Mailing List Archive

linstor Error after adding resource to node
I'm still progressively testing linstor on a group of test servers, and
now came across a new problem.

I had everything working nicely with 4 x nodes, linstor n l showed all 4
working, machines would automatically come back after a reboot, and it
all looked good. I deleted all the storage-pools and re-created them
(same name across all servers), and then followed the docs to create my
first resource-definition, volume-definition and then used
auto-placement on 3 nodes.

I then decided to get clever, and started creating another 9 tests with
auto-placement on 4 nodes. All of this worked well, and I got a status
like this:

????????????????????????????????????????????????????????????????????????????????????????????????????????
? Node   ? Resource   ? StoragePool ? VolNr ? MinorNr ? DeviceName    ? 
Allocated ? InUse  ?    State ?
????????????????????????????????????????????????????????????????????????????????????????????????????????
? castle ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
? san5   ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
? san6   ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
????????????????????????????????????????????????????????????????????????????????????????????????????????

You can see at the end the windows-wm is only on 3 nodes, while the
other testvm are on 4. So I ran this command to add the windows-wm to
the 4th node:

linstor resource create san7 windows-wm --storage-pool pool_hdd
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'windows-wm' on node 'san7' registered.
Details:
    Resource 'windows-wm' on node 'san7' UUID is:
50e34cac-6702-45c4-b242-b644415a7644
SUCCESS:
Description:
    Volume with number '0' on resource 'windows-wm' on node 'san7'
successfully registered
Details:
    Volume UUID is: f6529463-03ac-4b31-af69-67b3153de355
SUCCESS:
    Added peer(s) 'san7' to resource 'windows-wm' on 'san6'
SUCCESS:
    Added peer(s) 'san7' to resource 'windows-wm' on 'castle'
SUCCESS:
    Added peer(s) 'san7' to resource 'windows-wm' on 'san5'
SUCCESS:
    Created resource 'windows-wm' on 'san7'
SUCCESS:
Description:
    Resource 'windows-wm' on 'san7' ready
Details:
    Node(s): 'san7', Resource: 'windows-wm'

This looked promising, and seemed to work, so I ran another "linstor
volume list" but got an error:

linstor volume list
ERROR:
Show reports:
    linstor error-reports show 5F733CD9-00000-000004

The contents of that are below:
ERROR REPORT 5F733CD9-00000-000004

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:13:34
Node:                               castle.websitemanagers.com.au

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'deviceProviderKindAsString',
Source file 'Json.java', Line #73


Call backtrace:

    Method                                   Native Class:Line number
    deviceProviderKindAsString               N
com.linbit.linstor.api.rest.v1.serializer.Json:73
    apiToVolume                              N
com.linbit.linstor.api.rest.v1.serializer.Json:664
    lambda$apiToResourceWithVolumes$2        N
com.linbit.linstor.api.rest.v1.serializer.Json:477
    accept                                   N
java.util.stream.ReferencePipeline$3$1:195
    forEachRemaining                         N
java.util.ArrayList$ArrayListSpliterator:1655
    copyInto                                 N
java.util.stream.AbstractPipeline:484
    wrapAndCopyInto                          N
java.util.stream.AbstractPipeline:474
    evaluateSequential                       N
java.util.stream.ReduceOps$ReduceOp:913
    evaluate                                 N
java.util.stream.AbstractPipeline:234
    collect                                  N
java.util.stream.ReferencePipeline:578
    apiToResourceWithVolumes                 N
com.linbit.linstor.api.rest.v1.serializer.Json:506
    lambda$listVolumesApiCallRcWithToResponse$1 N
com.linbit.linstor.api.rest.v1.View:112
    accept                                   N
java.util.stream.ReferencePipeline$3$1:195
    forEachRemaining                         N
java.util.ArrayList$ArrayListSpliterator:1655
    copyInto                                 N
java.util.stream.AbstractPipeline:484
    wrapAndCopyInto                          N
java.util.stream.AbstractPipeline:474
    evaluateSequential                       N
java.util.stream.ReduceOps$ReduceOp:913
    evaluate                                 N
java.util.stream.AbstractPipeline:234
    collect                                  N
java.util.stream.ReferencePipeline:578
    lambda$listVolumesApiCallRcWithToResponse$2 N
com.linbit.linstor.api.rest.v1.View:113
    onNext                                   N
reactor.core.publisher.FluxFlatMap$FlatMapMain:378
    onNext                                   N
reactor.core.publisher.FluxContextStart$ContextStartSubscriber:96
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
    onNext                                   N
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:385
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
    request                                  N
reactor.core.publisher.Operators$ScalarSubscription:2317
    onSubscribeInner                         N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:143
    onSubscribe                              N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:237
    trySubscribeScalarMap                    N
reactor.core.publisher.FluxFlatMap:191
    subscribeOrReturn                        N
reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N
reactor.core.publisher.Flux:8311
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    request                                  N
reactor.core.publisher.Operators$ScalarSubscription:2317
    onSubscribe                              N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
    subscribe                                N
reactor.core.publisher.MonoCurrentContext:35
    subscribe                                N
reactor.core.publisher.Flux:8325
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    onNext                                   N
reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
    complete                                 N
reactor.core.publisher.Operators$MonoSubscriber:1755
    onComplete                               N
reactor.core.publisher.MonoStreamCollector$StreamCollectorSubscriber:167
    onComplete                               N
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
    onComplete                               N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
    checkTerminated                          N
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
    drainLoop                                N
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
    innerComplete                            N
reactor.core.publisher.FluxFlatMap$FlatMapMain:909
    onComplete                               N
reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
    onComplete                               N
reactor.core.publisher.FluxMap$MapSubscriber:136
    onComplete                               N
reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
    onComplete                               N
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
    complete                                 N
reactor.core.publisher.FluxCreate$BaseSink:438
    drain                                    N
reactor.core.publisher.FluxCreate$BufferAsyncSink:784
    complete                                 N
reactor.core.publisher.FluxCreate$BufferAsyncSink:732
    drainLoop                                N
reactor.core.publisher.FluxCreate$SerializedSink:239
    drain                                    N
reactor.core.publisher.FluxCreate$SerializedSink:205
    complete                                 N
reactor.core.publisher.FluxCreate$SerializedSink:196
    apiCallComplete                          N
com.linbit.linstor.netcom.TcpConnectorPeer:455
    handleComplete                           N
com.linbit.linstor.proto.CommonMessageProcessor:363
    handleDataMessage                        N
com.linbit.linstor.proto.CommonMessageProcessor:287
    doProcessInOrderMessage                  N
com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N
com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N
reactor.core.publisher.FluxDefer:46
    subscribe                                N
reactor.core.publisher.Flux:8325
    onNext                                   N
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N
reactor.core.publisher.UnicastProcessor:286
    drain                                    N
reactor.core.publisher.UnicastProcessor:322
    onNext                                   N
reactor.core.publisher.UnicastProcessor:401
    next                                     N
reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N
reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N
com.linbit.linstor.netcom.TcpConnectorPeer:373
    doProcessMessage                         N
com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N
com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N
reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N
reactor.core.scheduler.WorkerTask:84
    call                                     N
reactor.core.scheduler.WorkerTask:37
    run                                      N
java.util.concurrent.FutureTask:264
    run                                      N
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N
java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N
java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.

Since it looks relevant, error reports 1, 2 and 3 are all similar for
nodes castle, san5 and san6 (note san7 was the 4th/newest node where I
tried to add the resource to it). Error report is:
ERROR REPORT 5F733CD9-00000-000001

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:13:01
Node:                               castle.websitemanagers.com.au
Peer:                               Node: 'san6'

============================================================

Reported error:
===============

Category:                           Error
Class name:                         ImplementationError
Class canonical name:               com.linbit.ImplementationError
Generated at:                       Method 'createStorageRscData',
Source file 'CtrlRscLayerDataMerger.java', Line #214

Error message:                      Received unknown storage resource
from satellite

Asynchronous stage backtrace:

    Error has been observed at the following site(s):
        |_ checkpoint ? Execute single-stage API NotifyRscApplied
        |_ checkpoint ? Fallback error handling wrapper
    Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    createStorageRscData                     N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214

Suppressed exception 1 of 1:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:
reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'createStorageRscData',
Source file 'CtrlRscLayerDataMerger.java', Line #214

Error message:
Error has been observed at the following site(s):
    |_ checkpoint ? Execute single-stage API NotifyRscApplied
    |_ checkpoint ? Fallback error handling wrapper
Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    createStorageRscData                     N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214
    createStorageRscData                     N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:65
    mergeStorageRscData                      N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:285
    merge                                    N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:138
    merge                                    N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:148
    mergeLayerData                           N
com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:93
    mergeLayerData                           N
com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:82
    updateVolume                             N
com.linbit.linstor.core.apicallhandler.controller.internal.RscInternalCallHandler:228
    execute                                  N
com.linbit.linstor.api.protobuf.internal.NotifyResourceApplied:45
    executeNonReactive                       N
com.linbit.linstor.proto.CommonMessageProcessor:525
    lambda$execute$13                        N
com.linbit.linstor.proto.CommonMessageProcessor:500
    doInScope                                N
com.linbit.linstor.core.apicallhandler.ScopeRunner:147
    lambda$fluxInScope$0                     N
com.linbit.linstor.core.apicallhandler.ScopeRunner:75
    call                                     N
reactor.core.publisher.MonoCallable:91
    trySubscribeScalarMap                    N
reactor.core.publisher.FluxFlatMap:126
    subscribeOrReturn                        N
reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N
reactor.core.publisher.Flux:8311
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    request                                  N
reactor.core.publisher.Operators$ScalarSubscription:2317
    onSubscribe                              N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
    subscribe                                N
reactor.core.publisher.MonoCurrentContext:35
    subscribe                                N
reactor.core.publisher.InternalFluxOperator:62
    subscribe                                N
reactor.core.publisher.FluxDefer:54
    subscribe                                N
reactor.core.publisher.Flux:8325
    onNext                                   N
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N
reactor.core.publisher.UnicastProcessor:286
    drain                                    N
reactor.core.publisher.UnicastProcessor:322
    onNext                                   N
reactor.core.publisher.UnicastProcessor:401
    next                                     N
reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N
reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N
com.linbit.linstor.netcom.TcpConnectorPeer:373
    doProcessMessage                         N
com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N
com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N
reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N
reactor.core.scheduler.WorkerTask:84
    call                                     N
reactor.core.scheduler.WorkerTask:37
    run                                      N
java.util.concurrent.FutureTask:264
    run                                      N
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N
java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N
java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.


So, questions:

1) Why did I end up in this state? I assume something was configured on
castle/san5/san6 but not on san7.

2) How can I fix it?

Thanks,
Adam

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: linstor Error after adding resource to node [ In reply to ]
>> Error message: Received unknown storage resource
from satellite

According to the error above, there seems to be an issue with the "storage
resource" on san7.
Have you checked if the storage pool (pool_hdd) has enough space and is
healthy ?

On Tue, 29 Sep 2020 at 16:15, Adam Goryachev <
mailinglists@websitemanagers.com.au> wrote:

> I'm still progressively testing linstor on a group of test servers, and
> now came across a new problem.
>
> I had everything working nicely with 4 x nodes, linstor n l showed all 4
> working, machines would automatically come back after a reboot, and it
> all looked good. I deleted all the storage-pools and re-created them
> (same name across all servers), and then followed the docs to create my
> first resource-definition, volume-definition and then used
> auto-placement on 3 nodes.
>
> I then decided to get clever, and started creating another 9 tests with
> auto-placement on 4 nodes. All of this worked well, and I got a status
> like this:
>
>
> ????????????????????????????????????????????????????????????????????????????????????????????????????????
> ? Node ? Resource ? StoragePool ? VolNr ? MinorNr ? DeviceName ?
> Allocated ? InUse ? State ?
>
> ????????????????????????????????????????????????????????????????????????????????????????????????????????
> ? castle ? testvm1 ? pool_hdd ? 0 ? 1000 ? /dev/drbd1000 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm1 ? pool_hdd ? 0 ? 1000 ? /dev/drbd1000 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm1 ? pool_hdd ? 0 ? 1000 ? /dev/drbd1000 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm1 ? pool_hdd ? 0 ? 1000 ? /dev/drbd1000 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm2 ? pool_hdd ? 0 ? 1002 ? /dev/drbd1002 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm2 ? pool_hdd ? 0 ? 1002 ? /dev/drbd1002 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm2 ? pool_hdd ? 0 ? 1002 ? /dev/drbd1002 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm2 ? pool_hdd ? 0 ? 1002 ? /dev/drbd1002 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm3 ? pool_hdd ? 0 ? 1003 ? /dev/drbd1003 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm3 ? pool_hdd ? 0 ? 1003 ? /dev/drbd1003 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm3 ? pool_hdd ? 0 ? 1003 ? /dev/drbd1003 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm3 ? pool_hdd ? 0 ? 1003 ? /dev/drbd1003 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm4 ? pool_hdd ? 0 ? 1004 ? /dev/drbd1004 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm4 ? pool_hdd ? 0 ? 1004 ? /dev/drbd1004 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm4 ? pool_hdd ? 0 ? 1004 ? /dev/drbd1004 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm4 ? pool_hdd ? 0 ? 1004 ? /dev/drbd1004 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm5 ? pool_hdd ? 0 ? 1005 ? /dev/drbd1005 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm5 ? pool_hdd ? 0 ? 1005 ? /dev/drbd1005 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm5 ? pool_hdd ? 0 ? 1005 ? /dev/drbd1005 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm5 ? pool_hdd ? 0 ? 1005 ? /dev/drbd1005 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm6 ? pool_hdd ? 0 ? 1006 ? /dev/drbd1006 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm6 ? pool_hdd ? 0 ? 1006 ? /dev/drbd1006 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm6 ? pool_hdd ? 0 ? 1006 ? /dev/drbd1006 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm6 ? pool_hdd ? 0 ? 1006 ? /dev/drbd1006 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm7 ? pool_hdd ? 0 ? 1007 ? /dev/drbd1007 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm7 ? pool_hdd ? 0 ? 1007 ? /dev/drbd1007 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm7 ? pool_hdd ? 0 ? 1007 ? /dev/drbd1007 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm7 ? pool_hdd ? 0 ? 1007 ? /dev/drbd1007 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm8 ? pool_hdd ? 0 ? 1008 ? /dev/drbd1008 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm8 ? pool_hdd ? 0 ? 1008 ? /dev/drbd1008 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm8 ? pool_hdd ? 0 ? 1008 ? /dev/drbd1008 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm8 ? pool_hdd ? 0 ? 1008 ? /dev/drbd1008 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? testvm9 ? pool_hdd ? 0 ? 1009 ? /dev/drbd1009 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san5 ? testvm9 ? pool_hdd ? 0 ? 1009 ? /dev/drbd1009 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san6 ? testvm9 ? pool_hdd ? 0 ? 1009 ? /dev/drbd1009 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? san7 ? testvm9 ? pool_hdd ? 0 ? 1009 ? /dev/drbd1009 ?
> 102.42 MiB ? Unused ? UpToDate ?
> ? castle ? windows-wm ? pool_hdd ? 0 ? 1001 ? /dev/drbd1001 ?
> 49.16 MiB ? Unused ? UpToDate ?
> ? san5 ? windows-wm ? pool_hdd ? 0 ? 1001 ? /dev/drbd1001 ?
> 49.16 MiB ? Unused ? UpToDate ?
> ? san6 ? windows-wm ? pool_hdd ? 0 ? 1001 ? /dev/drbd1001 ?
> 49.16 MiB ? Unused ? UpToDate ?
>
> ????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> You can see at the end the windows-wm is only on 3 nodes, while the
> other testvm are on 4. So I ran this command to add the windows-wm to
> the 4th node:
>
> linstor resource create san7 windows-wm --storage-pool pool_hdd
> SUCCESS:
> Successfully set property key(s): StorPoolName
> SUCCESS:
> Description:
> New resource 'windows-wm' on node 'san7' registered.
> Details:
> Resource 'windows-wm' on node 'san7' UUID is:
> 50e34cac-6702-45c4-b242-b644415a7644
> SUCCESS:
> Description:
> Volume with number '0' on resource 'windows-wm' on node 'san7'
> successfully registered
> Details:
> Volume UUID is: f6529463-03ac-4b31-af69-67b3153de355
> SUCCESS:
> Added peer(s) 'san7' to resource 'windows-wm' on 'san6'
> SUCCESS:
> Added peer(s) 'san7' to resource 'windows-wm' on 'castle'
> SUCCESS:
> Added peer(s) 'san7' to resource 'windows-wm' on 'san5'
> SUCCESS:
> Created resource 'windows-wm' on 'san7'
> SUCCESS:
> Description:
> Resource 'windows-wm' on 'san7' ready
> Details:
> Node(s): 'san7', Resource: 'windows-wm'
>
> This looked promising, and seemed to work, so I ran another "linstor
> volume list" but got an error:
>
> linstor volume list
> ERROR:
> Show reports:
> linstor error-reports show 5F733CD9-00000-000004
>
> The contents of that are below:
> ERROR REPORT 5F733CD9-00000-000004
>
> ============================================================
>
> Application: LINBIT® LINSTOR
> Module: Controller
> Version: 1.9.0
> Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
> Build time: 2020-09-23T10:27:49+00:00
> Error time: 2020-09-30 00:13:34
> Node: castle.websitemanagers.com.au
>
> ============================================================
>
> Reported error:
> ===============
>
> Category: RuntimeException
> Class name: NullPointerException
> Class canonical name: java.lang.NullPointerException
> Generated at: Method 'deviceProviderKindAsString',
> Source file 'Json.java', Line #73
>
>
> Call backtrace:
>
> Method Native Class:Line number
> deviceProviderKindAsString N
> com.linbit.linstor.api.rest.v1.serializer.Json:73
> apiToVolume N
> com.linbit.linstor.api.rest.v1.serializer.Json:664
> lambda$apiToResourceWithVolumes$2 N
> com.linbit.linstor.api.rest.v1.serializer.Json:477
> accept N
> java.util.stream.ReferencePipeline$3$1:195
> forEachRemaining N
> java.util.ArrayList$ArrayListSpliterator:1655
> copyInto N
> java.util.stream.AbstractPipeline:484
> wrapAndCopyInto N
> java.util.stream.AbstractPipeline:474
> evaluateSequential N
> java.util.stream.ReduceOps$ReduceOp:913
> evaluate N
> java.util.stream.AbstractPipeline:234
> collect N
> java.util.stream.ReferencePipeline:578
> apiToResourceWithVolumes N
> com.linbit.linstor.api.rest.v1.serializer.Json:506
> lambda$listVolumesApiCallRcWithToResponse$1 N
> com.linbit.linstor.api.rest.v1.View:112
> accept N
> java.util.stream.ReferencePipeline$3$1:195
> forEachRemaining N
> java.util.ArrayList$ArrayListSpliterator:1655
> copyInto N
> java.util.stream.AbstractPipeline:484
> wrapAndCopyInto N
> java.util.stream.AbstractPipeline:474
> evaluateSequential N
> java.util.stream.ReduceOps$ReduceOp:913
> evaluate N
> java.util.stream.AbstractPipeline:234
> collect N
> java.util.stream.ReferencePipeline:578
> lambda$listVolumesApiCallRcWithToResponse$2 N
> com.linbit.linstor.api.rest.v1.View:113
> onNext N
> reactor.core.publisher.FluxFlatMap$FlatMapMain:378
> onNext N
> reactor.core.publisher.FluxContextStart$ContextStartSubscriber:96
> onNext N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
> onNext N
> reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:385
> onNext N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242
> request N
> reactor.core.publisher.Operators$ScalarSubscription:2317
> onSubscribeInner N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:143
> onSubscribe N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:237
> trySubscribeScalarMap N
> reactor.core.publisher.FluxFlatMap:191
> subscribeOrReturn N
> reactor.core.publisher.MonoFlatMapMany:49
> subscribe N
> reactor.core.publisher.Flux:8311
> onNext N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
> request N
> reactor.core.publisher.Operators$ScalarSubscription:2317
> onSubscribe N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
> subscribe N
> reactor.core.publisher.MonoCurrentContext:35
> subscribe N
> reactor.core.publisher.Flux:8325
> onNext N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
> onNext N
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
> complete N
> reactor.core.publisher.Operators$MonoSubscriber:1755
> onComplete N
> reactor.core.publisher.MonoStreamCollector$StreamCollectorSubscriber:167
> onComplete N
> reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
> onComplete N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
> checkTerminated N
> reactor.core.publisher.FluxFlatMap$FlatMapMain:838
> drainLoop N
> reactor.core.publisher.FluxFlatMap$FlatMapMain:600
> innerComplete N
> reactor.core.publisher.FluxFlatMap$FlatMapMain:909
> onComplete N
> reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
> onComplete N
> reactor.core.publisher.FluxMap$MapSubscriber:136
> onComplete N
> reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
> onComplete N
> reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
> complete N
> reactor.core.publisher.FluxCreate$BaseSink:438
> drain N
> reactor.core.publisher.FluxCreate$BufferAsyncSink:784
> complete N
> reactor.core.publisher.FluxCreate$BufferAsyncSink:732
> drainLoop N
> reactor.core.publisher.FluxCreate$SerializedSink:239
> drain N
> reactor.core.publisher.FluxCreate$SerializedSink:205
> complete N
> reactor.core.publisher.FluxCreate$SerializedSink:196
> apiCallComplete N
> com.linbit.linstor.netcom.TcpConnectorPeer:455
> handleComplete N
> com.linbit.linstor.proto.CommonMessageProcessor:363
> handleDataMessage N
> com.linbit.linstor.proto.CommonMessageProcessor:287
> doProcessInOrderMessage N
> com.linbit.linstor.proto.CommonMessageProcessor:235
> lambda$doProcessMessage$3 N
> com.linbit.linstor.proto.CommonMessageProcessor:220
> subscribe N
> reactor.core.publisher.FluxDefer:46
> subscribe N
> reactor.core.publisher.Flux:8325
> onNext N
> reactor.core.publisher.FluxFlatMap$FlatMapMain:418
> drainAsync N
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
> drain N
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
> onNext N
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
> drainFused N
> reactor.core.publisher.UnicastProcessor:286
> drain N
> reactor.core.publisher.UnicastProcessor:322
> onNext N
> reactor.core.publisher.UnicastProcessor:401
> next N
> reactor.core.publisher.FluxCreate$IgnoreSink:618
> next N
> reactor.core.publisher.FluxCreate$SerializedSink:153
> processInOrder N
> com.linbit.linstor.netcom.TcpConnectorPeer:373
> doProcessMessage N
> com.linbit.linstor.proto.CommonMessageProcessor:218
> lambda$processMessage$2 N
> com.linbit.linstor.proto.CommonMessageProcessor:164
> onNext N
> reactor.core.publisher.FluxPeek$PeekSubscriber:177
> runAsync N
> reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
> run N
> reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
> call N
> reactor.core.scheduler.WorkerTask:84
> call N
> reactor.core.scheduler.WorkerTask:37
> run N
> java.util.concurrent.FutureTask:264
> run N
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
> runWorker N
> java.util.concurrent.ThreadPoolExecutor:1128
> run N
> java.util.concurrent.ThreadPoolExecutor$Worker:628
> run N java.lang.Thread:834
>
>
> END OF ERROR REPORT.
>
> Since it looks relevant, error reports 1, 2 and 3 are all similar for
> nodes castle, san5 and san6 (note san7 was the 4th/newest node where I
> tried to add the resource to it). Error report is:
> ERROR REPORT 5F733CD9-00000-000001
>
> ============================================================
>
> Application: LINBIT® LINSTOR
> Module: Controller
> Version: 1.9.0
> Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
> Build time: 2020-09-23T10:27:49+00:00
> Error time: 2020-09-30 00:13:01
> Node: castle.websitemanagers.com.au
> Peer: Node: 'san6'
>
> ============================================================
>
> Reported error:
> ===============
>
> Category: Error
> Class name: ImplementationError
> Class canonical name: com.linbit.ImplementationError
> Generated at: Method 'createStorageRscData',
> Source file 'CtrlRscLayerDataMerger.java', Line #214
>
> Error message: Received unknown storage resource
> from satellite
>
> Asynchronous stage backtrace:
>
> Error has been observed at the following site(s):
> |_ checkpoint ? Execute single-stage API NotifyRscApplied
> |_ checkpoint ? Fallback error handling wrapper
> Stack trace:
>
> Call backtrace:
>
> Method Native Class:Line number
> createStorageRscData N
> com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214
>
> Suppressed exception 1 of 1:
> ===============
> Category: RuntimeException
> Class name: OnAssemblyException
> Class canonical name:
> reactor.core.publisher.FluxOnAssembly.OnAssemblyException
> Generated at: Method 'createStorageRscData',
> Source file 'CtrlRscLayerDataMerger.java', Line #214
>
> Error message:
> Error has been observed at the following site(s):
> |_ checkpoint ? Execute single-stage API NotifyRscApplied
> |_ checkpoint ? Fallback error handling wrapper
> Stack trace:
>
> Call backtrace:
>
> Method Native Class:Line number
> createStorageRscData N
> com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214
> createStorageRscData N
> com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:65
> mergeStorageRscData N
> com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:285
> merge N
> com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:138
> merge N
> com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:148
> mergeLayerData N
> com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:93
> mergeLayerData N
> com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:82
> updateVolume N
>
> com.linbit.linstor.core.apicallhandler.controller.internal.RscInternalCallHandler:228
> execute N
> com.linbit.linstor.api.protobuf.internal.NotifyResourceApplied:45
> executeNonReactive N
> com.linbit.linstor.proto.CommonMessageProcessor:525
> lambda$execute$13 N
> com.linbit.linstor.proto.CommonMessageProcessor:500
> doInScope N
> com.linbit.linstor.core.apicallhandler.ScopeRunner:147
> lambda$fluxInScope$0 N
> com.linbit.linstor.core.apicallhandler.ScopeRunner:75
> call N
> reactor.core.publisher.MonoCallable:91
> trySubscribeScalarMap N
> reactor.core.publisher.FluxFlatMap:126
> subscribeOrReturn N
> reactor.core.publisher.MonoFlatMapMany:49
> subscribe N
> reactor.core.publisher.Flux:8311
> onNext N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
> request N
> reactor.core.publisher.Operators$ScalarSubscription:2317
> onSubscribe N
> reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
> subscribe N
> reactor.core.publisher.MonoCurrentContext:35
> subscribe N
> reactor.core.publisher.InternalFluxOperator:62
> subscribe N
> reactor.core.publisher.FluxDefer:54
> subscribe N
> reactor.core.publisher.Flux:8325
> onNext N
> reactor.core.publisher.FluxFlatMap$FlatMapMain:418
> drainAsync N
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
> drain N
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
> onNext N
> reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
> drainFused N
> reactor.core.publisher.UnicastProcessor:286
> drain N
> reactor.core.publisher.UnicastProcessor:322
> onNext N
> reactor.core.publisher.UnicastProcessor:401
> next N
> reactor.core.publisher.FluxCreate$IgnoreSink:618
> next N
> reactor.core.publisher.FluxCreate$SerializedSink:153
> processInOrder N
> com.linbit.linstor.netcom.TcpConnectorPeer:373
> doProcessMessage N
> com.linbit.linstor.proto.CommonMessageProcessor:218
> lambda$processMessage$2 N
> com.linbit.linstor.proto.CommonMessageProcessor:164
> onNext N
> reactor.core.publisher.FluxPeek$PeekSubscriber:177
> runAsync N
> reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
> run N
> reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
> call N
> reactor.core.scheduler.WorkerTask:84
> call N
> reactor.core.scheduler.WorkerTask:37
> run N
> java.util.concurrent.FutureTask:264
> run N
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
> runWorker N
> java.util.concurrent.ThreadPoolExecutor:1128
> run N
> java.util.concurrent.ThreadPoolExecutor$Worker:628
> run N java.lang.Thread:834
>
>
> END OF ERROR REPORT.
>
>
> So, questions:
>
> 1) Why did I end up in this state? I assume something was configured on
> castle/san5/san6 but not on san7.
>
> 2) How can I fix it?
>
> Thanks,
> Adam
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user@lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
>
Re: linstor Error after adding resource to node [ In reply to ]
Hi,

I tried to recreate this issue, but without success.

4 Node setup, all LVM
First create a resource with --auto-place 3,
Create 9 other resources with --auto-place 4
Create the first resource on the 4th (missing) node
Check "linstor volume list"

That means, there has to be something else in your setup.
What else did you do? I see that your "first" resource "windows-wm" was
more like the second resource, as it got the minor-number 1001, instead of
1000. That minor-number 1000 was later reused by "testvm1". However, was
something broken with the "original" resource using minor-number 1000?

Error report 5F733CD9-00000-000004 is a NullPointerException, but this is
most likely just a side-effect of the original issue.

> Since it looks relevant, error reports 1, 2 and 3 are all similar for
nodes castle, san5 and san6

What about error report 0? Not relevant for this issue?

> 1) Why did I end up in this state? I assume something was configured on
castle/san5/san6 but not on san7.

Not sure... If something would be broken on san7, you should also have
gotten an error report from a satellite. The ones you showed here are all
created by the controller (error-ids XXX-00000-YYY are always
controller-errors, satellite errors would also have some other
"random-looking" number instead of the -00000- part)

> 2) How can I fix it?

If I cannot recreate it, there is not much I can do. You could of course
try restarting the controller, that will reload the data from the database,
which might fix things... I would be still curious what caused all of
this...

Best regards,
Gabor
Re: linstor Error after adding resource to node [ In reply to ]
On 30/9/20 16:49, Gábor Hernádi wrote:
> Hi,
>
> I tried to recreate this issue, but without success.
>
> 4 Node setup, all LVM
> First create a resource with --auto-place 3,
> Create 9 other resources with --auto-place 4
> Create the first resource on the 4th (missing) node
> Check "linstor volume list"
>
> That means, there has to be something else in your setup.
> What else did you do? I see that your "first" resource "windows-wm"
> was more like the second resource, as it got the minor-number 1001,
> instead of 1000. That minor-number 1000 was later reused by "testvm1".
> However, was something broken with the "original" resource using
> minor-number 1000?
>
Unfortunately, yes, a whole bunch of things have been done on the first
three nodes. I've been slowly messing around to try and get everything
working over the last few months. There was another "testvm3" created
before, which I deleted, and then started again with the further testing....


> Error report 5F733CD9-00000-000004 is a NullPointerException, but this
> is most likely just a side-effect of the original issue.
>
> > Since it looks relevant, error reports 1, 2 and 3 are all similar
> for nodes castle, san5 and san6
>
> What about error report 0? Not relevant for this issue?
>
Ooops, I just didn't think that there was a 0 report... I've included it
here now:

ERROR REPORT 5F733CD9-00000-000000

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:12:19
Node:                               castle.websitemanagers.com.au
Peer:                               RestClient(192.168.5.207;
'PythonLinstor/1.4.0 (API1.0.4)')

============================================================

Reported error:
===============

Description:
    Dependency not found

Category:                           LinStorException
Class name:                         LinStorException
Class canonical name: com.linbit.linstor.LinStorException
Generated at:                       Method 'checkStorPoolLoaded', Source
file 'CtrlStorPoolResolveHelper.java', Line #225

Error message:                      Dependency not found

Error context:
    The storage pool 'DfltStorPool' for resource 'windows-wm' for
volume number '0' is not deployed on node 'san7'.

Call backtrace:

    Method                                   Native Class:Line number
    checkStorPoolLoaded                      N
com.linbit.linstor.CtrlStorPoolResolveHelper:225
    resolveStorPool                          N
com.linbit.linstor.CtrlStorPoolResolveHelper:149
    resolveStorPool                          N
com.linbit.linstor.CtrlStorPoolResolveHelper:65
    createVolumeResolvingStorPool            N
com.linbit.linstor.core.apicallhandler.controller.CtrlVlmCrtApiHelper:72
    createResourceDb                         N
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiHelper:396
    createResourceInTransaction              N
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:171
    lambda$createResource$2                  N
com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:143
    doInScope                                N
com.linbit.linstor.core.apicallhandler.ScopeRunner:147
    lambda$fluxInScope$0                     N
com.linbit.linstor.core.apicallhandler.ScopeRunner:75
    call                                     N
reactor.core.publisher.MonoCallable:91
    trySubscribeScalarMap                    N
reactor.core.publisher.FluxFlatMap:126
    subscribeOrReturn                        N
reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N
reactor.core.publisher.Flux:8311
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    request                                  N
reactor.core.publisher.Operators$ScalarSubscription:2317
    onSubscribe                              N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
    subscribe                                N
reactor.core.publisher.MonoCurrentContext:35
    subscribe                                N
reactor.core.publisher.Flux:8325
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    request                                  N
reactor.core.publisher.Operators$ScalarSubscription:2317
    onSubscribe                              N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
    subscribe                                N
reactor.core.publisher.MonoCurrentContext:35
    subscribe                                N
reactor.core.publisher.Flux:8325
    onNext                                   N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    onNext                                   N
reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121
    complete                                 N
reactor.core.publisher.Operators$MonoSubscriber:1755
    onComplete                               N
reactor.core.publisher.MonoCollect$CollectSubscriber:152
    onComplete                               N
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395
    onComplete                               N
reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
    checkTerminated                          N
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
    drainLoop                                N
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
    drain                                    N
reactor.core.publisher.FluxFlatMap$FlatMapMain:580
    onComplete                               N
reactor.core.publisher.FluxFlatMap$FlatMapMain:457
    checkTerminated                          N
reactor.core.publisher.FluxFlatMap$FlatMapMain:838
    drainLoop                                N
reactor.core.publisher.FluxFlatMap$FlatMapMain:600
    innerComplete                            N
reactor.core.publisher.FluxFlatMap$FlatMapMain:909
    onComplete                               N
reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
    onComplete                               N
reactor.core.publisher.FluxMap$MapSubscriber:136
    onComplete                               N
reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989
    onComplete                               N
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
    complete                                 N
reactor.core.publisher.FluxCreate$BaseSink:438
    drain                                    N
reactor.core.publisher.FluxCreate$BufferAsyncSink:784
    complete                                 N
reactor.core.publisher.FluxCreate$BufferAsyncSink:732
    drainLoop                                N
reactor.core.publisher.FluxCreate$SerializedSink:239
    drain                                    N
reactor.core.publisher.FluxCreate$SerializedSink:205
    complete                                 N
reactor.core.publisher.FluxCreate$SerializedSink:196
    apiCallComplete                          N
com.linbit.linstor.netcom.TcpConnectorPeer:455
    handleComplete                           N
com.linbit.linstor.proto.CommonMessageProcessor:363
    handleDataMessage                        N
com.linbit.linstor.proto.CommonMessageProcessor:287
    doProcessInOrderMessage                  N
com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N
com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N
reactor.core.publisher.FluxDefer:46
    subscribe                                N
reactor.core.publisher.Flux:8325
    onNext                                   N
reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N
reactor.core.publisher.UnicastProcessor:286
    drain                                    N
reactor.core.publisher.UnicastProcessor:322
    onNext                                   N
reactor.core.publisher.UnicastProcessor:401
    next                                     N
reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N
reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N
com.linbit.linstor.netcom.TcpConnectorPeer:373
    doProcessMessage                         N
com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N
com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N
reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N
reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N
reactor.core.scheduler.WorkerTask:84
    call                                     N
reactor.core.scheduler.WorkerTask:37
    run                                      N
java.util.concurrent.FutureTask:264
    run                                      N
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N
java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N
java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.

> > 1) Why did I end up in this state? I assume something was configured
> on castle/san5/san6 but not on san7.
>
> Not sure... If something would be broken on san7, you should also have
> gotten an error report from a satellite. The ones you showed here are
> all created by the controller (error-ids XXX-00000-YYY are always
> controller-errors, satellite errors would also have some other
> "random-looking" number instead of the -00000- part)
>
> > 2) How can I fix it?
>
> If I cannot recreate it, there is not much I can do. You could of
> course try restarting the controller, that will reload the data from
> the database, which might fix things... I would be still curious what
> caused all of this...


Sure, will see if I can work it out. From the 0 error, it looks like I
created some configuration in a name DfltStorPool, and probably this was
not replicated to san7 (because san7 didn't exist back then). I'm not
sure whether I would expect this to automatically be copied to the node
if/when it is required, or if I should get an error saying the resource
can't be deployed due to a missing dependency, but I suspect it
shouldn't crash the way it is at the moment...

OK, so I did restart the controller, and now linstor volume list returns
this:

????????????????????????????????????????????????????????????????????????????????????????????????????????
? Node   ? Resource   ? StoragePool ? VolNr ? MinorNr ? DeviceName    ? 
Allocated ? InUse  ?    State ?
????????????????????????????????????????????????????????????????????????????????????????????????????????
? castle ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm1    ? pool_hdd    ?     0 ?    1000 ? /dev/drbd1000 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm2    ? pool_hdd    ?     0 ?    1002 ? /dev/drbd1002 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm3    ? pool_hdd    ?     0 ?    1003 ? /dev/drbd1003 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm4    ? pool_hdd    ?     0 ?    1004 ? /dev/drbd1004 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm5    ? pool_hdd    ?     0 ?    1005 ? /dev/drbd1005 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm6    ? pool_hdd    ?     0 ?    1006 ? /dev/drbd1006 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm7    ? pool_hdd    ?     0 ?    1007 ? /dev/drbd1007 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm8    ? pool_hdd    ?     0 ?    1008 ? /dev/drbd1008 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? san5   ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? san6   ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? san7   ? testvm9    ? pool_hdd    ?     0 ?    1009 ? /dev/drbd1009 ?
102.42 MiB ? Unused ? UpToDate ?
? castle ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
? san5   ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
? san6   ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
? san7   ? windows-wm ? pool_hdd    ?     0 ?    1001 ? /dev/drbd1001 ? 
49.16 MiB ? Unused ? UpToDate ?
????????????????????????????????????????????????????????????????????????????????????????????????????????

So, it would appear that it did actually deploy windows-wm to san7, and
looks like it's all working again. Though I'm still rather unsure about
my process, that with all my testing I might end up with an unstable
system due to old config/testing bits left over.

To completely "reset" linstor, I've so far found the following configs:

/etc/linstor
/etc/drbd.d
/var/lib/linstor
/var/lib/linstor.d
/var/log/linstor

Plus I assume whatever storage spaces have been configured. Is there
anything else that should be wiped to ensure I am starting with a clean
slate? I'd rather not format the whole system and re-install ...

Thanks,
Adam


_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user