Mailing List Archive

linstor issues
I've tried to follow the limited documentation on installing DRBD 9 and
linstor, and sort of managed to get things working. I have three nodes
(castle, san5 and san6). I've re-built the various ubuntu packages under
debian, and installed on debian buster on all three machines:

drbd-dkms_9.0.22-1ppa1~bionic1_all.deb
drbd-utils_9.13.1-1ppa1~bionic1_amd64.deb
linstor-controller_1.7.1-1ppa1~bionic1_all.deb
linstor-satellite_1.7.1-1ppa1~bionic1_all.deb
linstor-common_1.7.1-1ppa1~bionic1_all.deb
python-linstor_1.1.1-1ppa1~bionic1_all.deb
linstor-client_1.1.1-1ppa1~bionic1_all.deb

After adding the three nodes I had this output:
linstor node list
????????????????????????????????????????????????????????????
? Node   ? NodeType  ? Addresses                  ? State  ?
????????????????????????????????????????????????????????????
? castle ? SATELLITE ? <IP>.204:3366 (PLAIN) ? Online ?
? san5   ? SATELLITE ? <IP>.205:3366 (PLAIN) ? Online ?
? san6   ? SATELLITE ? <IP>.206:3366 (PLAIN) ? Online ?
????????????????????????????????????????????????????????????

Then I added some storage pools:
linstor storage-pool list
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
TotalCapacity ? CanSnapshots ? State ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? pool                 ? castle ? LVM      ? vg_hdd   ?     3.44 TiB
?      3.44 TiB ? False        ? Ok    ?
? pool                 ? san5   ? LVM      ? vg_hdd   ?     4.36 TiB
?      4.36 TiB ? False        ? Ok    ?
? pool                 ? san6   ? LVM      ? vg_ssd   ?     1.75 TiB
?      1.75 TiB ? False        ? Ok    ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????

Again, everything was looking pretty good.

So, I tried to create a resource, and then I got this:

linstor resource list
??????????????????????????????????????????????????????????????????????????????
? ResourceName ? Node   ? Port ? Usage  ? Conns ?    State ?
??????????????????????????????????????????????????????????????????????????????
? testvm1      ? castle ? 7000 ?        ? ?  Unknown ?
? testvm1      ? san5   ? 7000 ?        ? ?  Unknown ?
? testvm1      ? san6   ? 7000 ? Unused ? Connecting(san5,castle) ?
UpToDate ?
??????????????????????????????????????????????????????????????????????????????

There hasn't been any change in over 24 hours, so I'm guessing there is
something stuck/not working, but I don't seem to have many clues on what
it might be.

I've checked through the docs at:
https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/ and found
these two commands in section 2.7 Checking the state of your cluster:

# linstor node list
# linstor storage-pool list --groupby Size

However, the second command produces a usage error (documentation bug
perhaps). Editing the command to something valid produces:
linstor storage-pool list --groupby Node
???????????????????????????????????????????????????????????????????????????????????????????????????????????????
? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
TotalCapacity ? CanSnapshots ? State   ?
???????????????????????????????????????????????????????????????????????????????????????????????????????????????
? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
?               ? False        ? Ok      ?
? pool                 ? castle ? LVM      ? vg_hdd   ?     3.44 TiB
?      3.44 TiB ? False        ? Ok      ?
? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
?               ? False        ? Warning ?
? pool                 ? san5   ? LVM      ? vg_hdd ?             
?               ? False        ? Warning ?
? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
?               ? False        ? Ok      ?
? pool                 ? san6   ? LVM      ? vg_ssd   ?     1.26 TiB
?      1.75 TiB ? False        ? Ok      ?
???????????????????????????????????????????????????????????????????????????????????????????????????????????????
WARNING:
Description:
    No active connection to satellite 'san5'
Details:
    The controller is trying to (re-) establish a connection to the
satellite. The controller stored the changes and as soon the satellite
is connected, it will receive this update.

Note, after waiting approx 20hours, san5 was shutdown cleanly, so is
currently offline.

dmesg on san6 includes this:
[95078.272184] drbd testvm1: Starting worker thread (from drbdsetup [2398])
[95078.285272] drbd testvm1 castle: Starting sender thread (from
drbdsetup [2402])
[95078.290733] drbd testvm1 san5: Starting sender thread (from drbdsetup
[2406])
[95078.310399] drbd testvm1/0 drbd1000: meta-data IO uses: blk-bio
[95078.310500] drbd testvm1/0 drbd1000: rs_discard_granularity feature
disabled
[95078.310767] drbd testvm1/0 drbd1000: disk( Diskless -> Attaching )
[95078.310775] drbd testvm1/0 drbd1000: Maximum number of peer devices = 7
[95078.310864] drbd testvm1: Method to ensure write ordering: flush
[95078.310867] drbd testvm1/0 drbd1000: Adjusting my ra_pages to backing
device's (32 -> 1024)
[95078.310870] drbd testvm1/0 drbd1000: drbd_bm_resize called with
capacity == 1048581248
[95078.418753] drbd testvm1/0 drbd1000: resync bitmap: bits=131072656
words=14336077 pages=28001
[95078.418757] drbd testvm1/0 drbd1000: size = 500 GB (524290624 KB)
[95078.593417] drbd testvm1/0 drbd1000: recounting of set bits took
additional 64ms
[95078.593429] drbd testvm1/0 drbd1000: disk( Attaching -> Inconsistent
) quorum( no -> yes )
[95078.593431] drbd testvm1/0 drbd1000: attached to current UUID:
0000000000000004
[95078.595412] drbd testvm1 castle: conn( StandAlone -> Unconnected )
[95078.596649] drbd testvm1 san5: conn( StandAlone -> Unconnected )
[95078.599430] drbd testvm1 castle: Starting receiver thread (from
drbd_w_testvm1 [2399])
[95078.599742] drbd testvm1 san5: Starting receiver thread (from
drbd_w_testvm1 [2399])
[95078.599813] drbd testvm1 castle: conn( Unconnected -> Connecting )
[95078.604454] drbd testvm1 san5: conn( Unconnected -> Connecting )
[95079.113391] drbd testvm1/0 drbd1000: rs_discard_granularity feature
disabled
[95079.146175] drbd testvm1: Preparing cluster-wide state change
1272763172 (2->-1 7683/4609)
[95079.146178] drbd testvm1: Committing cluster-wide state change
1272763172 (0ms)
[95079.146184] drbd testvm1: role( Secondary -> Primary )
[95079.146186] drbd testvm1/0 drbd1000: disk( Inconsistent -> UpToDate )
[95079.146256] drbd testvm1/0 drbd1000: size = 500 GB (524290624 KB)
[95079.152264] drbd testvm1: Forced to consider local data as UpToDate!
[95079.156608] drbd testvm1/0 drbd1000: new current UUID:
60E1FC2F9926E84B weak: FFFFFFFFFFFFFFFB
[95079.159415] drbd testvm1: role( Primary -> Secondary )


----- a few weeks later...

I wrote the above intending to have another go at this later, and so now
I have san5 back online, and have rebooted both castle and san6, now my
status on all three is:
linstor n l
?????????????????????????????????????????????????????????????
? Node   ? NodeType  ? Addresses                  ? State   ?
?????????????????????????????????????????????????????????????
? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Unknown ?
? san5   ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Unknown ?
? san6   ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Unknown ?
?????????????????????????????????????????????????????????????

Is there any other documentation on what to do when things go wrong? A
checklist to find where the problem might be? With the old DRBD 8.4
/proc/drbd or dmesg seemed to be the two main sources of information,
but now I seem quite out of my depth. Any clues or suggestions on things
to check, additional information to provide/etc would be greatly
appreciated.
Re: linstor issues [ In reply to ]
Hi,

apparently something is quite broken... maybe it's somehow your setup or
environment, I am not sure...

linstor resource list
>
> ??????????????????????????????????????????????????????????????????????????????
> ? ResourceName ? Node ? Port ? Usage ? Conns ?
> State ?
>
> ??????????????????????????????????????????????????????????????????????????????
> ? testvm1 ? castle ? 7000 ? ? ?
> Unknown ?
> ? testvm1 ? san5 ? 7000 ? ? ?
> Unknown ?
> ? testvm1 ? san6 ? 7000 ? Unused ? Connecting(san5,castle) ?
> UpToDate ?
>
> ??????????????????????????????????????????????????????????????????????????????
>
> This looks like some kind of network issues.

# linstor storage-pool list --groupby Size
>
> However, the second command produces a usage error (documentation bug
> perhaps).
>

Thanks for reporting, we will look into this.


> WARNING:
> Description:
> No active connection to satellite 'san5'
> Details:
> The controller is trying to (re-) establish a connection to the
> satellite. The controller stored the changes and as soon the satellite is
> connected, it will receive this update.
>

So Linstor has obviously no connection to satellite 'san5'.


> [95078.599813] drbd testvm1 castle: conn( Unconnected -> Connecting )
> [95078.604454] drbd testvm1 san5: conn( Unconnected -> Connecting )
>

... and DRBD apparently also has troubles connecting...

linstor n l
> ?????????????????????????????????????????????????????????????
> ? Node ? NodeType ? Addresses ? State ?
> ?????????????????????????????????????????????????????????????
> ? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Unknown ?
> ? san5 ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Unknown ?
> ? san6 ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Unknown ?
> ?????????????????????????????????????????????????????????????
>

Now this is really strange. I will spare you with some details, but I
assume you have triggered some bad exception in Linstor which somehow
killed a necessary thread.
You should check
linstor err list
and see if you can find some related error reports.
Also, restarting the controller might help you here.

--
Best regards,
Gabor Hernadi
Re: linstor issues [ In reply to ]
On 6/23/20 4:53 AM, Adam Goryachev wrote:
> I wrote the above intending to have another go at this later, and so
> now I have san5 back online, and have rebooted both castle and san6,
> now my status on all three is:
> linstor n l
> ?????????????????????????????????????????????????????????????
> ? Node   ? NodeType  ? Addresses                  ? State   ?
> ?????????????????????????????????????????????????????????????
> ? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Unknown ?
> ? san5   ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Unknown ?
> ? san6   ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Unknown ?
> ?????????????????????????????????????????????????????????????
>

I suggest to check whether you can open a TCP connection from the
controller to those satellite ports, e.g. just telnet from 192.168.5.204
port 3366 and see what happens. If it says "connection refused" or just
hangs, then it's commonly either a network issue or a misconfigured
firewall.

IPv4 vs. IPv6 could also be an issue, try changing the listen addresses
from ::0 to 0.0.0.0, cause sometimes on some distributions/kernels,
listening on IPv6 does not implicitly listen von IPv4 as well (or
alternatively, use the IPv6 global scope addresses instead of the IPv4
addresses).

br,
Robert

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: linstor issues [ In reply to ]
On 23/6/20 21:53, Gábor Hernádi wrote:
> Hi,
>
> apparently something is quite broken... maybe it's somehow your setup
> or environment, I am not sure...
>
> linstor resource list
> ??????????????????????????????????????????????????????????????????????????????
> ? ResourceName ? Node   ? Port ? Usage  ? Conns                  
> ?    State ?
> ??????????????????????????????????????????????????????????????????????????????
> ? testvm1      ? castle ? 7000 ? ?                         ? 
> Unknown ?
> ? testvm1      ? san5   ? 7000 ? ?                         ? 
> Unknown ?
> ? testvm1      ? san6   ? 7000 ? Unused ? Connecting(san5,castle)
> ? UpToDate ?
> ??????????????????????????????????????????????????????????????????????????????
>
> This looks like some kind of network issues.
>
> # linstor storage-pool list --groupby Size
>
> However, the second command produces a usage error (documentation
> bug perhaps).
>
>
> Thanks for reporting, we will look into this.
>
> WARNING:
> Description:
>     No active connection to satellite 'san5'
> Details:
>     The controller is trying to (re-) establish a connection to
> the satellite. The controller stored the changes and as soon the
> satellite is connected, it will receive this update.
>
>
> So Linstor has obviously no connection to satellite 'san5'.
>
> [95078.599813] drbd testvm1 castle: conn( Unconnected -> Connecting )
> [95078.604454] drbd testvm1 san5: conn( Unconnected -> Connecting )
>
>
> ... and DRBD apparently also has troubles connecting...
>
> linstor n l
> ?????????????????????????????????????????????????????????????
> ? Node   ? NodeType  ? Addresses                  ? State   ?
> ?????????????????????????????????????????????????????????????
> ? castle ? SATELLITE ? 192.168.5.204:3366
> <http://192.168.5.204:3366> (PLAIN) ? Unknown ?
> ? san5   ? SATELLITE ? 192.168.5.205:3366
> <http://192.168.5.205:3366> (PLAIN) ? Unknown ?
> ? san6   ? SATELLITE ? 192.168.5.206:3366
> <http://192.168.5.206:3366> (PLAIN) ? Unknown ?
> ?????????????????????????????????????????????????????????????
>
>
> Now  this is really strange. I will spare you with some details, but I
> assume you have triggered some bad exception in Linstor which somehow
> killed a necessary thread.
> You should check
>    linstor err list
> and see if you can find some related error reports.
> Also, restarting the controller might help you here.
>
Thank you!

linstor err list showed a list of errors, but the contents didn't make a
lot of sense to me. Let me know if you are interested in them, and I can
send them.

I did a systemctl restart linstor-controller.service on san6, and things
started looking much better.

linstor n l
????????????????????????????????????????????????????????????
? Node   ? NodeType  ? Addresses                  ? State  ?
????????????????????????????????????????????????????????????
? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Online ?
? san5   ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Online ?
? san6   ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Online ?
????????????????????????????????????????????????????????????

So, all nodes agree that they are now online and talking to each other.
I assume this proves there is no network issues.

linstor resource list
???????????????????????????????????????????????????????????????????????????????????
? ResourceName ? Node   ? Port ? Usage  ? Conns ?              State ?
???????????????????????????????????????????????????????????????????????????????????
? testvm1      ? castle ? 7000 ?        ? ?            Unknown ?
? testvm1      ? san5   ? 7000 ? Unused ? Connecting(castle) ?
SyncTarget(12.67%) ?
? testvm1      ? san6   ? 7000 ? Unused ? Connecting(castle) ?          
UpToDate ?
???????????????????????????????????????????????????????????????????????????????????

From this, it looks like san6 (the controller) thinks it has the up to
date data, probably based on the fact it was created there first or
something. The data is syncing to san5 (in progress, and progressing
steadily), so that is good also. However, castle doesn't seem to be
syncing/connecting.

On castle, I see this:

Jun 24 11:01:55 castle Satellite[7499]: 11:01:55.177 [DeviceManager]
ERROR LINSTOR/Satellite - SYSTEM - Failed to create meta-data for DRBD
volume testvm1/0 [Report number 5EF2A316-31431-000002]

linstor err show give this:

ERROR REPORT 5EF2A316-31431-000002

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Satellite
Version:                            1.7.1
Build ID: 6760637d6fae7a5862103ced4ea0ab0a758861f9
Build time:                         2020-05-14T13:14:11+00:00
Error time:                         2020-06-24 11:01:55
Node:                               castle

============================================================

Reported error:
===============

Description:
    Failed to create meta-data for DRBD volume testvm1/0

Category:                           LinStorException
Class name:                         VolumeException
Class canonical name:
com.linbit.linstor.storage.layer.exceptions.VolumeException
Generated at:                       Method 'createMetaData', Source file
'DrbdLayer.java', Line #995

Error message:                      Failed to create meta-data for DRBD
volume testvm1/0

Error context:
    An error occurred while processing resource 'Node: 'castle', Rsc:
'testvm1''

Call backtrace:

    Method                                   Native Class:Line number
    createMetaData                           N
com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:995
    adjustDrbd                               N
com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:575
    process                                  N
com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:373
    process                                  N
com.linbit.linstor.core.devmgr.DeviceHandlerImpl:731
    processResourcesAndSnapshots             N
com.linbit.linstor.core.devmgr.DeviceHandlerImpl:300
    dispatchResources                        N
com.linbit.linstor.core.devmgr.DeviceHandlerImpl:138
    dispatchResources                        N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
    phaseDispatchDeviceHandlers              N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
    devMgrLoop                               N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
    run                                      N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
    run                                      N java.lang.Thread:834

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 1.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line
is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm -vvv --max-peers 7 -- --force create-md testvm1/0

    The external command sent the following output data:


    The external command sent the following error information:
    no resources defined!


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file
'DrbdAdm.java', Line #550

Error message:                      The external command 'drbdadm'
exited with error code 1


Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N
com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:550
    simpleAdmCommand                         N
com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:495
    createMd                                 N
com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:262
    createMetaData                           N
com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:923
    adjustDrbd                               N
com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:575
    process                                  N
com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:373
    process                                  N
com.linbit.linstor.core.devmgr.DeviceHandlerImpl:731
    processResourcesAndSnapshots             N
com.linbit.linstor.core.devmgr.DeviceHandlerImpl:300
    dispatchResources                        N
com.linbit.linstor.core.devmgr.DeviceHandlerImpl:138
    dispatchResources                        N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
    phaseDispatchDeviceHandlers              N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
    devMgrLoop                               N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
    run                                      N
com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.

Indeed, re-running the same command from the CLI provides the shown
error message:

drbdadm -vvv --max-peers 7 -- --force create-md testvm1/0
no resources defined!

Some other random status information which may or may not be relevant...

linstor storage-pool list
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
TotalCapacity ? CanSnapshots ? State ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? pool                 ? castle ? LVM      ? vg_hdd   ?     2.95 TiB
?      3.44 TiB ? False        ? Ok    ?
? pool                 ? san5   ? LVM      ? vg_hdd   ?     3.87 TiB
?      4.36 TiB ? False        ? Ok    ?
? pool                 ? san6   ? LVM      ? vg_ssd   ?     1.26 TiB
?      1.75 TiB ? False        ? Ok    ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????

I've tried to restart linstor-satellite service on castle, but it didn't
make any difference.

After a reboot of castle, and now I get this:

linstor resource list
??????????????????????????????????????????????????????????????????????
? ResourceName ? Node   ? Port ? Usage  ? Conns ? State ?
??????????????????????????????????????????????????????????????????????
? testvm1      ? castle ? 7000 ? Unused ? Ok    ? Diskless ?
? testvm1      ? san5   ? 7000 ? Unused ? Ok    ? SyncTarget(55.99%) ?
? testvm1      ? san6   ? 7000 ? Unused ? Ok    ? UpToDate ?
??????????????????????????????????????????????????????????????????????

However, looking at the err reports, and I see the exactl same error
about creating the metadata on castle.

One interesting thing is that the LV seems to have been created:

lvs
  /dev/drbd0: open failed: Wrong medium type
  /dev/drbd1: open failed: Wrong medium type
  LV                            VG      Attr       LSize    Pool Origin
Data%  Meta%  Move Log Cpy%Sync Convert
  backup_system_20200624_062513 storage swi-a-s---    4.00g system 3.06
  system                        storage owi-aos--- 5.00g
  testvm1_00000                 vg_hdd  -wi-a----- <500.11g

Any suggestions on where to look next? Or what I might have done wrong now?

Regards,
Adam
Re: linstor issues [ In reply to ]
I'm having another crack at this, I think it will be worth it once it works.

Firstly, another documentation error:

https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-using_the_linstor_client

> In case anything goes wrong with the storage pool’s VG/zPool, e.g. the
> VG having been renamed or somehow became invalid you can delete the
> storage pool in LINSTOR with the following command, given that only
> resources with all their volumes in the so-called ‘lost’ storage pool
> are attached. This feature is available since LINSTOR v0.9.13.
>
> # linstor storage-pool lost alpha pool_ssd
linstor storage-pool lost castle vg_hdd
usage: linstor storage-pool [-h]
                            {create, delete, list, list-properties,
                            set-property} ...
linstor storage-pool: error: argument {create, delete, list,
list-properties, set-property}: invalid choice: 'lost' (choose from
'create', 'c', 'delete', 'd', 'list', 'l', 'list-properties', 'lp',
'set-property', 'sp')

Changing to use delete instead of lost:

castle:~# linstor storage-pool delete castle vg_hdd
ERROR:
Description:
    Storage pool definition 'vg_hdd' not found.
Cause:
    The specified storage pool definition 'vg_hdd' could not be found
in the database
Correction:
    Create a storage pool definition 'vg_hdd' first.
Details:
    Node: castle, Storage pool name: vg_hdd
Show reports:
    linstor error-reports show 5F0D500C-00000-000000
castle:~# linstor storage-pool list
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
TotalCapacity ? CanSnapshots ? State ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? pool                 ? castle ? LVM      ? vg_hdd   ?     2.95 TiB
?      3.44 TiB ? False        ? Ok    ?
? pool                 ? san5   ? LVM      ? vg_hdd   ?     3.87 TiB
?      4.36 TiB ? False        ? Ok    ?
? pool                 ? san6   ? LVM      ? vg_ssd   ?     1.26 TiB
?      1.75 TiB ? False        ? Ok    ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????

I was hoping I could just remove the storage pool from castle (since it
doesn't seem to be working properly), and then destroy it, re-create it,
and then re-add it and see if that solves the problem. However, while it
seems to exist, it also doesn't (can't delete it).

Possibly part of the cause of my original problem is that I have a
script that automatically creates a snapshot for each LV, and this
created a snapshot of testvm1_00000 named
backup_testvm1_00000_blahblah.... I've now manually deleted that, and
fixed my script to avoid messing with the VG allocated to linstor, but
so far, there is no change in the current status (as per below).

Would appreciate any suggestions on what might be going wrong, and/or
how to fix it?

Regards,
Adam



On 24/6/20 11:46, Adam Goryachev wrote:
>
>
> On 23/6/20 21:53, Gábor Hernádi wrote:
>> Hi,
>>
>> apparently something is quite broken... maybe it's somehow your setup
>> or environment, I am not sure...
>>
>> linstor resource list
>> ??????????????????????????????????????????????????????????????????????????????
>> ? ResourceName ? Node   ? Port ? Usage  ? Conns                  
>> ?    State ?
>> ??????????????????????????????????????????????????????????????????????????????
>> ? testvm1      ? castle ? 7000 ? ?                         ? 
>> Unknown ?
>> ? testvm1      ? san5   ? 7000 ? ?                         ? 
>> Unknown ?
>> ? testvm1      ? san6   ? 7000 ? Unused ? Connecting(san5,castle)
>> ? UpToDate ?
>> ??????????????????????????????????????????????????????????????????????????????
>>
>> This looks like some kind of network issues.
>>
>> # linstor storage-pool list --groupby Size
>>
>> However, the second command produces a usage error (documentation
>> bug perhaps).
>>
>>
>> Thanks for reporting, we will look into this.
>>
>> WARNING:
>> Description:
>>     No active connection to satellite 'san5'
>> Details:
>>     The controller is trying to (re-) establish a connection to
>> the satellite. The controller stored the changes and as soon the
>> satellite is connected, it will receive this update.
>>
>>
>> So Linstor has obviously no connection to satellite 'san5'.
>>
>> [95078.599813] drbd testvm1 castle: conn( Unconnected -> Connecting )
>> [95078.604454] drbd testvm1 san5: conn( Unconnected -> Connecting )
>>
>>
>> ... and DRBD apparently also has troubles connecting...
>>
>> linstor n l
>> ?????????????????????????????????????????????????????????????
>> ? Node   ? NodeType  ? Addresses                  ? State   ?
>> ?????????????????????????????????????????????????????????????
>> ? castle ? SATELLITE ? 192.168.5.204:3366
>> <http://192.168.5.204:3366> (PLAIN) ? Unknown ?
>> ? san5   ? SATELLITE ? 192.168.5.205:3366
>> <http://192.168.5.205:3366> (PLAIN) ? Unknown ?
>> ? san6   ? SATELLITE ? 192.168.5.206:3366
>> <http://192.168.5.206:3366> (PLAIN) ? Unknown ?
>> ?????????????????????????????????????????????????????????????
>>
>>
>> Now  this is really strange. I will spare you with some details, but
>> I assume you have triggered some bad exception in Linstor which
>> somehow killed a necessary thread.
>> You should check
>>    linstor err list
>> and see if you can find some related error reports.
>> Also, restarting the controller might help you here.
>>
> Thank you!
>
> linstor err list showed a list of errors, but the contents didn't make
> a lot of sense to me. Let me know if you are interested in them, and I
> can send them.
>
> I did a systemctl restart linstor-controller.service on san6, and
> things started looking much better.
>
> linstor n l
> ????????????????????????????????????????????????????????????
> ? Node   ? NodeType  ? Addresses                  ? State  ?
> ????????????????????????????????????????????????????????????
> ? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Online ?
> ? san5   ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Online ?
> ? san6   ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Online ?
> ????????????????????????????????????????????????????????????
>
> So, all nodes agree that they are now online and talking to each
> other. I assume this proves there is no network issues.
>
> linstor resource list
> ???????????????????????????????????????????????????????????????????????????????????
> ? ResourceName ? Node   ? Port ? Usage  ? Conns ?              State ?
> ???????????????????????????????????????????????????????????????????????????????????
> ? testvm1      ? castle ? 7000 ?        ? ?            Unknown ?
> ? testvm1      ? san5   ? 7000 ? Unused ? Connecting(castle) ?
> SyncTarget(12.67%) ?
> ? testvm1      ? san6   ? 7000 ? Unused ? Connecting(castle)
> ?           UpToDate ?
> ???????????????????????????????????????????????????????????????????????????????????
>
> From this, it looks like san6 (the controller) thinks it has the up to
> date data, probably based on the fact it was created there first or
> something. The data is syncing to san5 (in progress, and progressing
> steadily), so that is good also. However, castle doesn't seem to be
> syncing/connecting.
>
> On castle, I see this:
>
> Jun 24 11:01:55 castle Satellite[7499]: 11:01:55.177 [DeviceManager]
> ERROR LINSTOR/Satellite - SYSTEM - Failed to create meta-data for DRBD
> volume testvm1/0 [Report number 5EF2A316-31431-000002]
>
> linstor err show give this:
>
> ERROR REPORT 5EF2A316-31431-000002
>
> ============================================================
>
> Application:                        LINBIT® LINSTOR
> Module:                             Satellite
> Version:                            1.7.1
> Build ID: 6760637d6fae7a5862103ced4ea0ab0a758861f9
> Build time:                         2020-05-14T13:14:11+00:00
> Error time:                         2020-06-24 11:01:55
> Node:                               castle
>
> ============================================================
>
> Reported error:
> ===============
>
> Description:
>     Failed to create meta-data for DRBD volume testvm1/0
>
> Category:                           LinStorException
> Class name:                         VolumeException
> Class canonical name:
> com.linbit.linstor.storage.layer.exceptions.VolumeException
> Generated at:                       Method 'createMetaData', Source
> file 'DrbdLayer.java', Line #995
>
> Error message:                      Failed to create meta-data for
> DRBD volume testvm1/0
>
> Error context:
>     An error occurred while processing resource 'Node: 'castle', Rsc:
> 'testvm1''
>
> Call backtrace:
>
>     Method                                   Native Class:Line number
>     createMetaData                           N
> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:995
>     adjustDrbd                               N
> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:575
>     process                                  N
> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:373
>     process                                  N
> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:731
>     processResourcesAndSnapshots             N
> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:300
>     dispatchResources                        N
> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:138
>     dispatchResources                        N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
>     phaseDispatchDeviceHandlers              N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
>     devMgrLoop                               N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
>     run                                      N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
>     run                                      N java.lang.Thread:834
>
> Caused by:
> ==========
>
> Description:
>     Execution of the external command 'drbdadm' failed.
> Cause:
>     The external command exited with error code 1.
> Correction:
>     - Check whether the external program is operating properly.
>     - Check whether the command line is correct.
>       Contact a system administrator or a developer if the command
> line is no longer valid
>       for the installed version of the external program.
> Additional information:
>     The full command line executed was:
>     drbdadm -vvv --max-peers 7 -- --force create-md testvm1/0
>
>     The external command sent the following output data:
>
>
>     The external command sent the following error information:
>     no resources defined!
>
>
> Category:                           LinStorException
> Class name:                         ExtCmdFailedException
> Class canonical name: com.linbit.extproc.ExtCmdFailedException
> Generated at:                       Method 'execute', Source file
> 'DrbdAdm.java', Line #550
>
> Error message:                      The external command 'drbdadm'
> exited with error code 1
>
>
> Call backtrace:
>
>     Method                                   Native Class:Line number
>     execute                                  N
> com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:550
>     simpleAdmCommand                         N
> com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:495
>     createMd                                 N
> com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:262
>     createMetaData                           N
> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:923
>     adjustDrbd                               N
> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:575
>     process                                  N
> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:373
>     process                                  N
> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:731
>     processResourcesAndSnapshots             N
> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:300
>     dispatchResources                        N
> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:138
>     dispatchResources                        N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
>     phaseDispatchDeviceHandlers              N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
>     devMgrLoop                               N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
>     run                                      N
> com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
>     run                                      N java.lang.Thread:834
>
>
> END OF ERROR REPORT.
>
> Indeed, re-running the same command from the CLI provides the shown
> error message:
>
> drbdadm -vvv --max-peers 7 -- --force create-md testvm1/0
> no resources defined!
>
> Some other random status information which may or may not be relevant...
>
> linstor storage-pool list
> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
> ? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
> TotalCapacity ? CanSnapshots ? State ?
> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
> ? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
> ?               ? False        ? Ok    ?
> ? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
> ?               ? False        ? Ok    ?
> ? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
> ?               ? False        ? Ok    ?
> ? pool                 ? castle ? LVM      ? vg_hdd   ?     2.95 TiB
> ?      3.44 TiB ? False        ? Ok    ?
> ? pool                 ? san5   ? LVM      ? vg_hdd   ?     3.87 TiB
> ?      4.36 TiB ? False        ? Ok    ?
> ? pool                 ? san6   ? LVM      ? vg_ssd   ?     1.26 TiB
> ?      1.75 TiB ? False        ? Ok    ?
> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> I've tried to restart linstor-satellite service on castle, but it
> didn't make any difference.
>
> After a reboot of castle, and now I get this:
>
> linstor resource list
> ??????????????????????????????????????????????????????????????????????
> ? ResourceName ? Node   ? Port ? Usage  ? Conns ? State ?
> ??????????????????????????????????????????????????????????????????????
> ? testvm1      ? castle ? 7000 ? Unused ? Ok    ? Diskless ?
> ? testvm1      ? san5   ? 7000 ? Unused ? Ok    ? SyncTarget(55.99%) ?
> ? testvm1      ? san6   ? 7000 ? Unused ? Ok    ? UpToDate ?
> ??????????????????????????????????????????????????????????????????????
>
> However, looking at the err reports, and I see the exactl same error
> about creating the metadata on castle.
>
> One interesting thing is that the LV seems to have been created:
>
> lvs
>   /dev/drbd0: open failed: Wrong medium type
>   /dev/drbd1: open failed: Wrong medium type
>   LV                            VG      Attr       LSize    Pool
> Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   backup_system_20200624_062513 storage swi-a-s---    4.00g system 3.06
>   system                        storage owi-aos--- 5.00g
>   testvm1_00000                 vg_hdd  -wi-a----- <500.11g
>
> Any suggestions on where to look next? Or what I might have done wrong
> now?
>
> Regards,
> Adam
>
>
>
>
>
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user@lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
Re: linstor issues [ In reply to ]
Hi all,

I've finally had another crack at this, and eventually got it working.

I think I had all of the following issues at various stages, so just
mentioning them here for other future people who find they have similar
issues.

1) Initially, I had a firewall blocking connections between servers. I
fixed this early on, but not sure if it messed up some of the config and
never recovered.

2) Somehow, some of the servers ended up with old 8.4 kernel module
version installed. I think this was related to installing new kernels
from debian, but for some reason it wouldn't update matching header
files, and so dkms couldn't recompile the module for the new kernel.
Thus, reverting to the in-kernel 8.4 version.

3) Finally, I think the hardest issue to track down was that there was a
copy of drbdadm and associated drbd-utils installed under /usr/local and
these were taking priority over the package installed version. Thus,
drbdadm couldn't find the drbd resource config files because linstor was
putting them in /etc/drbd.d and drbdadm was looking in /usr/local/etc/drbd.d

Question: Is this something that linstor should be aware of (ie, storing
the resource files in the same location as the installed/used drbdadm
executable expects them to go)? I guess it's a pretty rare issue, but
curious if it could be avoided easily.

After a few complete wipe/reinstalls, I found and fixed these issues,
and now looks like I have a working installation.

linstor node list
????????????????????????????????????????????????????????????
? Node   ? NodeType  ? Addresses                  ? State  ?
????????????????????????????????????????????????????????????
? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Online ?
? san5   ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Online ?
? san6   ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Online ?
????????????????????????????????????????????????????????????

linstor storage-pool list
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
TotalCapacity ? CanSnapshots ? State ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????
? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
?               ? False        ? Ok    ?
? pool_hdd             ? castle ? LVM      ? vg_hdd   ?     2.95 TiB
?      3.44 TiB ? False        ? Ok    ?
? pool_hdd             ? san5   ? LVM      ? vg_hdd   ?     3.87 TiB
?      4.36 TiB ? False        ? Ok    ?
? pool_hdd             ? san6   ? LVM      ? vg_hdd   ?     1.26 TiB
?      1.75 TiB ? False        ? Ok    ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????

Thank you for those that replied and helped out along the way, I'm
finally feeling a lot more confident in moving forward with this
project. I'll no doubt be asking a few more questions along the way, but
hopefully there won't be anything quite as unexpected as this (#3)
happening.

Regards,
Adam


On 14/7/20 16:43, Adam Goryachev wrote:
>
> I'm having another crack at this, I think it will be worth it once it
> works.
>
> Firstly, another documentation error:
>
> https://www.linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-using_the_linstor_client
>
>> In case anything goes wrong with the storage pool’s VG/zPool, e.g.
>> the VG having been renamed or somehow became invalid you can delete
>> the storage pool in LINSTOR with the following command, given that
>> only resources with all their volumes in the so-called ‘lost’ storage
>> pool are attached. This feature is available since LINSTOR v0.9.13.
>>
>> # linstor storage-pool lost alpha pool_ssd
> linstor storage-pool lost castle vg_hdd
> usage: linstor storage-pool [-h]
>                             {create, delete, list, list-properties,
>                             set-property} ...
> linstor storage-pool: error: argument {create, delete, list,
> list-properties, set-property}: invalid choice: 'lost' (choose from
> 'create', 'c', 'delete', 'd', 'list', 'l', 'list-properties', 'lp',
> 'set-property', 'sp')
>
> Changing to use delete instead of lost:
>
> castle:~# linstor storage-pool delete castle vg_hdd
> ERROR:
> Description:
>     Storage pool definition 'vg_hdd' not found.
> Cause:
>     The specified storage pool definition 'vg_hdd' could not be found
> in the database
> Correction:
>     Create a storage pool definition 'vg_hdd' first.
> Details:
>     Node: castle, Storage pool name: vg_hdd
> Show reports:
>     linstor error-reports show 5F0D500C-00000-000000
> castle:~# linstor storage-pool list
> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
> ? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity ?
> TotalCapacity ? CanSnapshots ? State ?
> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
> ? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
> ?               ? False        ? Ok    ?
> ? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
> ?               ? False        ? Ok    ?
> ? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
> ?               ? False        ? Ok    ?
> ? pool                 ? castle ? LVM      ? vg_hdd   ?     2.95 TiB
> ?      3.44 TiB ? False        ? Ok    ?
> ? pool                 ? san5   ? LVM      ? vg_hdd   ?     3.87 TiB
> ?      4.36 TiB ? False        ? Ok    ?
> ? pool                 ? san6   ? LVM      ? vg_ssd   ?     1.26 TiB
> ?      1.75 TiB ? False        ? Ok    ?
> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> I was hoping I could just remove the storage pool from castle (since
> it doesn't seem to be working properly), and then destroy it,
> re-create it, and then re-add it and see if that solves the problem.
> However, while it seems to exist, it also doesn't (can't delete it).
>
> Possibly part of the cause of my original problem is that I have a
> script that automatically creates a snapshot for each LV, and this
> created a snapshot of testvm1_00000 named
> backup_testvm1_00000_blahblah.... I've now manually deleted that, and
> fixed my script to avoid messing with the VG allocated to linstor, but
> so far, there is no change in the current status (as per below).
>
> Would appreciate any suggestions on what might be going wrong, and/or
> how to fix it?
>
> Regards,
> Adam
>
>
>
> On 24/6/20 11:46, Adam Goryachev wrote:
>>
>>
>> On 23/6/20 21:53, Gábor Hernádi wrote:
>>> Hi,
>>>
>>> apparently something is quite broken... maybe it's somehow your
>>> setup or environment, I am not sure...
>>>
>>> linstor resource list
>>> ??????????????????????????????????????????????????????????????????????????????
>>> ? ResourceName ? Node   ? Port ? Usage  ?
>>> Conns                   ?    State ?
>>> ??????????????????????????????????????????????????????????????????????????????
>>> ? testvm1      ? castle ? 7000 ? ?                         ? 
>>> Unknown ?
>>> ? testvm1      ? san5   ? 7000 ? ?                         ? 
>>> Unknown ?
>>> ? testvm1      ? san6   ? 7000 ? Unused ?
>>> Connecting(san5,castle) ? UpToDate ?
>>> ??????????????????????????????????????????????????????????????????????????????
>>>
>>> This looks like some kind of network issues.
>>>
>>> # linstor storage-pool list --groupby Size
>>>
>>> However, the second command produces a usage error
>>> (documentation bug perhaps).
>>>
>>>
>>> Thanks for reporting, we will look into this.
>>>
>>> WARNING:
>>> Description:
>>>     No active connection to satellite 'san5'
>>> Details:
>>>     The controller is trying to (re-) establish a connection to
>>> the satellite. The controller stored the changes and as soon the
>>> satellite is connected, it will receive this update.
>>>
>>>
>>> So Linstor has obviously no connection to satellite 'san5'.
>>>
>>> [95078.599813] drbd testvm1 castle: conn( Unconnected ->
>>> Connecting )
>>> [95078.604454] drbd testvm1 san5: conn( Unconnected -> Connecting )
>>>
>>>
>>> ... and DRBD apparently also has troubles connecting...
>>>
>>> linstor n l
>>> ?????????????????????????????????????????????????????????????
>>> ? Node   ? NodeType  ? Addresses                  ? State   ?
>>> ?????????????????????????????????????????????????????????????
>>> ? castle ? SATELLITE ? 192.168.5.204:3366
>>> <http://192.168.5.204:3366> (PLAIN) ? Unknown ?
>>> ? san5   ? SATELLITE ? 192.168.5.205:3366
>>> <http://192.168.5.205:3366> (PLAIN) ? Unknown ?
>>> ? san6   ? SATELLITE ? 192.168.5.206:3366
>>> <http://192.168.5.206:3366> (PLAIN) ? Unknown ?
>>> ?????????????????????????????????????????????????????????????
>>>
>>>
>>> Now  this is really strange. I will spare you with some details, but
>>> I assume you have triggered some bad exception in Linstor which
>>> somehow killed a necessary thread.
>>> You should check
>>>    linstor err list
>>> and see if you can find some related error reports.
>>> Also, restarting the controller might help you here.
>>>
>> Thank you!
>>
>> linstor err list showed a list of errors, but the contents didn't
>> make a lot of sense to me. Let me know if you are interested in them,
>> and I can send them.
>>
>> I did a systemctl restart linstor-controller.service on san6, and
>> things started looking much better.
>>
>> linstor n l
>> ????????????????????????????????????????????????????????????
>> ? Node   ? NodeType  ? Addresses                  ? State  ?
>> ????????????????????????????????????????????????????????????
>> ? castle ? SATELLITE ? 192.168.5.204:3366 (PLAIN) ? Online ?
>> ? san5   ? SATELLITE ? 192.168.5.205:3366 (PLAIN) ? Online ?
>> ? san6   ? SATELLITE ? 192.168.5.206:3366 (PLAIN) ? Online ?
>> ????????????????????????????????????????????????????????????
>>
>> So, all nodes agree that they are now online and talking to each
>> other. I assume this proves there is no network issues.
>>
>> linstor resource list
>> ???????????????????????????????????????????????????????????????????????????????????
>> ? ResourceName ? Node   ? Port ? Usage  ? Conns ?              State ?
>> ???????????????????????????????????????????????????????????????????????????????????
>> ? testvm1      ? castle ? 7000 ?        ? ?            Unknown ?
>> ? testvm1      ? san5   ? 7000 ? Unused ? Connecting(castle) ?
>> SyncTarget(12.67%) ?
>> ? testvm1      ? san6   ? 7000 ? Unused ? Connecting(castle)
>> ?           UpToDate ?
>> ???????????????????????????????????????????????????????????????????????????????????
>>
>> From this, it looks like san6 (the controller) thinks it has the up
>> to date data, probably based on the fact it was created there first
>> or something. The data is syncing to san5 (in progress, and
>> progressing steadily), so that is good also. However, castle doesn't
>> seem to be syncing/connecting.
>>
>> On castle, I see this:
>>
>> Jun 24 11:01:55 castle Satellite[7499]: 11:01:55.177 [DeviceManager]
>> ERROR LINSTOR/Satellite - SYSTEM - Failed to create meta-data for
>> DRBD volume testvm1/0 [Report number 5EF2A316-31431-000002]
>>
>> linstor err show give this:
>>
>> ERROR REPORT 5EF2A316-31431-000002
>>
>> ============================================================
>>
>> Application:                        LINBIT® LINSTOR
>> Module:                             Satellite
>> Version:                            1.7.1
>> Build ID: 6760637d6fae7a5862103ced4ea0ab0a758861f9
>> Build time:                         2020-05-14T13:14:11+00:00
>> Error time:                         2020-06-24 11:01:55
>> Node:                               castle
>>
>> ============================================================
>>
>> Reported error:
>> ===============
>>
>> Description:
>>     Failed to create meta-data for DRBD volume testvm1/0
>>
>> Category:                           LinStorException
>> Class name:                         VolumeException
>> Class canonical name:
>> com.linbit.linstor.storage.layer.exceptions.VolumeException
>> Generated at:                       Method 'createMetaData', Source
>> file 'DrbdLayer.java', Line #995
>>
>> Error message:                      Failed to create meta-data for
>> DRBD volume testvm1/0
>>
>> Error context:
>>     An error occurred while processing resource 'Node: 'castle', Rsc:
>> 'testvm1''
>>
>> Call backtrace:
>>
>>     Method                                   Native Class:Line number
>>     createMetaData                           N
>> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:995
>>     adjustDrbd                               N
>> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:575
>>     process                                  N
>> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:373
>>     process                                  N
>> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:731
>>     processResourcesAndSnapshots             N
>> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:300
>>     dispatchResources                        N
>> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:138
>>     dispatchResources                        N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
>>     phaseDispatchDeviceHandlers              N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
>>     devMgrLoop                               N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
>>     run                                      N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
>>     run                                      N java.lang.Thread:834
>>
>> Caused by:
>> ==========
>>
>> Description:
>>     Execution of the external command 'drbdadm' failed.
>> Cause:
>>     The external command exited with error code 1.
>> Correction:
>>     - Check whether the external program is operating properly.
>>     - Check whether the command line is correct.
>>       Contact a system administrator or a developer if the command
>> line is no longer valid
>>       for the installed version of the external program.
>> Additional information:
>>     The full command line executed was:
>>     drbdadm -vvv --max-peers 7 -- --force create-md testvm1/0
>>
>>     The external command sent the following output data:
>>
>>
>>     The external command sent the following error information:
>>     no resources defined!
>>
>>
>> Category:                           LinStorException
>> Class name:                         ExtCmdFailedException
>> Class canonical name: com.linbit.extproc.ExtCmdFailedException
>> Generated at:                       Method 'execute', Source file
>> 'DrbdAdm.java', Line #550
>>
>> Error message:                      The external command 'drbdadm'
>> exited with error code 1
>>
>>
>> Call backtrace:
>>
>>     Method                                   Native Class:Line number
>>     execute                                  N
>> com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:550
>>     simpleAdmCommand                         N
>> com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:495
>>     createMd                                 N
>> com.linbit.linstor.storage.layer.adapter.drbd.utils.DrbdAdm:262
>>     createMetaData                           N
>> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:923
>>     adjustDrbd                               N
>> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:575
>>     process                                  N
>> com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:373
>>     process                                  N
>> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:731
>>     processResourcesAndSnapshots             N
>> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:300
>>     dispatchResources                        N
>> com.linbit.linstor.core.devmgr.DeviceHandlerImpl:138
>>     dispatchResources                        N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
>>     phaseDispatchDeviceHandlers              N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
>>     devMgrLoop                               N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
>>     run                                      N
>> com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
>>     run                                      N java.lang.Thread:834
>>
>>
>> END OF ERROR REPORT.
>>
>> Indeed, re-running the same command from the CLI provides the shown
>> error message:
>>
>> drbdadm -vvv --max-peers 7 -- --force create-md testvm1/0
>> no resources defined!
>>
>> Some other random status information which may or may not be relevant...
>>
>> linstor storage-pool list
>> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
>> ? StoragePool          ? Node   ? Driver   ? PoolName ? FreeCapacity
>> ? TotalCapacity ? CanSnapshots ? State ?
>> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
>> ? DfltDisklessStorPool ? castle ? DISKLESS ? ?             
>> ?               ? False        ? Ok    ?
>> ? DfltDisklessStorPool ? san5   ? DISKLESS ? ?             
>> ?               ? False        ? Ok    ?
>> ? DfltDisklessStorPool ? san6   ? DISKLESS ? ?             
>> ?               ? False        ? Ok    ?
>> ? pool                 ? castle ? LVM      ? vg_hdd   ? 2.95 TiB
>> ?      3.44 TiB ? False        ? Ok    ?
>> ? pool                 ? san5   ? LVM      ? vg_hdd   ? 3.87 TiB
>> ?      4.36 TiB ? False        ? Ok    ?
>> ? pool                 ? san6   ? LVM      ? vg_ssd   ? 1.26 TiB
>> ?      1.75 TiB ? False        ? Ok    ?
>> ?????????????????????????????????????????????????????????????????????????????????????????????????????????????
>>
>> I've tried to restart linstor-satellite service on castle, but it
>> didn't make any difference.
>>
>> After a reboot of castle, and now I get this:
>>
>> linstor resource list
>> ??????????????????????????????????????????????????????????????????????
>> ? ResourceName ? Node   ? Port ? Usage  ? Conns ? State ?
>> ??????????????????????????????????????????????????????????????????????
>> ? testvm1      ? castle ? 7000 ? Unused ? Ok    ? Diskless ?
>> ? testvm1      ? san5   ? 7000 ? Unused ? Ok    ? SyncTarget(55.99%) ?
>> ? testvm1      ? san6   ? 7000 ? Unused ? Ok    ? UpToDate ?
>> ??????????????????????????????????????????????????????????????????????
>>
>> However, looking at the err reports, and I see the exactl same error
>> about creating the metadata on castle.
>>
>> One interesting thing is that the LV seems to have been created:
>>
>> lvs
>>   /dev/drbd0: open failed: Wrong medium type
>>   /dev/drbd1: open failed: Wrong medium type
>>   LV                            VG      Attr       LSize Pool Origin
>> Data%  Meta%  Move Log Cpy%Sync Convert
>>   backup_system_20200624_062513 storage swi-a-s--- 4.00g      system
>> 3.06
>>   system                        storage owi-aos--- 5.00g
>>   testvm1_00000                 vg_hdd  -wi-a----- <500.11g
>>
>> Any suggestions on where to look next? Or what I might have done
>> wrong now?
>>
>> Regards,
>> Adam
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Star us on GITHUB:https://github.com/LINBIT
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> https://lists.linbit.com/mailman/listinfo/drbd-user
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user@lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user