Mailing List Archive

Slow linstor queries after recent upgrade
Hi,

I'm using Linstor on 5 node Proxmox 7.x cluster. After recent upgrade
I've got some performance issues with some linstor queries.

Upgrade: linstor-common:amd64 (1.20.2-1, 1.22.1-1), python-linstor:amd64
(1.16.0-1, 1.18.0-1), linstor-client:amd64 (1.16.0-1, 1.18.0-1),
linstor-controller:amd64 (1.20.2-1, 1.22.1-1), linstor-satellite:amd64
(1.20.2-1, 1.22.1-1), linstor-proxmox:amd64 (6.1.0-1, 7.0.0-1)

Some queries work fast:

# time linstor node list
+------------------------------------------------------------------------------+
| Node   | NodeType| Addresse               s                   | State   |
|================================================|
| node1 | SATELLITE | xxx.xxx.xxx.xx1:3366 (PLAIN) | Online |
| node2 | SATELLITE | xxx.xxx.xxx.xx2:3366 (PLAIN) | Online |
| node3 | SATELLITE | xxx.xxx.xxx.xx3:3366 (PLAIN) | Online |
| node4 | SATELLITE | xxx.xxx.xxx.xx4:3366 (PLAIN) | Online |
| node5 | SATELLITE | xxx.xxx.xxx.xx5:3366 (PLAIN) | Online |
+------------------------------------------------------------------------------+

real    0m0.294s
user    0m0.259s
sys     0m0.033s

But some got very slow, up to a minute. I'm 100% sure it was a lot
faster before this upgrade:

# time linstor volume list
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node     | Resource          | StoragePool               | VolNr |
MinorNr | DeviceName     |  Allocated    | InUse    | State |
|=================================================================================================|
| node3   | linstor_db        | pool-drbd-mgmt       |         0 |    
1000 | /dev/drbd1000 | 136.80 MiB | InUse    |   UpToDate |
| node4   | linstor_db        | pool-drbd-mgmt       |         0 |      
1000 | /dev/drbd1000 | 141.82 MiB | Unused |   UpToDate |
| node5   | linstor_db        | pool-drbd-mgmt       |         0 |    
1000 | /dev/drbd1000 |   36.43 MiB | Unused |   UpToDate |
| node3   | vm-100-disk-2 | DfltDisklessStorPool |        0 | 1012 |
/dev/drbd1012 |                      | Unused | TieBreaker |
| node4   | vm-100-disk-2 | pool-drbd-4-5            |        0 |    
1012 | /dev/drbd1012 |   29.90 GiB  | InUse    |   UpToDate |
| node5   | vm-100-disk-2 | pool-drbd-4-5            |        0 |      
1012 | /dev/drbd1012 |   29.90 GiB  | Unused |   UpToDate |
| node1   | vm-101-disk-3 | DfltDisklessStorPool |        0 | 1048 |
/dev/drbd1048 |                      | Unused | TieBreaker |
| node4   | vm-101-disk-3 | pool-drbd-4-5            |        0 |      
1048 | /dev/drbd1048 | 146.09 GiB | InUse     |   UpToDate |
| node5   | vm-101-disk-3 | pool-drbd-4-5            |        0 |    
1048 | /dev/drbd1048 | 147.09 GiB | Unused |   UpToDate |
| node1   | vm-102-disk-2 | pool-drbd-1-2            |        0 |      
1021 | /dev/drbd1021 |        16 MiB | InUse     | UpToDate |
| node2   | vm-102-disk-2 | pool-drbd-1-2            |        0 |    
1021 | /dev/drbd1021 |        16 MiB | Unused |   UpToDate |
| node3   | vm-102-disk-2 | DfltDisklessStorPool |        0 | 1021 |
/dev/drbd1021 |                      | Unused | TieBreaker |
| node1   | vm-102-disk-3 | pool-drbd-1-2            |        0 |      
1044 | /dev/drbd1044 |   10.25 GiB | InUse     |   UpToDate |
| node2   | vm-102-disk-3 | pool-drbd-1-2            |        0 |      
1044 | /dev/drbd1044 |   10.25 GiB | Unused |   UpToDate |
| node3   | vm-102-disk-3 | DfltDisklessStorPool |        0 | 1044 |
/dev/drbd1044 |                      | Unused | TieBreaker |

[...]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

real    0m58.888s
user    0m0.229s
sys     0m0.047s

There is 113 volumes (with TieBreakers):

# linstor volume list | grep -c /dev/drbd
113

Performance graphs in Proxmox are all scattered after upgrade, I suppose
it's related to long query times. It looks like this - top image is
before upgrade, bottom one is after upgrade: https://imgur.com/a/c5FrS0y

I've changed node on which linstor-controller runs but it didn't help.

Is there anything I can do to make it work like before this upgrade? I
will gladly provide additional info if needed.

--
Best regards
?ukasz W?sikowski

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: Slow linstor queries after recent upgrade [ In reply to ]
On Sun, May 07, 2023 at 02:45:53PM +0200, ?ukasz W?sikowski wrote:
> Hi,
>
> I'm using Linstor on 5 node Proxmox 7.x cluster. After recent upgrade I've
> got some performance issues with some linstor queries.
>
> Upgrade: linstor-common:amd64 (1.20.2-1, 1.22.1-1), python-linstor:amd64
> (1.16.0-1, 1.18.0-1), linstor-client:amd64 (1.16.0-1, 1.18.0-1),
> linstor-controller:amd64 (1.20.2-1, 1.22.1-1), linstor-satellite:amd64
> (1.20.2-1, 1.22.1-1), linstor-proxmox:amd64 (6.1.0-1, 7.0.0-1)

I'm pretty sure this isn't a LINSTOR issue but more a linstor-proxmox
issue: Proxmox calls the plugins status API *a* *lot*. And in case of
LINSTOR it isn't only that 1 nodes calls the controller a lot, but all
the nodes query it separately. There is a "cache" in the Proxmox API and
we even rolled our own that is stored persistently in the file system.
With 7.0.0 we are in a situation where we can not use the cache as
efficient as before, and I guess that shows. You can a) increase
"statuscache", or you can downgrade to 6.1.0. 6.1.0 should be fine, the
only thing that changed was improved storage information, but if you
have been happy so far, please downgrade. I have an idea how we can keep
the information in the cache more efficiently, but that will also
require some development effort in LINSTOR itself.

Regards, rck
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: Slow linstor queries after recent upgrade [ In reply to ]
W dniu 08.05.2023 o 09:46, Roland Kammerer pisze:
> On Sun, May 07, 2023 at 02:45:53PM +0200, ?ukasz W?sikowski wrote:
>> Hi,
>>
>> I'm using Linstor on 5 node Proxmox 7.x cluster. After recent upgrade I've
>> got some performance issues with some linstor queries.
>>
>> Upgrade: linstor-common:amd64 (1.20.2-1, 1.22.1-1), python-linstor:amd64
>> (1.16.0-1, 1.18.0-1), linstor-client:amd64 (1.16.0-1, 1.18.0-1),
>> linstor-controller:amd64 (1.20.2-1, 1.22.1-1), linstor-satellite:amd64
>> (1.20.2-1, 1.22.1-1), linstor-proxmox:amd64 (6.1.0-1, 7.0.0-1)
> I'm pretty sure this isn't a LINSTOR issue but more a linstor-proxmox
> issue: Proxmox calls the plugins status API *a* *lot*. And in case of
> LINSTOR it isn't only that 1 nodes calls the controller a lot, but all
> the nodes query it separately. There is a "cache" in the Proxmox API and
> we even rolled our own that is stored persistently in the file system.
> With 7.0.0 we are in a situation where we can not use the cache as
> efficient as before, and I guess that shows. You can a) increase
> "statuscache", or you can downgrade to 6.1.0. 6.1.0 should be fine, the
> only thing that changed was improved storage information, but if you
> have been happy so far, please downgrade. I have an idea how we can keep
> the information in the cache more efficiently, but that will also
> require some development effort in LINSTOR itself.

That helped, thank you!

--
Best regards
?ukasz W?sikowski
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: Slow linstor queries after recent upgrade [ In reply to ]
On Wed, May 10, 2023 at 04:49:56PM +0200, ?ukasz W?sikowski wrote:
> W dniu 08.05.2023 o 09:46, Roland Kammerer pisze:
> > On Sun, May 07, 2023 at 02:45:53PM +0200, ?ukasz W?sikowski wrote:
> > > Hi,
> > >
> > > I'm using Linstor on 5 node Proxmox 7.x cluster. After recent upgrade I've
> > > got some performance issues with some linstor queries.
> > >
> > > Upgrade: linstor-common:amd64 (1.20.2-1, 1.22.1-1), python-linstor:amd64
> > > (1.16.0-1, 1.18.0-1), linstor-client:amd64 (1.16.0-1, 1.18.0-1),
> > > linstor-controller:amd64 (1.20.2-1, 1.22.1-1), linstor-satellite:amd64
> > > (1.20.2-1, 1.22.1-1), linstor-proxmox:amd64 (6.1.0-1, 7.0.0-1)
> > I'm pretty sure this isn't a LINSTOR issue but more a linstor-proxmox
> > issue: Proxmox calls the plugins status API *a* *lot*. And in case of
> > LINSTOR it isn't only that 1 nodes calls the controller a lot, but all
> > the nodes query it separately. There is a "cache" in the Proxmox API and
> > we even rolled our own that is stored persistently in the file system.
> > With 7.0.0 we are in a situation where we can not use the cache as
> > efficient as before, and I guess that shows. You can a) increase
> > "statuscache", or you can downgrade to 6.1.0. 6.1.0 should be fine, the
> > only thing that changed was improved storage information, but if you
> > have been happy so far, please downgrade. I have an idea how we can keep
> > the information in the cache more efficiently, but that will also
> > require some development effort in LINSTOR itself.
>
> That helped, thank you!

Too much technical details, but we we did the necessary development, a
new LINSTOR RC1 should follow early next week and I will adapt the
plugin during the RC period. So next LINSTOR + linstor-proxmox 7.0.1
should then behave on larger setups again. Sorry for the inconvenience.
Probably also a good call for people to test RCs, I do them for a
reason.

Regards, rck
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user