Mailing List Archive: Database (Got timeout reading communication packets)

Database (Got timeout reading communication packets)

May 8, 2018, 11:17 AM

Post #1 of 5 (742 views)

Just the other day I noticed a bunch of errors spewing from the mysql service. I've spent quite a bit of time trying to track this down, and I haven't had any luck figuring out why this is happening. The following line is repeatedly spewed in the service's journal.

May 08 11:13:47 UBNTU-DBMQ2 mysqld[20788]: 2018-05-08 11:13:47 140127545740032 [Warning] Aborted connection 211 to db: 'nova_api' user: 'nova' host: '192.168.116.21' (Got timeout reading communication packets)

It isn't always nova_api, it's happening with all of the openstack projects. And either of the controller node's ip addresses.

The database is a mariadb galera cluster. Removing haproxy has no effect. The output only occurs on the node receiving the connections; with haproxy it is multiple nodes, otherwise it is whatever node I specify as database in my controllers' host file's.

Re: Database (Got timeout reading communication packets) [ In reply to ]

eblock at nde

May 14, 2018, 2:32 AM

Post #2 of 5 (737 views)

Permalink

Hi,

are these interruptions occasionally or do they occur all the time? Is
this a new issue or has this happened before?
Does the openstack environment work as expected despite these messages
or do you experience interruptions in the services?

I would check the network setup first (I have read about loose cables
in different threads...), maybe run some ping tests between the
machines to see if there's anything weird. Since you mention different
services reporting these interruptions this seems like a network issue
to me.

Regards,
Eugen

Zitat von Torin Woltjer <torin.woltjer@granddial.com>:

> Just the other day I noticed a bunch of errors spewing from the
> mysql service. I've spent quite a bit of time trying to track this
> down, and I haven't had any luck figuring out why this is happening.
> The following line is repeatedly spewed in the service's journal.
>
> May 08 11:13:47 UBNTU-DBMQ2 mysqld[20788]: 2018-05-08 11:13:47
> 140127545740032 [Warning] Aborted connection 211 to db: 'nova_api'
> user: 'nova' host: '192.168.116.21' (Got timeout reading
> communication packets)
>
> It isn't always nova_api, it's happening with all of the openstack
> projects. And either of the controller node's ip addresses.
>
> The database is a mariadb galera cluster. Removing haproxy has no
> effect. The output only occurs on the node receiving the
> connections; with haproxy it is multiple nodes, otherwise it is
> whatever node I specify as database in my controllers' host file's.

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Database (Got timeout reading communication packets) [ In reply to ]

torin.woltjer at granddial

May 14, 2018, 7:02 AM

Post #3 of 5 (737 views)

Permalink

>are these interruptions occasionally or do they occur all the time? Is
>this a new issue or has this happened before?

This is a 3 node Galera cluster on 3 KVM virtual machines. The errors are
constantly printing in the logs, and no node is excluded from receiving the
errors. I don't know whether they had always been there or not, but I
noticed them after an update.

>Does the openstack environment work as expected despite these messages
>or do you experience interruptions in the services?

The openstack services operate normally, the dashboard is fairly slow, but it
always has been.

>I would check the network setup first (I have read about loose cables
>in different threads...), maybe run some ping tests between the
>machines to see if there's anything weird. Since you mention different
>services reporting these interruptions this seems like a network issue
>to me.

The hosts are all networked with bonded 10G SFP+ cables networked via a
switch. Pings between the VMs seem fine. If I were to guess, any networking
problem would be between the guest and host due to libvirt. Anything that I
should be looking for there?

Re: Database (Got timeout reading communication packets) [ In reply to ]

eblock at nde

May 14, 2018, 7:41 AM

Post #4 of 5 (737 views)

Permalink

While I was working on something else I remembered the error messages
you described, I have them, too. It's a lab environment on hardware
nodes with a sufficient network connection, and since we had to debug
network issues before, we can rule out network problems in our case.
I found a website [1] to track down galera issues, I tried to apply
those steps and it seems that the openstack code doesn't close the
connections properly, hence the aborted connections.
I'm not sure if this is the correct interpretation, but since I didn't
face any problems related to the openstack databases I decided to
ignore these messages as long as the openstack environment works
properly.

Regards,
Eugen

[1] https://www.fromdual.ch/abbrechende-mariadb-mysql-verbindungen

Zitat von Torin Woltjer <torin.woltjer@granddial.com>:

>> are these interruptions occasionally or do they occur all the time? Is
>> this a new issue or has this happened before?
>
> This is a 3 node Galera cluster on 3 KVM virtual machines. The errors are
> constantly printing in the logs, and no node is excluded from receiving the
> errors. I don't know whether they had always been there or not, but I
> noticed them after an update.
>
>> Does the openstack environment work as expected despite these messages
>> or do you experience interruptions in the services?
>
> The openstack services operate normally, the dashboard is fairly slow, but it
> always has been.
>
>> I would check the network setup first (I have read about loose cables
>> in different threads...), maybe run some ping tests between the
>> machines to see if there's anything weird. Since you mention different
>> services reporting these interruptions this seems like a network issue
>> to me.
>
> The hosts are all networked with bonded 10G SFP+ cables networked via a
> switch. Pings between the VMs seem fine. If I were to guess, any networking
> problem would be between the guest and host due to libvirt. Anything that I
> should be looking for there?

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Database (Got timeout reading communication packets) [ In reply to ]

torin.woltjer at granddial

May 14, 2018, 7:51 AM

Post #5 of 5 (737 views)

Permalink

>While I was working on something else I remembered the error messages >you described, I have them, too. It's a lab environment on hardware >nodes with a sufficient network connection, and since we had to debug >network issues before, we can rule out network problems in our case. >I found a website [1] to track down galera issues, I tried to apply >those steps and it seems that the openstack code doesn't close the >connections properly, hence the aborted connections. >I'm not sure if this is the correct interpretation, but since I didn't >face any problems related to the openstack databases I decided to >ignore these messages as long as the openstack environment works >properly. I did think something similar to this initially when I noticed a high number of sleeping connections, but because I was unsure I thought to ask. Because this effects all Openstack services as a whole, what project would I file a bug report on?