Mailing List Archive

Re: [Openstack-operators] Recovering from full outage
The neutron-metadata-agent service is running, the the agent is alive, and it is listening on port 8775. However, new instances still do not get any information like hostname or keypair. If I run `curl 192.168.116.22:8775` from the compute nodes, I do get a response. The metadata agent is running, listening, and accessible from the compute nodes; and it worked previously.

I'm stumped.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----------------------------------------
From: ????? ?????? (Arun Kumar) <thangam.arunx@gmail.com>
Sent: 7/12/18 12:01 AM
To: torin.woltjer@granddial.com
Cc: "openstack@lists.openstack.org" <openstack@lists.openstack.org>, openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage
Hi Torin,

If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat -lnp` on the controller, should I see anything listening on the metadata port (8775)? When I run these commands I don't see that listening, but I have no example of a working system to check against. Can anybody verify this?

Either on qrouter/qdhcp namespaces, you won't see port 8775, instead check whether meta-data service is running on the neutron controller node(s) and listening on port 8775? Aslo, you can verify metadata and neturon services using following commands

service neutron-metadata-agent status neutron agent-list netstat -ntplua | grep :8775

Thanks & Regards
Arun

??????????????????????????????????????
????????
?????
??????? ??????????? ???????? ????????
http://thangamaniarun.wordpress.com
??????????????????????????????????????
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
Are you instances receiving a route to the metadata service
(169.254.169.254) from DHCP? Can you curl the endpoint? curl
http://169.254.169.254/latest/meta-data
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
I tested this on two instances. The first instance has existed since before I began having this issue. The second is created from a cirros test image.

On the first instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100.
curl returns information, for example;
`curl http://169.254.169.254/latest/meta-data/public-keys`
0=nextcloud

On the second instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev eth0
curl fails;
`curl http://169.254.169.254/latest/meta-data`
curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out

I am curious why this is the case that one is able to connect but not the other. Both the first and second instances were running on the same compute node.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----------------------------------------
From: John Petrini <jpetrini@coredial.com>
Sent: 7/12/18 9:16 AM
To: torin.woltjer@granddial.com
Cc: thangam.arunx@gmail.com, OpenStack Operators <openstack-operators@lists.openstack.org>, OpenStack Mailing List <openstack@lists.openstack.org>
Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage
Are you instances receiving a route to the metadata service (169.254.169.254) from DHCP? Can you curl the endpoint? curl http://169.254.169.254/latest/meta-data
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
On 07/12/2018 08:20 AM, Torin Woltjer wrote:
> The neutron-metadata-agent service is running, the the agent is alive,
> and it is listening on port 8775. However, new instances still do not
> get any information like hostname or keypair. If I run `curl
> 192.168.116.22:8775` from the compute nodes, I do get a response. The
> metadata agent is running, listening, and accessible from the compute
> nodes; and it worked previously.
>
> I'm stumped.

There is also a metadata proxy that runs in the qrouter namespace, you
can verify it's running and getting requests by looking at both iptables
and netstat output.

$ sudo ip netns exec qrouter-$ID iptables-save -c | grep 169
[16:960] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
[96:7968] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+
-p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff

The numbers inside [] represent packets:bytes, so non-zero is good.

$ sudo ip netns exec qrouter-$ID netstat -anep | grep 9697
tcp 0 0 0.0.0.0:9697 0.0.0.0:*
LISTEN 0 294339 4867/haproxy

If you have a running instance you can log into, running curl to the
metadata IP would be helpful to try and diagnose since it would go
through this entire path.

-Brian


> /*Torin Woltjer*/
> *Grand Dial Communications - A ZK Tech Inc. Company*
> *616.776.1066 ext. 2006*
> /*<http://www.granddial.com>www.granddial.com <http://www.granddial.com>*/
>
> ------------------------------------------------------------------------
> *From*: ????? ?????? (Arun Kumar) <thangam.arunx@gmail.com>
> *Sent*: 7/12/18 12:01 AM
> *To*: torin.woltjer@granddial.com
> *Cc*: "openstack@lists.openstack.org" <openstack@lists.openstack.org>,
> openstack-operators@lists.openstack.org
> *Subject*: Re: [Openstack-operators] [Openstack] Recovering from full outage
> Hi Torin,
>
> If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec
> qdhcp netstat -lnp` on the controller, should I see anything
> listening on the metadata port (8775)? When I run these commands I
> don't see that listening, but I have no example of a working system
> to check against. Can anybody verify this?
>
>
> Either on qrouter/qdhcp namespaces, you won't see port 8775, instead
> check whether meta-data service is running on the neutron controller
> node(s) and listening on port 8775? Aslo, you can verify metadata and
> neturon services using following commands
>
> service neutron-metadata-agent status
> neutron agent-list
> netstat -ntplua | grep :8775
>
>
> Thanks & Regards
> Arun
>
> ??????????????????????????????????????
> ????????
> ?????
> ??????? ??????????? ???????? ????????
> <http://thangamaniarun.wordpress.com><http://thangamaniarun.wordpress.com>http://thangamaniarun.wordpress.com
> ??????????????????????????????????????
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
You might want to try giving the neutron-dhcp and metadata agents a restart.
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
Checking iptables for the metadata-proxy inside of qrouter provides the following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697
returns nothing, so there isn't actually anything running on 9697 in the network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived
raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ] DGRAM 64501 7567/python2
unix 2 [ ] DGRAM 79953 8403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent.

Thank you for this, and any future help.
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
$ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl http://169.254.169.254
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
<h1>404 Not Found</h1>
The resource could not be found.<br /><br />

</body>
</html>
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl http://169.254.169.254
curl: (7) Couldn't connect to server

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----------------------------------------
From: "Torin Woltjer" <torin.woltjer@granddial.com>
Sent: 7/12/18 11:16 AM
To: <haleyb.dev@gmail.com>, <thangam.arunx@gmail.com>, "jpetrini@coredial.com" <jpetrini@coredial.com>
Cc: openstack-operators@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
Checking iptables for the metadata-proxy inside of qrouter provides the following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697
returns nothing, so there isn't actually anything running on 9697 in the network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived
raw 0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ] DGRAM 64501 7567/python2
unix 2 [ ] DGRAM 79953 8403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent.

Thank you for this, and any future help.
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
On 07/16/2018 08:41 AM, Torin Woltjer wrote:
> $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl
> http://169.254.169.254
> <html>
> <head>
>  <title>404 Not Found</title>
> </head>
> <body>
>  <h1>404 Not Found</h1>
>  The resource could not be found.<br /><br /> > </body>
> </html>

Strange, don't know where the reply came from for that.

> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl
> http://169.254.169.254
> curl: (7) Couldn't connect to server

Based on your iptables output below, I would think the metadata proxy is
running in the qrouter namespace. However, a curl from there will not
work since it is restricted to only work for incoming packets from the
qr- device(s). You would have to try curl from a running instance.

Is there an haproxy process running? And is it listening on port 9697
in the qrouter namespace?

-Brian


> ------------------------------------------------------------------------
> *From*: "Torin Woltjer" <torin.woltjer@granddial.com>
> *Sent*: 7/12/18 11:16 AM
> *To*: <haleyb.dev@gmail.com>, <thangam.arunx@gmail.com>,
> "jpetrini@coredial.com" <jpetrini@coredial.com>
> *Cc*: openstack-operators@lists.openstack.org, openstack@lists.openstack.org
> *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage
> Checking iptables for the metadata-proxy inside of qrouter provides the
> following:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e
> iptables-save -c | grep 169
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff
> Packets:Bytes are both 0, so no traffic is touching this rule?
>
> Interestingly the command:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat
> -anep | grep 9697
> returns nothing, so there isn't actually anything running on 9697 in the
> network namespace...
>
> This is the output without grep:
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       User       Inode      PID/Program name
> raw        0      0 0.0.0.0:112             0.0.0.0:*               7
>         0          76154      8404/keepalived
> raw        0      0 0.0.0.0:112             0.0.0.0:*               7
>         0          76153      8404/keepalived
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags       Type       State         I-Node   PID/Program
> name     Path
> unix  2      [ ]         DGRAM                    64501    7567/python2
> unix  2      [ ]         DGRAM                    79953    8403/keepalived
>
> Could the reason no traffic touching the rule be that nothing is
> listening on that port, or is there a second issue down the chain?
>
> Curl fails even after restarting the neutron-dhcp-agent &
> neutron-metadata agent.
>
> Thank you for this, and any future help.

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] Recovering from full outage [ In reply to ]
I feel pretty dumb about this, but it was fixed by adding a rule to my security groups. I'm still very confused about some of the other behavior that I saw, but at least the problem is fixed now.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----------------------------------------
From: Brian Haley <haleyb.dev@gmail.com>
Sent: 7/16/18 4:39 PM
To: torin.woltjer@granddial.com, thangam.arunx@gmail.com, jpetrini@coredial.com
Cc: openstack-operators@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
On 07/16/2018 08:41 AM, Torin Woltjer wrote:
> $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl
> http://169.254.169.254
>
>
> 404 Not Found
>
>
>
404 Not Found

> The resource could not be found.

>
>

Strange, don't know where the reply came from for that.

> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl
> http://169.254.169.254
> curl: (7) Couldn't connect to server

Based on your iptables output below, I would think the metadata proxy is
running in the qrouter namespace. However, a curl from there will not
work since it is restricted to only work for incoming packets from the
qr- device(s). You would have to try curl from a running instance.

Is there an haproxy process running? And is it listening on port 9697
in the qrouter namespace?

-Brian

> ------------------------------------------------------------------------
> *From*: "Torin Woltjer"
> *Sent*: 7/12/18 11:16 AM
> *To*: , ,
> "jpetrini@coredial.com"
> *Cc*: openstack-operators@lists.openstack.org, openstack@lists.openstack.org
> *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage
> Checking iptables for the metadata-proxy inside of qrouter provides the
> following:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e
> iptables-save -c | grep 169
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff
> Packets:Bytes are both 0, so no traffic is touching this rule?
>
> Interestingly the command:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat
> -anep | grep 9697
> returns nothing, so there isn't actually anything running on 9697 in the
> network namespace...
>
> This is the output without grep:
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign Address
> State User Inode PID/Program name
> raw 0 0 0.0.0.0:112 0.0.0.0:* 7
> 0 76154 8404/keepalived
> raw 0 0 0.0.0.0:112 0.0.0.0:* 7
> 0 76153 8404/keepalived
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags Type State I-Node PID/Program
> name Path
> unix 2 [ ] DGRAM 64501 7567/python2
> unix 2 [ ] DGRAM 79953 8403/keepalived
>
> Could the reason no traffic touching the rule be that nothing is
> listening on that port, or is there a second issue down the chain?
>
> Curl fails even after restarting the neutron-dhcp-agent &
> neutron-metadata agent.
>
> Thank you for this, and any future help.