Mailing List Archive

DHCP relay monitoring
Hello

On one of my MX204 routers the DHCP relay crashes after some running
time and the process stops. It is not restarted automatically but will
start again with the following command:

admin@gc-edge1> restart dhcp-service
error: Junos Dynamic Host Configuration Protocol process is not running
Junos Dynamic Host Configuration Protocol process started, pid 72256

I can open a case with JTAC for the cause of the crash, but I am
thinking about how to monitor the relay. None of my current monitoring
tools detects this situation and it is actually quite critical. With no
relay the customers DHCP lease may expire. To a certain extend the
customers will be using unicast to the DHCP server and not many will
feel it right away, but soon enough we will have customers that can not
get online after rebooting their CPE etc.

What options do we have for monitoring running processes on the router?
Are there other processes than DHCP that should be monitored too?

Regards,

Baldur


_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: DHCP relay monitoring [ In reply to ]
> On 9/07/2020, at 23:48, Baldur Norddahl <baldur@gigabit.dk> wrote:
>
> Hello
>
> On one of my MX204 routers the DHCP relay crashes after some running time and the process stops. It is not restarted automatically but will start again with the following command:

What version are you on? Are you running IPv6? PPP with IPv6 over the top?

> admin@gc-edge1> restart dhcp-service
> error: Junos Dynamic Host Configuration Protocol process is not running
> Junos Dynamic Host Configuration Protocol process started, pid 72256
>
> I can open a case with JTAC for the cause of the crash, but I am thinking about how to monitor the relay. None of my current monitoring tools detects this situation and it is actually quite critical. With no relay the customers DHCP lease may expire. To a certain extend the customers will be using unicast to the DHCP server and not many will feel it right away, but soon enough we will have customers that can not get online after rebooting their CPE etc.
>
> What options do we have for monitoring running processes on the router? Are there other processes than DHCP that should be monitored too?

One option I’ve used for very similar sounding issues is doing this on the DHCP server, collecting stats for requests per giaddr and alerting when they’re suddenly low.

You might see something in the logs when DHCP crashes and can alarm on that with your chosen syslog system.

JUNIPER-JDHCP-MIB may be useful - though when the DHCP process is dead you may get polling timeouts. If your polling system can alarm on that you might get usefulness there.

--
Nathan Ward

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: DHCP relay monitoring [ In reply to ]
Once upon a time, Baldur Norddahl <baldur@gigabit.dk> said:
> What options do we have for monitoring running processes on the
> router? Are there other processes than DHCP that should be monitored
> too?

JUNOS implements many of the standard SNMP MIBs, which include
information about open TCP/UDP ports, so you could monitor
UDP-MIB::udpLocalAddress.0.0.0.0.67. If dhcp-relay is running, that
variable should return 0.0.0.0; if not, the variable won't exist.

--
Chris Adams <cma@cmadams.net>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: DHCP relay monitoring [ In reply to ]
On Thu, 9 Jul 2020 13:48:16 +0200
Baldur Norddahl <baldur@gigabit.dk> wrote:

> [snip]
>
> I can open a case with JTAC for the cause of the crash, but I am
> thinking about how to monitor the relay.

In the past I have used traceoptions, which was helpful.

Under system, processes, dhcp-service, traceoptions I have done:
file DHCP.log size 100m files 3;
level all;
flag all;

You can then look at that file. There will be a lot of info.

This works on my ACX5048. I have not tried on an MX.

--TimH
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: DHCP relay monitoring [ In reply to ]
Hi,

> On one of my MX204 routers the DHCP relay crashes after some running time
and the process stops.

if you are looking for a temporary workaround, then you could periodically
check if the jdhcpd process is running and if it isn't, then restart it.
Something like this:
https://gist.github.com/tonusoo/084ac04cab30151ce7d2911a13320838
Does the jdhcpd log something when it crashes? Perhaps a better approach
would be to trigger the restart based on the jdhcpd (traceoptions) log
message as the process will get restored pretty much immediately.


WBR,
Martin
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp