Mailing List Archive

gNMI on MX960
Hello everyone,

I'm trying (and failing) to get gNMI running on an MX960. All I'm
getting are weird errors. Has anyone successfully used gNMI
subscriptions on an MX box for streaming telemetry? If so, which
config did you use on the box and in the client?

My config on the MX is this:

set system services extension-service request-response grpc ssl address 10.4.0.78
set system services extension-service request-response grpc ssl port 30030
set system services extension-service request-response grpc ssl local-certificate gnmi
set system services extension-service request-response grpc skip-authentication

set security certificates local gnmi "-----BEGIN PRIVATE KEY-----\n[...]\n-----END CERTIFICATE-----\n"

First I had port 3060 which produced weird errors but nothing else.
Moving the gNMI service to port 30030 at least made the gNMI
connection succeed. Then I got weird gRPC errors, so I upgraded JunOS
on the box to 21.2R3-S5.4.

Now, when I try to subscribe to interface counters:

subscribe = {
"subscription": [.
{
"path": "/interfaces/interface/state/counters",
"mode": "sample",
"sample_interval": 10 * 1000000000,
},
],
"mode": "stream",
"use_aliases": False,
"encoding": "proto",
}

All the box does is spew back an gNMI error that Qos is not
supported. WTF?

"UNKNOWN:Error received from peer ipv4:10.4.0.78:30030 {grpc_message:"Qos not supported", grpc_status:12, created_time:"2024-03-07T16:20:41.756921+01:00"}"

On Arista this worked right after gNMI was enabled on the Switch.

Am I missing something? It should not be this hard to get this
working!

Best Regards

Sebastian

--
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
I’ve been spending some time on this as well, here’s the first thing I would ask you:


If you do “show version | match tele”

Eg:

jared@Router> show version | match tele

{master:0}
jared@Router>

vs

jared@Router> show version | match tele
JUNOS na telemetry [21.4R3-S5.17]

{master:0}

What do you see? I’ve had varying results based on the platform.

The second thing is, does your sensor path actually complete?

You may want to try to use a UDP based sensor instead to start to validate the platform will output what you expect, for example:

services {
analytics {
streaming-server server-name {
remote-address 10.0.0.100;
remote-port 22022;
}
export-profile export-interfaces {
local-address lo0.0-ip-address;
local-port 9877;
reporting-rate 60;
format gpb;
transport udp;
}
sensor data-interfaces {
server-name server-name;
export-name export-interfaces;
resource /interfaces/interface;
}



A simple receiver such as:

#!/usr/bin/python3
import socket, struct
from google.protobuf.descriptor import FieldDescriptor
from google.protobuf.json_format import MessageToJson
import google.protobuf.text_format
import telemetry_top_pb2 # build w/ protoc

# Bind Socket UDP port as Telemetry recevice server
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('0.0.0.0', 22022))


Telemetry_content = telemetry_top_pb2.TelemetryStream()
while True:
buf, (src_ip, src_port) = sock.recvfrom(65535)
print("received %d from %s:%d" % (len(buf), src_ip, src_port))

with warnings.catch_warnings(record=True) as w:
ret = Telemetry_content.ParseFromString(buf)
if len(w) > 0:
print ("Runtime Warning using ParseFromString: ", w)
for x in w:
print("\t", x)

print(google.protobuf.text_format.MessageToString(Telemetry_content, print_unknown_fields=True, force_colon=True, use_index_order=True))


This will save a lot of effort/overhead of the certificates etc, and let you make sure the code supports the sensors you expect, and you can use protoc to add in the photo files that might be needed.

I’ve seen Juniper output invalid GPB in cases where the software doesn’t support the sensors.

- Jared


> On Mar 7, 2024, at 10:25?AM, Sebastian Wiesinger via juniper-nsp <juniper-nsp@puck.nether.net> wrote:
>
> Hello everyone,
>
> I'm trying (and failing) to get gNMI running on an MX960. All I'm
> getting are weird errors. Has anyone successfully used gNMI
> subscriptions on an MX box for streaming telemetry? If so, which
> config did you use on the box and in the client?
>
> My config on the MX is this:
>
> set system services extension-service request-response grpc ssl address 10.4.0.78
> set system services extension-service request-response grpc ssl port 30030
> set system services extension-service request-response grpc ssl local-certificate gnmi
> set system services extension-service request-response grpc skip-authentication
>
> set security certificates local gnmi "-----BEGIN PRIVATE KEY-----\n[...]\n-----END CERTIFICATE-----\n"
>
> First I had port 3060 which produced weird errors but nothing else.
> Moving the gNMI service to port 30030 at least made the gNMI
> connection succeed. Then I got weird gRPC errors, so I upgraded JunOS
> on the box to 21.2R3-S5.4.
>
> Now, when I try to subscribe to interface counters:
>
> subscribe = {
> "subscription": [.
> {
> "path": "/interfaces/interface/state/counters",
> "mode": "sample",
> "sample_interval": 10 * 1000000000,
> },
> ],
> "mode": "stream",
> "use_aliases": False,
> "encoding": "proto",
> }
>
> All the box does is spew back an gNMI error that Qos is not
> supported. WTF?
>
> "UNKNOWN:Error received from peer ipv4:10.4.0.78:30030 {grpc_message:"Qos not supported", grpc_status:12, created_time:"2024-03-07T16:20:41.756921+01:00"}"
>
> On Arista this worked right after gNMI was enabled on the Switch.
>
> Am I missing something? It should not be this hard to get this
> working!
>
> Best Regards
>
> Sebastian
>
> --
> 'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
> -- Terry Pratchett, The Fifth Elephant
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
Hi Jared,

thanks for the answer.

* Jared Mauch via juniper-nsp <juniper-nsp@puck.nether.net> [2024-03-07 16:41]:
> I’ve been spending some time on this as well, here’s the first thing I would ask you:
>
>
> If you do “show version | match tele”
>
> What do you see? I’ve had varying results based on the platform.

swiesinger@lab-mx960> show version | match tele
JUNOS na telemetry [21.2R3-S5.4]
JUNOS RPD Telemetry Application [21.2R3-S5.4]
JUNOS Services Telemetry [20230427.001720_builder_junos_212_r3_s5]

> The second thing is, does your sensor path actually complete?

What do you mean by complete?

>
> You may want to try to use a UDP based sensor instead to start to
> validate the platform will output what you expect, for example:
>
> [..]
>
> This will save a lot of effort/overhead of the certificates etc, and
> let you make sure the code supports the sensors you expect, and you
> can use protoc to add in the photo files that might be needed.
>
> I’ve seen Juniper output invalid GPB in cases where the software
> doesn’t support the sensors.

I'll try to test it with your example config.

The goal is to use Telegraf (which has a gNMI input plugin) to get
gNMI data from multiple vendors (mostly Arista, Juniper) and output it
to Prometheus and/or InfluxDB.

Best Regards

Sebastian

--
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
> On Mar 7, 2024, at 10:48?AM, Sebastian Wiesinger via juniper-nsp <juniper-nsp@puck.nether.net> wrote:
>
> Hi Jared,
>
> thanks for the answer.
>
>> The second thing is, does your sensor path actually complete?
>
> What do you mean by complete?


[edit services analytics]
jared@Router# set sensor asdf resource ?
Possible completions:
<resource> System resource identifier string
/junos/services/health-monitor/config/ Health monitoring configuration
/junos/services/health-monitor/data/ Health monitoring data
/junos/services/ip-tunnel/usage/ PFE sensor for IP Tunnel statistics
/junos/services/label-switched-path/usage/ PFE sensor for LSP statistics


>
>>
>> You may want to try to use a UDP based sensor instead to start to
>> validate the platform will output what you expect, for example:
>>
>> [..]
>>
>> This will save a lot of effort/overhead of the certificates etc, and
>> let you make sure the code supports the sensors you expect, and you
>> can use protoc to add in the photo files that might be needed.
>>
>> I’ve seen Juniper output invalid GPB in cases where the software
>> doesn’t support the sensors.
>
> I'll try to test it with your example config.
>
> The goal is to use Telegraf (which has a gNMI input plugin) to get
> gNMI data from multiple vendors (mostly Arista, Juniper) and output it
> to Prometheus and/or InfluxDB.

Yeah, I’ve had a lot of start-stop experience with this myself.

You also want to verify that the sensor paths are available in the code that you want, so are they actually in the tree:

https://apps.juniper.net/telemetry-explorer/select-software?software=Junos%20OS%20Evolved&release=23.1R1&moduleId=All&platform=all&tagId=420294&tagName=lldp


_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
* Jared Mauch via juniper-nsp <juniper-nsp@puck.nether.net> [2024-03-07 16:51]:
> > What do you mean by complete?
>
>
> [edit services analytics]
> jared@Router# set sensor asdf resource ?
> Possible completions:
> <resource> System resource identifier string
> /junos/services/health-monitor/config/ Health monitoring configuration
> /junos/services/health-monitor/data/ Health monitoring data
> /junos/services/ip-tunnel/usage/ PFE sensor for IP Tunnel statistics
> /junos/services/label-switched-path/usage/ PFE sensor for LSP statistics

Okay, I see a lot of paths starting with /junos/ but no openconfig
paths there (like the one I'm testing
/interfaces/interface/state/counters). Don't know if that's expected.


> > The goal is to use Telegraf (which has a gNMI input plugin) to get
> > gNMI data from multiple vendors (mostly Arista, Juniper) and output it
> > to Prometheus and/or InfluxDB.
>
> Yeah, I’ve had a lot of start-stop experience with this myself.
>
> You also want to verify that the sensor paths are available in the code that you want, so are they actually in the tree:
>
> https://apps.juniper.net/telemetry-explorer/select-software?software=Junos%20OS%20Evolved&release=23.1R1&moduleId=All&platform=all&tagId=420294&tagName=lldp


Huh okay that's weird, Junos 21.2R3 is not listed at all on that page,
even though 20.2 and 20.4 are? But that might be a hint, I'll try one
of the releases from that page that have support for the path I want!


Best Regards

Sebastian


--
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
> "UNKNOWN:Error received from peer ipv4:10.4.0.78:30030
> {grpc_message:"Qos not supported", grpc_status:12,
> created_time:"2024-03-07T16:20:41.756921+01:00"}"

This would imply that the `qos` field is likely being populated in your
request message:

https://github.com/openconfig/gnmi/blob/master/proto/gnmi/gnmi.proto#L266

Which is unsupported in our implementation atm. (NOTE: there are various
fields within the APIs that could range in support across
implementations and/or may be on a path of deprecation and are also not
mandatory)

Am I to assume your JSON config is for the gnmi input plugin for
telegraf and is the complete config? I don't belive this plugin
supports the qos field either so can you share the precise version you
are testing with?

How about results with another client such as gNMIc targeting this same
path?

The OpenConfig path `/interfaces/interface/state/counters` is supported
on the version you are indicating

> set system services extension-service request-response grpc
> skip-authentication

Remove this knob - it is unnecessary and will be removed at some point
in the near future. For now (and this plugin), the username/password
would be populated as gRPC metadata (HTTP2 headers)

> Okay, I see a lot of paths starting with /junos/ but no openconfig
> paths there (like the one I'm testing
> /interfaces/interface/state/counters). Don't know if that's expected.

In your current version, we did not enumerate all supported paths in
this config completion. Note, this portion of the implementation is
also separate from the gNMI side but can glean some level of "path
support"

You can also use the following command (currently hidden today as we
rework it's intent)

show agent sensor-capability

> Huh okay that's weird, Junos 21.2R3 is not listed at all on that page,
> even though 20.2 and 20.4 are? But that might be a hint, I'll try one
> of the releases from that page that have support for the path I want!

Our classical telemetry explorer may give you inaccurate results for the
data you are looking for unfortunately. This is reworked into the
following rather:

https://apps.juniper.net/ydm-explorer/

However applies to more recent versions.

In either case, the path you are looking for right now, is supported on
the version you are describing - issue is most likely how the request
message is constructed in the gNMI subscribe RPC

> You may want to try to use a UDP based sensor instead

If the goal is gNMI, then I suggest sticking with that. If you want to
eliminate any TLS for testing/troubleshooting, then you can swap out
your `ssl` hierarchy under the extension-service config for the hidden
`clear-text` variant.



On 2024-03-07 17:08:48, Sebastian Wiesinger via juniper-nsp wrote:
> [External Email. Be cautious of content]
>
>
> * Jared Mauch via juniper-nsp <juniper-nsp@puck.nether.net> [2024-03-07 16:51]:
> > > What do you mean by complete?
> >
> >
> > [edit services analytics]
> > jared@Router# set sensor asdf resource ?
> > Possible completions:
> > <resource> System resource identifier string
> > /junos/services/health-monitor/config/ Health monitoring configuration
> > /junos/services/health-monitor/data/ Health monitoring data
> > /junos/services/ip-tunnel/usage/ PFE sensor for IP Tunnel statistics
> > /junos/services/label-switched-path/usage/ PFE sensor for LSP statistics
>
> Okay, I see a lot of paths starting with /junos/ but no openconfig
> paths there (like the one I'm testing
> /interfaces/interface/state/counters). Don't know if that's expected.
>
>
> > > The goal is to use Telegraf (which has a gNMI input plugin) to get
> > > gNMI data from multiple vendors (mostly Arista, Juniper) and output it
> > > to Prometheus and/or InfluxDB.
> >
> > Yeah, I’ve had a lot of start-stop experience with this myself.
> >
> > You also want to verify that the sensor paths are available in the code that you want, so are they actually in the tree:
> >
> > https://apps.juniper.net/telemetry-explorer/select-software?software=Junos%20OS%20Evolved&release=23.1R1&moduleId=All&platform=all&tagId=420294&tagName=lldp
>
>
> Huh okay that's weird, Junos 21.2R3 is not listed at all on that page,
> even though 20.2 and 20.4 are? But that might be a hint, I'll try one
> of the releases from that page that have support for the path I want!
>
>
> Best Regards
>
> Sebastian
>
>
> --
> 'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
> -- Terry Pratchett, The Fifth Elephant
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/juniper-nsp__;!!NEt6yMaO-gk!CyNOtJdhz3PSj7bxvbR14lmCeFuZqaxbvl9Eq0HhkSbaSCDWDmHc9aBDVWml8cVXOHMu4FQntoHSRlKRKkexvw$
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
Thanks this is helpful as I've had issues with output of incomplete GPB if the image does not support telemetry.

I'm also getting some incomplete data when I have all the enterprise protos loaded, hence why I posted code that shows incomplete/unparsed protos.

Are there plans to distribute the open config proto files that juniper uses, or do these need to be loaded to work correctly?

- Jared

Sent via RFC1925 compliant device

> On Mar 7, 2024, at 1:54?PM, Ebben Aries via juniper-nsp <juniper-nsp@puck.nether.net> wrote:
>
> You can also use the following command (currently hidden today as we
> rework it's intent)
>
> show agent sensor-capability
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
> I'm also getting some incomplete data when I have all the enterprise
> protos loaded, hence why I posted code that shows incomplete/unparsed
> protos.

We can take this one offline to triage - will send you a separate mail

> Are there plans to distribute the open config proto files that juniper
> uses, or do these need to be loaded to work correctly?

When using compact GPB encoding (UDP/TCP or a separate gRPC dialout
service), we have a mix of proto first sensors (origin JTI/JVision) and
those derived from YANG models (which if YANG models change then the
proto message structs change) [.and they do quite often as OpenConfig
models evolve, fix past behaviors, etc..]

We currently publish all proto IDL here:

https://github.com/Juniper/telemetry/blob/master/23.4/23.4R1.10/protos/junos-telemetry-interface

So the corresponding proto IDL for this OC subtree:

+--rw system
+--rw processes
+--ro process* [pid]
+--ro pid -> ../state/pid
+--ro state
+--ro pid? uint64
+--ro name? string
+--ro args* string
+--ro start-time? oc-types:timeticks64
+--ro cpu-usage-user? oc-yang:counter64
+--ro cpu-usage-system? oc-yang:counter64
+--ro cpu-utilization? oc-types:percentage
+--ro memory-usage? uint64
+--ro memory-utilization? oc-types:percentage

Is located here:

https://github.com/Juniper/telemetry/blob/master/23.4/23.4R1.10/protos/junos-telemetry-interface/jkdsd_oc.proto


Thx

/ebben

On 2024-03-07 15:27:25, Jared Mauch wrote:
> [External Email. Be cautious of content]
>
>
> Thanks this is helpful as I've had issues with output of incomplete
> GPB if the image does not support telemetry.
>
> I'm also getting some incomplete data when I have all the enterprise
> protos loaded, hence why I posted code that shows incomplete/unparsed
> protos.
>
> Are there plans to distribute the open config proto files that juniper
> uses, or do these need to be loaded to work correctly?
>
> - Jared
>
> Sent via RFC1925 compliant device
>
> > On Mar 7, 2024, at 1:54?PM, Ebben Aries via juniper-nsp <juniper-nsp@puck.nether.net> wrote:
> >
> > You can also use the following command (currently hidden today as we
> > rework it's intent)
> >
> > show agent sensor-capability
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: gNMI on MX960 [ In reply to ]
* Ebben Aries via juniper-nsp <juniper-nsp@puck.nether.net> [2024-03-07 19:55]:
> Am I to assume your JSON config is for the gnmi input plugin for
> telegraf and is the complete config? I don't belive this plugin
> supports the qos field either so can you share the precise version you
> are testing with?
>
> How about results with another client such as gNMIc targeting this same
> path?
>
> The OpenConfig path `/interfaces/interface/state/counters` is supported
> on the version you are indicating


No the JSON config I posted is for the Python module pygnmi which I
used to test with. I now switched to gNMIc but now I'm completely
blocked. I updated the box to 22.2R3-S2.8 (which is the highest
recommended for MX960) which should support the path I want.

But now it does not work at all, I only get cryptic error messages in
the log:

Mar 8 09:50:07 lab-mx960 na-grpcd[40784]: NA_GRPCD_CONFIG_EDIT_FAILURE: Ephemeral DB Edit config: error_code=14, error_message='failed to connect to all addresses'
Mar 8 09:50:36 lab-mx960 last message repeated 29 times

And I can't make any connection:

$ gnmic -u gnmi -p $(cat gnmi-passwd.txt) --skip-verify -a lab-mx960:30030 capabilities
target "lab-mx960:30030", capabilities request failed: failed to create a gRPC client for target "lab-mx960:30030" : lab-mx960:30030: context deadline exceeded
Error: one or more requests failed

I see that the ephemeral database is currently empty (if this is the
database in question, it's the only one I see):

swiesinger@lab-mx960> show ephemeral-configuration instance junos-analytics
## Last changed: 2024-03-07 17:47:45 CET

swiesinger@lab-mx960>

No idea how to proceed, restarting the daemon and rebooting the RE did
not help.

I also had the same error in the old JunOS version but somewhere in
testing it started working. I can't reproduce it.

> > set system services extension-service request-response grpc
> > skip-authentication
>
> Remove this knob - it is unnecessary and will be removed at some point
> in the near future. For now (and this plugin), the username/password
> would be populated as gRPC metadata (HTTP2 headers)

Okay, I removed it.

> You can also use the following command (currently hidden today as we
> rework it's intent)
>
> show agent sensor-capability


Yeah, I think it should work in theory:

swiesinger@lab-mx960> show agent sensor-capability resource /interfaces
Resource : /interfaces
Node Type : container

Resource : /interfaces/interface
Node Type : list
Key(s) : name

Resource : /interfaces/interface/state
Node Type : container

Resource : /interfaces/interface/state/counters
Node Type : container

Resource : /interfaces/interface/state/counters/in-octets
Node Type : leaf
Data Type : uint64
ON_CHANGE Support : False
[..]

> Our classical telemetry explorer may give you inaccurate results for the
> data you are looking for unfortunately. This is reworked into the
> following rather:
>
> https://apps.juniper.net/ydm-explorer/

Thanks I'll try to remember that!

If you have any hints about the error above I would be grateful. Our
current SE left and the new one hasn't introduced him-/herself but I
think that would be my only option left at this point.

Best Regards

Sebastian

--
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp