Mailing List Archive: NTP Issues Today

Re: NTP Issues Today [ In reply to ]

Nov 20, 2012, 5:01 PM

Post #26 of 38 (2603 views)

Looks like something bad has happened:
Behind the Random NTP Bizarreness of Incorrect Year Being Set
https://isc.sans.edu/diary.html?n&storyid=14548

---
"A few people have written in within the past 18 hours about their NTP
server/clients getting set to the year 2000. The cause of this behavior is
that an NTP server at the US Naval Observatory (pretty much the
authoritative time source in the US) was rebooted and somehow reverted to
the year 2000. This, then, propogated out for a limited time and
downstream time sources also got this value. It's a transient problem and
should already be rectified. Not much really to report except an error at
the top of the food chain causing problems to the layers below. If you
have a problem, just fix the year or resync your NTP server.

Just goes to show how reliant NTP is that it is all but a "fire and forget"
service once configured until "bad things happen". John Bambenek"

---

Alvaro Pereira

Re: NTP Issues Today [ In reply to ]

ikiris at gmail

Nov 20, 2012, 5:03 PM

Post #27 of 38 (2592 views)

Permalink

That's what happens when you just follow vendor recommendations blindly. If
you do follow that on vm's (which can actually be a good practice), make
sure they pull from your own time infrastructure, and not just the world at
large, and that those servers behave in a sane fashion with regard to time
jumps.

On Tue, Nov 20, 2012 at 6:56 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:

> On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:
> > Are you sure that you are actually using NTP to set your clock?
> > For you to sync with 2000, you should have had multiple confused
> > peers from multiple time sources; possibly a false radio signal....
> >
> > NTP by default has a panic threshold of 1000 seconds.
> >
> > This _should_ have caused NTP to execute a panic shutdown,
> > instead of setting the clock back 30 million seconds.
>
> For VMWare at least, their official recommendation[1] for NTP is to
>
> tinker panic 0
>
> for suspend/resume reasons. I've seen it default in some places.
>
> [1]
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
>
> --
> Darius Jahandarie
>
>

Re: NTP Issues Today [ In reply to ]

george.herbert at gmail

Nov 20, 2012, 5:14 PM

Post #28 of 38 (2596 views)

Permalink

As a reminder - time infrastructure is not recommended for
virtualization. Make them physicals.

On Tue, Nov 20, 2012 at 5:03 PM, Blake Dunlap <ikiris@gmail.com> wrote:
> That's what happens when you just follow vendor recommendations blindly. If
> you do follow that on vm's (which can actually be a good practice), make
> sure they pull from your own time infrastructure, and not just the world at
> large, and that those servers behave in a sane fashion with regard to time
> jumps.
>
>
> On Tue, Nov 20, 2012 at 6:56 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:
>
>> On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:
>> > Are you sure that you are actually using NTP to set your clock?
>> > For you to sync with 2000, you should have had multiple confused
>> > peers from multiple time sources; possibly a false radio signal....
>> >
>> > NTP by default has a panic threshold of 1000 seconds.
>> >
>> > This _should_ have caused NTP to execute a panic shutdown,
>> > instead of setting the clock back 30 million seconds.
>>
>> For VMWare at least, their official recommendation[1] for NTP is to
>>
>> tinker panic 0
>>
>> for suspend/resume reasons. I've seen it default in some places.
>>
>> [1]
>> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
>>
>> --
>> Darius Jahandarie
>>
>>

--
-george william herbert
george.herbert@gmail.com

Re: NTP Issues Today [ In reply to ]

rs at seastrom

Nov 21, 2012, 4:20 AM

Post #29 of 38 (2602 views)

Permalink

Blake Dunlap <ikiris@gmail.com> writes:

> That's what happens when you just follow vendor recommendations blindly. If
> you do follow that on vm's (which can actually be a good practice), make
> sure they pull from your own time infrastructure, and not just the world at
> large, and that those servers behave in a sane fashion with regard to time
> jumps.

Emphatically disagree on the "pull from your own infrastructure"
point. You probably don't have the budget even in a big company for
sufficient diversity of sources [*] for your NTP server and even if
you do the NTP servers will probably be run by the same
person/organization. Mills has called the latter practice out as bad
in the past.

As Leo pointed out, the key is having a large diverse set so that if a
couple of them go nuts they can be voted off the island.

If you have a requirement for super low jitter or holdover if you lose
network, you're looking at on-site devices with OCXO or Rb frequency
standards in them. That doesn't mean you shouldn't be talking to the
rest of the world too though. What if your on-site sources go nuts?
This happens periodically, say every 10 years or so, because of crappy
implementations and worst-current-practices. A re-read of
https://groups.google.com/forum/?fromgroups=#!search/mills$20ntp$20byzantine/comp.protocols.time.ntp/TryjqtAd1XM/R0zzzE13Tl8J
may prove instructive.

(reading list also includes http://www.amazon.com/dp/1439814635/ )

In my experience NTP beats out even DNS for "blatantly wrong configs
in the wild that nevertheless seem to work well enough that dilettante
tech folks don't notice".

I might have replied to this thread yesterday but I was blissfully
unaware of any problems:

rs@bifrost [8] % ntpq -c peers | egrep -v '(===|remote)' | wc -l
11
rs@bifrost [9] %

-r

[*] particularly due to shortsighted US federal government choices on
LORAN, GOES, WWVB time format, etc

Re: NTP Issues Today [ In reply to ]

malayter at gmail

Nov 21, 2012, 4:30 AM

Post #30 of 38 (2580 views)

Permalink

On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:

> wbailey@satelliteintelligencegroup.com>
>
> Or you could just concede the fact that the navy is playing with time travel again.
> ----------------------------------------------------------
>
>
> To finish this thread off for the archives...
>
> Apparently something was up with the navy stuff as a post on
> the outages shows.

Re: NTP Issues Today [ In reply to ]

malayter at gmail

Nov 21, 2012, 4:34 AM

Post #31 of 38 (2568 views)

Permalink

On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:

> Lesson learned: Use more than one NTP source.
>

The lesson is: use MORE THAN TWO diverse NTP sources.

A man with two watches has no idea what the time it actually is.

Re: NTP Issues Today [ In reply to ]

neil at tonal

Nov 21, 2012, 4:58 AM

Post #32 of 38 (2589 views)

Permalink

On 21/11/12 12:34, Ryan Malayter wrote:
>
> On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:
>
>> Lesson learned: Use more than one NTP source.
>>
> The lesson is: use MORE THAN TWO diverse NTP sources.
>
> A man with two watches has no idea what the time it actually is.
>
>

Per David Mills, from the discussion linked upthread, this should be
FOUR OR MORE...

"Every critical server should have at least four sources, no two from the
same organization and, as much as possible, reachable only via diverse,
nonintersecting paths."

Four, so that the remaining three can reach consensus even if one fails.

-- Neil

Re: NTP Issues Today [ In reply to ]

srao at ctigroup

Nov 21, 2012, 5:06 AM

Post #33 of 38 (2575 views)

Permalink

Guys:

We were synchronized against multiple sources. Unfortunately the Navy NTP source contaminated multiple downstream sources.

Unless you can trace all your sources, if these sources all have a root source you will break.

Sid Rao | CTI Group | +1 (317) 262-4677

On Nov 21, 2012, at 8:01 AM, "Neil Harris" <neil@tonal.clara.co.uk> wrote:

> On 21/11/12 12:34, Ryan Malayter wrote:
>>
>> On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:
>>
>>> Lesson learned: Use more than one NTP source.
>> The lesson is: use MORE THAN TWO diverse NTP sources.
>>
>> A man with two watches has no idea what the time it actually is.
>
> Per David Mills, from the discussion linked upthread, this should be FOUR OR MORE...
>
> "Every critical server should have at least four sources, no two from the
> same organization and, as much as possible, reachable only via diverse,
> nonintersecting paths."
>
> Four, so that the remaining three can reach consensus even if one fails.
>
> -- Neil
>
>
>

RE: NTP Issues Today [ In reply to ]

chuckchurch at gmail

Nov 21, 2012, 5:28 AM

Post #34 of 38 (2596 views)

Permalink

-----Original Message-----
>From: Jimmy Hess [mailto:mysidia@gmail.com]
>Sent: Tuesday, November 20, 2012 7:50 PM
>To: Van Wolfe
>Cc: nanog@nanog.org
>Subject: Re: NTP Issues Today

>This _should_ have caused NTP to execute a panic shutdown,
>instead of setting the clock back 30 million seconds.

>--
>-JH

Sounds like SNTP might have been on the client. Doesn't do much if any
sanity checking. Windows used to use that, was more than happy to change
the time by years if bad time received. Not sure if that is still the case.

Chuck

Re: NTP Issues Today [ In reply to ]

os10rules at gmail

Nov 21, 2012, 5:50 AM

Post #35 of 38 (2574 views)

Permalink

It sounds like the Navy and who ever else they partner with (NIST?) need
some egress filtering on their NTP servers to catch and prevent events like
this.

Re: NTP Issues Today [ In reply to ]

jra at baylink

Nov 21, 2012, 7:41 AM

Post #36 of 38 (2578 views)

Permalink

----- Original Message -----
> From: "Sid Rao" <srao@ctigroup.com>

> We were synchronized against multiple sources. Unfortunately the Navy
> NTP source contaminated multiple downstream sources.
>
> Unless you can trace all your sources, if these sources all have a
> root source you will break.

"... against multiple [Stratum 1] sources..."

Baby, if you've ever wondered... whether it matters whether your sources
are strat 1 or not, now you know -- since there's no real way to get
provenance on down-strat time sources that I'm aware of.

Does the NTP code, people who know, give any extra credence to strat-1
sources in it's byzantine code?

Cheers,
-- jra
--
Jay R. Ashworth Baylink jra@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA #natog +1 727 647 1274

Re: NTP Issues Today [ In reply to ]

msa at latt

Nov 21, 2012, 12:29 PM

Post #37 of 38 (2566 views)

Permalink

On Wed, Nov 21, 2012 at 10:41:01AM -0500, Jay Ashworth wrote:
> "... against multiple [Stratum 1] sources..."
>
> Baby, if you've ever wondered... whether it matters whether your sources
> are strat 1 or not, now you know -- since there's no real way to get
> provenance on down-strat time sources that I'm aware of.
>
> Does the NTP code, people who know, give any extra credence to strat-1
> sources in it's byzantine code?

Not in a way that matters if one of them suddenly becomes a
falseticker. If a reference clock goes insane, it's pretty easily
detected provided you have at least two more servers (or even
peers configured.)

Stratum 1 just means it thinks it has a reference clock
attached, but those clocks fail, go into holdover, what have you
all the time.

NTP will happily select a stratum 2 or lower clock instead
provided it appears stable (low jitter, responded to our last 255
queries, and is an eligible candidate.)

To get an idea what your NTP server will do, try ntpq -p:

msa@paladin:/home/msa (582)$ ntpq -p
remote refid st t when poll reach delay offset
jitter
==============================================================================
-nist1.symmetric .ACTS. 1 u 304 1024 377 5.140 3.271
0.581
+nist1-sj.ustimi .ACTS. 1 u 307 1024 377 7.843 5.227
0.729
+64.147.116.229 .ACTS. 1 u 414 1024 377 9.406 5.742
0.068
*usno.pa-x.dec.c .USNO. 1 u 540 1024 377 1.373 4.242
0.032
-pegasus.latt.ne 64.250.177.145 2 u 304 1024 377 61.383 5.920
6.578
-pyramid.latt.ne 216.171.124.36 2 u 361 1024 377 1.076 4.181
0.066

This is a stratum 2 server in the public pool. It's peering
with two other stratum 2 servers that I run. Those two are deselected
(-). The server marked with a * is selected, and those with a + are
included in a weighted averdage used to maintain the system clock.
If the primary selected server does something wonky, it's going to
select one of the candidates marked with a +.

In this case it has enough stratum 1 servers that it's not
likely to fall back to its peers, but it can do so if those servers
suddenly give it a set of unexpected replies.

--msa

Re: NTP Issues Today [ In reply to ]

ask at develooper

Nov 21, 2012, 2:06 PM

Post #38 of 38 (2588 views)

Permalink

On Nov 20, 2012, at 13:00, Darius Jahandarie <djahandarie@gmail.com> wrote:

Hi everyone,

I run the NTP Pool system - http://www.pool.ntp.org/ - so I have some opinions on some of this. :-)

> But beyond that, I'm honestly rather curious what server selections
> are a good idea. A first thought would be an adjacent country, but
> maybe there is a benefit to picking things outside of the pool.ntp.org
> selection entirely?

First of all: None of the ~3800 servers in the NTP Pool system were affected by this as far as I can tell from the (copious) monitoring data.

The big benefit to adding some non-pool servers is that you wouldn't be depending basically on a bunch of volunteers (and to a large extent me) for your time keeping. Though likely you'd just be depending on another group of volunteers.

In addition to depending on the server operators who run the ntpd servers you also depend on:

1) The monitoring system keeping accurate time.
2) The monitoring system does its job catching bad servers.
3) The process updating and distributing the DNS data working.
4) The DNS servers working (and not being under a DoS attack or similar).
5) Anything I haven't thought of!

Empirically I believe we've done a better job than just about anyone with a similar scale, but past performance is no promise of the future.

> I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a
> specific reason for that or if my questions are even worth thinking
> about at all :-).

The servers for x.fedora.pool.ntp.org are in the same "group" as x.pool.ntp.org. If you are in a country with many servers in the pool then you'll very likely get different IPs for the two. If you are in a country with few servers your odds for that aren't so good and it'd be a bit pointless.

Anyone using the NTP Pool in a default configuration (like Fedora does) must get a "vendor zone" setup - http://www.pool.ntp.org/en/vendors.html - so we have at least a little bit of a chance to monitor and mitigate problems.

It also allows us to change what servers are selected, how many IPs are returned etc for a particular vendor. For example if Fedora in the future changes to use 'pool' instead of 'server' in the configuration we could optimize for that.

Ask

--
http://askask.com/