Mailing List Archive

NTP Issues Today
Hello,

Did anyone else experience issues with NTP today? We had our server
times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Thanks,
Van
Re: NTP Issues Today [ In reply to ]
--- vanwolfe@gmail.com wrote:
From: Van Wolfe <vanwolfe@gmail.com>

Did anyone else experience issues with NTP today? We had our server
times update to the year 2000 at around 3:30 MT, then revert back to 2012.
-----------------------------------------


You need to provide more information. For example, what NTP
source are you using?

scott
Re: NTP Issues Today [ In reply to ]
Scott,
I can confirm this had happened on one of my test servers - it was
pointing to tick.usno.navy.mil and tock.usno.navy.mil at the time.


- Clay



On 11/19/12 6:32 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:

>
>
>--- vanwolfe@gmail.com wrote:
>From: Van Wolfe <vanwolfe@gmail.com>
>
>Did anyone else experience issues with NTP today? We had our server
>times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>-----------------------------------------
>
>
>You need to provide more information. For example, what NTP
>source are you using?
>
>scott
>
Re: NTP Issues Today [ In reply to ]
On 11/19/12 6:32 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:
>--- vanwolfe@gmail.com wrote:
>From: Van Wolfe <vanwolfe@gmail.com>
>
>Did anyone else experience issues with NTP today? We had our server
>times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>-----------------------------------------

>You need to provide more information. For example, what NTP
>source are you using?
------------------------------------------
--- chaynes@centracomm.net wrote:
From: Clay Haynes <chaynes@centracomm.net>

I can confirm this had happened on one of my test servers - it was
pointing to tick.usno.navy.mil and tock.usno.navy.mil at the time.
-------------------------------------------

That's not a very diverse set of NTP servers. In the future if
you think it might be an outage, you might try on the 'outages'
list: http://puck.nether.net/mailman/listinfo/outages

For this one, you might ask the server contact if there was a
problem: http://support.ntp.org/bin/view/Servers/TockUsnoNavyMil

That assumes you've done your homework first and made sure it
wasn't something in your network.

scott
NTP Issues Today [ In reply to ]
We had the same issue on our NTP server pointing to tick.usno.navy.mil. Set date back to year 2000.



Date: Mon, 19 Nov 2012 16:21:55 -0700

From: Van Wolfe <vanwolfe@gmail.com<mailto:vanwolfe@gmail.com>>

To: nanog@nanog.org<mailto:nanog@nanog.org>

Subject: NTP Issues Today

Message-ID:

<CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com<mailto:CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>>

Content-Type: text/plain; charset=ISO-8859-1



Hello,



Did anyone else experience issues with NTP today? We had our server

times update to the year 2000 at around 3:30 MT, then revert back to 2012.



Thanks,

Van
Re: NTP Issues Today [ In reply to ]
Or you could just concede the fact that the navy is playing with time travel again.




From my Galaxy Note II, please excuse any mistakes.


-------- Original message --------
From: Scott Weeks <surfer@mauigateway.com>
Date: 11/19/2012 3:52 PM (GMT-08:00)
To: nanog@nanog.org
Subject: Re: NTP Issues Today







On 11/19/12 6:32 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:
>--- vanwolfe@gmail.com wrote:
>From: Van Wolfe <vanwolfe@gmail.com>
>
>Did anyone else experience issues with NTP today? We had our server
>times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>-----------------------------------------

>You need to provide more information. For example, what NTP
>source are you using?
------------------------------------------
--- chaynes@centracomm.net wrote:
From: Clay Haynes <chaynes@centracomm.net>

I can confirm this had happened on one of my test servers - it was
pointing to tick.usno.navy.mil and tock.usno.navy.mil at the time.
-------------------------------------------

That's not a very diverse set of NTP servers. In the future if
you think it might be an outage, you might try on the 'outages'
list: http://puck.nether.net/mailman/listinfo/outages

For this one, you might ask the server contact if there was a
problem: http://support.ntp.org/bin/view/Servers/TockUsnoNavyMil

That assumes you've done your homework first and made sure it
wasn't something in your network.

scott
Re: NTP Issues Today [ In reply to ]
--- wbailey@satelliteintelligencegroup.com wrote:
From: Warren Bailey <wbailey@satelliteintelligencegroup.com>

Or you could just concede the fact that the navy is playing with time travel again.
----------------------------------------------------------


To finish this thread off for the archives...

Apparently something was up with the navy stuff as a post on
the outages shows.

Lesson learned: Use more than one NTP source.

scott
Re: NTP Issues Today [ In reply to ]
In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>
, Van Wolfe writes:
> Hello,
>
> Did anyone else experience issues with NTP today? We had our server
> times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>
> Thanks,
> Van

NTP should be immune from this sort of behaviour unless you did a
ntpdate at the wrong moment. The clocks should have been marked
as insane.

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
RE: NTP Issues Today [ In reply to ]
Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.

-----Original Message-----
From: Mark Andrews [mailto:marka@isc.org]
Sent: Monday, November 19, 2012 8:42 PM
To: Van Wolfe
Cc: nanog@nanog.org
Subject: Re: NTP Issues Today


In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>
, Van Wolfe writes:
> Hello,
>
> Did anyone else experience issues with NTP today? We had our server
> times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>
> Thanks,
> Van

NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane.

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: NTP Issues Today [ In reply to ]
crossreplying to outages list.

Is anyone ELSE seeing GPS issues? This could well have been an
unrelated issue on that particular PBX.

If this was real, then the mother of all infrastructure attacks might
be underway...

One glitch on tick and tock and one malfunctioning PBX is not
sufficient evidence of pattern - much less hostile activity - to
induce panic, but it would perhaps be a wise time to check
time-related logs?


-george

On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith
<kwallace@pcconnection.com> wrote:
> Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.
>
> -----Original Message-----
> From: Mark Andrews [mailto:marka@isc.org]
> Sent: Monday, November 19, 2012 8:42 PM
> To: Van Wolfe
> Cc: nanog@nanog.org
> Subject: Re: NTP Issues Today
>
>
> In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>
> , Van Wolfe writes:
>> Hello,
>>
>> Did anyone else experience issues with NTP today? We had our server
>> times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>>
>> Thanks,
>> Van
>
> NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane.
>
> Mark
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
>
>



--
-george william herbert
george.herbert@gmail.com
Re: NTP Issues Today [ In reply to ]
We had multiple servers synchronized with Windows/MS time change their clock to the year 2000 today. It broke many things, including AD authentication.

These servers had been properly synchronized for years.

They were synchronized with Microsoft and NIST NTP servers.

This may not be isolated.

Sid Rao | CTI Group | +1 (317) 262-4677

On Nov 19, 2012, at 10:29 PM, "George Herbert" <george.herbert@gmail.com> wrote:

> crossreplying to outages list.
>
> Is anyone ELSE seeing GPS issues? This could well have been an
> unrelated issue on that particular PBX.
>
> If this was real, then the mother of all infrastructure attacks might
> be underway...
>
> One glitch on tick and tock and one malfunctioning PBX is not
> sufficient evidence of pattern - much less hostile activity - to
> induce panic, but it would perhaps be a wise time to check
> time-related logs?
>
>
> -george
>
> On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith
> <kwallace@pcconnection.com> wrote:
>> Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.
>>
>> -----Original Message-----
>> From: Mark Andrews [mailto:marka@isc.org]
>> Sent: Monday, November 19, 2012 8:42 PM
>> To: Van Wolfe
>> Cc: nanog@nanog.org
>> Subject: Re: NTP Issues Today
>>
>>
>> In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>
>> , Van Wolfe writes:
>>> Hello,
>>>
>>> Did anyone else experience issues with NTP today? We had our server
>>> times update to the year 2000 at around 3:30 MT, then revert back to 2012.
>>>
>>> Thanks,
>>> Van
>>
>> NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane.
>>
>> Mark
>> --
>> Mark Andrews, ISC
>> 1 Seymour St., Dundas Valley, NSW 2117, Australia
>> PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
>>
>>
>
>
>
> --
> -george william herbert
> george.herbert@gmail.com
>
>
Re: NTP Issues Today [ In reply to ]
In a message written on Mon, Nov 19, 2012 at 04:21:55PM -0700, Van Wolfe wrote:
> Did anyone else experience issues with NTP today? We had our server
> times update to the year 2000 at around 3:30 MT, then revert back to 2012.

I'm surprised the various time geeks aren't all posting their logs, so
I'll kick off:

/tmp/parse-peerstats.pl peerstats.20121119
56250 76367.354 192.5.41.41 91b4 -378691200.312258363 0.088274002 0.014835425 0.263515353
56250 77391.354 192.5.41.41 91b4 -378691200.312258363 0.088274002 0.018668790 0.263749719
56250 78204.354 192.5.41.40 90b4 -378691200.785377324 0.088179350 0.014812585 0.263668835
56250 78416.355 192.5.41.41 91b4 -378691200.785974681 0.088312507 0.014832943 0.209966600
56250 79229.355 192.5.41.40 90b4 -378691200.785377324 0.088179350 0.018668723 378691200.785523713
56250 79442.355 192.5.41.41 91b4 -378691200.785974681 0.088312507 0.018689918 378691200.786114931

Or in more human readable form:
/tmp/parse-peerstats.pl peerstats.20121119
192.5.41.41 off by -378691200.312258363
192.5.41.41 off by -378691200.312258363
192.5.41.40 off by -378691200.785377324
192.5.41.41 off by -378691200.785974681
192.5.41.40 off by -378691200.785377324
192.5.41.41 off by -378691200.785974681

The script, if you want to run against your own stats:

#!/usr/bin/perl

while (<>) {
chomp;
($day, $second, $addr, $status, $offset, $delay, $disp, $skew) = split;
if (($offset > 10) || ($offset < -10)) {
# print "$addr off by $offset\n"; # More human friendly
print "$_\n"; # Full details
}
}

It just looks for servers off by more than 10 econds and then prints
the line. 378691200 seconds is ~12 years, which lines up with the
year 2000 dates some are reporting.

The IP's are tick.usno.navy.mil and tock.usno.navy.mil.

I can confirm from my vantage point that tick and tock both went about
12 years wrong on Nov 19th for a bit, I can also report that my NTP
server with sufficient sources correctly determined they were haywire
and ignored them.

If your machines switched dates yesterday it probably means you're
NTP infrastructure is insufficiently peered and diversified.

--
Leo Bicknell - bicknell@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/
Re: NTP Issues Today [ In reply to ]
On Tue, Nov 20, 2012 at 11:38 AM, Leo Bicknell <bicknell@ufp.org> wrote:

>
> If your machines switched dates yesterday it probably means you're
> NTP infrastructure is insufficiently peered and diversified.
>

If you take anything away from this thread, this is it....

-Steve
Re: NTP Issues Today [ In reply to ]
On 11/19/12 6:08 PM, Wallace Keith wrote:
> Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.
>


I use GPS for my NTP server and didn't notice anything, but it's PPS
disciplined after initial sync so it doesn't matter as long as the pulse
keeps going.

ntp0# ntpq -pn
remote refid st t when poll reach delay offset
jitter
==============================================================================
127.127.1.0 .LOCL. 12 l 10 64 377 0.000 0.000
0.015
+216.171.124.36 .ACTS. 1 u 167 1024 377 26.801 2.387
0.015
+127.127.20.0 .GPS. 0 l 45 64 377 0.000 -0.048
0.015
o127.127.22.0 .PPS. 0 l 27 64 377 0.000 -0.048
0.015


~Seth
Re: NTP Issues Today [ In reply to ]
After some private replies, I'm going to reply to my own post with
some information here.

It appears many people don't understand how the NTP protocol works.
I suspect many people have configured a "primary" and a "backup"
NTP server on many of their devices. It turns out this is the
_WORST_ possible configuration if you want accurate time:

http://support.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3.3.

To protect against two falseticking servers (tick and tock, as we saw on
the 19th) you need _FIVE_ servers minimum configured if they are both in
the list. More importantly, if you want to protect against a source
(GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of
_FOUR_ different source technologies in the list as well.

It's not hard, my box that I posted the logs from peers with 18 servers
using 8 source technologies, all freely available on the Internet...

--
Leo Bicknell - bicknell@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/
Re: NTP Issues Today [ In reply to ]
----- Original Message -----
> From: "Leo Bicknell" <bicknell@ufp.org>

> To protect against two falseticking servers (tick and tock, as we saw on
> the 19th) you need _FIVE_ servers minimum configured if they are both in
> the list. More importantly, if you want to protect against a source
> (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of
> _FOUR_ different source technologies in the list as well.
>
> It's not hard, my box that I posted the logs from peers with 18
> servers using 8 source technologies, all freely available on the Internet...

I'm curious, Leo, what your internal setup looks like. Do you have an
internal pair of masters, all slaved to those externals and one another,
with your machines homed to them? Full mesh? Or something else?

In my last big gig, it was recommended to me that I have all the machines
which had to speak to my DBMS NTP *to it*, and have only it connect to the
rest of my NTP infrastructure. It coming unstuck was of less operational
impact than *pieces of it* going out of sync with one another...

Cheers,
-- jra
--
Jay R. Ashworth Baylink jra@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA #natog +1 727 647 1274
Re: NTP Issues Today [ In reply to ]
On Nov 20, 2012, at 2:28 PM, Jay Ashworth <jra@baylink.com> wrote:

> ----- Original Message -----
>> From: "Leo Bicknell" <bicknell@ufp.org>
>
>> To protect against two falseticking servers (tick and tock, as we saw on
>> the 19th) you need _FIVE_ servers minimum configured if they are both in
>> the list. More importantly, if you want to protect against a source
>> (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of
>> _FOUR_ different source technologies in the list as well.
>>
>> It's not hard, my box that I posted the logs from peers with 18
>> servers using 8 source technologies, all freely available on the Internet...
>
> I'm curious, Leo, what your internal setup looks like. Do you have an
> internal pair of masters, all slaved to those externals and one another,
> with your machines homed to them? Full mesh? Or something else?
>
> In my last big gig, it was recommended to me that I have all the machines
> which had to speak to my DBMS NTP *to it*, and have only it connect to the
> rest of my NTP infrastructure. It coming unstuck was of less operational
> impact than *pieces of it* going out of sync with one another...


here's a sample ntp config from one of my systems.

-- snip --
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server 3.fedora.pool.ntp.org

#
server 0.us.pool.ntp.org iburst maxpoll 9
server 1.us.pool.ntp.org iburst maxpoll 9
server 2.us.pool.ntp.org iburst maxpoll 9
server 129.250.35.250 iburst maxpoll 9
server 129.250.35.251 iburst maxpoll 9

-- snip --

You can audit its operation like this:

nat:~$ ntpq -p -n -c ass
remote refid st t when poll reach delay offset jitter
==============================================================================
-129.250.35.250 164.244.221.197 2 u 68 512 377 19.248 -0.135 3.195
+129.250.35.251 192.5.41.40 2 u 439 512 377 41.817 1.109 15.660
-206.57.44.17 204.123.2.5 2 u 126 512 377 37.133 -6.443 9.631
+4.53.160.75 209.81.9.7 2 u 48 512 377 25.209 1.551 8.804
-64.73.32.135 192.5.41.41 2 u 349 512 377 23.418 -0.703 1.721
*50.116.38.157 64.250.177.145 2 u 380 512 377 43.021 1.267 2.136
+208.87.221.228 10.0.22.49 2 u 517 512 377 92.000 0.974 0.678
-206.212.242.132 128.252.19.1 2 u 323 512 377 21.781 -2.873 1.304
+38.229.71.1 204.123.2.72 2 u 211 512 377 21.977 -0.055 2.274

ind assid status conf reach auth condition last_event cnt
===========================================================
1 39973 931a yes yes none outlyer sys_peer 1
2 39974 941a yes yes none candidate sys_peer 1
3 39975 9324 yes yes none outlyer reachable 2
4 39976 942a yes yes none candidate sys_peer 2
5 39977 931a yes yes none outlyer sys_peer 1
6 39978 961a yes yes none sys.peer sys_peer 1
7 39979 9414 yes yes none candidate reachable 1
8 39980 931a yes yes none outlyer sys_peer 1
9 39981 941a yes yes none candidate sys_peer 1


What you would have seen is a falseticker from the impacted clocks.

This is a fairly reasonable setup.

I've also been looking at an item like this:

http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP

which is about $300 + misc parts.

Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc.

- Jared
Re: NTP Issues Today [ In reply to ]
In a message written on Tue, Nov 20, 2012 at 02:28:19PM -0500, Jay Ashworth wrote:
> I'm curious, Leo, what your internal setup looks like. Do you have an
> internal pair of masters, all slaved to those externals and one another,
> with your machines homed to them? Full mesh? Or something else?

My particular internal setup is a tad weird, and so rather than
answer your question, I'm going to answer with some generalities.
The right answer of course depends a lot on how important it is
that boxes have the right time.

If you have 4 or more physical sites, I believe the right answer
is to have on the order of 8 NTP servers. 2 each in 4 sites reaches
the minimum nicely with redundancy. These boxes can have GPS, CDMA
or other technologies if you want, but MUST peer with at least 10
stratum-1 sources outside of your network. Of course if you have
more sites, one server in each of 8 sites is peachy. Those on a
budget could probably get by with 4 servers total, but never less!

All "critical" devices should then be synced to the full set of
internal servers. 4 boxes minimum, 8-10 preferred. NTP will only
use the 10 best servers in it's calculations, so there is a steep
dropoff of diminishing returns beyond 10. For most ISP's I would
include all routers in this list.

For the "non-critical" devices? Well, there it gets more complex.
For most I would only configure one server, their default gateway
router. Of course, pushing out a set of 4+ to themm if that is
easy is a great thing to do.

The interesting thing here is that no devices except for your NTP
servers should ever peer with anything outside of your network.
Why? Let's say your NTP servers all go crazy together. The outside
world is cut off, GPS is spoofed, the world is ending. All that
you have left is that all of your devices are in time to each
other....so at least your logs still coorelate and such. So having
every device under your master set of NTP servers is important.
One guy with an external peer may choose to use that, and leave the
hive mind, so to speak.

For small players, less than 4 sites, typically just use the NTP
pool servers, configuring 4 per box minimum. If you want the same
protection I just outlined in the paragraph before, make 4 of your
servers talk to the outside world, and make everything else talk
to those. Want to give back to the community? Get a GPS/CDMA/Whatever
box and make it part of the NTP pool. Want to step up your game (which
is what I do), reach out to various Stratum-1's on the net (or find
free, open ones) and peer up 8-20 of them.

> In my last big gig, it was recommended to me that I have all the machines
> which had to speak to my DBMS NTP *to it*, and have only it connect to the
> rest of my NTP infrastructure. It coming unstuck was of less operational
> impact than *pieces of it* going out of sync with one another...

Yep, a prime example of the scenario I described above. Depending on
your level of network redundancy, number of NTP servers, and so on, this
is a fine solution. With one NTP server (the DBMS) the downstream will
always use it, and stay in sync. It's a valid and good config in many
situations.

--
Leo Bicknell - bicknell@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/
Re: NTP Issues Today [ In reply to ]
On Nov 20, 2012, at 11:39 AM, Jared Mauch <jared@puck.nether.net> wrote:
.
>
> I've also been looking at an item like this:
>
> http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP
>
> which is about $300 + misc parts.
>
> Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc.
>
> - Jared


Caution - that Netburner decice is just GPS synced, so if GPS ever does go insane you're out of luck. It doesn't list a precision internal clock part.

I am not sure what all is in the dev kit version, but I know the company owner and can ask if anyone cares.




George William Herbert
Sent from my iPhone
Re: NTP Issues Today [ In reply to ]
On Tue, Nov 20, 2012 at 3:15 PM, Leo Bicknell <bicknell@ufp.org> wrote:
> For small players, less than 4 sites, typically just use the NTP
> pool servers, configuring 4 per box minimum. If you want the same
> protection I just outlined in the paragraph before, make 4 of your
> servers talk to the outside world, and make everything else talk
> to those. Want to give back to the community? Get a GPS/CDMA/Whatever

Choosing the first four servers is usually pretty straightforward:
*.CC.pool.ntp.org

But beyond that, I'm honestly rather curious what server selections
are a good idea. A first thought would be an adjacent country, but
maybe there is a benefit to picking things outside of the pool.ntp.org
selection entirely?

I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a
specific reason for that or if my questions are even worth thinking
about at all :-).


Happy to hear thoughts.

--
Darius Jahandarie
Re: NTP Issues Today [ In reply to ]
I usually use time.nist.gov.

On Tue, Nov 20, 2012 at 1:00 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:

> On Tue, Nov 20, 2012 at 3:15 PM, Leo Bicknell <bicknell@ufp.org> wrote:
> > For small players, less than 4 sites, typically just use the NTP
> > pool servers, configuring 4 per box minimum. If you want the same
> > protection I just outlined in the paragraph before, make 4 of your
> > servers talk to the outside world, and make everything else talk
> > to those. Want to give back to the community? Get a GPS/CDMA/Whatever
>
> Choosing the first four servers is usually pretty straightforward:
> *.CC.pool.ntp.org
>
> But beyond that, I'm honestly rather curious what server selections
> are a good idea. A first thought would be an adjacent country, but
> maybe there is a benefit to picking things outside of the pool.ntp.org
> selection entirely?
>
> I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a
> specific reason for that or if my questions are even worth thinking
> about at all :-).
>
>
> Happy to hear thoughts.
>
> --
> Darius Jahandarie
>
>


--
Mike Lyon
408-621-4826
mike.lyon@gmail.com

http://www.linkedin.com/in/mlyon
Re: NTP Issues Today [ In reply to ]
On Nov 20, 2012, at 4:00 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:

> Choosing the first four servers is usually pretty straightforward:
> *.CC.pool.ntp.org
>
> But beyond that, I'm honestly rather curious what server selections
> are a good idea. A first thought would be an adjacent country, but
> maybe there is a benefit to picking things outside of the pool.ntp.org
> selection entirely?
>
> I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a
> specific reason for that or if my questions are even worth thinking
> about at all :-).

I'm by no means a time geek, but …. i have some ideas about what you want and can tell you why I picked the settings I did…

1) The 129.250 ones are my employer run clocks. It is a good idea to know how accurate they are.

2) The pool ones, some were default (e.g.: fedora) from my OS distro on the machine I took the example from. You will see freebsd, centOS and others based on your settings. You may even see time.apple.com if you are MacOS.

3) CC ntp pool were selected to provide additional clock diversity.

4) You want low jitter to your clocks. This will allow you to have an accurate timing source. This means don't congest that path. If you want something very reliable, don't run it on a server with the other "misc" functions you need (e.g.: DNS, etc). If it's important, dedicate some hardware to it. if it is of passing importance, use a fair number of peers.

I was playing with the OWAMP software. Having consistent clocks is important for that, (even if they are all off by a few ms). It can be fun to play with and measure things… http://www.internet2.edu/performance/owamp/index.html

5) Monitor your NTP setup periodically. You may see clocks be rejected or outliers. Depending on how close your clocks are, you may see a fair number be unusable. Take this output:

nat:~$ ntpq -n -p -c ass
remote refid st t when poll reach delay offset jitter
==============================================================================
*129.250.35.250 164.244.221.197 2 u 507 512 377 18.883 0.196 18.311
+129.250.35.251 209.51.161.238 2 u 366 512 377 41.349 0.429 2.184
-206.57.44.17 204.123.2.5 2 u 91 512 377 35.884 -5.982 7.099
-4.53.160.75 209.81.9.7 2 u 5 512 377 24.250 1.522 1.353
+64.73.32.135 164.67.62.194 2 u 296 512 377 26.405 -0.956 11.244
+50.116.38.157 64.250.177.145 2 u 897 1024 377 42.978 0.685 1.211
-208.87.221.228 10.0.22.51 2 u 390 512 377 83.858 -2.717 0.814
-206.212.242.132 128.252.19.1 2 u 262 512 377 22.278 -1.640 1.150
+38.229.71.1 204.123.2.72 2 u 95 512 377 20.688 0.113 1.878

ind assid status conf reach auth condition last_event cnt
===========================================================
1 39973 961a yes yes none sys.peer sys_peer 1
2 39974 941a yes yes none candidate sys_peer 1
3 39975 9324 yes yes none outlyer reachable 2
4 39976 932a yes yes none outlyer sys_peer 2
5 39977 941a yes yes none candidate sys_peer 1
6 39978 941a yes yes none candidate sys_peer 1
7 39979 9314 yes yes none outlyer reachable 1
8 39980 931a yes yes none outlyer sys_peer 1
9 39981 941a yes yes none candidate sys_peer 1

Only 5/9 clocks are 'candidate' for usage, or the actual reference clock. The jitter on the reference clock is equal to the delay (!). This is on a business class internet link/tier, but from one of the 'usual suspects' that offers residential services as well. I haven't been able to find them operating any customer accessible clocks, but they may exist.

My config, or one resembling it will give you a fair amount of diversity of clocks. Syncing to one can easily result in being lied to and resetting the clock as everyone observed that went back to 2000.

- Jared
Re: NTP Issues Today [ In reply to ]
On 11/19/12, Van Wolfe <vanwolfe@gmail.com> wrote:
> Did anyone else experience issues with NTP today? We had our server
> times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Are you sure that you are actually using NTP to set your clock?
For you to sync with 2000, you should have had multiple confused
peers from multiple time sources; possibly a false radio signal....

NTP by default has a panic threshold of 1000 seconds.

This _should_ have caused NTP to execute a panic shutdown,
instead of setting the clock back 30 million seconds.


> Thanks,
> Van
--
-JH
Re: NTP Issues Today [ In reply to ]
On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:
> Are you sure that you are actually using NTP to set your clock?
> For you to sync with 2000, you should have had multiple confused
> peers from multiple time sources; possibly a false radio signal....
>
> NTP by default has a panic threshold of 1000 seconds.
>
> This _should_ have caused NTP to execute a panic shutdown,
> instead of setting the clock back 30 million seconds.

For VMWare at least, their official recommendation[1] for NTP is to

tinker panic 0

for suspend/resume reasons. I've seen it default in some places.

[1] http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

--
Darius Jahandarie
Re: NTP Issues Today [ In reply to ]
On Tue, Nov 20, 2012 at 4:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:

> On 11/19/12, Van Wolfe <vanwolfe@gmail.com> wrote:
> > Did anyone else experience issues with NTP today? We had our server
> > times update to the year 2000 at around 3:30 MT, then revert back to
> 2012.
>
> Are you sure that you are actually using NTP to set your clock?
> For you to sync with 2000, you should have had multiple confused
> peers from multiple time sources; possibly a false radio signal....
>
> NTP by default has a panic threshold of 1000 seconds.
>
> This _should_ have caused NTP to execute a panic shutdown,
> instead of setting the clock back 30 million seconds.
>

From logs various people have posted, it appears NTPd saw the excessive
time shift and took the reasonable(?) step of killing itself. The OS
detected ntpd's death and took the reasonable step of restarting it. On
startup, ntpd can be reasonably(?) configured with the -g option to bypass
the 1000s limit to set the starting time before going into the regular ntpd
time adjustment code.

In this case that would have set them back to 2000....

It's a good lesson on how a chain of reasonable decisions can lead to a bad
outcome, so you really need to understand the interactions of the whole
system.

Damian
Re: NTP Issues Today [ In reply to ]
Looks like something bad has happened:
Behind the Random NTP Bizarreness of Incorrect Year Being Set
https://isc.sans.edu/diary.html?n&storyid=14548

---
"A few people have written in within the past 18 hours about their NTP
server/clients getting set to the year 2000. The cause of this behavior is
that an NTP server at the US Naval Observatory (pretty much the
authoritative time source in the US) was rebooted and somehow reverted to
the year 2000. This, then, propogated out for a limited time and
downstream time sources also got this value. It's a transient problem and
should already be rectified. Not much really to report except an error at
the top of the food chain causing problems to the layers below. If you
have a problem, just fix the year or resync your NTP server.

Just goes to show how reliant NTP is that it is all but a "fire and forget"
service once configured until "bad things happen". John Bambenek"

---


Alvaro Pereira
Re: NTP Issues Today [ In reply to ]
That's what happens when you just follow vendor recommendations blindly. If
you do follow that on vm's (which can actually be a good practice), make
sure they pull from your own time infrastructure, and not just the world at
large, and that those servers behave in a sane fashion with regard to time
jumps.


On Tue, Nov 20, 2012 at 6:56 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:

> On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:
> > Are you sure that you are actually using NTP to set your clock?
> > For you to sync with 2000, you should have had multiple confused
> > peers from multiple time sources; possibly a false radio signal....
> >
> > NTP by default has a panic threshold of 1000 seconds.
> >
> > This _should_ have caused NTP to execute a panic shutdown,
> > instead of setting the clock back 30 million seconds.
>
> For VMWare at least, their official recommendation[1] for NTP is to
>
> tinker panic 0
>
> for suspend/resume reasons. I've seen it default in some places.
>
> [1]
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
>
> --
> Darius Jahandarie
>
>
Re: NTP Issues Today [ In reply to ]
As a reminder - time infrastructure is not recommended for
virtualization. Make them physicals.


On Tue, Nov 20, 2012 at 5:03 PM, Blake Dunlap <ikiris@gmail.com> wrote:
> That's what happens when you just follow vendor recommendations blindly. If
> you do follow that on vm's (which can actually be a good practice), make
> sure they pull from your own time infrastructure, and not just the world at
> large, and that those servers behave in a sane fashion with regard to time
> jumps.
>
>
> On Tue, Nov 20, 2012 at 6:56 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:
>
>> On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:
>> > Are you sure that you are actually using NTP to set your clock?
>> > For you to sync with 2000, you should have had multiple confused
>> > peers from multiple time sources; possibly a false radio signal....
>> >
>> > NTP by default has a panic threshold of 1000 seconds.
>> >
>> > This _should_ have caused NTP to execute a panic shutdown,
>> > instead of setting the clock back 30 million seconds.
>>
>> For VMWare at least, their official recommendation[1] for NTP is to
>>
>> tinker panic 0
>>
>> for suspend/resume reasons. I've seen it default in some places.
>>
>> [1]
>> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
>>
>> --
>> Darius Jahandarie
>>
>>



--
-george william herbert
george.herbert@gmail.com
Re: NTP Issues Today [ In reply to ]
Blake Dunlap <ikiris@gmail.com> writes:

> That's what happens when you just follow vendor recommendations blindly. If
> you do follow that on vm's (which can actually be a good practice), make
> sure they pull from your own time infrastructure, and not just the world at
> large, and that those servers behave in a sane fashion with regard to time
> jumps.

Emphatically disagree on the "pull from your own infrastructure"
point. You probably don't have the budget even in a big company for
sufficient diversity of sources [*] for your NTP server and even if
you do the NTP servers will probably be run by the same
person/organization. Mills has called the latter practice out as bad
in the past.

As Leo pointed out, the key is having a large diverse set so that if a
couple of them go nuts they can be voted off the island.

If you have a requirement for super low jitter or holdover if you lose
network, you're looking at on-site devices with OCXO or Rb frequency
standards in them. That doesn't mean you shouldn't be talking to the
rest of the world too though. What if your on-site sources go nuts?
This happens periodically, say every 10 years or so, because of crappy
implementations and worst-current-practices. A re-read of
https://groups.google.com/forum/?fromgroups=#!search/mills$20ntp$20byzantine/comp.protocols.time.ntp/TryjqtAd1XM/R0zzzE13Tl8J
may prove instructive.

(reading list also includes http://www.amazon.com/dp/1439814635/ )

In my experience NTP beats out even DNS for "blatantly wrong configs
in the wild that nevertheless seem to work well enough that dilettante
tech folks don't notice".

I might have replied to this thread yesterday but I was blissfully
unaware of any problems:

rs@bifrost [8] % ntpq -c peers | egrep -v '(===|remote)' | wc -l
11
rs@bifrost [9] %

-r

[*] particularly due to shortsighted US federal government choices on
LORAN, GOES, WWVB time format, etc
Re: NTP Issues Today [ In reply to ]
On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:

> wbailey@satelliteintelligencegroup.com>
>
> Or you could just concede the fact that the navy is playing with time travel again.
> ----------------------------------------------------------
>
>
> To finish this thread off for the archives...
>
> Apparently something was up with the navy stuff as a post on
> the outages shows.
Re: NTP Issues Today [ In reply to ]
On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:

> Lesson learned: Use more than one NTP source.
>

The lesson is: use MORE THAN TWO diverse NTP sources.

A man with two watches has no idea what the time it actually is.
Re: NTP Issues Today [ In reply to ]
On 21/11/12 12:34, Ryan Malayter wrote:
>
> On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:
>
>> Lesson learned: Use more than one NTP source.
>>
> The lesson is: use MORE THAN TWO diverse NTP sources.
>
> A man with two watches has no idea what the time it actually is.
>
>

Per David Mills, from the discussion linked upthread, this should be
FOUR OR MORE...

"Every critical server should have at least four sources, no two from the
same organization and, as much as possible, reachable only via diverse,
nonintersecting paths."

Four, so that the remaining three can reach consensus even if one fails.

-- Neil
Re: NTP Issues Today [ In reply to ]
Guys:

We were synchronized against multiple sources. Unfortunately the Navy NTP source contaminated multiple downstream sources.

Unless you can trace all your sources, if these sources all have a root source you will break.

Sid Rao | CTI Group | +1 (317) 262-4677

On Nov 21, 2012, at 8:01 AM, "Neil Harris" <neil@tonal.clara.co.uk> wrote:

> On 21/11/12 12:34, Ryan Malayter wrote:
>>
>> On Nov 19, 2012, at 6:12 PM, "Scott Weeks" <surfer@mauigateway.com> wrote:
>>
>>> Lesson learned: Use more than one NTP source.
>> The lesson is: use MORE THAN TWO diverse NTP sources.
>>
>> A man with two watches has no idea what the time it actually is.
>
> Per David Mills, from the discussion linked upthread, this should be FOUR OR MORE...
>
> "Every critical server should have at least four sources, no two from the
> same organization and, as much as possible, reachable only via diverse,
> nonintersecting paths."
>
> Four, so that the remaining three can reach consensus even if one fails.
>
> -- Neil
>
>
>
RE: NTP Issues Today [ In reply to ]
-----Original Message-----
>From: Jimmy Hess [mailto:mysidia@gmail.com]
>Sent: Tuesday, November 20, 2012 7:50 PM
>To: Van Wolfe
>Cc: nanog@nanog.org
>Subject: Re: NTP Issues Today

>This _should_ have caused NTP to execute a panic shutdown,
>instead of setting the clock back 30 million seconds.

>--
>-JH

Sounds like SNTP might have been on the client. Doesn't do much if any
sanity checking. Windows used to use that, was more than happy to change
the time by years if bad time received. Not sure if that is still the case.

Chuck
Re: NTP Issues Today [ In reply to ]
It sounds like the Navy and who ever else they partner with (NIST?) need
some egress filtering on their NTP servers to catch and prevent events like
this.
Re: NTP Issues Today [ In reply to ]
----- Original Message -----
> From: "Sid Rao" <srao@ctigroup.com>

> We were synchronized against multiple sources. Unfortunately the Navy
> NTP source contaminated multiple downstream sources.
>
> Unless you can trace all your sources, if these sources all have a
> root source you will break.

"... against multiple [Stratum 1] sources..."

Baby, if you've ever wondered... whether it matters whether your sources
are strat 1 or not, now you know -- since there's no real way to get
provenance on down-strat time sources that I'm aware of.

Does the NTP code, people who know, give any extra credence to strat-1
sources in it's byzantine code?

Cheers,
-- jra
--
Jay R. Ashworth Baylink jra@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA #natog +1 727 647 1274
Re: NTP Issues Today [ In reply to ]
On Wed, Nov 21, 2012 at 10:41:01AM -0500, Jay Ashworth wrote:
> "... against multiple [Stratum 1] sources..."
>
> Baby, if you've ever wondered... whether it matters whether your sources
> are strat 1 or not, now you know -- since there's no real way to get
> provenance on down-strat time sources that I'm aware of.
>
> Does the NTP code, people who know, give any extra credence to strat-1
> sources in it's byzantine code?

Not in a way that matters if one of them suddenly becomes a
falseticker. If a reference clock goes insane, it's pretty easily
detected provided you have at least two more servers (or even
peers configured.)

Stratum 1 just means it thinks it has a reference clock
attached, but those clocks fail, go into holdover, what have you
all the time.

NTP will happily select a stratum 2 or lower clock instead
provided it appears stable (low jitter, responded to our last 255
queries, and is an eligible candidate.)

To get an idea what your NTP server will do, try ntpq -p:

msa@paladin:/home/msa (582)$ ntpq -p
remote refid st t when poll reach delay offset
jitter
==============================================================================
-nist1.symmetric .ACTS. 1 u 304 1024 377 5.140 3.271
0.581
+nist1-sj.ustimi .ACTS. 1 u 307 1024 377 7.843 5.227
0.729
+64.147.116.229 .ACTS. 1 u 414 1024 377 9.406 5.742
0.068
*usno.pa-x.dec.c .USNO. 1 u 540 1024 377 1.373 4.242
0.032
-pegasus.latt.ne 64.250.177.145 2 u 304 1024 377 61.383 5.920
6.578
-pyramid.latt.ne 216.171.124.36 2 u 361 1024 377 1.076 4.181
0.066

This is a stratum 2 server in the public pool. It's peering
with two other stratum 2 servers that I run. Those two are deselected
(-). The server marked with a * is selected, and those with a + are
included in a weighted averdage used to maintain the system clock.
If the primary selected server does something wonky, it's going to
select one of the candidates marked with a +.

In this case it has enough stratum 1 servers that it's not
likely to fall back to its peers, but it can do so if those servers
suddenly give it a set of unexpected replies.

--msa
Re: NTP Issues Today [ In reply to ]
On Nov 20, 2012, at 13:00, Darius Jahandarie <djahandarie@gmail.com> wrote:

Hi everyone,

I run the NTP Pool system - http://www.pool.ntp.org/ - so I have some opinions on some of this. :-)

> But beyond that, I'm honestly rather curious what server selections
> are a good idea. A first thought would be an adjacent country, but
> maybe there is a benefit to picking things outside of the pool.ntp.org
> selection entirely?

First of all: None of the ~3800 servers in the NTP Pool system were affected by this as far as I can tell from the (copious) monitoring data.

The big benefit to adding some non-pool servers is that you wouldn't be depending basically on a bunch of volunteers (and to a large extent me) for your time keeping. Though likely you'd just be depending on another group of volunteers.

In addition to depending on the server operators who run the ntpd servers you also depend on:

1) The monitoring system keeping accurate time.
2) The monitoring system does its job catching bad servers.
3) The process updating and distributing the DNS data working.
4) The DNS servers working (and not being under a DoS attack or similar).
5) Anything I haven't thought of!

Empirically I believe we've done a better job than just about anyone with a similar scale, but past performance is no promise of the future.

> I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a
> specific reason for that or if my questions are even worth thinking
> about at all :-).


The servers for x.fedora.pool.ntp.org are in the same "group" as x.pool.ntp.org. If you are in a country with many servers in the pool then you'll very likely get different IPs for the two. If you are in a country with few servers your odds for that aren't so good and it'd be a bit pointless.

Anyone using the NTP Pool in a default configuration (like Fedora does) must get a "vendor zone" setup - http://www.pool.ntp.org/en/vendors.html - so we have at least a little bit of a chance to monitor and mitigate problems.

It also allows us to change what servers are selected, how many IPs are returned etc for a particular vendor. For example if Fedora in the future changes to use 'pool' instead of 'server' in the configuration we could optimize for that.


Ask

--
http://askask.com/