Mailing List Archive

[PATCH] vpnc: fix connectivity gap and ESP packet loss resulting from phase 2 re-keying
Dear VPNC-maintainers,

I would like to submit the attached patch proposal for your review:

20150912_fix_phase2_rekey_gap.patch

The patch provides a possible solution for the connectivity gap and ESP
packet
loss resulting from phase 2 re-keying between VPNC and Cisco ASA's.

The extensive testing I did with Cisco ASA's shows very promising results
with
the phase 2 re-keying gap reduced from 55 seconds to less than 1 second and
ESP
packet loss down to a minimum as well: most of the times no ESP packets are
lost at all, only once every so many vpn connections a single ESP packet is
lost (i.e. received by VPNC on the already terminated rx spi).

Compared to the current behaviour of VPNC this would be a huge improvement.

I don't have access to Nortell, Fortigate etc. and I am therefore unable to
test and verify the behaviour of the patch with those concentrators,
unfortunately.

I have kept the patch as minimal and straightforward as possible. It applies
cleanly against:

vpnc-0.5.3-r550 (May 26 20:09:51 2014, Joerg Mayer).

Should the patch proposal pass your review, I am hoping that you are willing
to consider including it into the VPNC-code base. I'm not looking for
credits
in any way, shape or form. My only aim is to get this gap and packet loss
issue resolved simply and elegantly and keep code changes to a minimum as
they
are bound to introduce new issues that I imagine nobody really wants at this
stage.

Most of this email is just intended to provide background and documentation,
so some will consider it to have an extremely high tl;dr-index. To those I
can
only say: just read the summary, or skip all reading entirely, if that's
what
works for you, apply the patch if you feel it applies to your situation or
just for the fun of it and start testing.

Comments, possible concerns and suggestions for further improvement are most
welcome. I'll try to address them to best of my ability.


Patch summary
~~~~~~~~~~~~~

vpnc: fix connectivity gap and ESP packet loss resulting from phase 2
re-keying

On Cisco ASA5510 8.2(4) and ASA5515X 9.1(4) there is a delay of 30-55
seconds
between VPNC instantly 'deleting' the old ipsec SA's and moving to the new
ones
and the ASA (peer) 'deleting' the old ipsec SA's and moving to the new ones.

This delay results in a connectivity gap between VPNC and the ASA. During
this
gap tunnels generally do remain in tact but tunnel traffic stalls and ESP
packets exchanged across the tunnel are lost (i.e. being ignored by VPNC).

Explicitly confirming the (local) termination of the old ipsec SA's by VPNC
to
the peer, immediately following a phase 2 re-key QM packet exchange, tells
the
peer to stop using the old ipsec SA's and causes it to switch to the new
ones
instantly as well.

The explicit delete notification significantly speeds up the completion of
the
entire phase 2 re-key operation and minimises, if not entirely prevents, ESP
packet loss.

This patch implements sending the confirmation of the (local) termination of
the old ipsec SA's to the peer, by sending an ISAKMP_PAYLOAD_D delete
notification to the peer that contains both the terminated inbound and the
terminated outbound ipsec SA's.

The delete notification is sent to the peer immediately following the
successful completion of the phase 2 re-keying QM packet exchange sequence
initiated by the peer.

Additionally some change-related debug and log messages have been revised or
made more meaningful and consistent. A number of debug messages have been
added
to vpnc.c:do_rekey() to provide a clearer view on each of the steps in the
phase 2 re-keying process.


Background
~~~~~~~~~~

I am a long term and very happy user of VPNC and would like to start with
expressing my admiration for and gratitude to all those who have come up
with
the original idea, created it, made it available to the community and have
maintained and helped improving it over the years.

VPNC has served me well and has proven to be a very reliable vpn client
that is
both easy to install and easy to configure on all distributions I work with.

Of course for a while there were occasions when I got disconnected when
using it
longer than 1 hour, but r512 proved the be the perfect solution for that,
and
things got even better with r550.

So, what more could I possibly want?

Well, now that I no longer got disconnected, I noticed that my vpn sessions
would stall for about 55 seconds following a phase 2 re-key.

Following such a stall tunnel traffic generally would pick up again as if
nothing had ever happened and then would continue to run smoothly until the
next phase 2 re-key, at which point the whole thing would repeat, and so on.

I thought that as a token of my appreciation of the amount of effort that
has
gone and still goes into maintaining VPNC, it would be nice if I could help
improve VPNC a little by providing a possible solution for the delay and
submit
that as a patch proposal to the VPNC maintainers.


Similar cases
~~~~~~~~~~~~~

Searching for possible explanations and trying to become more familiar with
the phase 2 re-key process I came across a bug that was posted to this
mailing
list early 2014 by Friedemann Stoyan:

Thread: VPN connectivity gap during Phase2 Rekeying
URL: http://www.gossamer-threads.com/lists/vpnc/devel/4072

Friedemann starts by stating

I have become aware of a 30s VPN connectivity gap during Phase2
rekeying between VPNC and a Cisco ASA5515X with 9.1(4).

and then proceeds to describe in great detail the exact same 'annoyance'
that
I ran across myself.

Note that the bug report that Friedemann submitted is about r512 but if you
look at the change logs since r512 the relevant phase 2 re-key code actually
hasn't changed that much, if at all.


Research
~~~~~~~~

As a Unix-admin I don't have admin access to our vpn concentrators (ASA's
mostly). Unable to view this from the peer side, I decided to analise packet
flow with wireshark on the VPNC side instead.

Below is output extracted from a capture that I ran during a test of r550.
For
brevity the capture only shows packets relevant to the issue.

The capture shows the overall duration of the gap between the peer
confirming phase 2 re-key completion by sending a QM3 (T 3431.465392) and
the
tunnel traffic returning to normal (T 3486.285648) to be 55 seconds.

During that gap 8 ESP packets are lost, discarded by VPNC for their 'unknown
spi'. To make matters even worse, the ISAKMP_PAYLOAD_D delete notification
sent
by a peer in utter despair is also discarded by VPNC for proposing for
deletion
what VPNC by then considers to be a 'bogus spi'.

From the moment VPNC receives the ISAKMP_PAYLOAD_D delete notification from
the
peer (T 3461.446333), already 30 (!) seconds after phase 2 re-key
completion,
it takes another 25 (!) seconds more for tunnel traffic to return to normal
and stop stalling.


No. Time Src Dst Protocol Info Comment
~~~ ~~~~ ~~~ ~~~ ~~~~~~~~ ~~~~ ~~~~~~~
7504 3429.712316 PEER VPNC ESP ESP (SPI=0x3342d642) \
7505 3429.713395 VPNC PEER ESP ESP (SPI=0x19e26f70) |- normal
exchange of tunnel pkts on old rx and tx spi
7506 3430.712439 PEER VPNC ESP ESP (SPI=0x3342d642) |
7507 3430.713520 VPNC PEER ESP ESP (SPI=0x19e26f70) /

7508 3431.447189 PEER VPNC ISAKMP --- peer initiates
phase 2 re-key by sending QM1 with proposed new tx spi
7509 3431.447963 VPNC PEER ISAKMP --- VPNC replies
by sending QM2 with proposed new rx spi and switches to new SA's
7510 3431.465392 PEER VPNC ISAKMP --- peer confirms
phase 2 re-key completion by sending QM3

7511 3431.714920 PEER VPNC ESP ESP (SPI=0x3342d642) \
7512 3431.934957 PEER VPNC ESP ESP (SPI=0x3342d642) |
7513 3432.357945 PEER VPNC ESP ESP (SPI=0x3342d642) |- peer is
unaware of VPNC having switched to the new SA's already and
7514 3433.212068 PEER VPNC ESP ESP (SPI=0x3342d642) | continues to
use the old SA's (i.c. rx spi) for tunnel traffic;
7517 3434.925162 PEER VPNC ESP ESP (SPI=0x3342d642) | this results
in VPNC complaining (8x in this test) and discarding those
7518 3438.353240 PEER VPNC ESP ESP (SPI=0x3342d642) | packets with:
7521 3445.196854 PEER VPNC ESP ESP (SPI=0x3342d642) | 'unknown
spi 0x3342d642 from peer'
7524 3458.892060 PEER VPNC ESP ESP (SPI=0x3342d642) /

7525 3461.446333 PEER VPNC ISAKMP --- peer proposes
old tx spi termination (ISAKMP_PAYLOAD_D); as VPNC has already
terminated
the old tx spi and 'forgotten' all about it, this causes VPNC
to just
complain and ignore the packet with:
'got
isakmp delete with bogus spi (expected
0x96b54bc9,

received 0x19e26f70), ignoring...'

7531 3486.285648 PEER VPNC ESP ESP (SPI=0xe91f1836) \
7532 3486.286465 VPNC PEER ESP ESP (SPI=0x96b54bc9) |- normal
exchange of tunnel pkts resumes on new rx and tx spi
7533 3486.299460 PEER VPNC ESP ESP (SPI=0xe91f1836) |
7534 3486.301185 VPNC PEER ESP ESP (SPI=0x96b54bc9) /


Solution
~~~~~~~~

The fact that the tunnels kept stalling even after the peer had sent the
old tx
spi delete notification, led me to think that, although the applicable
RFC's do
not require the use of the delete notification, perhaps, all along, the peer
was just waiting desperately for VPNC to send one.

So, in essence, that is what I added to the code and tested fairly
extensively
against our ASA's (there's a wide array of concentrators I connect to all
day
every day) on various distributions.

Below is output extracted from a capture that I ran during a test of r550
with
the proposed patch applied to it. Again, for brevity the capture only shows
packets relevant to the issue.

The capture shows that the patch reduces the former 55 second gap between
the
peer confirming phase 2 re-key completion (T 3429.552318) and tunnel traffic
resuming on the new SA's (T 3430.272291), to less than 1 second.

The entire 'conversation' between VPNC and the peer now runs smoothly. The
delay has disappeared, there are no more 'unknow spi' (i.e. lost ESP)
packets
and, since the peer doesn't need to propose old tx spi termination anymore,
the 'bogus spi' packet now belongs to the past as well.


No. Time Src Dst Protocol Info Comment
~~~ ~~~~ ~~~ ~~~ ~~~~~~~~ ~~~~ ~~~~~~~
7582 3428.267254 PEER VPNC ESP ESP (SPI=0xe6425491) \
7583 3428.268257 VPNC PEER ESP ESP (SPI=0x19f581c8) |- normal
exchange of tunnel pkts on old rx and tx spi
7584 3429.272243 PEER VPNC ESP ESP (SPI=0xe6425491) |
7585 3429.273613 VPNC PEER ESP ESP (SPI=0x19f581c8) /

7586 3429.532348 PEER VPNC ISAKMP --- peer initiates
phase 2 re-key by sending QM1 with proposed new tx spi
7587 3429.533922 VPNC PEER ISAKMP --- VPNC replies
by sending QM2 with proposed new rx spi and switches to new SA's
7588 3429.552318 PEER VPNC ISAKMP --- peer confirms
phase 2 re-key completion by sending QM3

7589 3429.552955 VPNC PEER ISAKMP --- VPNC confirms
termination of old rx and tx spi to peer (ISAKMP_PAYLOAD_D)

7590 3430.272291 PEER VPNC ESP ESP (SPI=0x02e643ad) \
7591 3430.273550 VPNC PEER ESP ESP (SPI=0xa2e7bbcf) |- normal
exchange of tunnel pkts resumes on new rx and tx spi
7592 3431.274455 PEER VPNC ESP ESP (SPI=0x02e643ad) |
7594 3431.276151 VPNC PEER ESP ESP (SPI=0xa2e7bbcf) /


The proposed solution has been tested to work as shown here both on RHEL 6
and
Ubuntu 14 icm Cisco ASA551x 8.2(4) and up.


Managing expectations
~~~~~~~~~~~~~~~~~~~~~

I would like to be able to claim that reducing the gap also prevents packet
loss entirely, but I'm not. Testing shows that once every so many vpn
sessions
you will lose the odd 1 ESP packet as a result of phase 2 re-key operation.

Given the stable state the r550 code is in and the apparent development
freeze
and base review it has been undergoing for quite a while now, I doubt
whether
the benefit of fixing that will outweigh the effort required.

--
Piet Starreveld
Re: [PATCH] vpnc: fix connectivity gap and ESP packet loss resulting from phase 2 re-keying [ In reply to ]
Hi,

I've tried this patch with a fortigate (dunno which version or aother
stuff,because I'm just a customer of the service)
with no luck, same result as unpatched version, after some time traffic
stops caused by rekey errors.

regards,
Matteo

On Wed, Dec 9, 2015 at 8:03 PM, Piet Starreveld <pstarrev@gmail.com> wrote:

> Dear VPNC-maintainers,
>
> I would like to submit the attached patch proposal for your review:
>
> 20150912_fix_phase2_rekey_gap.patch
>
> The patch provides a possible solution for the connectivity gap and ESP
> packet
> loss resulting from phase 2 re-keying between VPNC and Cisco ASA's.
>
> The extensive testing I did with Cisco ASA's shows very promising results
> with
> the phase 2 re-keying gap reduced from 55 seconds to less than 1 second
> and ESP
> packet loss down to a minimum as well: most of the times no ESP packets are
> lost at all, only once every so many vpn connections a single ESP packet is
> lost (i.e. received by VPNC on the already terminated rx spi).
>
> Compared to the current behaviour of VPNC this would be a huge improvement.
>
> I don't have access to Nortell, Fortigate etc. and I am therefore unable to
> test and verify the behaviour of the patch with those concentrators,
> unfortunately.
>
> I have kept the patch as minimal and straightforward as possible. It
> applies
> cleanly against:
>
> vpnc-0.5.3-r550 (May 26 20:09:51 2014, Joerg Mayer).
>
> Should the patch proposal pass your review, I am hoping that you are
> willing
> to consider including it into the VPNC-code base. I'm not looking for
> credits
> in any way, shape or form. My only aim is to get this gap and packet loss
> issue resolved simply and elegantly and keep code changes to a minimum as
> they
> are bound to introduce new issues that I imagine nobody really wants at
> this
> stage.
>
> Most of this email is just intended to provide background and
> documentation,
> so some will consider it to have an extremely high tl;dr-index. To those I
> can
> only say: just read the summary, or skip all reading entirely, if that's
> what
> works for you, apply the patch if you feel it applies to your situation or
> just for the fun of it and start testing.
>
> Comments, possible concerns and suggestions for further improvement are
> most
> welcome. I'll try to address them to best of my ability.
>
>
> Patch summary
> ~~~~~~~~~~~~~
>
> vpnc: fix connectivity gap and ESP packet loss resulting from phase 2
> re-keying
>
> On Cisco ASA5510 8.2(4) and ASA5515X 9.1(4) there is a delay of 30-55
> seconds
> between VPNC instantly 'deleting' the old ipsec SA's and moving to the new
> ones
> and the ASA (peer) 'deleting' the old ipsec SA's and moving to the new
> ones.
>
> This delay results in a connectivity gap between VPNC and the ASA. During
> this
> gap tunnels generally do remain in tact but tunnel traffic stalls and ESP
> packets exchanged across the tunnel are lost (i.e. being ignored by VPNC).
>
> Explicitly confirming the (local) termination of the old ipsec SA's by
> VPNC to
> the peer, immediately following a phase 2 re-key QM packet exchange, tells
> the
> peer to stop using the old ipsec SA's and causes it to switch to the new
> ones
> instantly as well.
>
> The explicit delete notification significantly speeds up the completion of
> the
> entire phase 2 re-key operation and minimises, if not entirely prevents,
> ESP
> packet loss.
>
> This patch implements sending the confirmation of the (local) termination
> of
> the old ipsec SA's to the peer, by sending an ISAKMP_PAYLOAD_D delete
> notification to the peer that contains both the terminated inbound and the
> terminated outbound ipsec SA's.
>
> The delete notification is sent to the peer immediately following the
> successful completion of the phase 2 re-keying QM packet exchange sequence
> initiated by the peer.
>
> Additionally some change-related debug and log messages have been revised
> or
> made more meaningful and consistent. A number of debug messages have been
> added
> to vpnc.c:do_rekey() to provide a clearer view on each of the steps in the
> phase 2 re-keying process.
>
>
> Background
> ~~~~~~~~~~
>
> I am a long term and very happy user of VPNC and would like to start with
> expressing my admiration for and gratitude to all those who have come up
> with
> the original idea, created it, made it available to the community and have
> maintained and helped improving it over the years.
>
> VPNC has served me well and has proven to be a very reliable vpn client
> that is
> both easy to install and easy to configure on all distributions I work
> with.
>
> Of course for a while there were occasions when I got disconnected when
> using it
> longer than 1 hour, but r512 proved the be the perfect solution for that,
> and
> things got even better with r550.
>
> So, what more could I possibly want?
>
> Well, now that I no longer got disconnected, I noticed that my vpn sessions
> would stall for about 55 seconds following a phase 2 re-key.
>
> Following such a stall tunnel traffic generally would pick up again as if
> nothing had ever happened and then would continue to run smoothly until the
> next phase 2 re-key, at which point the whole thing would repeat, and so
> on.
>
> I thought that as a token of my appreciation of the amount of effort that
> has
> gone and still goes into maintaining VPNC, it would be nice if I could help
> improve VPNC a little by providing a possible solution for the delay and
> submit
> that as a patch proposal to the VPNC maintainers.
>
>
> Similar cases
> ~~~~~~~~~~~~~
>
> Searching for possible explanations and trying to become more familiar with
> the phase 2 re-key process I came across a bug that was posted to this
> mailing
> list early 2014 by Friedemann Stoyan:
>
> Thread: VPN connectivity gap during Phase2 Rekeying
> URL: http://www.gossamer-threads.com/lists/vpnc/devel/4072
>
> Friedemann starts by stating
>
> I have become aware of a 30s VPN connectivity gap during Phase2
> rekeying between VPNC and a Cisco ASA5515X with 9.1(4).
>
> and then proceeds to describe in great detail the exact same 'annoyance'
> that
> I ran across myself.
>
> Note that the bug report that Friedemann submitted is about r512 but if you
> look at the change logs since r512 the relevant phase 2 re-key code
> actually
> hasn't changed that much, if at all.
>
>
> Research
> ~~~~~~~~
>
> As a Unix-admin I don't have admin access to our vpn concentrators (ASA's
> mostly). Unable to view this from the peer side, I decided to analise
> packet
> flow with wireshark on the VPNC side instead.
>
> Below is output extracted from a capture that I ran during a test of r550.
> For
> brevity the capture only shows packets relevant to the issue.
>
> The capture shows the overall duration of the gap between the peer
> confirming phase 2 re-key completion by sending a QM3 (T 3431.465392) and
> the
> tunnel traffic returning to normal (T 3486.285648) to be 55 seconds.
>
> During that gap 8 ESP packets are lost, discarded by VPNC for their
> 'unknown
> spi'. To make matters even worse, the ISAKMP_PAYLOAD_D delete notification
> sent
> by a peer in utter despair is also discarded by VPNC for proposing for
> deletion
> what VPNC by then considers to be a 'bogus spi'.
>
> From the moment VPNC receives the ISAKMP_PAYLOAD_D delete notification
> from the
> peer (T 3461.446333), already 30 (!) seconds after phase 2 re-key
> completion,
> it takes another 25 (!) seconds more for tunnel traffic to return to normal
> and stop stalling.
>
>
> No. Time Src Dst Protocol Info Comment
> ~~~ ~~~~ ~~~ ~~~ ~~~~~~~~ ~~~~ ~~~~~~~
> 7504 3429.712316 PEER VPNC ESP ESP (SPI=0x3342d642) \
> 7505 3429.713395 VPNC PEER ESP ESP (SPI=0x19e26f70) |- normal
> exchange of tunnel pkts on old rx and tx spi
> 7506 3430.712439 PEER VPNC ESP ESP (SPI=0x3342d642) |
> 7507 3430.713520 VPNC PEER ESP ESP (SPI=0x19e26f70) /
>
> 7508 3431.447189 PEER VPNC ISAKMP --- peer
> initiates phase 2 re-key by sending QM1 with proposed new tx spi
> 7509 3431.447963 VPNC PEER ISAKMP --- VPNC replies
> by sending QM2 with proposed new rx spi and switches to new SA's
> 7510 3431.465392 PEER VPNC ISAKMP --- peer confirms
> phase 2 re-key completion by sending QM3
>
> 7511 3431.714920 PEER VPNC ESP ESP (SPI=0x3342d642) \
> 7512 3431.934957 PEER VPNC ESP ESP (SPI=0x3342d642) |
> 7513 3432.357945 PEER VPNC ESP ESP (SPI=0x3342d642) |- peer is
> unaware of VPNC having switched to the new SA's already and
> 7514 3433.212068 PEER VPNC ESP ESP (SPI=0x3342d642) | continues
> to use the old SA's (i.c. rx spi) for tunnel traffic;
> 7517 3434.925162 PEER VPNC ESP ESP (SPI=0x3342d642) | this
> results in VPNC complaining (8x in this test) and discarding those
> 7518 3438.353240 PEER VPNC ESP ESP (SPI=0x3342d642) | packets
> with:
> 7521 3445.196854 PEER VPNC ESP ESP (SPI=0x3342d642) | 'unknown
> spi 0x3342d642 from peer'
> 7524 3458.892060 PEER VPNC ESP ESP (SPI=0x3342d642) /
>
> 7525 3461.446333 PEER VPNC ISAKMP --- peer proposes
> old tx spi termination (ISAKMP_PAYLOAD_D); as VPNC has already
> terminated
> the old tx spi and 'forgotten' all about it, this causes VPNC
> to just
> complain and ignore the packet with:
> 'got
> isakmp delete with bogus spi (expected
> 0x96b54bc9,
>
> received 0x19e26f70), ignoring...'
>
> 7531 3486.285648 PEER VPNC ESP ESP (SPI=0xe91f1836) \
> 7532 3486.286465 VPNC PEER ESP ESP (SPI=0x96b54bc9) |- normal
> exchange of tunnel pkts resumes on new rx and tx spi
> 7533 3486.299460 PEER VPNC ESP ESP (SPI=0xe91f1836) |
> 7534 3486.301185 VPNC PEER ESP ESP (SPI=0x96b54bc9) /
>
>
> Solution
> ~~~~~~~~
>
> The fact that the tunnels kept stalling even after the peer had sent the
> old tx
> spi delete notification, led me to think that, although the applicable
> RFC's do
> not require the use of the delete notification, perhaps, all along, the
> peer
> was just waiting desperately for VPNC to send one.
>
> So, in essence, that is what I added to the code and tested fairly
> extensively
> against our ASA's (there's a wide array of concentrators I connect to all
> day
> every day) on various distributions.
>
> Below is output extracted from a capture that I ran during a test of r550
> with
> the proposed patch applied to it. Again, for brevity the capture only shows
> packets relevant to the issue.
>
> The capture shows that the patch reduces the former 55 second gap between
> the
> peer confirming phase 2 re-key completion (T 3429.552318) and tunnel
> traffic
> resuming on the new SA's (T 3430.272291), to less than 1 second.
>
> The entire 'conversation' between VPNC and the peer now runs smoothly. The
> delay has disappeared, there are no more 'unknow spi' (i.e. lost ESP)
> packets
> and, since the peer doesn't need to propose old tx spi termination anymore,
> the 'bogus spi' packet now belongs to the past as well.
>
>
> No. Time Src Dst Protocol Info Comment
> ~~~ ~~~~ ~~~ ~~~ ~~~~~~~~ ~~~~ ~~~~~~~
> 7582 3428.267254 PEER VPNC ESP ESP (SPI=0xe6425491) \
> 7583 3428.268257 VPNC PEER ESP ESP (SPI=0x19f581c8) |- normal
> exchange of tunnel pkts on old rx and tx spi
> 7584 3429.272243 PEER VPNC ESP ESP (SPI=0xe6425491) |
> 7585 3429.273613 VPNC PEER ESP ESP (SPI=0x19f581c8) /
>
> 7586 3429.532348 PEER VPNC ISAKMP --- peer
> initiates phase 2 re-key by sending QM1 with proposed new tx spi
> 7587 3429.533922 VPNC PEER ISAKMP --- VPNC replies
> by sending QM2 with proposed new rx spi and switches to new SA's
> 7588 3429.552318 PEER VPNC ISAKMP --- peer confirms
> phase 2 re-key completion by sending QM3
>
> 7589 3429.552955 VPNC PEER ISAKMP --- VPNC confirms
> termination of old rx and tx spi to peer (ISAKMP_PAYLOAD_D)
>
> 7590 3430.272291 PEER VPNC ESP ESP (SPI=0x02e643ad) \
> 7591 3430.273550 VPNC PEER ESP ESP (SPI=0xa2e7bbcf) |- normal
> exchange of tunnel pkts resumes on new rx and tx spi
> 7592 3431.274455 PEER VPNC ESP ESP (SPI=0x02e643ad) |
> 7594 3431.276151 VPNC PEER ESP ESP (SPI=0xa2e7bbcf) /
>
>
> The proposed solution has been tested to work as shown here both on RHEL 6
> and
> Ubuntu 14 icm Cisco ASA551x 8.2(4) and up.
>
>
> Managing expectations
> ~~~~~~~~~~~~~~~~~~~~~
>
> I would like to be able to claim that reducing the gap also prevents packet
> loss entirely, but I'm not. Testing shows that once every so many vpn
> sessions
> you will lose the odd 1 ESP packet as a result of phase 2 re-key operation.
>
> Given the stable state the r550 code is in and the apparent development
> freeze
> and base review it has been undergoing for quite a while now, I doubt
> whether
> the benefit of fixing that will outweigh the effort required.
>
> --
> Piet Starreveld
>
>
> _______________________________________________
> vpnc-devel mailing list
> vpnc-devel@unix-ag.uni-kl.de
> https://lists.unix-ag.uni-kl.de/mailman/listinfo/vpnc-devel
> http://www.unix-ag.uni-kl.de/~massar/vpnc/
>
>
Re: [PATCH] vpnc: fix connectivity gap and ESP packet loss resulting from phase 2 re-keying [ In reply to ]
Hi Matteo,

On do, 2015-12-10 at 15:05 +0100, matteo brancaleoni wrote:
> Hi,
>
>
> I've tried this patch with a fortigate (dunno which version or aother stuff,because I'm just a customer of the service)
> with no luck, same result as unpatched version, after some time traffic stops caused by rekey errors.
>
>
> regards,
> Matteo
>

Many thanks for testing and really sorry to hear the patch didn't resolve the re-key issues you encounter with Fortigate.

Would you perhaps care to share your vpnc.conf and if possible a debug log of the test you performed?

The vpnc.conf I used during the test is reasonably minimal (and based on connecting to ASA's, obviously):

IPSec gateway XXX.XXX.XXX.XXX
IPSec ID idfoo
IPSec secret secretfoo
Vendor cisco
No Detach
NAT Traversal Mode cisco-udp
Cisco UDP Encapsulation Port 10000
DPD idle timeout (our side) 10

One of the things I came across when testing with the ASA's is that e.g. the following config setting below caused issues:

Perfect Forward Secrecy dh2

And that leaving it at the default value (server) or setting it to nopfs would resolve those issues.

I'm not trying to suggest your config is wrong but without additional information it is impossible to determine where things went wrong and exactly what went wrong.

Kind regards,
Piet


_______________________________________________
vpnc-devel mailing list
vpnc-devel@unix-ag.uni-kl.de
https://lists.unix-ag.uni-kl.de/mailman/listinfo/vpnc-devel
http://www.unix-ag.uni-kl.de/~massar/vpnc/
Re: [PATCH] vpnc: fix connectivity gap and ESP packet loss resulting from phase 2 re-keying [ In reply to ]
Piet, thank you for the patch - i only now discovered this thread,
although i've had issues with vpnc for a few months now.

--- the problem

connecting to a fortigate device, vpn connection dies every 30 minutes,
which is supposed to be phase 2 rekeying.

most often my connection does with :
vpnc: HMAC mismatch in ESP mode
in this case, vpn sort-of stays up (dpd disabled), but any tunnel
traffic stops

in some less frequent cases it does with :
/usr/sbin/vpnc: quick mode response rejected:
(ISAKMP_N_INVALID_MESSAGE_ID)(9)
this is not yet supported by vpnc.

--- current state

currently running vpnc mostly through networkmanager, because i couldn't
figure out quickly enough how to get it to apply dns servers to
resolv.conf :)

what i have currently :
r550 with the following changes.

1. commented out "assert(a->next->type == IKE_ATTRIB_LIFE_DURATION);"
(vpnc doesn't connect at all with that in place)

that might be a crude approach to the patch by Jeff Layton in
http://lists.unix-ag.uni-kl.de/pipermail/vpnc-devel/2015-June/004160.html

2. a patch by Ralph Schmieder for phase 1 rekey
http://lists.unix-ag.uni-kl.de/pipermail/vpnc-devel/2015-June/004163.html

3. your patch

none of the above seems to help with the vpn connection going down every
30 minutes or so.

i would be very glad to provide any further debugging information, do
more testing etc.
--
Rihards
_______________________________________________
vpnc-devel mailing list
vpnc-devel@unix-ag.uni-kl.de
https://lists.unix-ag.uni-kl.de/mailman/listinfo/vpnc-devel
http://www.unix-ag.uni-kl.de/~massar/vpnc/
Re: [PATCH] vpnc: fix connectivity gap and ESP packet loss resulting from phase 2 re-keying [ In reply to ]
Hi Rihards,

On zo, 2016-04-03 at 14:16 +0300, Rihards wrote:
> Piet, thank you for the patch - i only now discovered this thread,
> although i've had issues with vpnc for a few months now.
> --- the problem
> connecting to a fortigate device, vpn connection dies every 30 minutes,
> which is supposed to be phase 2 rekeying.
> most often my connection does with :
> vpnc: HMAC mismatch in ESP mode
> in this case, vpn sort-of stays up (dpd disabled), but any tunnel
> traffic stops
> in some less frequent cases it does with :
> /usr/sbin/vpnc: quick mode response rejected:
> (ISAKMP_N_INVALID_MESSAGE_ID)(9)
> this is not yet supported by vpnc.
> --- current state
> currently running vpnc mostly through networkmanager, because i couldn't
> figure out quickly enough how to get it to apply dns servers to
> resolv.conf :)
> what i have currently :
> r550 with the following changes.
> 1. commented out "assert(a->next->type == IKE_ATTRIB_LIFE_DURATION);"
> (vpnc doesn't connect at all with that in place)
> that might be a crude approach to the patch by Jeff Layton in
> http://lists.unix-ag.uni-kl.de/pipermail/vpnc-devel/2015-June/004160.html
> 2. a patch by Ralph Schmieder for phase 1 rekey
> http://lists.unix-ag.uni-kl.de/pipermail/vpnc-devel/2015-June/004163.html
> 3. your patch
> none of the above seems to help with the vpn connection going down every
> 30 minutes or so.
> i would be very glad to provide any further debugging information, do
> more testing etc.

TL;DR-warning

First, please allow me to stress that I am in no way a maintainer of
vpnc nor am I any way whatsoever responsible for vpnc. I just proposed
some code a while back that had helped me, hoping it would help others,
that's all.

So far people who tried my patch on Fortigate FWs did so without any
luck - so apparently its impact if any is very limited.

The upstream maintainers of vpnc have remained silent for quite some
time now and haven't replied to any of the patch proposals submitted
following the release of r550. That includes my patch proposal too,
obviously.

My _personal_ interpretation of their silence is that they are no longer
actively supporting vpnc.

You may want to take that into consideration before you start spending
loads of time on debugging and trying to pinpoint the issue and hoping
someone else might help you out. In other words, at the risk of sounding
rude, if you're unable to program a solution for your issue yourself it
is hardly likely that you're gonna find someone here who is going to do
the 'dirty' work for you.

Back to my patch.

My patch explicitly addresses an 'issue' on Cisco ASA5510 8.2(4) and
ASA5515X 9.1(4) where I found a delay of 30-55 seconds between VPNC
instantly 'deleting' the old ipsec SA's and moving to the new ones and
the ASA (peer) 'deleting' the old ipsec SA's and moving to the new ones.

I didn't ran into 'HMAC mismatch in ESP mode'-errors. It was just this
extremely annoying delay. In other words: the issue you describe isn't
the issue that I 'fixed'.

For the r550 based version I work with ultimately I ended up doing this:
* vpnc: fix compiler warnings
Compiling vpnc against newer versions of glibc comes with a number
of warnings.
- sysdep.c
- tunip.c
- config.c
- math_group.c
- vpnc.c
Note: Differences between GCC 4.8.4 (Ubuntu 14.04) and
GCC 4.4.7 (SL6.7) make it necessary to exempt 'asprintf'-
occurences in do_config_to_env() from evaluating the result
when compiling with GCC 4.4.7. Not doing so will make GCC
complain 'warning: dereferencing pointer does break
strict-aliasing rules.
* vpnc-script: sync to vpnc-script git repo
Upstream has not sync'd vpnc-script with git repo since Nov 2014.
* vpnc: implement seq_id checking
Vpnc currently lacks support for seq-id checking.
Proposal made available to upstream on Jan 27 2015 by:
David Woodhouse <dwmw2 at XXXXXXX.org>
* vpnc: skip parsing bogus lifetime payload
Fortinet Fortigate firewalls may send an IKE_ATTRIB_LIFE_TYPE with
a value of seconds in it, and then send an attribute specifying the
hash algorithm.
Proposal made available to upstream on Jun 26 2015 by:
Jeff Layton <jlayton at XXXXXXX.net>
* vpnc: implement phase 1 rekeying and QoS and modify IKE keepalive
behavior
Vpnc currently lacks support for IKE phase 1 rekeying and QoS
handling for UDP encap. Additionally, Cisco ASA (rel 9.4) logs show
'IKE runt' entries for every keep-alive sent,
Patch proposal made available to upstream on Jun 27 2015 by:
Ralph Schmieder <ralph.schmieder at XXXXXXX.com>
* vpnc: fix connectivity gap and ESP packet loss resulting from phase 2
re-keying
On Cisco ASA5510 8.2(4) and ASA5515X 9.1(4) there is a delay of
30-55 seconds between VPNC instantly 'deleting' the old ipsec SA's
and moving to the new ones and the ASA (peer) 'deleting' the old
ipsec SA's and moving to the new ones.
Patch proposal made available to upstream on Dec 10 2015 by:
Piet Starreveld <pstarrev at XXXXXXX.com>
* vpnc: fix unknown vendor ids in phase 1 debug log
This patch adds proper debugging for below vendor ids to
vpnc.c:do_phase1_am_packet2():
- Cisco Unity 1.0
- Cisco Concentrator:
- Microsoft L2TP/IPSec VPN Client:
Patch proposal not submitted to upstream.
* vpnc: minor tunip logging and debugging changes
This patch removes obsolete returns from logmsg calls in tunip and
changes LOG_DEBUG logmsg calls to be handled by DEBUG.
Patch proposal not submitted to upstream.

Based on that, things you could consider are:
- ensure you have the latest and greatest vpnc-script from
git://git.infradead.org/users/dwmw2/vpnc-scripts.git
- run vpnc independent of NetworkManager (vpnc-script will mod
/etc/resolv.conf based on the session info vpnc gets from the fw)
- add to r550 the patch that David Woodhouse submitted
- add to r550 the patch that Jeff Layton submitted - as opposed to just
commenting out the assert. Jeff's patch takes care of that, btw.
- add to r550 the patch that Ralph Schmieder submitted
- leave my patch out - my guess is that it won't help you overcome
the issue you ran into

Please let me be clear: using vpnc YMMV, on Cisco and especially on
Fortigate since vpnc was never developed with Fortigates in mind.

Good luck.

Kind regards,
Piet Starreveld



_______________________________________________
vpnc-devel mailing list
vpnc-devel@unix-ag.uni-kl.de
https://lists.unix-ag.uni-kl.de/mailman/listinfo/vpnc-devel
http://www.unix-ag.uni-kl.de/~massar/vpnc/
Re: [PATCH] vpnc: fix connectivity gap and ESP packet loss resulting from phase 2 re-keying [ In reply to ]
On 2016.04.08. 21:02, Piet Starreveld wrote:
>
> Hi Rihards,

all responses inline

> On zo, 2016-04-03 at 14:16 +0300, Rihards wrote:
>> Piet, thank you for the patch - i only now discovered this thread,
>> although i've had issues with vpnc for a few months now.
>> --- the problem
>> connecting to a fortigate device, vpn connection dies every 30 minutes,
>> which is supposed to be phase 2 rekeying.
>> most often my connection does with :
>> vpnc: HMAC mismatch in ESP mode
>> in this case, vpn sort-of stays up (dpd disabled), but any tunnel
>> traffic stops
>> in some less frequent cases it does with :
>> /usr/sbin/vpnc: quick mode response rejected:
>> (ISAKMP_N_INVALID_MESSAGE_ID)(9)
>> this is not yet supported by vpnc.
>> --- current state
>> currently running vpnc mostly through networkmanager, because i couldn't
>> figure out quickly enough how to get it to apply dns servers to
>> resolv.conf :)
>> what i have currently :
>> r550 with the following changes.
>> 1. commented out "assert(a->next->type == IKE_ATTRIB_LIFE_DURATION);"
>> (vpnc doesn't connect at all with that in place)
>> that might be a crude approach to the patch by Jeff Layton in
>> http://lists.unix-ag.uni-kl.de/pipermail/vpnc-devel/2015-June/004160.html
>> 2. a patch by Ralph Schmieder for phase 1 rekey
>> http://lists.unix-ag.uni-kl.de/pipermail/vpnc-devel/2015-June/004163.html
>> 3. your patch
>> none of the above seems to help with the vpn connection going down every
>> 30 minutes or so.
>> i would be very glad to provide any further debugging information, do
>> more testing etc.
>
> TL;DR-warning
>
> First, please allow me to stress that I am in no way a maintainer of
> vpnc nor am I any way whatsoever responsible for vpnc. I just proposed
> some code a while back that had helped me, hoping it would help others,
> that's all.
>
> So far people who tried my patch on Fortigate FWs did so without any
> luck - so apparently its impact if any is very limited.

thank you for the answer. indeed, your patch seems to solve an issue,
specific to cisco devices. it will surely help many people, thank you
for publishing it.

> The upstream maintainers of vpnc have remained silent for quite some
> time now and haven't replied to any of the patch proposals submitted
> following the release of r550. That includes my patch proposal too,
> obviously.
>
> My _personal_ interpretation of their silence is that they are no longer
> actively supporting vpnc.

indeed, with the last release in 2010 and the last svn commit 2 years
ago it's somewhere between dead and very dead :)

> You may want to take that into consideration before you start spending
> loads of time on debugging and trying to pinpoint the issue and hoping
> someone else might help you out. In other words, at the risk of sounding
> rude, if you're unable to program a solution for your issue yourself it
> is hardly likely that you're gonna find someone here who is going to do
> the 'dirty' work for you.

oh, not perceived as rude. that's how opensource communities work and
it's all normal.

my issue indeed seem to be different, as the connection does not
recover, like in your case.

> Back to my patch.
>
> My patch explicitly addresses an 'issue' on Cisco ASA5510 8.2(4) and
> ASA5515X 9.1(4) where I found a delay of 30-55 seconds between VPNC
> instantly 'deleting' the old ipsec SA's and moving to the new ones and
> the ASA (peer) 'deleting' the old ipsec SA's and moving to the new ones.
>
> I didn't ran into 'HMAC mismatch in ESP mode'-errors. It was just this
> extremely annoying delay. In other words: the issue you describe isn't
> the issue that I 'fixed'.
>
> For the r550 based version I work with ultimately I ended up doing this:
> * vpnc: fix compiler warnings
> Compiling vpnc against newer versions of glibc comes with a number
> of warnings.
> - sysdep.c
> - tunip.c
> - config.c
> - math_group.c
> - vpnc.c
> Note: Differences between GCC 4.8.4 (Ubuntu 14.04) and
> GCC 4.4.7 (SL6.7) make it necessary to exempt 'asprintf'-
> occurences in do_config_to_env() from evaluating the result
> when compiling with GCC 4.4.7. Not doing so will make GCC
> complain 'warning: dereferencing pointer does break
> strict-aliasing rules.
> * vpnc-script: sync to vpnc-script git repo
> Upstream has not sync'd vpnc-script with git repo since Nov 2014.
> * vpnc: implement seq_id checking
> Vpnc currently lacks support for seq-id checking.
> Proposal made available to upstream on Jan 27 2015 by:
> David Woodhouse <dwmw2 at XXXXXXX.org>
> * vpnc: skip parsing bogus lifetime payload
> Fortinet Fortigate firewalls may send an IKE_ATTRIB_LIFE_TYPE with
> a value of seconds in it, and then send an attribute specifying the
> hash algorithm.
> Proposal made available to upstream on Jun 26 2015 by:
> Jeff Layton <jlayton at XXXXXXX.net>
> * vpnc: implement phase 1 rekeying and QoS and modify IKE keepalive
> behavior
> Vpnc currently lacks support for IKE phase 1 rekeying and QoS
> handling for UDP encap. Additionally, Cisco ASA (rel 9.4) logs show
> 'IKE runt' entries for every keep-alive sent,
> Patch proposal made available to upstream on Jun 27 2015 by:
> Ralph Schmieder <ralph.schmieder at XXXXXXX.com>
> * vpnc: fix connectivity gap and ESP packet loss resulting from phase 2
> re-keying
> On Cisco ASA5510 8.2(4) and ASA5515X 9.1(4) there is a delay of
> 30-55 seconds between VPNC instantly 'deleting' the old ipsec SA's
> and moving to the new ones and the ASA (peer) 'deleting' the old
> ipsec SA's and moving to the new ones.
> Patch proposal made available to upstream on Dec 10 2015 by:
> Piet Starreveld <pstarrev at XXXXXXX.com>
> * vpnc: fix unknown vendor ids in phase 1 debug log
> This patch adds proper debugging for below vendor ids to
> vpnc.c:do_phase1_am_packet2():
> - Cisco Unity 1.0
> - Cisco Concentrator:
> - Microsoft L2TP/IPSec VPN Client:
> Patch proposal not submitted to upstream.
> * vpnc: minor tunip logging and debugging changes
> This patch removes obsolete returns from logmsg calls in tunip and
> changes LOG_DEBUG logmsg calls to be handled by DEBUG.
> Patch proposal not submitted to upstream.
>
> Based on that, things you could consider are:
> - ensure you have the latest and greatest vpnc-script from
> git://git.infradead.org/users/dwmw2/vpnc-scripts.git
> - run vpnc independent of NetworkManager (vpnc-script will mod
> /etc/resolv.conf based on the session info vpnc gets from the fw)
> - add to r550 the patch that David Woodhouse submitted
> - add to r550 the patch that Jeff Layton submitted - as opposed to just
> commenting out the assert. Jeff's patch takes care of that, btw.
> - add to r550 the patch that Ralph Schmieder submitted
> - leave my patch out - my guess is that it won't help you overcome
> the issue you ran into

huge thank you for the suggestions, will go through all this. i've tried
most of the patches mentioned above, although maybe in a
not-too-methodic way.

i also got hopeful after discovering
http://www.gossamer-threads.com/lists/vpnc/devel/3442 , but it looks
like those two patches are old enough to be already applied in r550.

> Please let me be clear: using vpnc YMMV, on Cisco and especially on
> Fortigate since vpnc was never developed with Fortigates in mind.

well, ipsec is ipsec (except when it's different), and the connection is
established, it works - then drops after 30 minutes. it feels close to
working properly :)
thank you again for the answer and the suggestions.

> Good luck.
>
> Kind regards,
> Piet Starreveld
--
Rihards
_______________________________________________
vpnc-devel mailing list
vpnc-devel@unix-ag.uni-kl.de
https://lists.unix-ag.uni-kl.de/mailman/listinfo/vpnc-devel
http://www.unix-ag.uni-kl.de/~massar/vpnc/