Mailing List Archive: net/sctp: recursive locking in sctp_do_peeloff

net/sctp: recursive locking in sctp_do_peeloff

dvyukov at google

Mar 10, 2017, 12:20 PM

Post #1 of 8 (18 views)

Hello,

I've got the following recursive locking report while running
syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:

[ INFO: possible recursive locking detected ]
4.10.0+ #14 Not tainted
---------------------------------------------
syz-executor3/5560 is trying to acquire lock:
(sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>] lock_sock
include/net/sock.h:1460 [inline]
(sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>]
sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497

but task is already holding lock:
(sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
include/net/sock.h:1460 [inline]
(sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(sk_lock-AF_INET6);
lock(sk_lock-AF_INET6);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz-executor3/5560:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
include/net/sock.h:1460 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611

stack backtrace:
CPU: 0 PID: 5560 Comm: syz-executor3 Not tainted 4.10.0+ #14
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
check_deadlock kernel/locking/lockdep.c:1773 [inline]
validate_chain kernel/locking/lockdep.c:2251 [inline]
__lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
lock_sock include/net/sock.h:1460 [inline]
sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
sock_release+0x8d/0x1e0 net/socket.c:597
__sock_create+0x38b/0x870 net/socket.c:1226
sock_create+0x7f/0xa0 net/socket.c:1237
sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
SYSC_getsockopt net/socket.c:1817 [inline]
SyS_getsockopt+0x240/0x380 net/socket.c:1799
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44fb79
RSP: 002b:00007f35f232bb58 EFLAGS: 00000212 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 0000000000000084 RCX: 000000000044fb79
RDX: 0000000000000066 RSI: 0000000000000084 RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000020119000 R09: 0000000000000000
R10: 000000002058dff8 R11: 0000000000000212 R12: 0000000000708000
R13: 0000000000000103 R14: 0000000000000001 R15: 0000000000000000

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

marcelo.leitner at gmail

Mar 10, 2017, 12:20 PM

Post #2 of 8 (18 views)

On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> I've got the following recursive locking report while running
> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>
> [ INFO: possible recursive locking detected ]
> 4.10.0+ #14 Not tainted
> ---------------------------------------------
> syz-executor3/5560 is trying to acquire lock:
> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>] lock_sock
> include/net/sock.h:1460 [inline]
> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>]
> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>
> but task is already holding lock:
> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
> include/net/sock.h:1460 [inline]
> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(sk_lock-AF_INET6);
> lock(sk_lock-AF_INET6);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation

Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
on one socket, while the other lock that sctp_close() is getting later
is on the newly created (which failed) socket during peeloff
operation.

I don´t know how to fix this nesting notation in this situation, but
any idea why sock_create failed? Seems security_socket_post_create()
failed in there, so sock_release was called with sock->ops still
valid.

>
> 1 lock held by syz-executor3/5560:
> #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
> include/net/sock.h:1460 [inline]
> #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>
> stack backtrace:
> CPU: 0 PID: 5560 Comm: syz-executor3 Not tainted 4.10.0+ #14
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
> print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
> check_deadlock kernel/locking/lockdep.c:1773 [inline]
> validate_chain kernel/locking/lockdep.c:2251 [inline]
> __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
> lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
> lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
> lock_sock include/net/sock.h:1460 [inline]
> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
> inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
> inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
> sock_release+0x8d/0x1e0 net/socket.c:597
> __sock_create+0x38b/0x870 net/socket.c:1226
> sock_create+0x7f/0xa0 net/socket.c:1237
> sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
> sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
> sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
> sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
> SYSC_getsockopt net/socket.c:1817 [inline]
> SyS_getsockopt+0x240/0x380 net/socket.c:1799
> entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x44fb79
> RSP: 002b:00007f35f232bb58 EFLAGS: 00000212 ORIG_RAX: 0000000000000037
> RAX: ffffffffffffffda RBX: 0000000000000084 RCX: 000000000044fb79
> RDX: 0000000000000066 RSI: 0000000000000084 RDI: 0000000000000006
> RBP: 0000000000000006 R08: 0000000020119000 R09: 0000000000000000
> R10: 000000002058dff8 R11: 0000000000000212 R12: 0000000000708000
> R13: 0000000000000103 R14: 0000000000000001 R15: 0000000000000000

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

dvyukov at google

Mar 10, 2017, 12:20 PM

Post #3 of 8 (18 views)

On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Hello,
>>
>> I've got the following recursive locking report while running
>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>>
>> [ INFO: possible recursive locking detected ]
>> 4.10.0+ #14 Not tainted
>> ---------------------------------------------
>> syz-executor3/5560 is trying to acquire lock:
>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>] lock_sock
>> include/net/sock.h:1460 [inline]
>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>]
>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>>
>> but task is already holding lock:
>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
>> include/net/sock.h:1460 [inline]
>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(sk_lock-AF_INET6);
>> lock(sk_lock-AF_INET6);
>>
>> *** DEADLOCK ***
>>
>> May be due to missing lock nesting notation
>
> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
> on one socket, while the other lock that sctp_close() is getting later
> is on the newly created (which failed) socket during peeloff
> operation.

Does this mean that never-ever lock 2 sockets at a time except for
this case? If so, it probably suggests that this case should not do it
either.

> I don´t know how to fix this nesting notation in this situation, but
> any idea why sock_create failed? Seems security_socket_post_create()
> failed in there, so sock_release was called with sock->ops still
> valid.

No idea. The fuzzer frequently creates low memory conditions, but
there are no alloc failures messages in the log (maybe some allocation
used NOWARN?).

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

xiyou.wangcong at gmail

Mar 14, 2017, 10:20 PM

Post #4 of 8 (17 views)

On Fri, Mar 10, 2017 at 12:04 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
>> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> Hello,
>>>
>>> I've got the following recursive locking report while running
>>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>>>
>>> [ INFO: possible recursive locking detected ]
>>> 4.10.0+ #14 Not tainted
>>> ---------------------------------------------
>>> syz-executor3/5560 is trying to acquire lock:
>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>] lock_sock
>>> include/net/sock.h:1460 [inline]
>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>]
>>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>>>
>>> but task is already holding lock:
>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
>>> include/net/sock.h:1460 [inline]
>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
>>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>>>
>>> other info that might help us debug this:
>>> Possible unsafe locking scenario:
>>>
>>> CPU0
>>> ----
>>> lock(sk_lock-AF_INET6);
>>> lock(sk_lock-AF_INET6);
>>>
>>> *** DEADLOCK ***
>>>
>>> May be due to missing lock nesting notation
>>
>> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
>> on one socket, while the other lock that sctp_close() is getting later
>> is on the newly created (which failed) socket during peeloff
>> operation.
>
>
> Does this mean that never-ever lock 2 sockets at a time except for
> this case? If so, it probably suggests that this case should not do it
> either.
>

Yeah, actually for the error path we don't even need to lock sock
since it is newly allocated and no one else could see it yet.

Instead of checking for the status of the sock, I believe the following
one-line fix should do the trick too. Can you give it a try?

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 0f378ea..4de62d4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)

pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);

- lock_sock(sk);
+ lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
sk->sk_shutdown = SHUTDOWN_MASK;
sk->sk_state = SCTP_SS_CLOSING;

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

dvyukov at google

Mar 15, 2017, 4:20 AM

Post #5 of 8 (17 views)

On Wed, Mar 15, 2017 at 5:52 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Mar 10, 2017 at 12:04 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>>> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> Hello,
>>>>
>>>> I've got the following recursive locking report while running
>>>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>>>>
>>>> [ INFO: possible recursive locking detected ]
>>>> 4.10.0+ #14 Not tainted
>>>> ---------------------------------------------
>>>> syz-executor3/5560 is trying to acquire lock:
>>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>] lock_sock
>>>> include/net/sock.h:1460 [inline]
>>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>]
>>>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>>>>
>>>> but task is already holding lock:
>>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
>>>> include/net/sock.h:1460 [inline]
>>>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
>>>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>>>>
>>>> other info that might help us debug this:
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0
>>>> ----
>>>> lock(sk_lock-AF_INET6);
>>>> lock(sk_lock-AF_INET6);
>>>>
>>>> *** DEADLOCK ***
>>>>
>>>> May be due to missing lock nesting notation
>>>
>>> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
>>> on one socket, while the other lock that sctp_close() is getting later
>>> is on the newly created (which failed) socket during peeloff
>>> operation.
>>
>>
>> Does this mean that never-ever lock 2 sockets at a time except for
>> this case? If so, it probably suggests that this case should not do it
>> either.
>>
>
> Yeah, actually for the error path we don't even need to lock sock
> since it is newly allocated and no one else could see it yet.
>
> Instead of checking for the status of the sock, I believe the following
> one-line fix should do the trick too. Can you give it a try?
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 0f378ea..4de62d4 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>
> pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>
> - lock_sock(sk);
> + lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
> sk->sk_shutdown = SHUTDOWN_MASK;
> sk->sk_state = SCTP_SS_CLOSING;

Hi Cong,

I've applied the patch on bots. But so far it happened only once, so I
am not sure I will be able to give any conclusion.

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

marcelo.leitner at gmail

Mar 15, 2017, 6:20 AM

Post #6 of 8 (17 views)

On Tue, Mar 14, 2017 at 09:52:15PM -0700, Cong Wang wrote:
> On Fri, Mar 10, 2017 at 12:04 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
> > <marcelo.leitner@gmail.com> wrote:
> >> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >>> Hello,
> >>>
> >>> I've got the following recursive locking report while running
> >>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
> >>>
> >>> [ INFO: possible recursive locking detected ]
> >>> 4.10.0+ #14 Not tainted
> >>> ---------------------------------------------
> >>> syz-executor3/5560 is trying to acquire lock:
> >>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>] lock_sock
> >>> include/net/sock.h:1460 [inline]
> >>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8401ebcd>]
> >>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
> >>>
> >>> but task is already holding lock:
> >>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>] lock_sock
> >>> include/net/sock.h:1460 [inline]
> >>> (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff84038110>]
> >>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
> >>>
> >>> other info that might help us debug this:
> >>> Possible unsafe locking scenario:
> >>>
> >>> CPU0
> >>> ----
> >>> lock(sk_lock-AF_INET6);
> >>> lock(sk_lock-AF_INET6);
> >>>
> >>> *** DEADLOCK ***
> >>>
> >>> May be due to missing lock nesting notation
> >>
> >> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
> >> on one socket, while the other lock that sctp_close() is getting later
> >> is on the newly created (which failed) socket during peeloff
> >> operation.
> >
> >
> > Does this mean that never-ever lock 2 sockets at a time except for
> > this case? If so, it probably suggests that this case should not do it
> > either.
> >
>
> Yeah, actually for the error path we don't even need to lock sock
> since it is newly allocated and no one else could see it yet.
>

Agreed.

> Instead of checking for the status of the sock, I believe the following
> one-line fix should do the trick too. Can you give it a try?
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 0f378ea..4de62d4 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>
> pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>
> - lock_sock(sk);
> + lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
> sk->sk_shutdown = SHUTDOWN_MASK;
> sk->sk_state = SCTP_SS_CLOSING;

I refrained on doing this just because it will change the lock signature
for the first level too, as sctp_close() can be called directly, and
might avoid some other lockdep detections.

Then you probably also need:
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 465a9c8464f9..02506b4406d2 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1543,7 +1543,7 @@ static void sctp_close(struct sock *sk, long timeout)
* held and that should be grabbed before socket lock.
*/
spin_lock_bh(&net->sctp.addr_wq_lock);
- bh_lock_sock(sk);
+ bh_lock_sock_nested(sk);

/* Hold the sock, since sk_common_release() will put sock_put()
* and we have just a little more cleanup.

because sctp_close will re-lock the socket a little later (for backlog
processing).

Marcelo

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

xiyou.wangcong at gmail

Mar 15, 2017, 12:20 PM

Post #7 of 8 (17 views)

On Wed, Mar 15, 2017 at 5:52 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Tue, Mar 14, 2017 at 09:52:15PM -0700, Cong Wang wrote:
>> Instead of checking for the status of the sock, I believe the following
>> one-line fix should do the trick too. Can you give it a try?
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 0f378ea..4de62d4 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>>
>> pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>>
>> - lock_sock(sk);
>> + lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
>> sk->sk_shutdown = SHUTDOWN_MASK;
>> sk->sk_state = SCTP_SS_CLOSING;
>
> I refrained on doing this just because it will change the lock signature
> for the first level too, as sctp_close() can be called directly, and
> might avoid some other lockdep detections.

I knew, but for the first level it is fine to use a different class,
it is merely to make lockdep happy. There is no real deadlock here
since they are two different socks anyway.

>
> Then you probably also need:
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 465a9c8464f9..02506b4406d2 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1543,7 +1543,7 @@ static void sctp_close(struct sock *sk, long timeout)
> * held and that should be grabbed before socket lock.
> */
> spin_lock_bh(&net->sctp.addr_wq_lock);
> - bh_lock_sock(sk);
> + bh_lock_sock_nested(sk);
>
> /* Hold the sock, since sk_common_release() will put sock_put()
> * and we have just a little more cleanup.
>
> because sctp_close will re-lock the socket a little later (for backlog
> processing).
>

Ah, of course I missed the re-lock. Dmitry, please add this piece too.

Thanks.

Re: net/sctp: recursive locking in sctp_do_peeloff [ In reply to ]

dvyukov at google

Mar 22, 2017, 6:20 AM

Post #8 of 8 (17 views)

On Wed, Mar 15, 2017 at 7:19 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 5:52 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
>> On Tue, Mar 14, 2017 at 09:52:15PM -0700, Cong Wang wrote:
>>> Instead of checking for the status of the sock, I believe the following
>>> one-line fix should do the trick too. Can you give it a try?
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index 0f378ea..4de62d4 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>>>
>>> pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>>>
>>> - lock_sock(sk);
>>> + lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
>>> sk->sk_shutdown = SHUTDOWN_MASK;
>>> sk->sk_state = SCTP_SS_CLOSING;
>>
>> I refrained on doing this just because it will change the lock signature
>> for the first level too, as sctp_close() can be called directly, and
>> might avoid some other lockdep detections.
>
> I knew, but for the first level it is fine to use a different class,
> it is merely to make lockdep happy. There is no real deadlock here
> since they are two different socks anyway.
>
>>
>> Then you probably also need:
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 465a9c8464f9..02506b4406d2 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -1543,7 +1543,7 @@ static void sctp_close(struct sock *sk, long timeout)
>> * held and that should be grabbed before socket lock.
>> */
>> spin_lock_bh(&net->sctp.addr_wq_lock);
>> - bh_lock_sock(sk);
>> + bh_lock_sock_nested(sk);
>>
>> /* Hold the sock, since sk_common_release() will put sock_put()
>> * and we have just a little more cleanup.
>>
>> because sctp_close will re-lock the socket a little later (for backlog
>> processing).
>>
>
> Ah, of course I missed the re-lock. Dmitry, please add this piece too.

applied