Mailing List Archive

[PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr()
If re-registering an MR in hns_roce_rereg_user_mr(), we should
return NULL instead of pass 0 to ERR_PTR.

Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 006c84bb3f9f..7089ac780291 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
free_cmd_mbox:
hns_roce_free_cmd_mailbox(hr_dev, mailbox);

- return ERR_PTR(ret);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
}

int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
--
2.17.1
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> return NULL instead of pass 0 to ERR_PTR.
>
> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
> drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> index 006c84bb3f9f..7089ac780291 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
> free_cmd_mbox:
> hns_roce_free_cmd_mailbox(hr_dev, mailbox);
>
> - return ERR_PTR(ret);
> + if (ret)
> + return ERR_PTR(ret);
> + return NULL;
> }

I don't understand this function, it returns or ERR_PTR() or NULL, but
should return &mr->ibmr in success path. How does it work?

Thanks

>
> int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> --
> 2.17.1
>
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On 2021/8/4 21:53, Leon Romanovsky wrote:
> On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
>> If re-registering an MR in hns_roce_rereg_user_mr(), we should
>> return NULL instead of pass 0 to ERR_PTR.
>>
>> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> ---
>> drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
>> index 006c84bb3f9f..7089ac780291 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
>> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
>> free_cmd_mbox:
>> hns_roce_free_cmd_mailbox(hr_dev, mailbox);
>>
>> - return ERR_PTR(ret);
>> + if (ret)
>> + return ERR_PTR(ret);
>> + return NULL;
>> }
>
> I don't understand this function, it returns or ERR_PTR() or NULL, but
> should return &mr->ibmr in success path. How does it work?

Did you means hns_roce_reg_user_mr()?

hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,

In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:

829 new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
830 cmd.hca_va, cmd.access_flags, new_pd,
831 &attrs->driver_udata);
832 if (IS_ERR(new_mr)) {
833 ret = PTR_ERR(new_mr);
834 goto put_new_uobj;
835 }
836 if (new_mr) {
.....
860 mr = new_mr;
861 } else {
862 if (cmd.flags & IB_MR_REREG_PD) {
863 atomic_dec(&orig_pd->usecnt);
864 mr->pd = new_pd;
865 atomic_inc(&new_pd->usecnt);
866 }
867 if (cmd.flags & IB_MR_REREG_TRANS)
868 mr->iova = cmd.hca_va;
869 }


>
> Thanks
>
>>
>> int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>> --
>> 2.17.1
>>
> .
>
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On Thu, Aug 05, 2021 at 10:36:03AM +0800, YueHaibing wrote:
> On 2021/8/4 21:53, Leon Romanovsky wrote:
> > On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> >> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> >> return NULL instead of pass 0 to ERR_PTR.
> >>
> >> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> >> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> >> ---
> >> drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> index 006c84bb3f9f..7089ac780291 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
> >> free_cmd_mbox:
> >> hns_roce_free_cmd_mailbox(hr_dev, mailbox);
> >>
> >> - return ERR_PTR(ret);
> >> + if (ret)
> >> + return ERR_PTR(ret);
> >> + return NULL;
> >> }
> >
> > I don't understand this function, it returns or ERR_PTR() or NULL, but
> > should return &mr->ibmr in success path. How does it work?
>
> Did you means hns_roce_reg_user_mr()?
>
> hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,
>
> In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:
>
> 829 new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
> 830 cmd.hca_va, cmd.access_flags, new_pd,
> 831 &attrs->driver_udata);
> 832 if (IS_ERR(new_mr)) {
> 833 ret = PTR_ERR(new_mr);
> 834 goto put_new_uobj;
> 835 }
> 836 if (new_mr) {
> .....
> 860 mr = new_mr;
> 861 } else {
> 862 if (cmd.flags & IB_MR_REREG_PD) {
> 863 atomic_dec(&orig_pd->usecnt);
> 864 mr->pd = new_pd;
> 865 atomic_inc(&new_pd->usecnt);
> 866 }
> 867 if (cmd.flags & IB_MR_REREG_TRANS)
> 868 mr->iova = cmd.hca_va;
> 869 }

You overwrite various fields in old_mr when executing hns_roce_rereg_user_mr().
For example mr->access flags, which is not returned to the original
state after all failures.

Also I'm not so sure about if it is valid to return NULL in all flows.

Thanks

>
>
> >
> > Thanks
> >
> >>
> >> int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> >> --
> >> 2.17.1
> >>
> > .
> >
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On 2021/8/5 11:40, Leon Romanovsky wrote:
> On Thu, Aug 05, 2021 at 10:36:03AM +0800, YueHaibing wrote:
>> On 2021/8/4 21:53, Leon Romanovsky wrote:
>>> On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
>>>> If re-registering an MR in hns_roce_rereg_user_mr(), we should
>>>> return NULL instead of pass 0 to ERR_PTR.
>>>>
>>>> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
>>>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>>>> ---
>>>> drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
>>>> index 006c84bb3f9f..7089ac780291 100644
>>>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
>>>> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
>>>> free_cmd_mbox:
>>>> hns_roce_free_cmd_mailbox(hr_dev, mailbox);
>>>>
>>>> - return ERR_PTR(ret);
>>>> + if (ret)
>>>> + return ERR_PTR(ret);
>>>> + return NULL;
>>>> }
>>>
>>> I don't understand this function, it returns or ERR_PTR() or NULL, but
>>> should return &mr->ibmr in success path. How does it work?
>>
>> Did you means hns_roce_reg_user_mr()?
>>
>> hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,
>>
>> In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:
>>
>> 829 new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
>> 830 cmd.hca_va, cmd.access_flags, new_pd,
>> 831 &attrs->driver_udata);
>> 832 if (IS_ERR(new_mr)) {
>> 833 ret = PTR_ERR(new_mr);
>> 834 goto put_new_uobj;
>> 835 }
>> 836 if (new_mr) {
>> .....
>> 860 mr = new_mr;
>> 861 } else {
>> 862 if (cmd.flags & IB_MR_REREG_PD) {
>> 863 atomic_dec(&orig_pd->usecnt);
>> 864 mr->pd = new_pd;
>> 865 atomic_inc(&new_pd->usecnt);
>> 866 }
>> 867 if (cmd.flags & IB_MR_REREG_TRANS)
>> 868 mr->iova = cmd.hca_va;
>> 869 }
>
> You overwrite various fields in old_mr when executing hns_roce_rereg_user_mr().
> For example mr->access flags, which is not returned to the original
> state after all failures.

IMO, if ibv_rereg_mr failed, the mr is in undefined state, user needs to call
ibv_dereg_mr in order to release it, so there no need to recover the original state.

Also? mlx4_ib_rereg_user_mr seems to do the same thing.

>
> Also I'm not so sure about if it is valid to return NULL in all flows.
>
> Thanks
>
>>
>>
>>>
>>> Thanks
>>>
>>>>
>>>> int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>>>> --
>>>> 2.17.1
>>>>
>>> .
>>>
> .
>
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On Thu, Aug 05, 2021 at 05:29:25PM +0800, YueHaibing wrote:
> On 2021/8/5 11:40, Leon Romanovsky wrote:
> > On Thu, Aug 05, 2021 at 10:36:03AM +0800, YueHaibing wrote:
> >> On 2021/8/4 21:53, Leon Romanovsky wrote:
> >>> On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> >>>> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> >>>> return NULL instead of pass 0 to ERR_PTR.
> >>>>
> >>>> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> >>>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> >>>> ---
> >>>> drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
> >>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >>>> index 006c84bb3f9f..7089ac780291 100644
> >>>> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> >>>> @@ -352,7 +352,9 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
> >>>> free_cmd_mbox:
> >>>> hns_roce_free_cmd_mailbox(hr_dev, mailbox);
> >>>>
> >>>> - return ERR_PTR(ret);
> >>>> + if (ret)
> >>>> + return ERR_PTR(ret);
> >>>> + return NULL;
> >>>> }
> >>>
> >>> I don't understand this function, it returns or ERR_PTR() or NULL, but
> >>> should return &mr->ibmr in success path. How does it work?
> >>
> >> Did you means hns_roce_reg_user_mr()?
> >>
> >> hns_roce_rereg_user_mr() returns ERR_PTR() on failure, and return NULL on success,
> >>
> >> In ib_uverbs_rereg_mr(), old mr will be used if rereg_user_mr() return NULL, see:
> >>
> >> 829 new_mr = ib_dev->ops.rereg_user_mr(mr, cmd.flags, cmd.start, cmd.length,
> >> 830 cmd.hca_va, cmd.access_flags, new_pd,
> >> 831 &attrs->driver_udata);
> >> 832 if (IS_ERR(new_mr)) {
> >> 833 ret = PTR_ERR(new_mr);
> >> 834 goto put_new_uobj;
> >> 835 }
> >> 836 if (new_mr) {
> >> .....
> >> 860 mr = new_mr;
> >> 861 } else {
> >> 862 if (cmd.flags & IB_MR_REREG_PD) {
> >> 863 atomic_dec(&orig_pd->usecnt);
> >> 864 mr->pd = new_pd;
> >> 865 atomic_inc(&new_pd->usecnt);
> >> 866 }
> >> 867 if (cmd.flags & IB_MR_REREG_TRANS)
> >> 868 mr->iova = cmd.hca_va;
> >> 869 }
> >
> > You overwrite various fields in old_mr when executing hns_roce_rereg_user_mr().
> > For example mr->access flags, which is not returned to the original
> > state after all failures.
>
> IMO, if ibv_rereg_mr failed, the mr is in undefined state, user needs to call
> ibv_dereg_mr in order to release it, so there no need to recover the original state.

The thing is that it undefined state in the kernel.
What will be if user will change access_flags and try to use that
"broken" MR anyway? Will you catch it?

>
> Also? mlx4_ib_rereg_user_mr seems to do the same thing.

mlx4 does many crazy things.

>
> >
> > Also I'm not so sure about if it is valid to return NULL in all flows.
> >
> > Thanks
> >
> >>
> >>
> >>>
> >>> Thanks
> >>>
> >>>>
> >>>> int hns_roce_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> >>>> --
> >>>> 2.17.1
> >>>>
> >>> .
> >>>
> > .
> >
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On Thu, Aug 05, 2021 at 01:58:53PM +0300, Leon Romanovsky wrote:

> > IMO, if ibv_rereg_mr failed, the mr is in undefined state, user
> > needs to call ibv_dereg_mr in order to release it, so there no
> > need to recover the original state.
>
> The thing is that it undefined state in the kernel. What will be if
> user will change access_flags and try to use that "broken" MR
> anyway? Will you catch it?

rereg is not atomic, if the rereg fails in the middle the mr should be
left in some safe state.

Jason
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On Thu, Aug 05, 2021 at 09:23:11AM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 05, 2021 at 01:58:53PM +0300, Leon Romanovsky wrote:
>
> > > IMO, if ibv_rereg_mr failed, the mr is in undefined state, user
> > > needs to call ibv_dereg_mr in order to release it, so there no
> > > need to recover the original state.
> >
> > The thing is that it undefined state in the kernel. What will be if
> > user will change access_flags and try to use that "broken" MR
> > anyway? Will you catch it?
>
> rereg is not atomic, if the rereg fails in the middle the mr should be
> left in some safe state.

It is not the case in the hns flow, they leave such MR in limbo state.

>
> Jason
Re: [PATCH -next] RDMA/hns: Fix return in hns_roce_rereg_user_mr() [ In reply to ]
On Wed, Aug 04, 2021 at 08:59:39PM +0800, YueHaibing wrote:
> If re-registering an MR in hns_roce_rereg_user_mr(), we should
> return NULL instead of pass 0 to ERR_PTR.
>
> Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)

Applied to for-next, though hns should be checked to ensure MRs are
not left in some broken state after rereg failure.

Thanks,
Jason