Mailing List Archive

1 2  View All
Re: SCP with Resume Feature [ In reply to ]
Is it really the end, though?  Maybe we need to maintain support of SCP
protocol for interoperation with servers and clients that don't do SFTP?



-------- Forwarded Message --------
Subject: Re: SCP with Resume Feature
Date: Tue, 6 Apr 2021 06:18:27 -0400
From: Demi Marie Obenour <demiobenour@gmail.com>
To: Damien Miller <djm@mindrot.org>
CC: openssh-unix-dev@mindrot.org



On 4/5/21 6:22 PM, Damien Miller wrote:
> On Sat, 3 Apr 2021, Demi Marie Obenour wrote:
>
>> On 4/1/21 1:50 PM, rapier wrote:
>>> Howdy all,
>>>
>>> I know development on SCP is discouraged but being that it's still
>>> in wide use I thought I would do some work some of my users have
>>> been asking for and allow SCP to resume from a partial transfer.
>>
>> Would it be possible to instead reimplement SCP in terms of SFTP, and
>> then add
>> this feature to SFTP? My understanding is that such a
>> re-implementation is
>> something many people have wanted for quite a while.
>
> Yes, and there are patches to do this awaiting review:
>
> https://github.com/openssh/openssh-portable/pull/194
>
> -d

Sooner those get merged, the better, IMO. I for one will celebrate the end
of the SCP protocol.

Sincerely,

Demi

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SCP with Resume Feature [ In reply to ]
On Tue, 6 Apr 2021, rapier wrote:

> Looking at the performance - on my systems sftp seems to be a bit slower
> than scp when dealing with a lot of small files. Not sure why this is
> the case as I haven't looked at the sftp code in years.

the OpenSSH sftp client doesn't do inter-file pipelining - it only
pipelines read/writes within a transfer, so each new file causes a
stall.

This is all completely fixable on the client side, and shouldn't apply
to things like sshfs at all.

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SCP with Resume Feature [ In reply to ]
On 4/6/21 10:04 PM, Damien Miller wrote:
> On Tue, 6 Apr 2021, rapier wrote:
>
>> Looking at the performance - on my systems sftp seems to be a bit slower
>> than scp when dealing with a lot of small files. Not sure why this is
>> the case as I haven't looked at the sftp code in years.
>
> the OpenSSH sftp client doesn't do inter-file pipelining - it only
> pipelines read/writes within a transfer, so each new file causes a
> stall.
>
> This is all completely fixable on the client side, and shouldn't apply
> to things like sshfs at all.

Gotcha. Is this because of how it sequentially loops through the
readdirs in two _dir_internal functions? If so I'm wondering if you
could spawn per file threads to get some concurrency within a directory.
Just curious and this is the first time I've looked at the sftp code in
years. I hope you don't mind the questions.

Chris
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SCP with Resume Feature [ In reply to ]
I think I saw a fallback in the code. I only looked at it quickly though.

On 4/6/21 8:42 PM, David Newall wrote:
> Is it really the end, though?  Maybe we need to maintain support of SCP
> protocol for interoperation with servers and clients that don't do SFTP?
>
>
>
> -------- Forwarded Message --------
> Subject:     Re: SCP with Resume Feature
> Date:     Tue, 6 Apr 2021 06:18:27 -0400
> From:     Demi Marie Obenour <demiobenour@gmail.com>
> To:     Damien Miller <djm@mindrot.org>
> CC:     openssh-unix-dev@mindrot.org
>
>
>
> On 4/5/21 6:22 PM, Damien Miller wrote:
>> On Sat, 3 Apr 2021, Demi Marie Obenour wrote:
>>
>>> On 4/1/21 1:50 PM, rapier wrote:
>>>> Howdy all,
>>>>
>>>> I know development on SCP is discouraged but being that it's still
>>>> in wide use I thought I would do some work some of my users have
>>>> been asking for and allow SCP to resume from a partial transfer.
>>>
>>> Would it be possible to instead reimplement SCP in terms of SFTP, and
>>> then add
>>> this feature to SFTP? My understanding is that such a
>>> re-implementation is
>>> something many people have wanted for quite a while.
>>
>> Yes, and there are patches to do this awaiting review:
>>
>> https://github.com/openssh/openssh-portable/pull/194
>>
>> -d
>
> Sooner those get merged, the better, IMO. I for one will celebrate the end
> of the SCP protocol.
>
> Sincerely,
>
> Demi
>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev@mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SCP with Resume Feature [ In reply to ]
On Tue, 6 Apr 2021, rapier wrote:

> On 4/6/21 10:04 PM, Damien Miller wrote:
> > On Tue, 6 Apr 2021, rapier wrote:
> >
> > > Looking at the performance - on my systems sftp seems to be a bit slower
> > > than scp when dealing with a lot of small files. Not sure why this is
> > > the case as I haven't looked at the sftp code in years.
> >
> > the OpenSSH sftp client doesn't do inter-file pipelining - it only
> > pipelines read/writes within a transfer, so each new file causes a
> > stall.
> >
> > This is all completely fixable on the client side, and shouldn't apply
> > to things like sshfs at all.
>
> Gotcha. Is this because of how it sequentially loops through the readdirs in
> two _dir_internal functions?

Only partly - the client will do SSH2_FXP_READDIR to get the full list of
files and then transfer each file separately. The SSH2_FXP_READDIR are not
pipelined at all, there is no pipelining between obtaining the file list
and the file transfers. Finally each file transfer incurrs a pipeline
stall upon completion.

> If so I'm wondering if you could spawn per file
> threads to get some concurrency within a directory.

I don't think we want a threaded sftp client and AFAIK it isn't necessary
for the main problem. We could add inter-operation pipelining by adding a
work queue structure and driving the next operation from that. This is
similar to what happens inside do_download()/do_upload() already, but
extended to persist across and between different operations.

> Just curious and this is
> the first time I've looked at the sftp code in years. I hope you don't mind
> the questions.

Not at all :)

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SCP with Resume Feature [ In reply to ]
On Apr 6, 2021, at 10:47 PM, Damien Miller <djm@mindrot.org> wrote:
> On Tue, 6 Apr 2021, rapier wrote:
>> On 4/6/21 10:04 PM, Damien Miller wrote:
>>> On Tue, 6 Apr 2021, rapier wrote:
>>>
>>>> Looking at the performance - on my systems sftp seems to be a bit slower
>>>> than scp when dealing with a lot of small files. Not sure why this is
>>>> the case as I haven't looked at the sftp code in years.
>>>
>>> the OpenSSH sftp client doesn't do inter-file pipelining - it only
>>> pipelines read/writes within a transfer, so each new file causes a
>>> stall.
>>>
>>> This is all completely fixable on the client side, and shouldn't apply
>>> to things like sshfs at all.
>>
>> Gotcha. Is this because of how it sequentially loops through the readdirs in
>> two _dir_internal functions?
>
> Only partly - the client will do SSH2_FXP_READDIR to get the full list of
> files and then transfer each file separately. The SSH2_FXP_READDIR are not
> pipelined at all, there is no pipelining between obtaining the file list
> and the file transfers. Finally each file transfer incurrs a pipeline
> stall upon completion.

The good news here is that from a protocol standpoint a server can already break up a READDIR response into multiple chunks. So, while there will still be a stall between READDIR calls on a directory with a very large number of files, a client can start to pipeline the transfers of those files or recursive READDIR calls for subdirectories without waiting for the entire listing of files in the parent directory to be returned, once there’s some mechanism in place to manage that work. To avoid overwhelming the server, you’ll probably want to put a cap on the number of simultaneous requests to any given server, but that can all be managed in the client.

In AsyncSSH, I implemented a scandir() call that returns an async iterator of remote directory entries from READDIR that begins to return results even before the full list of file names in a directory has been returned and then used that to implement an rmtree() call on the client which parallelized recursive deletion of a remote directory tree and saw a significant speedup on trees with a large number of files/subdirectories. I haven’t yet updated my recursive file transfer client code to leverage this since there was already a good amount of parallelism on the transfers themselves, but perhaps I’ll look into doing this next. With a large number of very small files, I would expect to see some benefit from that.

That said, is the SCP implementation in OpenSSH currently doing any file-level parallelization? I wouldn’t expect it to, so I’m not sure that would explain the performance difference. If I had to guess, it’s more likely due to the fact that there’s a single round-trip with SCP for each file transfer, whereas SFTP involves separate requests to do an open(), read(), stat(), etc. each of which has its own round-trip. Some of those (such as the read() calls) are parallelized, but you still have to pay for the open() before beginning the reads, and possibly for other things like stat() when preserving attributes.
--
Ron Frederick
ronf@timeheart.net



_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SCP with Resume Feature [ In reply to ]
On 4/7/21 10:41 AM, Ron Frederick wrote:

> That said, is the SCP implementation in OpenSSH currently doing any file-level parallelization? I wouldn’t expect it to, so I’m not sure that would explain the performance difference. If I had to guess, it’s more likely due to the fact that there’s a single round-trip with SCP for each file transfer, whereas SFTP involves separate requests to do an open(), read(), stat(), etc. each of which has its own round-trip. Some of those (such as the read() calls) are parallelized, but you still have to pay for the open() before beginning the reads, and possibly for other things like stat() when preserving attributes.
>

No parallelization at all. It's something I thought about but it's
something I'll have to come back to when I have time. There are other
deliverables for this project I need to focus on. As for the number of
RTs - there are a couple of message round trips but nothing all that
much. The resume feature increases the number of RTs but it's still faster.

I absolutely agree with Damien about the pipeline stalling being the
major factor. Anyway, I've been looking at learning more about
pipelining. :)

In some cases there *might* be an issue with hitting the outstanding
message request limit but that's not what's happening here. I really do
want to take a closer look at this - especially if SCP is going to
default to the SFTP protocol soon. In the high performance computing
community we do have faster transport tools like GridFTP and Aspera but
they have some serious barriers to entry for a lot of users. SCP is
still widely used for transferring large data sets (people moving TBs of
data via SCP isn't uncommon where I work) so performance in those
environments is a concern of mine.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

1 2  View All