Mailing List Archive

RT/Linux SCHED_RR/_FIXED to combat latency?
Good morning!

We're experiencing rather very bad latency spikes on busy Linux
systems, for example if one machine is the jumphost (ssh -J) for a few
hundred connections, while at the same time handles CPU intensive
tasks.

Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g.
put all ssh processes into the SCHED_FIXED scheduling class, with a
priority higher than the non-interactive compute processes?

Also, do I interpret it correctly that each forwarded TCP connection
has its own process?!

Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: RT/Linux SCHED_RR/_FIXED to combat latency? [ In reply to ]
On Thu, 10 Aug 2023, Cedric Blancher wrote:

>We're experiencing rather very bad latency spikes on busy Linux
>systems, for example if one machine is the jumphost (ssh -J) for a few
>hundred connections, while at the same time handles CPU intensive
>tasks.
>
>Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g.

Did you already check the old and tried method of nice(2)?

If the other load is CPU-intensive, this is usually sufficient.

Normally you’d nice the CPU-intensive load (so anything else on
the system is not affected), but you can also negative-nice the
sshd processes (and therefore, the children) which however may
not be sufficient and could require to negative-nice some other
processes or kernel tasks as well, so see if your scenario can
just positive-nice the load instead.

gl hf,
//mirabilos
--
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

****************************************************
/?\ The UTF-8 Ribbon
? ? Campaign against Mit dem tarent-Newsletter nichts mehr verpassen:
 ?  HTML eMail! Also, https://www.tarent.de/newsletter
? ? header encryption!
****************************************************
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: RT/Linux SCHED_RR/_FIXED to combat latency? [ In reply to ]
On Thu, 10 Aug 2023 at 05:00, Thorsten Glaser <t.glaser@tarent.de> wrote:
>
> On Thu, 10 Aug 2023, Cedric Blancher wrote:
>
> >We're experiencing rather very bad latency spikes on busy Linux
> >systems, for example if one machine is the jumphost (ssh -J) for a few
> >hundred connections, while at the same time handles CPU intensive
> >tasks.
> >
> >Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g.
>
> Did you already check the old and tried method of nice(2)?
>
> If the other load is CPU-intensive, this is usually sufficient.

nice or renice did NOT help to reduce the latency spikes.
Also, I think nice(2) is to govern scheduler priorities, but does
little to improve latency

Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: RT/Linux SCHED_RR/_FIXED to combat latency? [ In reply to ]
On Thu, 10 Aug 2023, Cedric Blancher wrote:

> Good morning!
>
> We're experiencing rather very bad latency spikes on busy Linux
> systems, for example if one machine is the jumphost (ssh -J) for a few
> hundred connections, while at the same time handles CPU intensive
> tasks.
>
> Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g.
> put all ssh processes into the SCHED_FIXED scheduling class, with a
> priority higher than the non-interactive compute processes?

If the problem is load caused by the ssh connections then a different
scheduling class isn't likely to help.

> Also, do I interpret it correctly that each forwarded TCP connection
> has its own process?!

Usually yes. If you're using connection multiplexing (ControlPath/
ControlMaster/ControlPersist) then connections from the same user
through the same jump host can be shared.

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: RT/Linux SCHED_RR/_FIXED to combat latency? [ In reply to ]
On Wed, Aug 9, 2023 at 10:42?PM Cedric Blancher
<cedric.blancher@gmail.com> wrote:
>
> Good morning!
>
> We're experiencing rather very bad latency spikes on busy Linux
> systems, for example if one machine is the jumphost (ssh -J) for a few
> hundred connections, while at the same time handles CPU intensive
> tasks.
>
> Would RT/Linux SCHED_FIXED or SCHED_RR be of help in such a case, e.g.
> put all ssh processes into the SCHED_FIXED scheduling class, with a
> priority higher than the non-interactive compute processes?

Real Time Linux doesn't solve these problems. It attempts to handle
the variety of interrupts more consistently and equitably, but
precisely the sort of "I'm too danged busy with this high priority
process" issues of a highly burdened server are likely to be
*aggraveted* by the incorrect guesses of what processes really matter
on a "real-time" system.

If you know which problems are most important and can raise their
priority, great, but unpredictable delays are usually the sign of a
"too-busy with too many processes" kernel.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: RT/Linux SCHED_RR/_FIXED to combat latency? [ In reply to ]
On Thu, 10 Aug 2023 at 12:47, Cedric Blancher <cedric.blancher@gmail.com> wrote:
[...]
> We're experiencing rather very bad latency spikes on busy Linux
> systems, for example if one machine is the jumphost (ssh -J) for a few
> hundred connections, while at the same time handles CPU intensive
> tasks.

Are these hundreds of connections started around the same time?
Connection establishment is the most computationally expensive part of
the process by some margin, and if you have clients synchronized I
could imagine that causing load spikes.

If that's the case you could try disabling the more expensive key
exchange algorithms ("KexAlgorithms in the config of either the client
or server) or host key algos (HostKeyAlgorithms in the server config).
Try benchmarking the available options, but I'd bet the post-quantum
safe default KexAlgorithm (sntrup761x25519-sha512@openssh.com) is the
most expensive one.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev