Mailing List Archive: Compared performance of Varnish Cache on x86

Compared performance of Varnish Cache on x86_64 and aarch64

Jul 28, 2020, 4:52 AM

Post #1 of 12 (1956 views)

Hello Varnish community,

I've just posted an article [1] about comparing the performance of Varnish
Cache on two similar
machines - the main difference is the CPU architecture - x86_64 vs aarch64.
It uses a specific use case - the backend service just returns a static
content. The idea is
to compare Varnish on the different architectures but also to compare
Varnish against the backend HTTP server.
What is interesting is that Varnish gives the same throughput as the
backend server on x86_64 but on aarch64 it is around 30% slower than the
backend.

Any feedback and ideas how to tweak it (VCL or even patches) are very
welcome!

Regards,
Martin

1.
https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f?sk=1be4c19efc17504fa1afb53dc1d8ef92

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

phk at phk

Jul 28, 2020, 7:01 AM

Post #2 of 12 (1956 views)

Permalink

--------
Martin Grigorov writes:

> Any feedback and ideas how to tweak it (VCL or even patches) are very
> welcome!

First you need to tweak your benchmark setup.

aarch64

Thread Stats Avg Stdev Max +/- Stdev
Latency 655.40us 798.70us 28.43ms 90.52%

Strictly speaking, you cannot rule out that the ARM machine
sends responses before it receives the request, because your
standard deviation is larger than your average.

In other words: Those numbers tell us nothing.

If you want to do this comparison, and I would love for you to do so,
you really need to take the time it takes, and get your "noise" down.

Here is how you should do it:

for machine in ARM, INTEL
Reboot machine
For i in (at least) 1-5:
Run test for 5 minutes

If the results from the first run on each machine is very different
from the other four runs, you can disrecard it, as a startup/bootup
artifact.

Report the numbers for all the runs for both machines.

Make a plot of all those numbers, where you plot the reported
average +/- stddev as a line, and the max value as a dot/cross/box.

If you want to get fancy, you can do a Student's T test to tell
you if there is any real difference. There's a program called
"ministat" which will do this for you.

Also: I can highly recommend this book:

http://www.larrygonick.com/titles/science/the-cartoon-guide-to-statistics/

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

martin.grigorov at gmail

Jul 29, 2020, 5:03 AM

Post #3 of 12 (1954 views)

Permalink

Hi Poul-Henning,

Thank you for your answer!

On Tue, Jul 28, 2020 at 5:01 PM Poul-Henning Kamp <phk@phk.freebsd.dk>
wrote:

> --------
> Martin Grigorov writes:
>
> > Any feedback and ideas how to tweak it (VCL or even patches) are very
> > welcome!
>
> First you need to tweak your benchmark setup.
>
> aarch64
>
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 655.40us 798.70us 28.43ms 90.52%
>
> Strictly speaking, you cannot rule out that the ARM machine
> sends responses before it receives the request, because your
> standard deviation is larger than your average.
>

Could you explain in what case(s) the server would send responses before
receiving a request ?
Do you think that there might be negative values for the latency of some
requests ?

>
> In other words: Those numbers tell us nothing.
>
> If you want to do this comparison, and I would love for you to do so,
> you really need to take the time it takes, and get your "noise" down.
>
> Here is how you should do it:
>
> for machine in ARM, INTEL
> Reboot machine
> For i in (at least) 1-5:
> Run test for 5 minutes
>
> If the results from the first run on each machine is very different
> from the other four runs, you can disrecard it, as a startup/bootup
> artifact.
>
> Report the numbers for all the runs for both machines.
>
> Make a plot of all those numbers, where you plot the reported
> average +/- stddev as a line, and the max value as a dot/cross/box.
>
> If you want to get fancy, you can do a Student's T test to tell
> you if there is any real difference. There's a program called
> "ministat" which will do this for you.
>

ministat looks cool! Thanks!
I think I can save the raw latencies for all requests into a file and feed
ministat with it!

Gil Tene also didn't like how wrk measures the latency and forked it to
https://github.com/giltene/wrk2. wrk2 measures the latency by using
constant rate/throughput, while wrk focuses on as high throughput as
possible and just reports the latency percentiles.
wrk2 also prints detailed latency distribution as at
https://github.com/giltene/wrk2#basic-usage (not as plot chart but still
useful).

The only problem is that wrk2 is not well maintained and it doesn't work on
modern aarch64 due to the old version of Lua. I'll try to upgrade it.

Regards,
Martin

> Also: I can highly recommend this book:
>
>
> http://www.larrygonick.com/titles/science/the-cartoon-guide-to-statistics/
>
> --
> Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG | TCP/IP since RFC 956
> FreeBSD committer | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
>

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

phk at phk

Jul 29, 2020, 5:11 AM

Post #4 of 12 (1954 views)

Permalink

--------
Martin Grigorov writes:

> > > Any feedback and ideas how to tweak it (VCL or even patches) are very
> > > welcome!
> >
> > First you need to tweak your benchmark setup.
> >
> > aarch64
> >
> > Thread Stats Avg Stdev Max +/- Stdev
> > Latency 655.40us 798.70us 28.43ms 90.52%
> >
> > Strictly speaking, you cannot rule out that the ARM machine
> > sends responses before it receives the request, because your
> > standard deviation is larger than your average.
> >
>
> Could you explain in what case(s) the server would send responses before
> receiving a request ?

It never would, that's the point!

Your measurement says that there is 2/3 chance that the latency
is between:

655.40µs - 798.70µs = -143.30µs

and
655.40µs + 798.70µs = 1454.10µs

You cannot conclude _anything_ from those numbers.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

martin.grigorov at gmail

Jul 29, 2020, 5:35 AM

Post #5 of 12 (1954 views)

Permalink

On Wed, Jul 29, 2020 at 3:11 PM Poul-Henning Kamp <phk@phk.freebsd.dk>
wrote:

> --------
> Martin Grigorov writes:
>
> > > > Any feedback and ideas how to tweak it (VCL or even patches) are very
> > > > welcome!
> > >
> > > First you need to tweak your benchmark setup.
> > >
> > > aarch64
> > >
> > > Thread Stats Avg Stdev Max +/- Stdev
> > > Latency 655.40us 798.70us 28.43ms 90.52%
> > >
> > > Strictly speaking, you cannot rule out that the ARM machine
> > > sends responses before it receives the request, because your
> > > standard deviation is larger than your average.
> > >
> >
> > Could you explain in what case(s) the server would send responses before
> > receiving a request ?
>
> It never would, that's the point!
>
> Your measurement says that there is 2/3 chance that the latency
> is between:
>
> 655.40µs - 798.70µs = -143.30µs
>
> and
> 655.40µs + 798.70µs = 1454.10µs
>
> You cannot conclude _anything_ from those numbers.
>

This now sounds like: if the latency stats are not correct then most
probably the throughput is also not correct!
I may switch to a different load client tool!

>
>
> --
> Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG | TCP/IP since RFC 956
> FreeBSD committer | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
>

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

geoff at uplex

Jul 30, 2020, 11:44 PM

Post #6 of 12 (1949 views)

Permalink

On 7/28/20 13:52, Martin Grigorov wrote:
>
> I've just posted an article [1] about comparing the performance of Varnish
> Cache on two similar
> machines - the main difference is the CPU architecture - x86_64 vs aarch64.
> It uses a specific use case - the backend service just returns a static
> content. The idea is
> to compare Varnish on the different architectures but also to compare
> Varnish against the backend HTTP server.
> What is interesting is that Varnish gives the same throughput as the
> backend server on x86_64 but on aarch64 it is around 30% slower than the
> backend.

Does your test have an account of whether there were any errors in
backend fetches? Don't know if that explains anything, but with a
connect timeout of 10s and first byte timeout of 5m, any error would
have a considerable effect on the results of a 30 second test.

The test tool output doesn't say anything I can see about error rates --
whether all responses had status 200, and if not, how many had which
other status. Ideally it should be all 200, otherwise the results may
not be valid.

I agree with phk that a statistical analysis is needed for a robust
statement about differences between the two platforms. For that, you'd
need more than the summary stats shown in your blog post -- you need to
collect all of the response times. What I usually do is query Varnish
client request logs for Timestamp:Resp and save the number in the last
column.

t.test() in R runs Student's t-test (me R fanboi).

HTH,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

hermunn at varnish-software

Jul 31, 2020, 6:43 AM

Post #7 of 12 (1949 views)

Permalink

I am sorry for being so late to the game, but here it goes:

ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk@phk.freebsd.dk>:
> Your measurement says that there is 2/3 chance that the latency
> is between:
>
> 655.40µs - 798.70µs = -143.30µs
>
> and
> 655.40µs + 798.70µs = 1454.10µs

No, it does not. There is no claim anywhere that the numbers are
following a normal distribution or an approximation of it. Of course,
the calculations you do demonstrate that the data is far from normally
distributed (as expected).

> You cannot conclude _anything_ from those numbers.

There are two numbers, the average and the standard deviation, and
they are calculated from the data, but the truth is hidden deeper in
the data. By looking at the particular numbers, I agree completely
that it is wrong to conclude that one is better than the other. I am
not saying that the statements in the article are false, just that you
do not have data to draw the conclusions.

Furthermore I have to say that Geoff got things right (see below). As
a mathematician, I have to say that statistics is hard, and trusting
the output of wrk to draw conclusions is outright the wrong thing to
do.

In this case we have a luxury which you typically do not have: Data is
essentially free. You can run many tests and you can run short or long
tests with different parameters. A 30 second test is simply not enough
for anything.

As Geoff indicated, for each transaction you can extract many relevant
values from varnishlog, with the status, hit/miss, time to first byte
and time to last byte being the most obvious ones. They can be
extracted and saved to a csv file by using varnishncsa with a custom
format string, and you can use R (used it myself as a tool in my
previous job - not a fan) to do statistical analysis on the data. The
Student T suggestion from Geoff is a good idea, but just looking at
one set of numbers without considering other factors is mathematically
problematic.

Anyway, some obvious questions then arise. For example:
- How do the numbers between wrk and varnishlog/varnishncsa compare?
Did wrk report a total number of transactions than varnish? If there
is a discrepancy, then the errors might be because of some resource
restraint (number of sockets or dropped syn packages?).
- How does the average and maximum compare between varnish and wrk?
- What is the CPU usage of the kernel, the benchmarking tool and the
varnish processes in the tests?
- What is the difference between the time to first byte and the time
to last byte in Varnish for different object sizes?

When Varnish writes to a socket, it hands bytes over to the kernel,
and when the write call returns, we do not know how far the bytes have
come, and how long it will take before they get to the final
destination. The bytes may be in a kernel buffer, they might be on the
network card, and they might be already received at the client's
kernel, and they might have made it all into wrk (which may or may not
have timestamped the response). Typically, depending on many things,
Varnish will report faster times than what wrk, but since returning
from the write call means that the calling thread must be rescheduled,
it is even possible that wrk will see that some requests are faster
than what Varnish reports. Running wrk2 with different speeds in a
series of tests seems natural to me, so that you can observe when (and
how) the system starts running into bottlenecks. Note that the
bottleneck can just as well be in wrk2 itself or on the combined CPU
usage of kernel + Varnish + wrk2.

To complicate things even further: On your ARM vs. x64 tests, my guess
is that both kernel parameters and parameters for the network are
different, and the distributions probably have good reason to choose
different values. It is very likely that these differences affect the
performance of the systems in many ways, and that different tests will
have different "optimal" tunings of kernel and network parameters.

Sorry for rambling, but getting the statistics wrong is so easy. The
question is very interesting, but if you want to draw conclusions, you
should do the analysis, and (ideally) give access to the raw data in
case anyone wants to have a look.

Best,
Pål

fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff@uplex.de>:
>
> On 7/28/20 13:52, Martin Grigorov wrote:
> >
> > I've just posted an article [1] about comparing the performance of Varnish
> > Cache on two similar
> > machines - the main difference is the CPU architecture - x86_64 vs aarch64.
> > It uses a specific use case - the backend service just returns a static
> > content. The idea is
> > to compare Varnish on the different architectures but also to compare
> > Varnish against the backend HTTP server.
> > What is interesting is that Varnish gives the same throughput as the
> > backend server on x86_64 but on aarch64 it is around 30% slower than the
> > backend.
>
> Does your test have an account of whether there were any errors in
> backend fetches? Don't know if that explains anything, but with a
> connect timeout of 10s and first byte timeout of 5m, any error would
> have a considerable effect on the results of a 30 second test.
>
> The test tool output doesn't say anything I can see about error rates --
> whether all responses had status 200, and if not, how many had which
> other status. Ideally it should be all 200, otherwise the results may
> not be valid.
>
> I agree with phk that a statistical analysis is needed for a robust
> statement about differences between the two platforms. For that, you'd
> need more than the summary stats shown in your blog post -- you need to
> collect all of the response times. What I usually do is query Varnish
> client request logs for Timestamp:Resp and save the number in the last
> column.
>
> t.test() in R runs Student's t-test (me R fanboi).
>
>
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

martin.grigorov at gmail

Aug 3, 2020, 8:14 AM

Post #8 of 12 (1947 views)

Permalink

Hi,

Thank you all for the feedback!
After some debugging it appeared that it is a bug in wrk - most of the
requests' latencies were 0 in the raw reports.

I've looked for a better maintained HTTP load testing tool and I liked
https://github.com/tsenart/vegeta. it provides (correctly looking)
statistics, can measure latencies while using constant rate, and last but
not least can produce plot charts!
I will update my article and let you know once I'm done!

Regards,
Martin

On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
hermunn@varnish-software.com> wrote:

> I am sorry for being so late to the game, but here it goes:
>
> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk@phk.freebsd.dk>:
> > Your measurement says that there is 2/3 chance that the latency
> > is between:
> >
> > 655.40µs - 798.70µs = -143.30µs
> >
> > and
> > 655.40µs + 798.70µs = 1454.10µs
>
> No, it does not. There is no claim anywhere that the numbers are
> following a normal distribution or an approximation of it. Of course,
> the calculations you do demonstrate that the data is far from normally
> distributed (as expected).
>
> > You cannot conclude _anything_ from those numbers.
>
> There are two numbers, the average and the standard deviation, and
> they are calculated from the data, but the truth is hidden deeper in
> the data. By looking at the particular numbers, I agree completely
> that it is wrong to conclude that one is better than the other. I am
> not saying that the statements in the article are false, just that you
> do not have data to draw the conclusions.
>
> Furthermore I have to say that Geoff got things right (see below). As
> a mathematician, I have to say that statistics is hard, and trusting
> the output of wrk to draw conclusions is outright the wrong thing to
> do.
>
> In this case we have a luxury which you typically do not have: Data is
> essentially free. You can run many tests and you can run short or long
> tests with different parameters. A 30 second test is simply not enough
> for anything.
>
> As Geoff indicated, for each transaction you can extract many relevant
> values from varnishlog, with the status, hit/miss, time to first byte
> and time to last byte being the most obvious ones. They can be
> extracted and saved to a csv file by using varnishncsa with a custom
> format string, and you can use R (used it myself as a tool in my
> previous job - not a fan) to do statistical analysis on the data. The
> Student T suggestion from Geoff is a good idea, but just looking at
> one set of numbers without considering other factors is mathematically
> problematic.
>
> Anyway, some obvious questions then arise. For example:
> - How do the numbers between wrk and varnishlog/varnishncsa compare?
> Did wrk report a total number of transactions than varnish? If there
> is a discrepancy, then the errors might be because of some resource
> restraint (number of sockets or dropped syn packages?).
> - How does the average and maximum compare between varnish and wrk?
> - What is the CPU usage of the kernel, the benchmarking tool and the
> varnish processes in the tests?
> - What is the difference between the time to first byte and the time
> to last byte in Varnish for different object sizes?
>
> When Varnish writes to a socket, it hands bytes over to the kernel,
> and when the write call returns, we do not know how far the bytes have
> come, and how long it will take before they get to the final
> destination. The bytes may be in a kernel buffer, they might be on the
> network card, and they might be already received at the client's
> kernel, and they might have made it all into wrk (which may or may not
> have timestamped the response). Typically, depending on many things,
> Varnish will report faster times than what wrk, but since returning
> from the write call means that the calling thread must be rescheduled,
> it is even possible that wrk will see that some requests are faster
> than what Varnish reports. Running wrk2 with different speeds in a
> series of tests seems natural to me, so that you can observe when (and
> how) the system starts running into bottlenecks. Note that the
> bottleneck can just as well be in wrk2 itself or on the combined CPU
> usage of kernel + Varnish + wrk2.
>
> To complicate things even further: On your ARM vs. x64 tests, my guess
> is that both kernel parameters and parameters for the network are
> different, and the distributions probably have good reason to choose
> different values. It is very likely that these differences affect the
> performance of the systems in many ways, and that different tests will
> have different "optimal" tunings of kernel and network parameters.
>
> Sorry for rambling, but getting the statistics wrong is so easy. The
> question is very interesting, but if you want to draw conclusions, you
> should do the analysis, and (ideally) give access to the raw data in
> case anyone wants to have a look.
>
> Best,
> Pål
>
> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff@uplex.de>:
> >
> > On 7/28/20 13:52, Martin Grigorov wrote:
> > >
> > > I've just posted an article [1] about comparing the performance of
> Varnish
> > > Cache on two similar
> > > machines - the main difference is the CPU architecture - x86_64 vs
> aarch64.
> > > It uses a specific use case - the backend service just returns a static
> > > content. The idea is
> > > to compare Varnish on the different architectures but also to compare
> > > Varnish against the backend HTTP server.
> > > What is interesting is that Varnish gives the same throughput as the
> > > backend server on x86_64 but on aarch64 it is around 30% slower than
> the
> > > backend.
> >
> > Does your test have an account of whether there were any errors in
> > backend fetches? Don't know if that explains anything, but with a
> > connect timeout of 10s and first byte timeout of 5m, any error would
> > have a considerable effect on the results of a 30 second test.
> >
> > The test tool output doesn't say anything I can see about error rates --
> > whether all responses had status 200, and if not, how many had which
> > other status. Ideally it should be all 200, otherwise the results may
> > not be valid.
> >
> > I agree with phk that a statistical analysis is needed for a robust
> > statement about differences between the two platforms. For that, you'd
> > need more than the summary stats shown in your blog post -- you need to
> > collect all of the response times. What I usually do is query Varnish
> > client request logs for Timestamp:Resp and save the number in the last
> > column.
> >
> > t.test() in R runs Student's t-test (me R fanboi).
> >
> >
>

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

martin.grigorov at gmail

Aug 4, 2020, 4:31 AM

Post #9 of 12 (1942 views)

Permalink

Hi,

I've updated the data in the article -
https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f
Now x86_64 and aarch64 are almost the same!
Varnish gives around 20% less throughput than the Golang HTTP server but I
guess this is because the Golang server is much simpler than Varnish.

3 min run produces around 3GB of Vegeta reports (130MB gzipped). If anyone
wants me to extract some extra data just let me know!

Regards,
Martin

On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov@gmail.com>
wrote:

> Hi,
>
> Thank you all for the feedback!
> After some debugging it appeared that it is a bug in wrk - most of the
> requests' latencies were 0 in the raw reports.
>
> I've looked for a better maintained HTTP load testing tool and I liked
> https://github.com/tsenart/vegeta. it provides (correctly looking)
> statistics, can measure latencies while using constant rate, and last but
> not least can produce plot charts!
> I will update my article and let you know once I'm done!
>
> Regards,
> Martin
>
> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
> hermunn@varnish-software.com> wrote:
>
>> I am sorry for being so late to the game, but here it goes:
>>
>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk@phk.freebsd.dk
>> >:
>> > Your measurement says that there is 2/3 chance that the latency
>> > is between:
>> >
>> > 655.40µs - 798.70µs = -143.30µs
>> >
>> > and
>> > 655.40µs + 798.70µs = 1454.10µs
>>
>> No, it does not. There is no claim anywhere that the numbers are
>> following a normal distribution or an approximation of it. Of course,
>> the calculations you do demonstrate that the data is far from normally
>> distributed (as expected).
>>
>> > You cannot conclude _anything_ from those numbers.
>>
>> There are two numbers, the average and the standard deviation, and
>> they are calculated from the data, but the truth is hidden deeper in
>> the data. By looking at the particular numbers, I agree completely
>> that it is wrong to conclude that one is better than the other. I am
>> not saying that the statements in the article are false, just that you
>> do not have data to draw the conclusions.
>>
>> Furthermore I have to say that Geoff got things right (see below). As
>> a mathematician, I have to say that statistics is hard, and trusting
>> the output of wrk to draw conclusions is outright the wrong thing to
>> do.
>>
>> In this case we have a luxury which you typically do not have: Data is
>> essentially free. You can run many tests and you can run short or long
>> tests with different parameters. A 30 second test is simply not enough
>> for anything.
>>
>> As Geoff indicated, for each transaction you can extract many relevant
>> values from varnishlog, with the status, hit/miss, time to first byte
>> and time to last byte being the most obvious ones. They can be
>> extracted and saved to a csv file by using varnishncsa with a custom
>> format string, and you can use R (used it myself as a tool in my
>> previous job - not a fan) to do statistical analysis on the data. The
>> Student T suggestion from Geoff is a good idea, but just looking at
>> one set of numbers without considering other factors is mathematically
>> problematic.
>>
>> Anyway, some obvious questions then arise. For example:
>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>> Did wrk report a total number of transactions than varnish? If there
>> is a discrepancy, then the errors might be because of some resource
>> restraint (number of sockets or dropped syn packages?).
>> - How does the average and maximum compare between varnish and wrk?
>> - What is the CPU usage of the kernel, the benchmarking tool and the
>> varnish processes in the tests?
>> - What is the difference between the time to first byte and the time
>> to last byte in Varnish for different object sizes?
>>
>> When Varnish writes to a socket, it hands bytes over to the kernel,
>> and when the write call returns, we do not know how far the bytes have
>> come, and how long it will take before they get to the final
>> destination. The bytes may be in a kernel buffer, they might be on the
>> network card, and they might be already received at the client's
>> kernel, and they might have made it all into wrk (which may or may not
>> have timestamped the response). Typically, depending on many things,
>> Varnish will report faster times than what wrk, but since returning
>> from the write call means that the calling thread must be rescheduled,
>> it is even possible that wrk will see that some requests are faster
>> than what Varnish reports. Running wrk2 with different speeds in a
>> series of tests seems natural to me, so that you can observe when (and
>> how) the system starts running into bottlenecks. Note that the
>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>> usage of kernel + Varnish + wrk2.
>>
>> To complicate things even further: On your ARM vs. x64 tests, my guess
>> is that both kernel parameters and parameters for the network are
>> different, and the distributions probably have good reason to choose
>> different values. It is very likely that these differences affect the
>> performance of the systems in many ways, and that different tests will
>> have different "optimal" tunings of kernel and network parameters.
>>
>> Sorry for rambling, but getting the statistics wrong is so easy. The
>> question is very interesting, but if you want to draw conclusions, you
>> should do the analysis, and (ideally) give access to the raw data in
>> case anyone wants to have a look.
>>
>> Best,
>> Pål
>>
>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff@uplex.de>:
>> >
>> > On 7/28/20 13:52, Martin Grigorov wrote:
>> > >
>> > > I've just posted an article [1] about comparing the performance of
>> Varnish
>> > > Cache on two similar
>> > > machines - the main difference is the CPU architecture - x86_64 vs
>> aarch64.
>> > > It uses a specific use case - the backend service just returns a
>> static
>> > > content. The idea is
>> > > to compare Varnish on the different architectures but also to compare
>> > > Varnish against the backend HTTP server.
>> > > What is interesting is that Varnish gives the same throughput as the
>> > > backend server on x86_64 but on aarch64 it is around 30% slower than
>> the
>> > > backend.
>> >
>> > Does your test have an account of whether there were any errors in
>> > backend fetches? Don't know if that explains anything, but with a
>> > connect timeout of 10s and first byte timeout of 5m, any error would
>> > have a considerable effect on the results of a 30 second test.
>> >
>> > The test tool output doesn't say anything I can see about error rates --
>> > whether all responses had status 200, and if not, how many had which
>> > other status. Ideally it should be all 200, otherwise the results may
>> > not be valid.
>> >
>> > I agree with phk that a statistical analysis is needed for a robust
>> > statement about differences between the two platforms. For that, you'd
>> > need more than the summary stats shown in your blog post -- you need to
>> > collect all of the response times. What I usually do is query Varnish
>> > client request logs for Timestamp:Resp and save the number in the last
>> > column.
>> >
>> > t.test() in R runs Student's t-test (me R fanboi).
>> >
>> >
>>
>

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

guillaume at varnish-software

Aug 4, 2020, 7:46 AM

Post #10 of 12 (1942 views)

Permalink

Hi,

> Varnish gives around 20% less throughput than the Golang HTTP server but
I guess this is because the Golang server is much simpler than Varnish.

Since the backend and vegeta are written in go, it's pretty safe they are
going to use H/2 by default, and that's not the case for your varnish
instance, so that possibly explain some of the differences you are seeing.

Cheers,

--
Guillaume Quintard

On Tue, Aug 4, 2020 at 4:33 AM Martin Grigorov <martin.grigorov@gmail.com>
wrote:

> Hi,
>
> I've updated the data in the article -
> https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f
> Now x86_64 and aarch64 are almost the same!
> Varnish gives around 20% less throughput than the Golang HTTP server but I
> guess this is because the Golang server is much simpler than Varnish.
>
> 3 min run produces around 3GB of Vegeta reports (130MB gzipped). If anyone
> wants me to extract some extra data just let me know!
>
> Regards,
> Martin
>
> On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov@gmail.com>
> wrote:
>
>> Hi,
>>
>> Thank you all for the feedback!
>> After some debugging it appeared that it is a bug in wrk - most of the
>> requests' latencies were 0 in the raw reports.
>>
>> I've looked for a better maintained HTTP load testing tool and I liked
>> https://github.com/tsenart/vegeta. it provides (correctly looking)
>> statistics, can measure latencies while using constant rate, and last but
>> not least can produce plot charts!
>> I will update my article and let you know once I'm done!
>>
>> Regards,
>> Martin
>>
>> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
>> hermunn@varnish-software.com> wrote:
>>
>>> I am sorry for being so late to the game, but here it goes:
>>>
>>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk@phk.freebsd.dk
>>> >:
>>> > Your measurement says that there is 2/3 chance that the latency
>>> > is between:
>>> >
>>> > 655.40µs - 798.70µs = -143.30µs
>>> >
>>> > and
>>> > 655.40µs + 798.70µs = 1454.10µs
>>>
>>> No, it does not. There is no claim anywhere that the numbers are
>>> following a normal distribution or an approximation of it. Of course,
>>> the calculations you do demonstrate that the data is far from normally
>>> distributed (as expected).
>>>
>>> > You cannot conclude _anything_ from those numbers.
>>>
>>> There are two numbers, the average and the standard deviation, and
>>> they are calculated from the data, but the truth is hidden deeper in
>>> the data. By looking at the particular numbers, I agree completely
>>> that it is wrong to conclude that one is better than the other. I am
>>> not saying that the statements in the article are false, just that you
>>> do not have data to draw the conclusions.
>>>
>>> Furthermore I have to say that Geoff got things right (see below). As
>>> a mathematician, I have to say that statistics is hard, and trusting
>>> the output of wrk to draw conclusions is outright the wrong thing to
>>> do.
>>>
>>> In this case we have a luxury which you typically do not have: Data is
>>> essentially free. You can run many tests and you can run short or long
>>> tests with different parameters. A 30 second test is simply not enough
>>> for anything.
>>>
>>> As Geoff indicated, for each transaction you can extract many relevant
>>> values from varnishlog, with the status, hit/miss, time to first byte
>>> and time to last byte being the most obvious ones. They can be
>>> extracted and saved to a csv file by using varnishncsa with a custom
>>> format string, and you can use R (used it myself as a tool in my
>>> previous job - not a fan) to do statistical analysis on the data. The
>>> Student T suggestion from Geoff is a good idea, but just looking at
>>> one set of numbers without considering other factors is mathematically
>>> problematic.
>>>
>>> Anyway, some obvious questions then arise. For example:
>>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>>> Did wrk report a total number of transactions than varnish? If there
>>> is a discrepancy, then the errors might be because of some resource
>>> restraint (number of sockets or dropped syn packages?).
>>> - How does the average and maximum compare between varnish and wrk?
>>> - What is the CPU usage of the kernel, the benchmarking tool and the
>>> varnish processes in the tests?
>>> - What is the difference between the time to first byte and the time
>>> to last byte in Varnish for different object sizes?
>>>
>>> When Varnish writes to a socket, it hands bytes over to the kernel,
>>> and when the write call returns, we do not know how far the bytes have
>>> come, and how long it will take before they get to the final
>>> destination. The bytes may be in a kernel buffer, they might be on the
>>> network card, and they might be already received at the client's
>>> kernel, and they might have made it all into wrk (which may or may not
>>> have timestamped the response). Typically, depending on many things,
>>> Varnish will report faster times than what wrk, but since returning
>>> from the write call means that the calling thread must be rescheduled,
>>> it is even possible that wrk will see that some requests are faster
>>> than what Varnish reports. Running wrk2 with different speeds in a
>>> series of tests seems natural to me, so that you can observe when (and
>>> how) the system starts running into bottlenecks. Note that the
>>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>>> usage of kernel + Varnish + wrk2.
>>>
>>> To complicate things even further: On your ARM vs. x64 tests, my guess
>>> is that both kernel parameters and parameters for the network are
>>> different, and the distributions probably have good reason to choose
>>> different values. It is very likely that these differences affect the
>>> performance of the systems in many ways, and that different tests will
>>> have different "optimal" tunings of kernel and network parameters.
>>>
>>> Sorry for rambling, but getting the statistics wrong is so easy. The
>>> question is very interesting, but if you want to draw conclusions, you
>>> should do the analysis, and (ideally) give access to the raw data in
>>> case anyone wants to have a look.
>>>
>>> Best,
>>> Pål
>>>
>>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff@uplex.de>:
>>> >
>>> > On 7/28/20 13:52, Martin Grigorov wrote:
>>> > >
>>> > > I've just posted an article [1] about comparing the performance of
>>> Varnish
>>> > > Cache on two similar
>>> > > machines - the main difference is the CPU architecture - x86_64 vs
>>> aarch64.
>>> > > It uses a specific use case - the backend service just returns a
>>> static
>>> > > content. The idea is
>>> > > to compare Varnish on the different architectures but also to compare
>>> > > Varnish against the backend HTTP server.
>>> > > What is interesting is that Varnish gives the same throughput as the
>>> > > backend server on x86_64 but on aarch64 it is around 30% slower than
>>> the
>>> > > backend.
>>> >
>>> > Does your test have an account of whether there were any errors in
>>> > backend fetches? Don't know if that explains anything, but with a
>>> > connect timeout of 10s and first byte timeout of 5m, any error would
>>> > have a considerable effect on the results of a 30 second test.
>>> >
>>> > The test tool output doesn't say anything I can see about error rates
>>> --
>>> > whether all responses had status 200, and if not, how many had which
>>> > other status. Ideally it should be all 200, otherwise the results may
>>> > not be valid.
>>> >
>>> > I agree with phk that a statistical analysis is needed for a robust
>>> > statement about differences between the two platforms. For that, you'd
>>> > need more than the summary stats shown in your blog post -- you need to
>>> > collect all of the response times. What I usually do is query Varnish
>>> > client request logs for Timestamp:Resp and save the number in the last
>>> > column.
>>> >
>>> > t.test() in R runs Student's t-test (me R fanboi).
>>> >
>>> >
>>>
>> _______________________________________________
> varnish-dev mailing list
> varnish-dev@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

martin.grigorov at gmail

Aug 5, 2020, 2:17 AM

Post #11 of 12 (1942 views)

Permalink

Hi Guillaume,

On Tue, Aug 4, 2020 at 5:47 PM Guillaume Quintard <
guillaume@varnish-software.com> wrote:

> Hi,
>
> > Varnish gives around 20% less throughput than the Golang HTTP server but
> I guess this is because the Golang server is much simpler than Varnish.
>
> Since the backend and vegeta are written in go, it's pretty safe they are
> going to use H/2 by default, and that's not the case for your varnish
> instance, so that possibly explain some of the differences you are seeing.
>

To use H/2 one has to use -http2 parameter (
https://github.com/tsenart/vegeta#-http2)
In addition I'd need to start the HTTP server with
svr.ListenAndServeTLS(cert, key)
I've added "log.Printf("Protocol: %s", r.Proto)" to the handle function and
it prints "HTTP/1.1" no matter whether I use -http2 parameter for Vegeta or
not

>
> Cheers,
>
> --
> Guillaume Quintard
>
>
> On Tue, Aug 4, 2020 at 4:33 AM Martin Grigorov <martin.grigorov@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've updated the data in the article -
>> https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f
>> Now x86_64 and aarch64 are almost the same!
>> Varnish gives around 20% less throughput than the Golang HTTP server but
>> I guess this is because the Golang server is much simpler than Varnish.
>>
>> 3 min run produces around 3GB of Vegeta reports (130MB gzipped). If
>> anyone wants me to extract some extra data just let me know!
>>
>> Regards,
>> Martin
>>
>> On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thank you all for the feedback!
>>> After some debugging it appeared that it is a bug in wrk - most of the
>>> requests' latencies were 0 in the raw reports.
>>>
>>> I've looked for a better maintained HTTP load testing tool and I liked
>>> https://github.com/tsenart/vegeta. it provides (correctly looking)
>>> statistics, can measure latencies while using constant rate, and last but
>>> not least can produce plot charts!
>>> I will update my article and let you know once I'm done!
>>>
>>> Regards,
>>> Martin
>>>
>>> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
>>> hermunn@varnish-software.com> wrote:
>>>
>>>> I am sorry for being so late to the game, but here it goes:
>>>>
>>>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <
>>>> phk@phk.freebsd.dk>:
>>>> > Your measurement says that there is 2/3 chance that the latency
>>>> > is between:
>>>> >
>>>> > 655.40µs - 798.70µs = -143.30µs
>>>> >
>>>> > and
>>>> > 655.40µs + 798.70µs = 1454.10µs
>>>>
>>>> No, it does not. There is no claim anywhere that the numbers are
>>>> following a normal distribution or an approximation of it. Of course,
>>>> the calculations you do demonstrate that the data is far from normally
>>>> distributed (as expected).
>>>>
>>>> > You cannot conclude _anything_ from those numbers.
>>>>
>>>> There are two numbers, the average and the standard deviation, and
>>>> they are calculated from the data, but the truth is hidden deeper in
>>>> the data. By looking at the particular numbers, I agree completely
>>>> that it is wrong to conclude that one is better than the other. I am
>>>> not saying that the statements in the article are false, just that you
>>>> do not have data to draw the conclusions.
>>>>
>>>> Furthermore I have to say that Geoff got things right (see below). As
>>>> a mathematician, I have to say that statistics is hard, and trusting
>>>> the output of wrk to draw conclusions is outright the wrong thing to
>>>> do.
>>>>
>>>> In this case we have a luxury which you typically do not have: Data is
>>>> essentially free. You can run many tests and you can run short or long
>>>> tests with different parameters. A 30 second test is simply not enough
>>>> for anything.
>>>>
>>>> As Geoff indicated, for each transaction you can extract many relevant
>>>> values from varnishlog, with the status, hit/miss, time to first byte
>>>> and time to last byte being the most obvious ones. They can be
>>>> extracted and saved to a csv file by using varnishncsa with a custom
>>>> format string, and you can use R (used it myself as a tool in my
>>>> previous job - not a fan) to do statistical analysis on the data. The
>>>> Student T suggestion from Geoff is a good idea, but just looking at
>>>> one set of numbers without considering other factors is mathematically
>>>> problematic.
>>>>
>>>> Anyway, some obvious questions then arise. For example:
>>>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>>>> Did wrk report a total number of transactions than varnish? If there
>>>> is a discrepancy, then the errors might be because of some resource
>>>> restraint (number of sockets or dropped syn packages?).
>>>> - How does the average and maximum compare between varnish and wrk?
>>>> - What is the CPU usage of the kernel, the benchmarking tool and the
>>>> varnish processes in the tests?
>>>> - What is the difference between the time to first byte and the time
>>>> to last byte in Varnish for different object sizes?
>>>>
>>>> When Varnish writes to a socket, it hands bytes over to the kernel,
>>>> and when the write call returns, we do not know how far the bytes have
>>>> come, and how long it will take before they get to the final
>>>> destination. The bytes may be in a kernel buffer, they might be on the
>>>> network card, and they might be already received at the client's
>>>> kernel, and they might have made it all into wrk (which may or may not
>>>> have timestamped the response). Typically, depending on many things,
>>>> Varnish will report faster times than what wrk, but since returning
>>>> from the write call means that the calling thread must be rescheduled,
>>>> it is even possible that wrk will see that some requests are faster
>>>> than what Varnish reports. Running wrk2 with different speeds in a
>>>> series of tests seems natural to me, so that you can observe when (and
>>>> how) the system starts running into bottlenecks. Note that the
>>>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>>>> usage of kernel + Varnish + wrk2.
>>>>
>>>> To complicate things even further: On your ARM vs. x64 tests, my guess
>>>> is that both kernel parameters and parameters for the network are
>>>> different, and the distributions probably have good reason to choose
>>>> different values. It is very likely that these differences affect the
>>>> performance of the systems in many ways, and that different tests will
>>>> have different "optimal" tunings of kernel and network parameters.
>>>>
>>>> Sorry for rambling, but getting the statistics wrong is so easy. The
>>>> question is very interesting, but if you want to draw conclusions, you
>>>> should do the analysis, and (ideally) give access to the raw data in
>>>> case anyone wants to have a look.
>>>>
>>>> Best,
>>>> Pål
>>>>
>>>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff@uplex.de>:
>>>> >
>>>> > On 7/28/20 13:52, Martin Grigorov wrote:
>>>> > >
>>>> > > I've just posted an article [1] about comparing the performance of
>>>> Varnish
>>>> > > Cache on two similar
>>>> > > machines - the main difference is the CPU architecture - x86_64 vs
>>>> aarch64.
>>>> > > It uses a specific use case - the backend service just returns a
>>>> static
>>>> > > content. The idea is
>>>> > > to compare Varnish on the different architectures but also to
>>>> compare
>>>> > > Varnish against the backend HTTP server.
>>>> > > What is interesting is that Varnish gives the same throughput as the
>>>> > > backend server on x86_64 but on aarch64 it is around 30% slower
>>>> than the
>>>> > > backend.
>>>> >
>>>> > Does your test have an account of whether there were any errors in
>>>> > backend fetches? Don't know if that explains anything, but with a
>>>> > connect timeout of 10s and first byte timeout of 5m, any error would
>>>> > have a considerable effect on the results of a 30 second test.
>>>> >
>>>> > The test tool output doesn't say anything I can see about error rates
>>>> --
>>>> > whether all responses had status 200, and if not, how many had which
>>>> > other status. Ideally it should be all 200, otherwise the results may
>>>> > not be valid.
>>>> >
>>>> > I agree with phk that a statistical analysis is needed for a robust
>>>> > statement about differences between the two platforms. For that, you'd
>>>> > need more than the summary stats shown in your blog post -- you need
>>>> to
>>>> > collect all of the response times. What I usually do is query Varnish
>>>> > client request logs for Timestamp:Resp and save the number in the last
>>>> > column.
>>>> >
>>>> > t.test() in R runs Student's t-test (me R fanboi).
>>>> >
>>>> >
>>>>
>>> _______________________________________________
>> varnish-dev mailing list
>> varnish-dev@varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>>
>

Re: Compared performance of Varnish Cache on x86_64 and aarch64 [ In reply to ]

guillaume at varnish-software

Aug 5, 2020, 6:52 AM

Post #12 of 12 (1942 views)

Permalink

I stand corrected, good to see it's not an issue.

On Wed, Aug 5, 2020, 02:17 Martin Grigorov <martin.grigorov@gmail.com>
wrote:

> Hi Guillaume,
>
> On Tue, Aug 4, 2020 at 5:47 PM Guillaume Quintard <
> guillaume@varnish-software.com> wrote:
>
>> Hi,
>>
>> > Varnish gives around 20% less throughput than the Golang HTTP server
>> but I guess this is because the Golang server is much simpler than Varnish.
>>
>> Since the backend and vegeta are written in go, it's pretty safe they are
>> going to use H/2 by default, and that's not the case for your varnish
>> instance, so that possibly explain some of the differences you are seeing.
>>
>
> To use H/2 one has to use -http2 parameter (
> https://github.com/tsenart/vegeta#-http2)
> In addition I'd need to start the HTTP server with
> svr.ListenAndServeTLS(cert, key)
> I've added "log.Printf("Protocol: %s", r.Proto)" to the handle function
> and it prints "HTTP/1.1" no matter whether I use -http2 parameter for
> Vegeta or not
>
>
>>
>> Cheers,
>>
>> --
>> Guillaume Quintard
>>
>>
>> On Tue, Aug 4, 2020 at 4:33 AM Martin Grigorov <martin.grigorov@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've updated the data in the article -
>>> https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f
>>> Now x86_64 and aarch64 are almost the same!
>>> Varnish gives around 20% less throughput than the Golang HTTP server but
>>> I guess this is because the Golang server is much simpler than Varnish.
>>>
>>> 3 min run produces around 3GB of Vegeta reports (130MB gzipped). If
>>> anyone wants me to extract some extra data just let me know!
>>>
>>> Regards,
>>> Martin
>>>
>>> On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <
>>> martin.grigorov@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thank you all for the feedback!
>>>> After some debugging it appeared that it is a bug in wrk - most of the
>>>> requests' latencies were 0 in the raw reports.
>>>>
>>>> I've looked for a better maintained HTTP load testing tool and I liked
>>>> https://github.com/tsenart/vegeta. it provides (correctly looking)
>>>> statistics, can measure latencies while using constant rate, and last but
>>>> not least can produce plot charts!
>>>> I will update my article and let you know once I'm done!
>>>>
>>>> Regards,
>>>> Martin
>>>>
>>>> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
>>>> hermunn@varnish-software.com> wrote:
>>>>
>>>>> I am sorry for being so late to the game, but here it goes:
>>>>>
>>>>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <
>>>>> phk@phk.freebsd.dk>:
>>>>> > Your measurement says that there is 2/3 chance that the latency
>>>>> > is between:
>>>>> >
>>>>> > 655.40µs - 798.70µs = -143.30µs
>>>>> >
>>>>> > and
>>>>> > 655.40µs + 798.70µs = 1454.10µs
>>>>>
>>>>> No, it does not. There is no claim anywhere that the numbers are
>>>>> following a normal distribution or an approximation of it. Of course,
>>>>> the calculations you do demonstrate that the data is far from normally
>>>>> distributed (as expected).
>>>>>
>>>>> > You cannot conclude _anything_ from those numbers.
>>>>>
>>>>> There are two numbers, the average and the standard deviation, and
>>>>> they are calculated from the data, but the truth is hidden deeper in
>>>>> the data. By looking at the particular numbers, I agree completely
>>>>> that it is wrong to conclude that one is better than the other. I am
>>>>> not saying that the statements in the article are false, just that you
>>>>> do not have data to draw the conclusions.
>>>>>
>>>>> Furthermore I have to say that Geoff got things right (see below). As
>>>>> a mathematician, I have to say that statistics is hard, and trusting
>>>>> the output of wrk to draw conclusions is outright the wrong thing to
>>>>> do.
>>>>>
>>>>> In this case we have a luxury which you typically do not have: Data is
>>>>> essentially free. You can run many tests and you can run short or long
>>>>> tests with different parameters. A 30 second test is simply not enough
>>>>> for anything.
>>>>>
>>>>> As Geoff indicated, for each transaction you can extract many relevant
>>>>> values from varnishlog, with the status, hit/miss, time to first byte
>>>>> and time to last byte being the most obvious ones. They can be
>>>>> extracted and saved to a csv file by using varnishncsa with a custom
>>>>> format string, and you can use R (used it myself as a tool in my
>>>>> previous job - not a fan) to do statistical analysis on the data. The
>>>>> Student T suggestion from Geoff is a good idea, but just looking at
>>>>> one set of numbers without considering other factors is mathematically
>>>>> problematic.
>>>>>
>>>>> Anyway, some obvious questions then arise. For example:
>>>>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>>>>> Did wrk report a total number of transactions than varnish? If there
>>>>> is a discrepancy, then the errors might be because of some resource
>>>>> restraint (number of sockets or dropped syn packages?).
>>>>> - How does the average and maximum compare between varnish and wrk?
>>>>> - What is the CPU usage of the kernel, the benchmarking tool and the
>>>>> varnish processes in the tests?
>>>>> - What is the difference between the time to first byte and the time
>>>>> to last byte in Varnish for different object sizes?
>>>>>
>>>>> When Varnish writes to a socket, it hands bytes over to the kernel,
>>>>> and when the write call returns, we do not know how far the bytes have
>>>>> come, and how long it will take before they get to the final
>>>>> destination. The bytes may be in a kernel buffer, they might be on the
>>>>> network card, and they might be already received at the client's
>>>>> kernel, and they might have made it all into wrk (which may or may not
>>>>> have timestamped the response). Typically, depending on many things,
>>>>> Varnish will report faster times than what wrk, but since returning
>>>>> from the write call means that the calling thread must be rescheduled,
>>>>> it is even possible that wrk will see that some requests are faster
>>>>> than what Varnish reports. Running wrk2 with different speeds in a
>>>>> series of tests seems natural to me, so that you can observe when (and
>>>>> how) the system starts running into bottlenecks. Note that the
>>>>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>>>>> usage of kernel + Varnish + wrk2.
>>>>>
>>>>> To complicate things even further: On your ARM vs. x64 tests, my guess
>>>>> is that both kernel parameters and parameters for the network are
>>>>> different, and the distributions probably have good reason to choose
>>>>> different values. It is very likely that these differences affect the
>>>>> performance of the systems in many ways, and that different tests will
>>>>> have different "optimal" tunings of kernel and network parameters.
>>>>>
>>>>> Sorry for rambling, but getting the statistics wrong is so easy. The
>>>>> question is very interesting, but if you want to draw conclusions, you
>>>>> should do the analysis, and (ideally) give access to the raw data in
>>>>> case anyone wants to have a look.
>>>>>
>>>>> Best,
>>>>> Pål
>>>>>
>>>>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff@uplex.de>:
>>>>> >
>>>>> > On 7/28/20 13:52, Martin Grigorov wrote:
>>>>> > >
>>>>> > > I've just posted an article [1] about comparing the performance of
>>>>> Varnish
>>>>> > > Cache on two similar
>>>>> > > machines - the main difference is the CPU architecture - x86_64 vs
>>>>> aarch64.
>>>>> > > It uses a specific use case - the backend service just returns a
>>>>> static
>>>>> > > content. The idea is
>>>>> > > to compare Varnish on the different architectures but also to
>>>>> compare
>>>>> > > Varnish against the backend HTTP server.
>>>>> > > What is interesting is that Varnish gives the same throughput as
>>>>> the
>>>>> > > backend server on x86_64 but on aarch64 it is around 30% slower
>>>>> than the
>>>>> > > backend.
>>>>> >
>>>>> > Does your test have an account of whether there were any errors in
>>>>> > backend fetches? Don't know if that explains anything, but with a
>>>>> > connect timeout of 10s and first byte timeout of 5m, any error would
>>>>> > have a considerable effect on the results of a 30 second test.
>>>>> >
>>>>> > The test tool output doesn't say anything I can see about error
>>>>> rates --
>>>>> > whether all responses had status 200, and if not, how many had which
>>>>> > other status. Ideally it should be all 200, otherwise the results may
>>>>> > not be valid.
>>>>> >
>>>>> > I agree with phk that a statistical analysis is needed for a robust
>>>>> > statement about differences between the two platforms. For that,
>>>>> you'd
>>>>> > need more than the summary stats shown in your blog post -- you need
>>>>> to
>>>>> > collect all of the response times. What I usually do is query Varnish
>>>>> > client request logs for Timestamp:Resp and save the number in the
>>>>> last
>>>>> > column.
>>>>> >
>>>>> > t.test() in R runs Student's t-test (me R fanboi).
>>>>> >
>>>>> >
>>>>>
>>>> _______________________________________________
>>> varnish-dev mailing list
>>> varnish-dev@varnish-cache.org
>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>>>
>>

Mailing List Archive

Mailing List Archive

Attached Files: