Mailing List Archive

[OT] Re: handler timeout
Hi.

This is a bit of a different discussion, which is why I am now marking it OT.

I was quite impressed by the numbers you list below (and I still am, nevertheless).
So my first impression was that I am totally incompetent to comment further, because I
have never dealt with such numbers, and I have no idea of the kind of hardware/software
architecture which is needed to deal with that kind of thing.

But then I made some calculations, based on the number of servers which you mentioned
earlier, and that leads - at least on the surface - to a quite different view :

100 servers * 24 cores = 2400 cores
42,368,982 requests per day
On average thus :
= 423,689 per server / day
= 17,653 per core / day
= 735 per core / hour
= 12 per core / minute
= 5 s / request

(and that does not look so exceptional anymore - except for your servers budget)

Of course, I am sure that the kind of averages which I calculate above are very rough,
that the load on your systems is probably not evenly spread, that not all the time of
those 2400 cores is dedicated to this application, and so on.

But as a starting point, it provides at least one observation :

You have 42,368,982 requests per day, of which 5619 fail with a timeout, and that is
0,013% of the total.
And we know that the timeout of a HTTP client is normally in the range of 3-5 minutes.
Yet we see above that, as a very rough average, the mean time for a request is in the
order of 5 seconds.

So at least on the surface, it would look like the requests that fail, take at least
approximately 36 times longer (3 min * 60 = 180 s, divided by 5) than the average.

So are these 0,013% of requests really exceptional in how long they take to complete ?
And if yes, do you know why ?

Do you have any statistics which classify the requests in terms of how long they take ?
Like :
- between 1 and 5 seconds : n1
- between 6 and 15 seconds : n2
...
- more than nnn seconds : nx (subject to client timeout, so error in the log)



On 28.03.2018 13:13, PANG J. wrote:
> As shown below,
>
>
> Last day total requests are 42,368,982, not all are successful, but 42,362,363 are right.
>
> The failed requests are timeout.
>
> Thanks.
>
>
>
> On 2018/3/28 ??? PM 6:37, André Warnier (tomcat) wrote:
>> On 28.03.2018 12:31, PANG J. wrote:
>>> what the client I meant is mobile App.
>>> mobile App gets the result from server via SDK.
>>
>> Ok. But it is very likely that your "mobile app SDK", also has a timeout after it sends
>> a request to a server. Or are you /sure/ that it waits forever ?
>> /Precisely what/ makes you think that it is a server-side timeout ?
>>
>>> in future we may move the computing task into App itself.
>>> But currently they are running on server side.
>>>
>>> thanks.
>>>
>>> On 2018/3/28 ??? PM 6:11, André Warnier (tomcat) wrote:
>>>> I believe that the timeout which Pang J. is mentioning, may be the browser-side timeout,
>>>> which is fixed at the browser level at about 5 minutes or so.
>>>> When a browser sends a request to a server, and it does receive /some/ response within
>>>> the next +-5 minutes, then the browser will drop the connection to the server, and pop
>>>> up a message saying "sorry, the server appears not to respond.."
>>>> In other words, it is not a server timeout, it is a client timeout.
>>>> The only way to avoid this, is to insure that the server sends at least /some/ temporary
>>>> response to the client (*), regularly, so that this browser timeout does not occur.
>>>> Unfortunately, that is a bit more complicated to set up, than just some parameter
>>>> somewhere.
>>>> But there must be plenty of past discussions of this issue already on the www, and
>>>> solution guidelines.
>>
>
Re: [OT] Re: handler timeout [ In reply to ]
Thanks Andre for so much info.

Yes we do have statistics for the requests in terms of how long they take:

As you see, 99.996% are less than 100ms, 0.002% are between 100ms and
200ms, another 0.002% are greater than 200ms. Max QPS is 2314.

You just remind me that timeout is may due to network latency. because
mobile App may have a worse network when they access to the system. We
will take Continue investigation on this issue.

Thanks again.


On 2018/3/28 ??? PM 9:18, André Warnier (tomcat) wrote:
> Hi.
>
> This is a bit of a different discussion, which is why I am now marking
> it OT.
>
> I was quite impressed by the numbers you list below (and I still am,
> nevertheless).
> So my first impression was that I am totally incompetent to comment
> further, because I have never dealt with such numbers, and I have no
> idea of the kind of hardware/software architecture which is needed to
> deal with that kind of thing.
>
> But then I made some calculations, based on the number of servers
> which you mentioned earlier, and that leads - at least on the surface
> - to a quite different view :
>
> 100 servers * 24 cores = 2400 cores
> 42,368,982 requests per day
> On average thus :
> = 423,689  per server / day
> =  17,653  per core / day
> =     735  per core / hour
> =      12  per core / minute
> =       5 s / request
>
> (and that does not look so exceptional anymore - except for your
> servers budget)
>
> Of course, I am sure that the kind of averages which I calculate above
> are very rough, that the load on your systems is probably not evenly
> spread, that not all the time of those 2400 cores is dedicated to this
> application, and so on.
>
> But as a starting point, it provides at least one observation :
>
> You have 42,368,982 requests per day, of which 5619 fail with a
> timeout, and that is 0,013% of the total.
> And we know that the timeout of a HTTP client is normally in the range
> of 3-5 minutes.
> Yet we see above that, as a very rough average, the mean time for a
> request is in the order of 5 seconds.
>
> So at least on the surface, it would look like the requests that fail,
> take at least approximately 36 times longer (3 min * 60 = 180 s,
> divided by 5) than the average.
>
> So are these 0,013% of requests really exceptional in how long they
> take to complete ?
> And if yes, do you know why ?
>
> Do you have any statistics which classify the requests in terms of how
> long they take ?
> Like :
> - between 1 and 5 seconds : n1
> - between 6 and 15 seconds : n2
> ...
> - more than nnn seconds : nx (subject to client timeout, so error in
> the log)
>
>
>
> On 28.03.2018 13:13, PANG J. wrote:
>> As shown below,
>>
>>
>> Last day total requests are 42,368,982,  not all are successful, but
>> 42,362,363 are right.
>>
>> The failed requests are timeout.
>>
>> Thanks.
>>
>>
>>
>> On 2018/3/28 ??? PM 6:37, André Warnier (tomcat) wrote:
>>> On 28.03.2018 12:31, PANG J. wrote:
>>>> what the client I meant is mobile App.
>>>> mobile App gets the result from server via SDK.
>>>
>>> Ok. But it is very likely that your "mobile app SDK", also has a
>>> timeout after it sends
>>> a request to a server. Or are you /sure/ that it waits forever ?
>>> /Precisely what/ makes you think that it is a server-side timeout ?
>>>
>>>> in future we may move the computing task into App itself.
>>>> But currently they are running on server side.
>>>>
>>>> thanks.
>>>>
>>>> On 2018/3/28 ??? PM 6:11, André Warnier (tomcat) wrote:
>>>>> I believe that the timeout which Pang J. is mentioning, may be the
>>>>> browser-side timeout,
>>>>> which is fixed at the browser level at about 5 minutes or so.
>>>>> When a browser sends a request to a server, and it does receive
>>>>> /some/ response within
>>>>> the next +-5 minutes, then the browser will drop the connection to
>>>>> the server, and pop
>>>>> up a message saying "sorry, the server appears not to respond.."
>>>>> In other words, it is not a server timeout, it is a client timeout.
>>>>> The only way to avoid this, is to insure that the server sends at
>>>>> least /some/ temporary
>>>>> response to the client (*), regularly, so that this browser
>>>>> timeout does not occur.
>>>>> Unfortunately, that is a bit more complicated to set up, than just
>>>>> some parameter
>>>>> somewhere.
>>>>> But there must be plenty of past discussions of this issue already
>>>>> on the www, and
>>>>> solution guidelines.
>>>
>>
>