Mailing List Archive: VIP23 (VSL refactoring) design sketch

VIP23 (VSL refactoring) design sketch

Apr 9, 2019, 10:38 AM

Post #1 of 25 (4439 views)

Hello all,

UPLEX may get support to develop a PR as a first step to implementing VIP23:

https://github.com/varnishcache/varnish-cache/wiki/VIP-23:-Refactor-VSL-to-support-extracting-structured-data-from-%22binary%22-log-payloads

Whether we get the support is yet to be finalized, but if so we'll give
it high priority. phk asked me to post a design sketch in advance before
we put in the major time and effort, so that others can comment, and
warn us off we're going in the wrong direction.

Please look at VIP23 in the wiki along with this mail, I won't repeat
everything here.

The goals are:

- The log does not contain ASCII-formatted data for all fields, but some
"binary" data, especially for numeric fields. Data that are naturally
strings, such as header values, of course remain as strings.

- That means that varnishd writes binary fields into the log.

- VSL clients read the log and format the output.

- Outputs for structured, typed data, in particular JSON from varnishlog
-j, fall out easily, since the data is already structured and typed
(don't need to parse strings in the log).

I think it's important that the PR demonstrates how we can make this
change incrementally, without a "monster commit" that changes everything
at once. So the PR will be the first iteration in the transition:

- The PR implements the structures and functions underlying the new VSL
concept.

- Some, but not all of the SLT tags are handled with the new design.

- SLT tags that are left unchanged in the first iteration have as
metadata "one ASCII field", i.e. one field of type ASCIIZ (for
null-terminated strings), and everything works with them as previously
for VSL.

The remaining steps to complete the refactoring will then be to
re-implement the SLT tags one at a time, until we've covered them all.
After that, we can go about removing code that has become obsolete.

VSL queries with the field selector (tag[n]) will end up working quite
differently. Now we scan the string and separate it at whitespace, in
the end we'll just select field n. During the transition, we'll have to
have both techniques working, depending on whether the SLT tag has been
transitioned.

Design sketch:

- include/tbl/vsl_tags*.h continue as the "source of truth" about SLT
tags. The "code-generating macros" in the tables are extended to express
metadata about the payload fields. I imagine that we add at least:

- n (number of fields)
- a sequence of n symbols such as ASCIIZ, DBL, UINT16, UINT32 etc.
for data types

- The data type symbols form an enum, and the enum indexes into a table
of their sizes in bytes, if they have a fixed size (ASCIIZ, at least,
does not).

- In the first iteration, all but a few selected tags have one field of
type ASCIIZ.

- Current VSL*() functions work with such tags without changes.

- varnishlog -j formats these cases as in Guillaume's first
implementation -- two string fields "tag" and "value".

- VSL queries with [n] separate these at whitespace.

- Suggested SLT tags with the new paradigm in the first iteration:
Timestamp, (Be)RespStatus, (Be)ReqAcct, *Header. Maybe some tags that
currently have binary contents, and something for H/2. We should try to
cover a broad variety of data types.

- The PR includes functions/macros for VSL writes & reads in the new
paradigm, using the metadata, similar in spirit to printf/scanf. Such as:

- VSLw(), VSLwv() etc. for writing to the log.

- VSLr(), VSLrv() etc. for reading from the log.

- Function(s) for "format the fields in a ws-separated string".

- Function(s) for "format header-like" as "<name>: <data>" for
headers and timestamps.

- These functions don't need a printf-like format string, we just make
use of the metadata. The idea is:

- start with a pointer at the start of the payload
- read/write sz bytes at the pointer, where sz is the size of the
next field
- advance the pointer to ptr + sz, continue

- VSL queries with [n] for tags in the new paradigm select field n.

- varnishlog -j emits structured data for tags in the new paradigm.

- std.log() and std.timestamp() are updated for the new paradigm, and
all of the vtc tests concerning them pass without changes.

- All of the vtc tests for varnishlog, varnishncsa, varnishhist and
varnishtop pass without changes.

That would be covered in the PR. The plan after merging the PR would be:

- Incrementally change the remaining SLT tags to use the new paradigm.

- After all of the tags have been updated, remove obsolete code.

After that, we can look at the "other considerations" in VIP23:

- SLT flag UNTRUSTED

- A BOM-like byte in binary logs

- Saving the metadata as a header in binary logs, so that log readers
get the metadata from the file.

- Dynamically extensible SLT tags with their own metadata.

Comments welcome!

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

nils.goroll at uplex

Apr 9, 2019, 11:08 PM

Post #2 of 25 (4439 views)

Permalink

Hi,

I am +1 (or more, if I got more votes) overall, in particular, the incremental
approach seems to make a lot of sense to me. Depending on the state of the rest
of varnish-cache core, I would try get this into production ASAP as I have been
for the last couple of years.

Some additional points:

- It would make a lot of sense to include the field definitions for VSL clients
in a VSM segment such that log clients do not need to be recompiled when we
change fields.

- This would also pave the way for possible future dynamic VSL records (added
while varnishd runs, referenced as "VMOD log meta-data" in VIP23). At some
point, phk mentioned he would see them as a requirement for H3, but last I know
this is not the case, so we can put this to the end of the list. Yet we should
head into the right direction to eventually get there.

- We should make it clear that the VSL format does not constitute an API, so log
access is only supported by using the vsl client in libvarnishapi

- We might want to consider making the "file" format written by varnishlog -w an
API, but I guess the advantage over libvarnishapi as the API is pretty minor.

- Geoff, You mentioned once over a burger that the actual binary representation
of doubles is not standardized. By the same argument you give for the byte
order, I do not think that this bears any practical relevance for now, but we
might want to at least mention this and, for future cross-platform support,
might have some kind of "this log uses that representation" info similar to the BOM.

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

xmpp://slink@jabber.int.uplex.de/

http://uplex.de/

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

phk at phk

Apr 10, 2019, 12:13 AM

Post #3 of 25 (4439 views)

Permalink

--------
In message <82919b48-25e0-d22f-1029-6d17db040519@uplex.de>, Geoff Simmons writes:

>Whether we get the support is yet to be finalized, but if so we'll give
>it high priority. phk asked me to post a design sketch in advance before
>we put in the major time and effort, so that others can comment, and
>warn us off we're going in the wrong direction.

Yes, I really hate it when people go off and do something in quiet
and come back with something which isn't quite right for some reason
they overlooked.

>- That means that varnishd writes binary fields into the log.

For all its problems, a single call to sprintf is actually pretty
efficient in terms of CPU cache etc, because we almost always have
it stuck there.

I'm not very keen to see a lot of macro-work here, for instance
wrapping three memcpy's for Timestamp etc.

In theory one could extend the printf syntax with %V and similar,
but that looses the compilers type-checking of arguments.

However, just because the "const char *fmt" and the varargs
are the *same* as printf, doesn't mean we have to *do* the same
thing as printf with them.

So as a total early morning strawman, before morning tea has
fully kicked in, but *after* I read your email:

VSLb(vsl, SLT_Timestamp, "%s: %.6f %.6f %.6f",
event, now, now - first, now - *pprev);

Becomes:

VSLb(vsl, SLT_Timestamp, "S%sT%fD%fD%f",
event, now, now - first, now - *pprev);

And serialize into VSL as:

<HEADER><ASCIIZ><8 byte double><8 byte double><8 byte double>

Meanwhile, the "S%sT%fD%fD%f" string (possibly transformed) tells
the client side that we have a string, a timestamp and two durations.

We could make the "veprintf" format string come out of the tbl/mumble.h
paired with the SLT_ define:

#include <tbl/mumble.h> magic produces:

#define SLT_Bin_Timestamp SLT_Timestamp | SLT_BINARY, "S%sT%fD%fD%f"

Then in source call becomes:

VSLb(vsl, SLT_Bin_Timestamp, event, now, now - first, now - *pprev);

That way nobody confuses it with a "regular" printf, and the
SLT_Timestamp bit alerts the VSL code to interpret the fmt string
as "veprintf" syntax.

This would be back-wards source compatible, making the the transition
easier, and could allow us to make A/B testing by substituting "real"
printf formats with a compiletime option.

The "veprintf" syntax needs to be carefully thought out, for instance
"S%f" should be a syntax error, and that needs to be caught somewhere,
preferably at compile time, but worst case we could walk all the
strings from <tbl/mumble.h> with a varnishd -x argument as part of
the build.

>- Outputs for structured, typed data, in particular JSON from varnishlog
>-j, fall out easily, since the data is already structured and typed
>(don't need to parse strings in the log).

So where does <tbl/mumble.h> come from ?

It probably makes a lot of sense to wrap all the information one single
place, possibly as .rst-oid format like ".vsc" files and have python
chew on that and output all of docs, JSON metadata and the macro
definitions for varnishd (which would not tbl/mumble.h then.)

Far out:

Given this level of abstraction we could reorder the arguments
between VSL and presentation, so for instance timestamp could be
serialized as:

<8 byte double><8 byte double><8 byte double><ASCIIZ>

So the VSLQ would always know where to find the doubles, but the
presentation code in libvarnishapi would (still) render the string
in front.

We chould also benchmark emitting a length before ASCIIZ to make
it faster to skip strings elements.

Either 'trick' might make VSLQ faster.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

dridi at varnish-software

Apr 10, 2019, 12:43 AM

Post #4 of 25 (4439 views)

Permalink

On Wed, Apr 10, 2019 at 8:08 AM Nils Goroll <nils.goroll@uplex.de> wrote:
>
> Hi,
>
> I am +1 (or more, if I got more votes) overall, in particular, the incremental
> approach seems to make a lot of sense to me. Depending on the state of the rest
> of varnish-cache core, I would try get this into production ASAP as I have been
> for the last couple of years.
>
> Some additional points:
>
> - It would make a lot of sense to include the field definitions for VSL clients
> in a VSM segment such that log clients do not need to be recompiled when we
> change fields.

I think the definitions should be the first step. The second one
should be changing the VSL file format magic number to include a
version to accommodate future changes, so that varnishlog -w can also
write the field definitions for vut -r to pick it up offline too.

> - This would also pave the way for possible future dynamic VSL records (added
> while varnishd runs, referenced as "VMOD log meta-data" in VIP23). At some
> point, phk mentioned he would see them as a requirement for H3, but last I know
> this is not the case, so we can put this to the end of the list. Yet we should
> head into the right direction to eventually get there.
>
> - We should make it clear that the VSL format does not constitute an API, so log
> access is only supported by using the vsl client in libvarnishapi
>
> - We might want to consider making the "file" format written by varnishlog -w an
> API, but I guess the advantage over libvarnishapi as the API is pretty minor.
>
> - Geoff, You mentioned once over a burger that the actual binary representation
> of doubles is not standardized. By the same argument you give for the byte
> order, I do not think that this bears any practical relevance for now, but we
> might want to at least mention this and, for future cross-platform support,
> might have some kind of "this log uses that representation" info similar to the BOM.
>
> Nils
>
> --
>
> ** * * UPLEX - Nils Goroll Systemoptimierung
>
> Scheffelstraße 32
> 22301 Hamburg
>
> tel +49 40 28805731
> mob +49 170 2723133
> fax +49 40 42949753
>
> xmpp://slink@jabber.int.uplex.de/
>
> http://uplex.de/
>
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

Apr 10, 2019, 2:18 AM

Post #5 of 25 (4439 views)

Permalink

On 4/10/19 09:13, Poul-Henning Kamp wrote:
>

Responding out of order by taking the last two points first, because
they become relevant again further down.

> Far out:
>
> Given this level of abstraction we could reorder the arguments
> between VSL and presentation, so for instance timestamp could be
> serialized as:
>
> <8 byte double><8 byte double><8 byte double><ASCIIZ>
>
> So the VSLQ would always know where to find the doubles, but the
> presentation code in libvarnishapi would (still) render the string
> in front.

Yes, I thought about that too, generally it will be much smarter not to
have variable-length fields first. So the metadata will want to
distinguish between presentation order and field order within the record.

I forgot to mention yesterday that we'll also want a name for each field
in the metadata, for the JSON tag.

Starting to spitball what the code could like like -- the metadata for a
single field, after generation from tbl/whatever, might look something
like this:

struct vslf_meta {
const char * name;
enum vsl_type type;
unsigned order;
}

And then:

VSLF_Meta[SLT_Timestamp] = { 4, // 4 fields
{"t", DBL, 1},
{"since_start", DBL, 2},
[."since_last", DBL, 3},
{"label", ASCIIZ, 0},
}

Or we could generate two tables, one in "field-major" order like this,
and the other in "presentation-major" order.

Or, instead of the "unsigned order" field, it could be a ssize_t, which
has the byte position within the record if it can be precomputed, or -1
if there are variable-length fields in front of it.

Then we could have "presentation-major" order:

VSLF_Meta[SLT_Timestamp] = { 4,
{"label", ASCIIZ, 24},
{"t", DBL, 0},
{"since_start", DBL, 8},
[."since_last", DBL, 16},
}

Notably, for headers we couldn't pre-compute the byte position of the
value field:

VSLF_Meta[SLT_ReqHeader] = { 2,
{"name", ASCIIZ, 0},
{"value", ASCIIZ, -1},
}

> We chould also benchmark emitting a length before ASCIIZ to make
> it faster to skip strings elements.

Thought about that one too, maybe in addition to ASCIIZ we'll want a
type like PSTRING (for "Pascal string", that's how I learned it in
school), which has a length field in front of a non-NULL terminated
string. As we do it for headers in the workspace already.

One byte for the length is enough for header names, but we'd need two
for VSL, since records can be up to just short of 4Kb. (Cookie header
values are infamously likely to fill out the entire log record.)

I'm not entirely sure about PSTRING. The length field would be great for
VSLQ as you say, but it would mean that varnishd is computing strlen()
all the time, and the point here is to hand off work like that to VSL
clients. And at one point we decided to have NULL-terminated strings
everywhere in VSL, so that we can always use the standard str*() functions.

Or are you thinking of having *both* the length field and NULL-termination?

> For all its problems, a single call to sprintf is actually pretty
> efficient in terms of CPU cache etc, because we almost always have
> it stuck there.
>
> I'm not very keen to see a lot of macro-work here, for instance
> wrapping three memcpy's for Timestamp etc.
>
> In theory one could extend the printf syntax with %V and similar,
> but that looses the compilers type-checking of arguments.
>
> However, just because the "const char *fmt" and the varargs
> are the *same* as printf, doesn't mean we have to *do* the same
> thing as printf with them.
>
> So as a total early morning strawman, before morning tea has
> fully kicked in, but *after* I read your email:
>
> VSLb(vsl, SLT_Timestamp, "%s: %.6f %.6f %.6f",
> event, now, now - first, now - *pprev);
>
> Becomes:
>
> VSLb(vsl, SLT_Timestamp, "S%sT%fD%fD%f",
> event, now, now - first, now - *pprev);

I wasn't thinking about doing this with format strings because, if we
have the SLT metadata in tables already, then the tables are enough for
what we need to do. And the format string becomes another thing that can
be incorrect. The caller of VSL*() at least has to get the order of the
varargs fields right, which means checking that against the metadata.
Getting the format string right as well wouldn't be strictly necessary.

I agree that we'll be replacing the long-optimized library printf code,
however we go about it, so it will have to be fast. But we need to do
that in any case, whether we're interpreting format strings or running
through tables of metadata. We can't hand off the format string for
binary writes to vsnprintf(), or anything else in the library. How can
we do it without implementing code of our own that uses something like
memcpy() internally?

As noted above, we'll need to have metadata in tables at least for the
field names used for JSON. And I suspect that VSLQ will benefit from
metadata tables, when it can jump right to the byte where a field is
located. So I don't think that all of this works with format strings only.

If we run through metadata tables to write the binary records, I think
they're likely to be at least in L2 cache. If the set of SLT tags is
fixed, they can be const (but they probably can't be when we get to
extending the tags).

There is a point to be made that format strings are shorter, while a
table entry for the field formats could burst a cacheline. And that
format strings are probably a good idea for output functions used by VSL
clients -- for that, we really could hand of a format to *printf().

> This would be back-wards source compatible, making the the transition
> easier, and could allow us to make A/B testing by substituting "real"
> printf formats with a compiletime option.

That's also a good point.

> So where does <tbl/mumble.h> come from ?
>
> It probably makes a lot of sense to wrap all the information one single
> place, possibly as .rst-oid format like ".vsc" files and have python
> chew on that and output all of docs, JSON metadata and the macro
> definitions for varnishd (which would not tbl/mumble.h then.)

Sure. I was thinking of just extending what we already have in
include/tbl/vsl_tags*.h on the grounds of "don't alarm everyone,
especially not phk, by removing something that's been fundamental so
far". But if you promise not to be alarmed, we could go with something
similar to *.vsc. Probably we could generate headers exactly like
vsl_tags*.h from that.

Thanks for the feedback,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

nils.goroll at uplex

Apr 10, 2019, 2:50 AM

Post #6 of 25 (4439 views)

Permalink

my 2¢ on the VSL interface:

If we really want to go for varargs, having format string compile time checking
with v_printflike_() would be a must in order not to lower our standard, yes.

But at least for the built-in tags, I find it more appealing to auto-generate
logging functions for each SLT (not macros) with well-defined arguments, by
example of

void VSLb_ts(struct vsl_log *, const char *event, vtim_real first,
vtim_real *pprev, vtim_real now);

The generated code may or may not memcpy, but even if, why is this an issue?

And for this concept, I don't think the cache line argument holds. Most
arguments will be passed in registers.

For dynamic Tags, we might have some less efficient generic code which follows
some spec and a format string, but we might also compromise a bit and make these
not fully dynamic but rather "extensible" and run our spec2c-compiler for VSL
functions parallel to vmodtool

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

xmpp://slink@jabber.int.uplex.de/

http://uplex.de/

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

phk at phk

Apr 10, 2019, 7:07 AM

Post #7 of 25 (4439 views)

Permalink

--------
In message <93c9d1d0-9d1c-ba23-a7aa-cc5e597d171e@uplex.de>, Nils Goroll writes:

>But at least for the built-in tags, I find it more appealing to auto-generate
>logging functions for each SLT (not macros) with well-defined arguments, by
>example of
>
>void VSLb_ts(struct vsl_log *, const char *event, vtim_real first,
> vtim_real *pprev, vtim_real now);
>
>The generated code may or may not memcpy, but even if, why is this an issue?

Because of the cache footprint.

If each VSL record invokes its own C-function, we eat up a LOT of L1+L2
cache bandwidth. If they all call the same function, we do not.

(Timestamp was a badly chosen example, because we already wrap that
in a function call, but you get the idea...)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

nils.goroll at uplex

Apr 10, 2019, 7:24 AM

Post #8 of 25 (4439 views)

Permalink

On 10/04/2019 16:07, Poul-Henning Kamp wrote:
> If each VSL record invokes its own C-function, we eat up a LOT of L1+L2
> cache bandwidth. If they all call the same function, we do not.

OK, understood.

I wonder though if all things considered (format string parsing, definition
lookup, varargs processing) this pays off. I got no data on this, but one
counter-argument might be that, for custom functions, one would expect no/very
few conditionals, so we would likely see no/very few pipeline stalls .

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

xmpp://slink@jabber.int.uplex.de/

http://uplex.de/

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

Apr 10, 2019, 7:59 AM

Post #9 of 25 (4439 views)

Permalink

On 4/10/19 16:24, Nils Goroll wrote:
>
> I wonder though if all things considered (format string parsing, definition
> lookup, varargs processing) this pays off. I got no data on this, but one
> counter-argument might be that, for custom functions, one would expect no/very
> few conditionals, so we would likely see no/very few pipeline stalls .

Remember that this started out when we were considering varnishlog -j.
If the output is to be truly JSON-like, we'd want the data for something
like (Be)ReqAcct to be an object with six integers, rather than a
whitespace-separated string of digit strings.

But yes, writing the binary data efficiently into log buffers, where now
it's mostly vsnprintf(), becomes the challenge. Both of the vsnprintf()
and a new function run through varargs, and both parse a format string
if that's what we'll use. The new function will just copy data instead
of doing things like repeatedly dividing by 10 -- that's where we're
going for a win.

But of course we'd be competing against the many years that have gone
into optimizing the library printf code.

(If vsnprintf() has a way to do it all without branching, I'd like to
see how.)

BTW I've now understood one of the attractions of the format string --
the type-correctness of the args can be checked at compile-time.

So I'm starting to come over to the format-string camp. If we're
generating code from .rst-like resources, as currently with .vsc, we
could use Python to generate the formats, that way we're sure to get
them right.

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

Apr 11, 2019, 6:40 AM

Post #10 of 25 (4430 views)

Permalink

I'm not sure if we have a consensus, but I'd like to summarize what I've
understood from the conversation so far. I at least would be willing to
go ahead on this basis.

- The "source of truth" and generation of code and docs for SLT tags
follows the model of .vsc files and the vsctool. So data about the tags,
including the new metadata, live in .rst-like sources, from which a
Python tool generates code and docs. This replaces current
include/tbl/vsl_tags*.h.

- New VSL*() code writes binary data to log records. They take
printf-like format strings (which can be generated from the SLT sources)
and varargs, so that the order and types of the args can be checked at
compile-time. But the new VSL-binary code doesn't hand off formatting to
the library printf family; it runs through the format and the varargs,
copying data into the log record.

- Code in the VSL client API extracts and formats data from log records.
The client API encapsulate details about the internal structure of
binary records -- the binary format is not public.

The VSL-binary code will obviously be hot code, and will have to be
efficient. But IMO the first PR should not overdo optimization. The
emphasis should be on correctness and good design, and the PR should be
a reasonable but not exaggerated effort to make it fast. As always, we
optimize after testing and measuring.

Comments welcome again!

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

phk at phk

Apr 11, 2019, 6:45 AM

Post #11 of 25 (4430 views)

Permalink

--------
In message <eb2c2e9d-810c-703c-5e3d-8b5444687de0@uplex.de>, Geoff Simmons writes:

>I'm not sure if we have a consensus, but I'd like to summarize what I've
>understood from the conversation so far. I at least would be willing to
>go ahead on this basis.

Sorry, I got distracted by a disk (actually NVME) failure on my laptop
yesterday, and have not tought more about it. Can we conclude on
this monday morning instead ?

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

Apr 11, 2019, 6:47 AM

Post #12 of 25 (4430 views)

Permalink

On 4/11/19 15:45, Poul-Henning Kamp wrote:
>
> Sorry, I got distracted by a disk (actually NVME) failure on my laptop
> yesterday, and have not tought more about it. Can we conclude on
> this monday morning instead ?

np

I hope your data's OK.

--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

phk at phk

Apr 11, 2019, 8:38 AM

Post #13 of 25 (4430 views)

Permalink

--------
In message <2291598a-aa17-ad9f-ed32-5b77b0edeb9c@uplex.de>, Geoff Simmons writes:

>I hope your data's OK.

No worries, I've been in this trade too long to trust any hardware :-)

Running FreeBSD-Current for 26 years makes you a little bit paranoid
about the software too :-)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

nils.goroll at uplex

Apr 11, 2019, 8:45 AM

Post #14 of 25 (4430 views)

Permalink

> - New VSL*() code writes binary data to log records. They take
> printf-like format strings (which can be generated from the SLT sources)
> and varargs, so that the order and types of the args can be checked at
> compile-time.

The alternative - auto-generated functions taking typed arguments, would yield
in a different interface, so while I am perfectly fine with aiming for
correctness first, the decision for

- either a printf-style VSL() or
- custom functions like VSLb_ts()

would ideally be made early on.

I will try to come up with some performance data.

Other than that, I do fully agree with the summary.

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

xmpp://slink@jabber.int.uplex.de/

http://uplex.de/

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

Apr 12, 2019, 7:00 AM

Post #15 of 25 (4430 views)

Permalink

So I said we shouldn't get to optimizing too soon, and here I am doing
something like that. Because the discussion made me curious.

In the attachment you'll find:

- A first implementation of a function VSLwv() to write binary fields
into a log record buffer, using a printf-like format string and a va_list.

- Test calls for Timestamp, ReqAcct, ReqHeader and RespStatus (these are
#if-0'd out here).

- printf() calls to output the data from the binary log record (also
#if-0'd out). This confirms that we can get the data back out as it went
in, and experiments at how we could go about that.

- A benchmark with ca. one gazillion runs of VSLwv() and the
corresponding sprintf() calls, to compare the binary function with
library printf code.

- Results of running that through callgrind and perf record. To view the
results, you'll need a tool like kcachegrind, and perf report.

VSLwv() doesn't have bounds checking, to make sure that we don't write
past vsl_reclen, which of course a real version must have. That will
mean more conditional branching, but the branches will be predicted
almost always.

I grepped through current VSL*() calls, and these are more or less all
of the format characters we currently use: %s, %.*s, %.ns (for a length
n, as in %.20s), %d, %u, %x, %f, %j* and %z* (for (u)intmax_t and size_t
types). There may be one or two more or less, but that's close to all of
it. I added %hu for a response status as unsigned short (we currently
have %u for that). (There are some %p's for tags like ExpKill, mostly
only interesting for devs.)

Note that for the binary writes, we don't have to care if the data type
is unsigned, just copy the right number of bytes. We'll want the %u
formats for type checking, but VSLwv() can do the same thing for %d and
%u, and %zd and %zu, and so on. It's the output code that needs to get
signedness right.

This version always writes a length field as an unsigned short before a
string (maybe it should be uint16_t), and string fields are not
NUL-terminated.

So a final version would need the bounds checking, may handle one or two
more or fewer format chars, and may handle strings differently. Other
than that, I think this is about everything we'll need it to do.

What did the performance tools say?

Overall, VSLwv() did noticeably better than the library printf calls on
the various measures: fewer CPU cycles, better memory cache hit rates
(both instruction and data), and, generally, fewer branch mispredicts.
Not surprising, because library printf has quite a bit more to do.

As @Nils suspected, the weak spot is branch prediction. My gcc compiled
the switch statement as a jump table (jmpq *rax), and kcachegrind
reports 86% of indirect branch mispredicts there, far more than anything
else -- glibc vfprintf comes in next with 13%.

On the other hand, the library printf code has about twice as many
mispredicts for conditional branches, resulting in more mispredicts for
all types of branches.

So: this first shot at coding something up has a sore spot, but overall
appears to be measurably better than the printf stuff.

Some other lessons learned:

The binary records are not necessarily shorter than the ASCII-formatted
records. Particularly ReqAcct, which we have as 6 uintmax_t fields. That
comes to 48 bytes on my machine, about twice as long as a string like
"110 0 110 210 224 434". I don't think we can do much about that.
Request/responses can transfer up to gigabytes, which is why we need
uintmax_t. Unless we want to try something like variable-length integer
fields (which I doubt).

HTTP header records, with two variable-length strings, also won't be any
shorter than the ASCII format. If they have two 2-byte length fields,
they may come out as a couple of bytes longer (especially if they're
also NUL-terminated).

In the large, I think we'll be saving VSL space -- the binary Timestamp
records, for example, end up about half as long as the ASCII-formatted
version. But we won't be saving space in every record.

As you may notice, some of this code is ... not pretty. With idioms like
this:

*((int *)p) = va_arg(ap, int);

printf("%.*s: %f %f %f\n\n", *((unsigned short *)&buf[24]),
&buf[26], *((double *)&buf[0]), *((double *)&buf[8]),
*((double *)&buf[16]));

... I had pleasant thoughts like "phk is gonna kill me".

If we're going with something like this, we'll probably want some macros
and/or inline functions for better sanity, say:

#define CASTP(ptr, type) (*((type *)(ptr)))

And then:

CASTP(p, int) = va_arg(ap, int);

For the output code, we could have the Python tool generate format
strings (perhaps more than one, such as "space-separated" and
"header-like"), field accessors, and varargs lists.

Say something like:

WS_FMT(SLT_Timestamp) ==> "%.*s: %.6f %.6f %.6f"
FLD_LEN(buf, SLT_Timestamp, 0) ==> CASTP(&buf[24], unsigned short)
FLD(buf, SLT_Timestamp, 0) ==> &buf[26]
FLD(buf, SLT_Timestamp, 1) ==> CASTP(&buf[0], double)
FLD(buf, SLT_Timestamp, 2) ==> CASTP(&buf[8], double)
FLD(buf, SLT_Timestamp, 3) ==> CASTP(&buf[16], double)

FMT_ARGS(buf, SLT_Timestamp) ==> \
FLD_LEN(buf, SLT_Timestamp, 0), FLD(buf, SLT_Timestamp, 0), \
FLD(buf, SLT_Timestamp, 2), FLD(buf, SLT_Timestamp, 3)

So then it's:

printf(WS_FMT(SLT_Timestamp), FMT_ARGS(buf, SLT_Timestamp));

... all generated so it's sure to be right. Something like the FLD()
accessors will of course also be needed for VSLQ, varnishncsa etc.

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

dridi at varni

Apr 12, 2019, 7:37 AM

Post #16 of 25 (4430 views)

Permalink

On Fri, Apr 12, 2019 at 4:00 PM Geoff Simmons <geoff@uplex.de> wrote:
>
> So I said we shouldn't get to optimizing too soon, and here I am doing
> something like that. Because the discussion made me curious.
>
> In the attachment you'll find:
>
> - A first implementation of a function VSLwv() to write binary fields
> into a log record buffer, using a printf-like format string and a va_list.
>
> - Test calls for Timestamp, ReqAcct, ReqHeader and RespStatus (these are
> #if-0'd out here).
>
> - printf() calls to output the data from the binary log record (also
> #if-0'd out). This confirms that we can get the data back out as it went
> in, and experiments at how we could go about that.
>
> - A benchmark with ca. one gazillion runs of VSLwv() and the
> corresponding sprintf() calls, to compare the binary function with
> library printf code.
>
> - Results of running that through callgrind and perf record. To view the
> results, you'll need a tool like kcachegrind, and perf report.
>
> VSLwv() doesn't have bounds checking, to make sure that we don't write
> past vsl_reclen, which of course a real version must have. That will
> mean more conditional branching, but the branches will be predicted
> almost always.

Interesting, for textual logs the effect of vsl_reclen is
straightforward but how should we deal with binary records truncation?

> I grepped through current VSL*() calls, and these are more or less all
> of the format characters we currently use: %s, %.*s, %.ns (for a length
> n, as in %.20s), %d, %u, %x, %f, %j* and %z* (for (u)intmax_t and size_t
> types). There may be one or two more or less, but that's close to all of
> it. I added %hu for a response status as unsigned short (we currently
> have %u for that). (There are some %p's for tags like ExpKill, mostly
> only interesting for devs.)
>
> Note that for the binary writes, we don't have to care if the data type
> is unsigned, just copy the right number of bytes. We'll want the %u
> formats for type checking, but VSLwv() can do the same thing for %d and
> %u, and %zd and %zu, and so on. It's the output code that needs to get
> signedness right.

Should we though? If someone runs Varnish in a virtual machine on a
mainframe server and runs varnishlog -w don't we want to be able to
varnishlog -r that file on a "regular" workstation?

> This version always writes a length field as an unsigned short before a
> string (maybe it should be uint16_t), and string fields are not
> NUL-terminated.

Until the vsl_reclen question is sorted out, you can always copy the
string and grab its length while doing so, and once it's written and
you have the length you can write it at the field length's address.

I agree we should settle for uin16_t for that.

Having null-terminated strings open a zero-copy opportunity on the VSL
consumer side.

> So a final version would need the bounds checking, may handle one or two
> more or fewer format chars, and may handle strings differently. Other
> than that, I think this is about everything we'll need it to do.
>
> What did the performance tools say?
>
> Overall, VSLwv() did noticeably better than the library printf calls on
> the various measures: fewer CPU cycles, better memory cache hit rates
> (both instruction and data), and, generally, fewer branch mispredicts.
> Not surprising, because library printf has quite a bit more to do.
>
> As @Nils suspected, the weak spot is branch prediction. My gcc compiled
> the switch statement as a jump table (jmpq *rax), and kcachegrind
> reports 86% of indirect branch mispredicts there, far more than anything
> else -- glibc vfprintf comes in next with 13%.
>
> On the other hand, the library printf code has about twice as many
> mispredicts for conditional branches, resulting in more mispredicts for
> all types of branches.
>
> So: this first shot at coding something up has a sore spot, but overall
> appears to be measurably better than the printf stuff.
>
> Some other lessons learned:
>
> The binary records are not necessarily shorter than the ASCII-formatted
> records. Particularly ReqAcct, which we have as 6 uintmax_t fields. That
> comes to 48 bytes on my machine, about twice as long as a string like
> "110 0 110 210 224 434". I don't think we can do much about that.
> Request/responses can transfer up to gigabytes, which is why we need
> uintmax_t. Unless we want to try something like variable-length integer
> fields (which I doubt).

We could, but then that would force us to pre-compute variable-length
field sizes. (are there others than strings?)

> HTTP header records, with two variable-length strings, also won't be any
> shorter than the ASCII format. If they have two 2-byte length fields,
> they may come out as a couple of bytes longer (especially if they're
> also NUL-terminated).

I think we never discussed the prefix filtering. With binary logs it
should become explicit and a different kind of field that is
essentially a string for a specific purpose.

Since the presence of a prefix may be optional, we'd need to change
std.log() and its underlying VRT function to something like this:

$Module std
$Function VOID log([STRING prefix], STRING msg)

And inside, if VCL <= 4.1 and prefix is null, look for a colon
character and extract the prefix from the message if present.

Individual records in the shmlog would then need a bit flag to tell
whether the first field is a prefix or not.

> In the large, I think we'll be saving VSL space -- the binary Timestamp
> records, for example, end up about half as long as the ASCII-formatted
> version. But we won't be saving space in every record.

But we also expect to save a lot on the consumer side.

> As you may notice, some of this code is ... not pretty. With idioms like
> this:
>
> *((int *)p) = va_arg(ap, int);
>
> printf("%.*s: %f %f %f\n\n", *((unsigned short *)&buf[24]),
> &buf[26], *((double *)&buf[0]), *((double *)&buf[8]),
> *((double *)&buf[16]));
>
> ... I had pleasant thoughts like "phk is gonna kill me".
>
> If we're going with something like this, we'll probably want some macros
> and/or inline functions for better sanity, say:
>
> #define CASTP(ptr, type) (*((type *)(ptr)))
>
> And then:
>
> CASTP(p, int) = va_arg(ap, int);
>
> For the output code, we could have the Python tool generate format
> strings (perhaps more than one, such as "space-separated" and
> "header-like"), field accessors, and varargs lists.
>
> Say something like:
>
> WS_FMT(SLT_Timestamp) ==> "%.*s: %.6f %.6f %.6f"
> FLD_LEN(buf, SLT_Timestamp, 0) ==> CASTP(&buf[24], unsigned short)
> FLD(buf, SLT_Timestamp, 0) ==> &buf[26]
> FLD(buf, SLT_Timestamp, 1) ==> CASTP(&buf[0], double)
> FLD(buf, SLT_Timestamp, 2) ==> CASTP(&buf[8], double)
> FLD(buf, SLT_Timestamp, 3) ==> CASTP(&buf[16], double)
>
> FMT_ARGS(buf, SLT_Timestamp) ==> \
> FLD_LEN(buf, SLT_Timestamp, 0), FLD(buf, SLT_Timestamp, 0), \
> FLD(buf, SLT_Timestamp, 2), FLD(buf, SLT_Timestamp, 3)
>
> So then it's:
>
> printf(WS_FMT(SLT_Timestamp), FMT_ARGS(buf, SLT_Timestamp));
>
> ... all generated so it's sure to be right. Something like the FLD()
> accessors will of course also be needed for VSLQ, varnishncsa etc.
>
>
> Best,
> Geoff
> --
> ** * * UPLEX - Nils Goroll Systemoptimierung
>
> Scheffelstraße 32
> 22301 Hamburg
>
> Tel +49 40 2880 5731
> Mob +49 176 636 90917
> Fax +49 40 42949753
>
> http://uplex.de
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

nils.goroll at uplex

Apr 12, 2019, 7:52 AM

Post #17 of 25 (4430 views)

Permalink

On 12/04/2019 16:37, Dridi Boukelmoune wrote:

> Interesting, for textual logs the effect of vsl_reclen is
> straightforward but how should we deal with binary records truncation?

As before - truncate?

For the fixed part, we can make sure that vsl_reclen can not be set to below a
sensible minimum, and I do not see a difference between truncating text and
binary. In both cases the log is just - a log, and may lose data anyway.

> Should we though? If someone runs Varnish in a virtual machine on a
> mainframe server and runs varnishlog -w don't we want to be able to
> varnishlog -r that file on a "regular" workstation?

Ideas of a BOM and a double representation marker were mentioned before. We
would indicate in the log which representation was used and it would be up to
the client to make any endianness adjustments, double representation conversions
and the like.

> Having null-terminated strings open a zero-copy opportunity on the VSL
> consumer side.

Agree.

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

xmpp://slink@jabber.int.uplex.de/

http://uplex.de/

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

Apr 12, 2019, 8:35 AM

Post #18 of 25 (4430 views)

Permalink

On 4/12/19 16:37, Dridi Boukelmoune wrote:
>
> Interesting, for textual logs the effect of vsl_reclen is
> straightforward but how should we deal with binary records truncation?

On 4/12/19 16:52, Nils Goroll wrote:
> As before - truncate?
>
> For the fixed part, we can make sure that vsl_reclen can not be set to below a
> sensible minimum, and I do not see a difference between truncating text and
> binary. In both cases the log is just - a log, and may lose data anyway.

Because of the idea of re-ordering fields in the binary record, mainly
so that the longer, variable fields go last, probably not the same thing
will happen when we truncate records in the new version as in the old.

If someone has the crazy idea of setting vsl_reclen to the current
minimum of 16b, then currently we get something like this:

Timestamp Request: 1554970

If we put the double fields first and the string field last in the
binary record, then with vsl_reclen=16b we'll get:

Timestamp : 1554970958.415837 0.000023

Both broken, of course, but not the same way, and all because vsl_reclen
is just way too short. Anyway, I think this will be pretty hard to
"break, and break the same way as before".

But I'd go for this rule:

* If a fixed-length field would overflow the buffer, then don't write
it, and stop.

* If a variable-length field overflows the buffer, write as much of it
as we can, then stop.

The outputs will turn out screwy, one way or another.

I agree with @Nils that we should consider increasing minimum
vsl_reclen, because at 16b the logs will just be broken.

>> Note that for the binary writes, we don't have to care if the data type
>> is unsigned, just copy the right number of bytes. We'll want the %u
>> formats for type checking, but VSLwv() can do the same thing for %d and
>> %u, and %zd and %zu, and so on. It's the output code that needs to get
>> signedness right.
>
> Should we though? If someone runs Varnish in a virtual machine on a
> mainframe server and runs varnishlog -w don't we want to be able to
> varnishlog -r that file on a "regular" workstation?

Types may have different sizes, byte orders might be different, doubles
might not even be IEEE754 binary64 format -- there is one scenario in
which that all becomes an issue. When we write a binary log on one
machine, and read it on another architecture. Other than that, we always
write and read logs on the same architecture.

AFAIK, "cross-architecture" log writes and reads via binary file are not
guaranteed to work now either.

It would be kinda awesome if we can abstract VSL enough to get that
scenario to work -- probably by saving the integer type sizes in the
metadata, using a BOM byte, and so forth (but what in the world do we do
about different double formats?). But IMO we should get everything else
to work first.

Is there any architecture for which unsigned and signed integer types
have different byte widths?

> Until the vsl_reclen question is sorted out, you can always copy the
> string and grab its length while doing so, and once it's written and
> you have the length you can write it at the field length's address.

We should use library functions, memcpy or strcpy, for any multi-byte
copying, not do it ourselves. It's really surprising what they do to
optimize what seems to be so simple -- vector instructions, copying
wider types (cast it all to intmax_t and copy 8 bytes at a time)... The
performance differences are significant, we'll never keep up with what
the C compilers are able to figure out.

> Having null-terminated strings open a zero-copy opportunity on the VSL
> consumer side.

Yeah, we made a deliberate choice to always NUL-terminate the log
records, so that clients can always use str*() stuff. There are a few
options to consider now:

* strings are NUL-terminated
* strings have length fields
* strings are NUL-terminated *and* have length fields
* some strings do it one way, other strings do it the other way,
depending on which SLT tag and field it is

>> The binary records are not necessarily shorter than the ASCII-formatted
>> records. Particularly ReqAcct, which we have as 6 uintmax_t fields. That
>> comes to 48 bytes on my machine, about twice as long as a string like
>> "110 0 110 210 224 434". I don't think we can do much about that.
>> Request/responses can transfer up to gigabytes, which is why we need
>> uintmax_t. Unless we want to try something like variable-length integer
>> fields (which I doubt).
>
> We could, but then that would force us to pre-compute variable-length
> field sizes. (are there others than strings?)

Well, we could make "110 0 110 210 224 434" much shorter in binary form,
if we knew that in this case, it's 5 uint8_t's and a uint16_t. That fits
into 7 bytes, rather than 48; except that we'd also have to encode the
integer field widths somehow, if it's going to be sometimes uint8_t,
sometimes uint16_t, sometimes uintmax_t, etc. Over burgers today, @Nils
got to thinking about Huffman encoding. But we agreed that it's better
not to bother with something like that, at least not for now. %^)

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

nils.goroll at uplex

Apr 12, 2019, 8:48 AM

Post #19 of 25 (4430 views)

Permalink

On 12/04/2019 17:35, Geoff Simmons wrote:

(integer coding etc)
> But we agreed that it's better not to bother with something like that, at least not for now. %^)
On 10/04/2019 08:08, Nils Goroll wrote:
> - We should make it clear that the VSL format does not constitute an API, so
> log access is only supported by using the vsl client in libvarnishapi

Following this approach, we are free to optimize the VSL format and still keep
the varnishlog -w binary compatible

--

** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

xmpp://slink@jabber.int.uplex.de/

http://uplex.de/

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

phk at phk

Apr 12, 2019, 12:24 PM

Post #20 of 25 (4430 views)

Permalink

--------
In message <CABoVN9DAYQYd7OiaH-MJg-A425H7OGccZ39aCk0xE_nbqHB9mQ@mail.gmail.com>, Dridi Boukelmoune writes:

>Interesting, for textual logs the effect of vsl_reclen is
>straightforward but how should we deal with binary records truncation?

Maybe redefine the limit to apply to individual strings (ie: header
values) rather than the full record ?

>> Some other lessons learned:
>>
>> The binary records are not necessarily shorter than the ASCII-formatted
>> records.

It is incredible how often people overlook this. I've been ranting
about it for almost two decades now, in particular in context of the
userland/kernel API.

Imagine if in addition to errno there also were a const char *errstr
which the kernel could fill out, that would do *wonders*, in particular
for EINVAL.

My favourite example was a Nx64 card where the driver spent more
than 35kloc of source code on defining a ton of structs for ioctl(2)
usage but no matter what the kernel found deficient, the only thing
it told you was return(EINVAL).

I threw out the entire thing and added a single ioctl which passed a
struct with an input string, containing a command like
"set chan 4 ts 4-12"
and another string out with an error message like:
"channel 11 already allocated to chan 5"
didn't even spend 1kloc on it...

>> comes to 48 bytes on my machine, about twice as long as a string like
>> "110 0 110 210 224 434". I don't think we can do much about that.

It depends what we know about the numbers.

For the 68K they had done a lot of statistics, and therefore their
relative jump instructions worked like this:

[opcode] 0x00 [16 bit signed displacement]
[opcode] 0xff [32 bit signed displacement]
[opcode] [8 bit signed displacement]

This gives good compression, without sinking in the
"variable length integers" like the crap in OSI/X509/H2/H3

Of course Motorola had the benefit that neither zero nor minus 1
gives any meaning as relative jump displacement.

We would probably want to sacrifice 0xfe rather than 0x00 (Benfords
Law and all that)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

phk at phk

Apr 14, 2019, 11:06 PM

Post #21 of 25 (4416 views)

Permalink

--------
In message <d1ee0467-fe1d-b135-3825-b3f4c7a1e999@uplex.de>, Geoff Simmons write
s:

Now that I found your mock-up in my spam-folder (no idea why) and
have a semi-working laptop again:

>In the attachment you'll find:
>
>- A first implementation of a function VSLwv() to write binary fields
>into a log record buffer, using a printf-like format string and a va_list

For strict alignment platformns I think you have to explicitly use
memcpy(3) in all cases, and leave it to the compiler to decide if
it really needs to call it.

>I grepped through current VSL*() calls, and these are more or less all
>of the format characters we currently use: %s, %.*s, %.ns (for a length
>n, as in %.20s), %d, %u, %x, %f, %j* and %z* (for (u)intmax_t and size_t
>types).

Yes, the vocabulary isn't that big to begin with.

>As @Nils suspected, the weak spot is branch prediction. My gcc compiled
>the switch statement as a jump table (jmpq *rax), and kcachegrind
>reports 86% of indirect branch mispredicts there, far more than anything
>else -- glibc vfprintf comes in next with 13%.

A switch may not be the optimal performance choice. If we collect
stats on frequency of the various formats, a handsorted if-else-if
chain may be better.

>HTTP header records, with two variable-length strings, also won't be any
>shorter than the ASCII format. If they have two 2-byte length fields,
>they may come out as a couple of bytes longer (especially if they're
>also NUL-terminated).

We could ENUM the most common headers. I've been considering that in
general, but the performance trade-off isn't obvious.

> printf("%.*s: %f %f %f\n\n", *((unsigned short *)&buf[24]),
> &buf[26], *((double *)&buf[0]), *((double *)&buf[8]),
> *((double *)&buf[16]));
>
>-.. I had pleasant thoughts like "phk is gonna kill me".

For a mock-ups to test a concept, I've done far worse myself :-)

>Something like the FLD()
>accessors will of course also be needed for VSLQ, varnishncsa etc.

All things considered, I suspect the generating VSL calls to be
the simpler part of the task and VSLQ to be the harder one, so
we should probably look carefully at that as well.

Poul-Henning

PS: Re: slinks idea with generating functions to emit the VSL
records, I will mention one evil trick I have seen, but say up front
that we are not going there: If you machine-generate the producer
functions you can basically memcpy memcpy'ed the stack-frame into
a network packet (or VSL record). This is horribly CPU-calling-convention
specific, and needs special casing for strings, but in olden days
before really smart compilers (ie: 68k) it was a pretty smart move.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

dridi at varni

May 3, 2019, 1:51 AM

Post #22 of 25 (4339 views)

Permalink

On Fri, Apr 12, 2019 at 9:24 PM Poul-Henning Kamp <phk@phk.freebsd.dk> wrote:
>
> --------
> In message <CABoVN9DAYQYd7OiaH-MJg-A425H7OGccZ39aCk0xE_nbqHB9mQ@mail.gmail.com>, Dridi Boukelmoune writes:
>
> >Interesting, for textual logs the effect of vsl_reclen is
> >straightforward but how should we deal with binary records truncation?
>
> Maybe redefine the limit to apply to individual strings (ie: header
> values) rather than the full record ?
>
> >> Some other lessons learned:
> >>
> >> The binary records are not necessarily shorter than the ASCII-formatted
> >> records.
>
> It is incredible how often people overlook this. I've been ranting
> about it for almost two decades now, in particular in context of the
> userland/kernel API.
>
> Imagine if in addition to errno there also were a const char *errstr
> which the kernel could fill out, that would do *wonders*, in particular
> for EINVAL.
>
> My favourite example was a Nx64 card where the driver spent more
> than 35kloc of source code on defining a ton of structs for ioctl(2)
> usage but no matter what the kernel found deficient, the only thing
> it told you was return(EINVAL).
>
> I threw out the entire thing and added a single ioctl which passed a
> struct with an input string, containing a command like
> "set chan 4 ts 4-12"
> and another string out with an error message like:
> "channel 11 already allocated to chan 5"
> didn't even spend 1kloc on it...

I have given a bit more thought to this and going with structured
binary VSL records will have more consequences than a blank-separated
list of fields.

First we lose the ability to treat the whole record as a single sting
out of the box. The client will need to reconstruct the string from
the various field types that may constitute a record. Think string and
regex operators.

Then we further the disconnect between today's VSL parsing and VSL
query parsing. Query parsing of strings does already support quoted
strings, and they are not mandatory. We can even query a prefix
containing blanks by quoting the prefix:

-q 'VCL_Log:"just because" eq "we can"'

My attempt at solving this (#2872) was bringing back consistency by
allowing fields to be quoted. I'm glad with the turbo-encabulator-friendly
outcome for Backend_health, but we need to consider the std.log()
escape hatch over which we have no input control, and for which we
can't currently provide a VCC interface to describe the fields, pass
them as a vararg and still ensure at "compile time" that arguments
match the spec. Sure, VCC could learn to do that but it sounds a tiny
bit complicated.

Finally, regarding the VSL file header that we'd hypothetically dump
with varnishlog -w in the future (which I really hope we do), we could
consider making this a VSL record itself that is not subject to
vsl_reclen, querying or filtering, for the needs of dynamic VSLs, so
that whenever something changes VSL clients see an update. It would
then be varnishlog -w's job to build such a synthetic record in the
VSL file before dumping actual logs.

Dridi
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

geoff at uplex

May 3, 2019, 3:07 AM

Post #23 of 25 (4339 views)

Permalink

On 5/3/19 10:51, Dridi Boukelmoune wrote:
>
> First we lose the ability to treat the whole record as a single sting
> out of the box. The client will need to reconstruct the string from
> the various field types that may constitute a record. Think string and
> regex operators.

Yes, Martin and I recently had a conversation about this on #hacking.
For the string equality comparisons (eq and ne), we could conceivably
look into the comparison string and see if it has blanks that cross
field boundaries. But as you pointed out, some VSL payloads have blanks
that are not field separators (VCL_Log, FetchError, etc).

For regex comparisons, where the pattern could include things like "\s",
".*", "\W" and so forth, detecting whether the pattern might match a
field-separating blank has the look of a sophisticated computer
science-y problem.

Martin and I agreed that, at least in the first iteration, for VSLQ
queries with a string operator (equality or regex) we should render the
formatted payload into a temporary buffer and run the comparison against
that. Once we have it all working, maybe we could consider more clever
solutions, but I'd be surprised if we'd gain much from getting clever.

I think that an important part of this will be to emphasize in the docs
that authors of VSL queries should use field[n] indices and numeric
comparisons whenever it's appropriate. My (admittedly evidence-free)
impression is that not everyone uses them when they could.

Say you're scanning backend logs for fetches to backend foo, like this:

-q 'BackendOpen ~ "foo"'

Since the backend name is the second field, it would be better as:

-q 'BackendOpen[2] eq "foo"'

Or to look for backend fetches that take longer than 1 second, it
wouldn't surprise me if some people are doing something like:

-q 'Timestamp:Beresp ~ "[1-9]\.\d*$"'

When it should be:

-q 'Timestamp:Beresp[3] > 1.0'

We'll do much better if we can narrow the query to a field, and if we
don't have to convert numbers at all. And we should make sure that users
know about it.

> Finally, regarding the VSL file header that we'd hypothetically dump
> with varnishlog -w in the future (which I really hope we do), we could
> consider making this a VSL record itself that is not subject to
> vsl_reclen, querying or filtering, for the needs of dynamic VSLs, so
> that whenever something changes VSL clients see an update. It would
> then be varnishlog -w's job to build such a synthetic record in the
> VSL file before dumping actual logs.

I agree, and I'd say that when we do have meta-data entries in the log,
it should be possible for them to appear at any time, in both log files
and in-memory logs. That would put us on the path to dynamic SLT tags.

But we're not planning for that in the first iteration. phk has said
that he doesn't foresee dynamic VSL by the September release. And I
agree that we should get the rest working first.

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

dridi at varni

May 3, 2019, 6:07 AM

Post #24 of 25 (4339 views)

Permalink

On Fri, May 3, 2019 at 12:07 PM Geoff Simmons <geoff@uplex.de> wrote:
>
> On 5/3/19 10:51, Dridi Boukelmoune wrote:
> >
> > First we lose the ability to treat the whole record as a single sting
> > out of the box. The client will need to reconstruct the string from
> > the various field types that may constitute a record. Think string and
> > regex operators.
>
> Yes, Martin and I recently had a conversation about this on #hacking.
> For the string equality comparisons (eq and ne), we could conceivably
> look into the comparison string and see if it has blanks that cross
> field boundaries. But as you pointed out, some VSL payloads have blanks
> that are not field separators (VCL_Log, FetchError, etc).
>
> For regex comparisons, where the pattern could include things like "\s",
> ".*", "\W" and so forth, detecting whether the pattern might match a
> field-separating blank has the look of a sophisticated computer
> science-y problem.

Does it have to though? We only use those as raw strings, and never as
regular expression patterns.

> Martin and I agreed that, at least in the first iteration, for VSLQ
> queries with a string operator (equality or regex) we should render the
> formatted payload into a temporary buffer and run the comparison against
> that. Once we have it all working, maybe we could consider more clever
> solutions, but I'd be surprised if we'd gain much from getting clever.

Ack, that's the logical thing to start with.

> I think that an important part of this will be to emphasize in the docs
> that authors of VSL queries should use field[n] indices and numeric
> comparisons whenever it's appropriate. My (admittedly evidence-free)
> impression is that not everyone uses them when they could.

Well, today the fields need to be dup'ed anyway when we need a null
terminated string so that will indeed need to be documented that field
operations become much cheaper than record operations.

> Say you're scanning backend logs for fetches to backend foo, like this:
>
> -q 'BackendOpen ~ "foo"'
>
> Since the backend name is the second field, it would be better as:
>
> -q 'BackendOpen[2] eq "foo"'
>
> Or to look for backend fetches that take longer than 1 second, it
> wouldn't surprise me if some people are doing something like:
>
> -q 'Timestamp:Beresp ~ "[1-9]\.\d*$"'
>
> When it should be:
>
> -q 'Timestamp:Beresp[3] > 1.0'

Agreed, I sometimes realize after running a long query on a VSL of the
GB persuasion that I could have done better.

> We'll do much better if we can narrow the query to a field, and if we
> don't have to convert numbers at all. And we should make sure that users
> know about it.

Make the field index mandatory with [0] meaning the whole record and
people will learn :p

> > Finally, regarding the VSL file header that we'd hypothetically dump
> > with varnishlog -w in the future (which I really hope we do), we could
> > consider making this a VSL record itself that is not subject to
> > vsl_reclen, querying or filtering, for the needs of dynamic VSLs, so
> > that whenever something changes VSL clients see an update. It would
> > then be varnishlog -w's job to build such a synthetic record in the
> > VSL file before dumping actual logs.
>
> I agree, and I'd say that when we do have meta-data entries in the log,
> it should be possible for them to appear at any time, in both log files
> and in-memory logs. That would put us on the path to dynamic SLT tags.
>
> But we're not planning for that in the first iteration. phk has said
> that he doesn't foresee dynamic VSL by the September release. And I
> agree that we should get the rest working first.

I disagree. If we draft a plan that 1) changes the in-memory AND
on-disk format and 2) has later a possibility for dynamic VSLs we
should make this part of the first iteration. We shouldn't change the
format of such a critical piece of the infrastructure every other
release.

Also by making this both a synthetic record for live log consumer by
libvarnishapi and in the future something that varnishd may produce
then it becomes a no brainer for varnishlog. It just writes it on disk
like any other log record. It may need some intelligence though when
logs are rotated, like varnishlog asking for a synthetic record again.

Dridi
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VIP23 (VSL refactoring) design sketch [ In reply to ]

dridi at varni

May 4, 2019, 2:49 AM

Post #25 of 25 (4334 views)

Permalink

On Fri, May 3, 2019 at 3:07 PM Dridi Boukelmoune <dridi@varni.sh> wrote:
>
> On Fri, May 3, 2019 at 12:07 PM Geoff Simmons <geoff@uplex.de> wrote:
> >
> > On 5/3/19 10:51, Dridi Boukelmoune wrote:
> > >
> > > First we lose the ability to treat the whole record as a single sting
> > > out of the box. The client will need to reconstruct the string from
> > > the various field types that may constitute a record. Think string and
> > > regex operators.
> >
> > Yes, Martin and I recently had a conversation about this on #hacking.
> > For the string equality comparisons (eq and ne), we could conceivably
> > look into the comparison string and see if it has blanks that cross
> > field boundaries. But as you pointed out, some VSL payloads have blanks
> > that are not field separators (VCL_Log, FetchError, etc).
> >
> > For regex comparisons, where the pattern could include things like "\s",
> > ".*", "\W" and so forth, detecting whether the pattern might match a
> > field-separating blank has the look of a sophisticated computer
> > science-y problem.
>
> Does it have to though? We only use those as raw strings, and never as
> regular expression patterns.

By the way, yes I'm aware that quoting fields containing blanks would
introduce the need of escaping at the very least double quotes and the
escape character.

> > Martin and I agreed that, at least in the first iteration, for VSLQ
> > queries with a string operator (equality or regex) we should render the
> > formatted payload into a temporary buffer and run the comparison against
> > that. Once we have it all working, maybe we could consider more clever
> > solutions, but I'd be surprised if we'd gain much from getting clever.
>
> Ack, that's the logical thing to start with.

One request I forgot to make in this area, when you rebuild the
complete record, it would be nice to enclose fields that contain
blanks with double quotes, so that we can have the same syntax on 6.0
since we're going to maintain it for a while and I would like to solve
the querying problem there too.

> > I think that an important part of this will be to emphasize in the docs
> > that authors of VSL queries should use field[n] indices and numeric
> > comparisons whenever it's appropriate. My (admittedly evidence-free)
> > impression is that not everyone uses them when they could.
>
> Well, today the fields need to be dup'ed anyway when we need a null
> terminated string so that will indeed need to be documented that field
> operations become much cheaper than record operations.
>
> > Say you're scanning backend logs for fetches to backend foo, like this:
> >
> > -q 'BackendOpen ~ "foo"'
> >
> > Since the backend name is the second field, it would be better as:
> >
> > -q 'BackendOpen[2] eq "foo"'
> >
> > Or to look for backend fetches that take longer than 1 second, it
> > wouldn't surprise me if some people are doing something like:
> >
> > -q 'Timestamp:Beresp ~ "[1-9]\.\d*$"'
> >
> > When it should be:
> >
> > -q 'Timestamp:Beresp[3] > 1.0'
>
> Agreed, I sometimes realize after running a long query on a VSL of the
> GB persuasion that I could have done better.
>
> > We'll do much better if we can narrow the query to a field, and if we
> > don't have to convert numbers at all. And we should make sure that users
> > know about it.
>
> Make the field index mandatory with [0] meaning the whole record and
> people will learn :p
>
> > > Finally, regarding the VSL file header that we'd hypothetically dump
> > > with varnishlog -w in the future (which I really hope we do), we could
> > > consider making this a VSL record itself that is not subject to
> > > vsl_reclen, querying or filtering, for the needs of dynamic VSLs, so
> > > that whenever something changes VSL clients see an update. It would
> > > then be varnishlog -w's job to build such a synthetic record in the
> > > VSL file before dumping actual logs.
> >
> > I agree, and I'd say that when we do have meta-data entries in the log,
> > it should be possible for them to appear at any time, in both log files
> > and in-memory logs. That would put us on the path to dynamic SLT tags.
> >
> > But we're not planning for that in the first iteration. phk has said
> > that he doesn't foresee dynamic VSL by the September release. And I
> > agree that we should get the rest working first.
>
> I disagree. If we draft a plan that 1) changes the in-memory AND
> on-disk format and 2) has later a possibility for dynamic VSLs we
> should make this part of the first iteration. We shouldn't change the
> format of such a critical piece of the infrastructure every other
> release.
>
> Also by making this both a synthetic record for live log consumer by
> libvarnishapi and in the future something that varnishd may produce
> then it becomes a no brainer for varnishlog. It just writes it on disk
> like any other log record. It may need some intelligence though when
> logs are rotated, like varnishlog asking for a synthetic record again.
>
> Dridi
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Mailing List Archive

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files: