Mailing List Archive: VFP and VDP configurations

VFP and VDP configurations

Dec 10, 2017, 9:36 AM

Post #1 of 8 (1466 views)

Was thinking about how configurable processors could work, so just throwing
out some ideas. Going to focus on VFPs. Basically, the user adds VFP
processors via VCL as usual with an optional position. These VFPs are put
on a candidate list. Then:

- Each VFP defines a string which states its input and output format
- When Varnish constructs the VFP chain, it starts at beresp and uses
that as the first output, and then it chains together VFPs matching inputs
with outputs. It uses the candidate list and priorities to guide this
construction, but it will move things around to get a best fit.
- Varnish has access to builtin VFPs. These VFPs are always available
and are used to fill in any gaps when it cannot find a way to match and
output and input when constructing the chain.

So from VCL, here is how we add VFPs:

VOID add_vfp(VFP init, ENUM position = DEFAULT);

VFP is "struct vfp" and any VMOD can return that, thus registering itself
as a VFP. This contains all the callback and its input and output
requirements.

position is: DEFAULT, FRONT, MIDDLE, LAST, FETCH, STEVEDORE

DEFAULT lets the VMOD recommend a position, otherwise it falls back to
LAST. FETCH and STEVEDORE are special positions which tells Varnish to put
the VFP in front or last, regardless of actual FRONT and LAST.

So this would be our current list of VFPs with the format
(input)name(output):

(text,plain,none)esi(esitext)
(text,plain,none)esi_gzip(gzip)
(text,plain,none)gzip(gzip,gz)
(gzip,gz)gunzip(text,plain,none)

gzip and gunzip have a prefered position of STEVEDORE. This means they will
behave the same as beresp.do_gzip and beresp.do_gunzip when added by the
user. Also, gzip and gunzip are builtin, so they never need to be
explicitly added if they are needed by other other VFPs. (From here on out
I will simplify text, plain, and none to text).

Also, when a VFP is successfully added from the candidate list to the
actual chain, it is initialized. During that initialization, it can see
beresp and all the VFPs in front of it and the other candidates. It can
then add new VFPs to the candidate list, remove itself, remove other VFPs,
or delete itself or other VFPs. Orphaned VFPs get put back on the candidate
list.

So for example, anytime the builtin gunzip VFP is added, it will add gzip
as a STEVEDORE VFP candidate (unless a gunzip VFP is already there). This
means content will always maintain its encoding going to storage, but the
user can override.

Example:

import myvfp;

sub vcl_backend_response
{
add_vfp(myvfp.init());
add_vfp(esi);
}

So we start at beresp.http.Content-Encoding to figure out the output of
beresp. We can also optionally look at Content-Type. So in this example, we
have a gzip response:

VFP chain: beresp(gzip)
VFP candidates: (text)myvfp(text), (text)esi(esitext)
VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip)

The algorithm for building the chain attempts to place candidates in order
from the candidates to the actual chain by matching output to input. There
is some flexibility in that it can reorder the candidates if that allows a
match. FETCH and STEVEDORE need to always be first and last, if possible.
Finally, if it cannot match anymore candidates, it then starts considering
the builtins and the process repeats until its not possible to add anymore
VFPs. This means its possible some VFPs cannot be added if there input
cannot be generated from the beresp.

So the above example:

VFP chain: beresp(gzip)
VFP candidates: (text)myvfp(text), (text)esi(esitext)
VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip)

Neither myvfp or esi can be placed since they do not match gzip. Varnish
then goes thru the builtins and it finds gunzip will allow a match to
happen and adds it:

VFP chain: beresp(gzip) > (gzip)gunzip(text)
VFP candidates: (text)myvfp(text), (text)esi(esitext)
VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip)

When gunzip gets initialized, it will add gzip to a stevedore position:

VFP chain: beresp(gzip) > (gzip)gunzip(text)
VFP candidates: (text)myvfp(text), (text)esi(esitext),
STEVEDORE:(text)gzip(gunzip)
VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip)

Next, all the VFPs are now added since their outputs and inputs match up,
giving us the final configuration:

VFP chain: beresp(gzip) > (gzip)gunzip(text) > (text)myvfp(text) >
(text)esi(esitext)
VFP candidates: STEVEDORE:(text)gzip(gunzip)
VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip)

gzip cannot be used since esi outputs a special text format, esitext, which
prevents any further processing. ESI could have had a little bit of
intelligence as it knows it has a gzip counterpart. It could have seen that
a gzip output VFP is in the candidate list, deleted itself, and added
esi_gzip back to the candidates. This would have given us:

VFP chain: beresp(gzip) > (gzip)gunzip(text) > (text)myvfp(text) >
(text)esi_gzip(gzip)

Brotli example

Lets say we have vmod brotli and it has these VFP:

(text)brotli(brotli,br)
(brotli,br)unbrotli(text)

Also, during init, these 2 VFPs are added to the builtin. So now Varnish
has these builtins:

VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip),
(text)brotli(brotli,br), (brotli,br)unbrotli(text)

Varnish can use these anywhere to make the VFP chain work. So in the
previous example (minus esi), we could still get our VFPs working when the
beresp is brotli. unbrotli will queue brotli at the STEVEDORE and content
will go into cache as brotli and our VFPs still got text:

VFP chain: beresp(br) > (br)unbrotli(text) > (text)myvfp(text) >
(text)brotli(br)

Transient buffer example

We could build a theoretical transient VFP vmod which buffers the VFP input
and passes it on as transient storage as 1 large contiguous buffer. It
would look like:

(text)buffer(buffertext)

And this would be added as a builtin. We could then have a regex
substitution vmod like this:

(buffertext)regex(text)

And our VCL would look like:

sub vcl_backend_response
{
add_vfp(regex.vfp());
regex.add("<title>.*</title>", "<title>new title</title>");
regex.add("host", "newhost");
}

This will give us:

VFP chain: beresp(gzip)
VFP candidates: (buffertext)regex(text)
VFP builtin: (gzip)gunzip(text), (text)gzip(gunzip), (text)brotli(br),
(br)unbrotli(text), (text)buffer(buffertext)

Since regex cannot be placed on gzip, we find the gunzip > buffer
combination gives us what we need. gunzip adds gzip and we end up with this:

VFP chain: beresp(gzip) > (gzip)gunzip(text) > (text)buffer(buffertext)
> (buffertext)regex(text) > (text)gzip(gunzip)

Anyway, I could go on with all kinds of other cool examples, but hopefully
I got my idea across. Thank you for reading thru this long email!

Re: VFP and VDP configurations [ In reply to ]

slink at schokola

Dec 17, 2017, 10:42 AM

Post #2 of 8 (1458 views)

Permalink

Hi,

at first, I found Rezas concept appealing and there are some aspects which I
think we should take from it:

- take the protocol-vpfs v1f_* h2_body out of the game for vcl

- format specifiers:

- have:

(gzip), (plain) *1), (esi)

- ideas:

(br), (buffertext) *2)

esi being a format which can contain gzip segments, but that's would
be opaque to other vfps

- the notion of format conversion(s) that a vfp can handle, e.g.

- have:

esi: (plain)->(esi), (gzip)->(esi)

gzip: (plain)->(gzip)
ungzip: (gzip)->(plain)

- ideas:

br: (plain)->(br)
unbr: (br)->(plain)

re: (plain)->(plain)

But reflecting on it, I am not so sure about runtime resolution and these
aspects in particular:

- "algorithm (...) can reorder the candidates if that allows a match."

- "(A VFP) can (...) add new VFPs to the candidate list, remove itself, remove
other VFPs, or delete itself or other VFPs

I wonder how we would even guarantee that this algorithm ever terminates.

So I think we really need to have VCL compile time checking of all possible
outcomes:

- Either by keeping track of all possible filter chain states at each point
during VCL compilation

- or by restricting ourselves to setting all of the filter chain at once.

The latter will probably lead to largish decision trees in VCL for advanced
cases, but I think we should start with this simple and safe solution with the
format/conversion check.

Nils

*1) "(text)" in reza's concept

*2) not sure if this is a good idea, maybe multi segment regexen are the better
idea
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VFP and VDP configurations [ In reply to ]

dridi at varni

Dec 18, 2017, 2:07 AM

Post #3 of 8 (1458 views)

Permalink

> So from VCL, here is how we add VFPs:
>
> VOID add_vfp(VFP init, ENUM position = DEFAULT);
>
> VFP is "struct vfp" and any VMOD can return that, thus registering itself as
> a VFP. This contains all the callback and its input and output requirements.
>
> position is: DEFAULT, FRONT, MIDDLE, LAST, FETCH, STEVEDORE
>
> DEFAULT lets the VMOD recommend a position, otherwise it falls back to LAST.
> FETCH and STEVEDORE are special positions which tells Varnish to put the VFP
> in front or last, regardless of actual FRONT and LAST.

I think the position should be mapped closer to HTTP semantics:

$Enum {
content,
assembly,
encoding,
transfer,
};

The `content` value would map to Accept/Content-Type headers, working
on the original body. The order shouldn't matter (otherwise you are
changing the content type) and you could for example chain operations:

- js-minification
- js-obfuscation

You should expect the same results regardless of the order, of course
the simplest would be to keep the order set in VCL. The `content` step
would feed from storage where the body is buffered.

The `assembly` value would map to ESI-like features, and would feed
from the content, with built-in support for Varnish's subset of ESI.

The `encoding` value would map to Accept-Encoding/Content-Encoding
headers. With built-in support for gzip and opening support for other
encodings. It would feed from the contents after an optional assembly.

The `transfer` value would map to Transfer-Encoding headers, with
built-in support for chunked encoding. ZeGermans could implement
trailers this way.

Would this step make sense in h2? If not, should Varnish just ignore them?

Now problems arise if you have an `encoding` step in a VFP (eg. gzip'd
in storage) and use `content` or `assembly` steps in a VDP for that
same object, or a different encoding altogether. But in your proposal
you don't seem bothered by this prospect. Neither am I, because that's
only a classic memory vs cpu trade off. But it might be hard to implement
the current ESI+gzip optimization if we go this route (or a good reason to
go back to upstream zlib).

Dridi
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VFP and VDP configurations [ In reply to ]

dridi at varni

Dec 18, 2017, 5:58 AM

Post #4 of 8 (1458 views)

Permalink

> - format specifiers:
>
> - have:
>
> (gzip), (plain) *1), (esi)
>
> - ideas:
>
> (br), (buffertext) *2)
>
> esi being a format which can contain gzip segments, but that's would
> be opaque to other vfps
[...]
> *1) "(text)" in reza's concept

Or "identity" to match HTTP vocabulary.

> *2) not sure if this is a good idea, maybe multi segment regexen are the better
> idea

For a lack of better place to comment, in my previous message I put
`content` before `assembly`. On second thought it should be the other
way around, otherwise <esi> tags break the content type.

Dridi
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VFP and VDP configurations [ In reply to ]

reza at varnish-software

Dec 18, 2017, 8:22 AM

Post #5 of 8 (1458 views)

Permalink

> take the protocol-vpfs v1f_* h2_body out of the game for vcl

Those will be builtin on the delivery side. So I didnt really dive into
VDPs, but it works similar to VFPs in that the client expects a certain
kind of response, so its upto the VDP chain to produce a matching output.
So if the client wants an H2 range response gzipped, then that chain needs
to be put together starting at resp in the stevedore and ending at the
client. So its different, but the same structure and rules apply.

> I wonder how we would even guarantee that this algorithm ever terminates.

Right, since processors can modify the chain as its being built and change
things mid flight, this could definitely happen. So the only thing to do
here is have a loop counter and break out after a certain amount of
attempts at creating the best fit chain. Its kind of like a graph search
where when you hit every node, the node can change the graph ahead of you
or optionally move you back positions. So in this case, its very possibly
to get stuck in an unavoidable loop.

> I think the position should be mapped closer to HTTP semantics

I think this makes too many assumptions? For example, where would security
processors go? Knowing what I know about whats possible with these things,
I think the processor universe might be bigger than the 4 categories you
listed out.

I think this brings up an important point, which is that for us to be
successful here, we really need to bring forward some new processors to be
our seeds for building this new framework. This will drive the requirements
that we need. I think there will be a lot of uncertainty if we build this
based on theoretical processors. I think its alright if these new
processors are simple and our new framework starts off simple as well. This
can then evolve as we learn more. For me, I have written a handful of
processors already, so a lot of what I am proposing here comes from past
experience.

--
Reza Naghibi
Varnish Software

On Mon, Dec 18, 2017 at 8:58 AM, Dridi Boukelmoune <dridi@varni.sh> wrote:

> > - format specifiers:
> >
> > - have:
> >
> > (gzip), (plain) *1), (esi)
> >
> > - ideas:
> >
> > (br), (buffertext) *2)
> >
> > esi being a format which can contain gzip segments, but that's
> would
> > be opaque to other vfps
> [...]
> > *1) "(text)" in reza's concept
>
> Or "identity" to match HTTP vocabulary.
>
> > *2) not sure if this is a good idea, maybe multi segment regexen are the
> better
> > idea
>
> For a lack of better place to comment, in my previous message I put
> `content` before `assembly`. On second thought it should be the other
> way around, otherwise <esi> tags break the content type.
>
> Dridi
>

Re: VFP and VDP configurations [ In reply to ]

dridi at varni

Dec 18, 2017, 9:06 AM

Post #6 of 8 (1458 views)

Permalink

>> I think the position should be mapped closer to HTTP semantics
>
> I think this makes too many assumptions? For example, where would security
> processors go? Knowing what I know about whats possible with these things, I
> think the processor universe might be bigger than the 4 categories you
> listed out.

I'm a bit perplex regarding theoretical security processors...

> I think this brings up an important point, which is that for us to be
> successful here, we really need to bring forward some new processors to be
> our seeds for building this new framework. This will drive the requirements
> that we need. I think there will be a lot of uncertainty if we build this
> based on theoretical processors.

...since you explicitly advise against designing for theory.

With the 4 categories I listed I can fit real-life processors in all of them:

- assembly: esi, edgestash, probably other kinds of include-able templates
- content: minification, obfuscation, regsub, exif cleanup, resizing,
watermarking
- encoding: gzip, br
- transfer: identity, chunked, trailers

My examples were VDP-oriented (from storage to proto) but would work
the other way around too (except assembly that I can't picture in a
VFP). You can map encoding and transfer processors to headers:
imagining that both gzip and brotli processors are registered, core
code could pick one or none based on good old content negotiation.

Now where would I put security processors? The only places where it
would make sense to me is content. But then again, please define
security (I see two cases off the top of my head, both would run on
content).

> I think its alright if these new processors
> are simple and our new framework starts off simple as well. This can then
> evolve as we learn more. For me, I have written a handful of processors
> already, so a lot of what I am proposing here comes from past experience.

Sure, with the ongoing work to clarify vmod ABIs this one should
definitely start as "strict" until we get to something stable. However
on the VCL side it is not that simple, because we don't want to break
"vcl x.y" if we can avoid it.

We could mimic the feature/debug parameters:

set beresp.deliver = "[+-]value(,...)*";

A + would append a processor to the right step (depending on where it
was registered), a - would remove it from the pipeline, and a lack of
prefix would replace the list altogether. That would create an
equivalent for the `do_*` properties, or even better the `do_*`
properties could be syntactic sugar:

set beresp.do_esi = true;
set beresp.do_br = true;
# same as
set beresp.deliver = "+esi,br";

Dridi
_______________________________________________
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VFP and VDP configurations [ In reply to ]

geoff at uplex

Dec 19, 2017, 4:29 AM

Post #7 of 8 (1455 views)

Permalink

On 12/10/2017 06:36 PM, Reza Naghibi wrote:
> Basically, the user adds VFP
> processors via VCL as usual with an optional position.

What did you mean here by "as usual"? A user doesn't have a means to add
VFPs or VDPs via VCL -- I thought that this discussion is about how that
would work.

> - Varnish has access to builtin VFPs. These VFPs are always available
> and are used to fill in any gaps when it cannot find a way to match and
> output and input when constructing the chain.

Are we considering ways for a VFP/VDP defined in a VMOD to replace one
of the builtins?

... assuming that the same thoughts apply to VDPs, and that esi and
gzip/gunzip are among the builtin VDPs ...

As we've talked about before, I'd like to take a shot at a VDP for
parallel ESIs. It seems to me that the pesi VDP wouldn't be worked into
the chain, but would rather substitute the builtin esi VDP in the chain.

So it would be something along the lines of:

sub vcl_deliver {
replace_vdp(esi, pesi.vdp());
# ...
}

Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Re: VFP and VDP configurations [ In reply to ]

reza at varnish-software

Dec 19, 2017, 8:18 AM

Post #8 of 8 (1455 views)

Permalink

> A user doesn't have a means to add VFPs or VDPs via VCL

Well, I guess I meant like this:

beresp.do_esi = true

In effect, the above statement adds the ESI VDP to the beresp. ESI would
not be part of the "builtin" VDPs in this new scheme, rather, its just a
plain old user VDP. Builtins, as I have defined it, are VDPs which are
always available to be used transparent to the user. So in your case,
switching out ESI would be done like this:

import my_parallel_esi;

sub vcl_backend_response
{
add_vfp(my_parallel_esi.init());
// Do not use beresp.do_esi
}

Because no other ESI VDP was added, my_parallel_esi will be the only one to
run.

> Are we considering ways for a VFP/VDP defined in a VMOD to replace one of
the builtins?

I see no reason why not.

--
Reza Naghibi
Varnish Software

On Tue, Dec 19, 2017 at 7:29 AM, Geoff Simmons <geoff@uplex.de> wrote:

> On 12/10/2017 06:36 PM, Reza Naghibi wrote:
> > Basically, the user adds VFP
> > processors via VCL as usual with an optional position.
>
> What did you mean here by "as usual"? A user doesn't have a means to add
> VFPs or VDPs via VCL -- I thought that this discussion is about how that
> would work.
>
> > - Varnish has access to builtin VFPs. These VFPs are always available
> > and are used to fill in any gaps when it cannot find a way to match
> and
> > output and input when constructing the chain.
>
> Are we considering ways for a VFP/VDP defined in a VMOD to replace one
> of the builtins?
>
> ... assuming that the same thoughts apply to VDPs, and that esi and
> gzip/gunzip are among the builtin VDPs ...
>
> As we've talked about before, I'd like to take a shot at a VDP for
> parallel ESIs. It seems to me that the pesi VDP wouldn't be worked into
> the chain, but would rather substitute the builtin esi VDP in the chain.
>
> So it would be something along the lines of:
>
> sub vcl_deliver {
> replace_vdp(esi, pesi.vdp());
> # ...
> }
>
>
> Best,
> Geoff
> --
> ** * * UPLEX - Nils Goroll Systemoptimierung
>
> Scheffelstraße 32
> 22301 Hamburg
>
> Tel +49 40 2880 5731
> Mob +49 176 636 90917
> Fax +49 40 42949753
>
> http://uplex.de
>
>
> _______________________________________________
> varnish-dev mailing list
> varnish-dev@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>

Mailing List Archive

Mailing List Archive

Attached Files: