Mailing List Archive

Using a parser generator in next generation VCL compiler
Hi,

Linpro AS recently assigned me to work part-time on Varnish development,
and I have just started getting to know the docs and source.

I have spent most time looking at the internals of the VCL-compiler. My
understanding is that an extension of VCL is being planned for version
2, and so I think it might be a good idea to base the next generation
compiler on a parser which is generated by a YACC-like parser generator.
I think this will make the VCL grammar easier to verify and grasp (for
new developers), as well as make it easier to extend.

Any thoughts?

I look forward to working more on Vanish in 2007.

--
Knut Aksel R?ysland
Linpro AS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://projects.linpro.no/pipermail/varnish-dev/attachments/20061222/f1c7233f/attachment.pgp
Using a parser generator in next generation VCL compiler [ In reply to ]
Hi,

Linpro AS recently assigned me to work part-time on Varnish development,
and I have just started getting to know the docs and source.

I have spent most time looking at the internals of the VCL-compiler. My
understanding is that an extension of VCL is being planned for version
2, and so I think it might be a good idea to base the next generation
compiler on a parser which is generated by a YACC-like parser generator.
I think this will make the VCL grammar easier to verify and grasp (for
new developers), as well as make it easier to extend.

Any thoughts?

I look forward to working more on Vanish in 2007.

--
Knut Aksel R?ysland
Linpro AS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://projects.linpro.no/pipermail/varnish-dev/attachments/20061222/f1c7233f/attachment-0001.pgp
Using a parser generator in next generation VCL compiler [ In reply to ]
In message <458BD522.3090304 at linpro.no>, =?ISO-8859-1?Q?Knut_Aksel_R=F8ysland?=
writes:

>I have spent most time looking at the internals of the VCL-compiler.
>[...]

>Any thoughts?

I deliberately didn't use lex/yacc in v1 because the VCL language
is so simple but mostly because I wanted strong error handling.

YACC generated grammers and LEX generated lexers have horrible
error detection and reporting.

If you try to make up for that deficiency, your .y and .l files
become unreadable as a result, more than negating any advantage
of having used them in the first place.

Remember that the people who are going to write VCL programs are
not programmers, they are admins, they may know a bit of perl
or php, but they are not programmers.

We need to give them really good error messages, or they will
hate VCL with a venegance.

But since most of what we will be adding to VCL is the
actions, it might very well be a good idea to spend some
time on unifying their argument handling, and possibly,
use a table driven matching of them, rather than the
current swich {} in Action(), lest it become too horrible.

I know this runs counter to conventional thinking in compiler
circuits, but I'm not alone in thinking that lex/yacc is over-
and under-kill for simple languages. See for instance the LCC
book which provided a lot of my inspiration for the VCL compiler.

Poul-Henning

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Using a parser generator in next generation VCL compiler [ In reply to ]
In message <458BD522.3090304 at linpro.no>, =?ISO-8859-1?Q?Knut_Aksel_R=F8ysland?=
writes:

>I have spent most time looking at the internals of the VCL-compiler.
>[...]

>Any thoughts?

I deliberately didn't use lex/yacc in v1 because the VCL language
is so simple but mostly because I wanted strong error handling.

YACC generated grammers and LEX generated lexers have horrible
error detection and reporting.

If you try to make up for that deficiency, your .y and .l files
become unreadable as a result, more than negating any advantage
of having used them in the first place.

Remember that the people who are going to write VCL programs are
not programmers, they are admins, they may know a bit of perl
or php, but they are not programmers.

We need to give them really good error messages, or they will
hate VCL with a venegance.

But since most of what we will be adding to VCL is the
actions, it might very well be a good idea to spend some
time on unifying their argument handling, and possibly,
use a table driven matching of them, rather than the
current swich {} in Action(), lest it become too horrible.

I know this runs counter to conventional thinking in compiler
circuits, but I'm not alone in thinking that lex/yacc is over-
and under-kill for simple languages. See for instance the LCC
book which provided a lot of my inspiration for the VCL compiler.

Poul-Henning

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Using a parser generator in next generation VCL compiler [ In reply to ]
On 2006-12-22 13:23:41 +0000, Poul-Henning Kamp wrote:
> In message <458BD522.3090304 at linpro.no>, =?ISO-8859-1?Q?Knut_Aksel_R=F8ysland?=
> writes:
>
> >I have spent most time looking at the internals of the VCL-compiler.
> >[...]
>
> >Any thoughts?
>
> I deliberately didn't use lex/yacc in v1 because the VCL language
> is so simple but mostly because I wanted strong error handling.
>
> YACC generated grammers and LEX generated lexers have horrible
> error detection and reporting.

did you look at ragel: http://www.cs.queensu.ca/home/thurston/ragel/ ?
a real life example: http://rubyforge.org/viewvc/trunk/ext/http11/?root=mongrel

darix

--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Using a parser generator in next generation VCL compiler [ In reply to ]
On 2006-12-22 13:23:41 +0000, Poul-Henning Kamp wrote:
> In message <458BD522.3090304 at linpro.no>, =?ISO-8859-1?Q?Knut_Aksel_R=F8ysland?=
> writes:
>
> >I have spent most time looking at the internals of the VCL-compiler.
> >[...]
>
> >Any thoughts?
>
> I deliberately didn't use lex/yacc in v1 because the VCL language
> is so simple but mostly because I wanted strong error handling.
>
> YACC generated grammers and LEX generated lexers have horrible
> error detection and reporting.

did you look at ragel: http://www.cs.queensu.ca/home/thurston/ragel/ ?
a real life example: http://rubyforge.org/viewvc/trunk/ext/http11/?root=mongrel

darix

--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Using a parser generator in next generation VCL compiler [ In reply to ]
In message <20061222151313.GB3344 at pixel.global-banlist.de>, Marcus Rueckert wri
tes:

>> YACC generated grammers and LEX generated lexers have horrible
>> error detection and reporting.
>
>did you look at ragel: http://www.cs.queensu.ca/home/thurston/ragel/ ?
>a real life example: http://rubyforge.org/viewvc/trunk/ext/http11/?root=mongrel

No I didn't, because I also don't want Varnish to depend on everything
and the kitchensink :-)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Using a parser generator in next generation VCL compiler [ In reply to ]
In message <20061222151313.GB3344 at pixel.global-banlist.de>, Marcus Rueckert wri
tes:

>> YACC generated grammers and LEX generated lexers have horrible
>> error detection and reporting.
>
>did you look at ragel: http://www.cs.queensu.ca/home/thurston/ragel/ ?
>a real life example: http://rubyforge.org/viewvc/trunk/ext/http11/?root=mongrel

No I didn't, because I also don't want Varnish to depend on everything
and the kitchensink :-)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Using a parser generator in next generation VCL compiler [ In reply to ]
* Poul-Henning Kamp

>[...]
> I deliberately didn't use lex/yacc in v1 because the VCL language
> is so simple but mostly because I wanted strong error handling.

Thanks for clearing up. Being a newcomer, it can be hard to determine
the level of deliberate design which lies behind each part of the source.

> YACC generated grammers and LEX generated lexers have horrible
> error detection and reporting.
>[...]

Certainly, good error reporting should not be sacrificed. But having VCL
abstracted into some kind of BNF'ish representation, appears attractive
to me. Maybe this representation could be mixed with doc-strings, to
form the basis for generating a 100% accurate VCL reference manual? This
is AFAIK outside the scope of YACC anyway, but maybe there are more
"modern" parser generators out there, also having better error reporting?

But if the grammar is not going to become much more complex than today,
I guess this might be overkill.

--
Knut Aksel R?ysland (off for Christmas holiday)
Linpro AS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://projects.linpro.no/pipermail/varnish-dev/attachments/20061222/41063e68/attachment.pgp
Using a parser generator in next generation VCL compiler [ In reply to ]
* Poul-Henning Kamp

>[...]
> I deliberately didn't use lex/yacc in v1 because the VCL language
> is so simple but mostly because I wanted strong error handling.

Thanks for clearing up. Being a newcomer, it can be hard to determine
the level of deliberate design which lies behind each part of the source.

> YACC generated grammers and LEX generated lexers have horrible
> error detection and reporting.
>[...]

Certainly, good error reporting should not be sacrificed. But having VCL
abstracted into some kind of BNF'ish representation, appears attractive
to me. Maybe this representation could be mixed with doc-strings, to
form the basis for generating a 100% accurate VCL reference manual? This
is AFAIK outside the scope of YACC anyway, but maybe there are more
"modern" parser generators out there, also having better error reporting?

But if the grammar is not going to become much more complex than today,
I guess this might be overkill.

--
Knut Aksel R?ysland (off for Christmas holiday)
Linpro AS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://projects.linpro.no/pipermail/varnish-dev/attachments/20061222/41063e68/attachment-0001.pgp
Using a parser generator in next generation VCL compiler [ In reply to ]
On 2006-12-22 16:05:02 +0000, Poul-Henning Kamp wrote:
> In message <20061222151313.GB3344 at pixel.global-banlist.de>, Marcus Rueckert wri
> tes:
>
> >> YACC generated grammers and LEX generated lexers have horrible
> >> error detection and reporting.
> >
> >did you look at ragel: http://www.cs.queensu.ca/home/thurston/ragel/ ?
> >a real life example: http://rubyforge.org/viewvc/trunk/ext/http11/?root=mongrel
>
> No I didn't, because I also don't want Varnish to depend on everything
> and the kitchensink :-)

does it really matter whether you depend on the generated source of
lex/yacc or ragel?:)
you can run the ragel when creating a new release so you dont need ragel
on the build hosts.

darix

--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Using a parser generator in next generation VCL compiler [ In reply to ]
On 2006-12-22 16:05:02 +0000, Poul-Henning Kamp wrote:
> In message <20061222151313.GB3344 at pixel.global-banlist.de>, Marcus Rueckert wri
> tes:
>
> >> YACC generated grammers and LEX generated lexers have horrible
> >> error detection and reporting.
> >
> >did you look at ragel: http://www.cs.queensu.ca/home/thurston/ragel/ ?
> >a real life example: http://rubyforge.org/viewvc/trunk/ext/http11/?root=mongrel
>
> No I didn't, because I also don't want Varnish to depend on everything
> and the kitchensink :-)

does it really matter whether you depend on the generated source of
lex/yacc or ragel?:)
you can run the ragel when creating a new release so you dont need ragel
on the build hosts.

darix

--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Using a parser generator in next generation VCL compiler [ In reply to ]
In message <20061222162936.GC3344 at pixel.global-banlist.de>, Marcus Rueckert wri
tes:

>> No I didn't, because I also don't want Varnish to depend on everything
>> and the kitchensink :-)
>
>does it really matter whether you depend on the generated source of
>lex/yacc or ragel?:)
>you can run the ragel when creating a new release so you dont need ragel
>on the build hosts.

Maybe not, but it matters to me that I depend on none of the three.

We are not talking about parsing ADA here, nor COBOL or PL/1 or for
that matter C. We are talking about a quite small programming
language. There is no rocket science involved.

I do realize that all of you have been brainwashed by CS professors
that yacc/lex is the holy grail of compiler construction.

But I maintain that lex or yacc would be a step backwards for VCL.

Primarily because error reporting sucks with code generated by
those, but also because of both lex and yacc loose readability once
you move from text-book examples to real-world code.

So for anybody wanting to convince me that using lex/yacc/something
else is a better path, here is the minimum hurdle to clear:

1. At least as good and precise error reporting facilities,
including being able to quote the source code with a
graphical marker of the trouble spot.

2. At least as readable and clear source code.

As to deciding if it is a worthwhile use of the projects
resources to do so, my vote is a no. There are far more
important fish to fry.


--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Using a parser generator in next generation VCL compiler [ In reply to ]
In message <20061222162936.GC3344 at pixel.global-banlist.de>, Marcus Rueckert wri
tes:

>> No I didn't, because I also don't want Varnish to depend on everything
>> and the kitchensink :-)
>
>does it really matter whether you depend on the generated source of
>lex/yacc or ragel?:)
>you can run the ragel when creating a new release so you dont need ragel
>on the build hosts.

Maybe not, but it matters to me that I depend on none of the three.

We are not talking about parsing ADA here, nor COBOL or PL/1 or for
that matter C. We are talking about a quite small programming
language. There is no rocket science involved.

I do realize that all of you have been brainwashed by CS professors
that yacc/lex is the holy grail of compiler construction.

But I maintain that lex or yacc would be a step backwards for VCL.

Primarily because error reporting sucks with code generated by
those, but also because of both lex and yacc loose readability once
you move from text-book examples to real-world code.

So for anybody wanting to convince me that using lex/yacc/something
else is a better path, here is the minimum hurdle to clear:

1. At least as good and precise error reporting facilities,
including being able to quote the source code with a
graphical marker of the trouble spot.

2. At least as readable and clear source code.

As to deciding if it is a worthwhile use of the projects
resources to do so, my vote is a no. There are far more
important fish to fry.


--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.