Mailing List Archive: Stackless & String-processing

Stackless & String-processing

Jul 14, 1999, 4:11 PM

Post #1 of 39 (1759 views)

In article <378A40AD.B4FDB79F@appliedbiometrics.com>,
Christian Tismer <tismer@appliedbiometrics.com> wrote:
>
> Stackless Python 0.3
>
>The possibilities are for instance:
>
>Continuations, with which we will build Coroutines and Generators
>
>Coroutines are able to run at the speed of a single C function call,
>which makes them a considerable alternative in certain algorithms.
>This is no longer a speculation since I have working coroutine
>prototypes, written with continuations.

I've been looking at Icon, and it occurs to me that if coroutines and
generators were available at the Python level, it might yield a way of
doing string processing that is more "Pythonic" than regexps.

Regexps are nice because when you have a pattern that they can
directly represent, then you can simply specify a pattern and then you
don't have to worry about the tedious bookkeeping of looping over the
string and keeping track of the state variable.

However, when you need to match a pattern that a regexp can't match,
then suddenly you need to break the string into tokens with regexps
and then loop over the pieces and keep track of a bunch of state
variables that don't necessarily correspond to the pieces you are
actually interested in.

This is unpleasant, because a) clumsy bookkeeping is bad, and b)
there's two different sorts of syntax needed to do basically
similar tasks. If we could compose generators just like functions,
then the bookkeeping can be abstracted away and the same approach
will work for arbitrarily complicated parsing tasks.

So I think it would be nice if these lovely toys were available at
the Python level. Is this a possibility?

(I'd love to be able to define coroutines in Python like so:

def evens(z):
for elt in z:
if z % 2 == 0:
suspend(z)

It would probably require some rethinking of Python's iteration
protocol though.)

Neel

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 15, 1999, 12:00 AM

Post #2 of 39 (1750 views)

[Neel Krishnaswami]
> ...
> I've been looking at Icon, and it occurs to me that if coroutines and
> generators were available at the Python level, it might yield a way of
> doing string processing that is more "Pythonic" than regexps.

Yes, and you can find much more about this in the very early days of the
String SIG archives.

> Regexps are nice because when you have a pattern that they can
> directly represent, then you can simply specify a pattern and then you
> don't have to worry about the tedious bookkeeping of looping over the
> string and keeping track of the state variable.
>
> However, when you need to match a pattern that a regexp can't match,
> then suddenly you need to break the string into tokens with regexps
> and then loop over the pieces and keep track of a bunch of state
> variables that don't necessarily correspond to the pieces you are
> actually interested in.
>
> This is unpleasant, because a) clumsy bookkeeping is bad, and b)
> there's two different sorts of syntax needed to do basically
> similar tasks.

The latter is one of the arguments Griswold gave in the paper that
introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct
pattern and procedural sublanguages.

I don't think he'd make the same argument today! Icon is an idiosyncratic
delight, but most string pattern matching was in fact easier (to write,
modify, or read) in SNOBOL4. Once you start using Icon for complex pattern
matching, you'll soon discover that it's very hard to go back to old code
and disentangle "the pattern" from "the processing" -- *because* everything
looks the same, and it's all intermixed. The distinct sublanguages in
SNOBOL4 enforced a clean separation; and that was more often an aid than a
burden.

Have noted before that writing string matching code in Icon feels a lot like
writing in an assembly language for SNOBOL4. Of course the latter didn't
have Icon's power in other areas (e.g. it's at least possible to program
structural pattern-matchers in Icon, using generators to match e.g. tree
patterns; SNOBOL4's patterns started & ended with strings).

> If we could compose generators just like functions, then the bookkeeping
> can be abstracted away and the same approach will work for arbitrarily
> complicated parsing tasks.

The real advantage of regexps is speed, & that's probably while they'll
always be much more popular. SNOBOL4 and Icon didn't define "bal" builtins
because you couldn't code that pattern yourself <wink>. bal is beyond a
regexp's abilities, but it's still a simple kind of pattern, and just runs
too darned slow if implemented via the recursive definition as run with the
general backtracking machinery.

> So I think it would be nice if these lovely toys were available at
> the Python level. Is this a possibility?

I've been bugging Guido about generators since '91, but for the billion and
one uses other than string matching. Anything is possible -- except that
I'll ever stop bugging him <wink>.

> (I'd love to be able to define coroutines in Python like so:
>
> def evens(z):
> for elt in z:
> if z % 2 == 0:
> suspend(z)

How do you intend to resume it?

> It would probably require some rethinking of Python's iteration
> protocol though.)

That's not a hangup; Guido already knows how to do it cleanly. Like any Big
New Feature there are real questions about costs vs benefits, and so far
this stuff has been widely viewed as "too esoteric". I think it's natural
as breathing, to the point that a *non*-resumable function strikes me as
never inhaling <wink>.

For now, there's a working implementation of a generator protocol (somewhat
more flexible than Icon's) in the source distribution, under
Demo/threads/Generator.py. I also posted a general (thread-based) coroutine
implementation a bit over 5 years ago. Building these on Christian's
stackless Python instead should run much faster.

BTW, generators are much easier to implement than coroutines -- the former
don't require threads or stacklessness or continuations under the covers,
just some relatively straightforward changes to Python's current
implementation (Steven Majewski noted this 5 years ago). This is why
generators work in all ports of Icon, but coroutines are an optional feature
that's supported only if a platform guru has taken the time to write the
platform-dependent context-switching assembly code Icon coexps require.

degeneratoredly y'rs - tim

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 15, 1999, 7:39 AM

Post #3 of 39 (1749 views)

[Neel Krishnaswami]
: > If we could compose generators just like functions, then the bookkeeping
: > can be abstracted away and the same approach will work for arbitrarily
: > complicated parsing tasks.
Tim Peters (tim_one@email.msn.com) wrote:
: The latter is one of the arguments Griswold gave in the paper that
: introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct
: pattern and procedural sublanguages.
:
: I don't think he'd make the same argument today! Icon is an idiosyncratic
: delight, but most string pattern matching was in fact easier (to write,
: modify, or read) in SNOBOL4. Once you start using Icon for complex pattern
: matching, you'll soon discover that it's very hard to go back to old code
: and disentangle "the pattern" from "the processing" -- *because* everything
: looks the same, and it's all intermixed. The distinct sublanguages in
: SNOBOL4 enforced a clean separation; and that was more often an aid than a
: burden.

As far as I understand this thread I think Neel is right, and Tim and
Griswold are wrong. I have been recently doing a lot of programming with
parser combinators in Clean and Haskell. They are something like what
Neel is talking about (as the name "parser combinators" suggests you
write small "string parsers" as functions, and then have "parser
combinators" that stitch these together). Note that in this parser
combinator approach there is no distinction between the code for "the
pattern" and the code for "the processing" -- they are all written in
the same language (Clean/Haskell in my case). At first I found the
idea of having the two types of code (the pattern and the processing)
in the same language (rather than in 2 distinct languages as in the
traditional way) a bit difficult. But I perservered and I am glad I
did. I found that once you reach a certain level of familiarity with
combinators you can write new pattern and processing code very
quickly and very reliably. The trick is to get the combinators right.

graham
--
This ain't no technological breakdown
Oh no, this is the road to hell

Stackless & String-processing [ In reply to ]

Jul 15, 1999, 5:27 PM

Post #4 of 39 (1751 views)

In article <000601bece8f$b549a420$51a22299@tim>,
Tim Peters <tim_one@email.msn.com> wrote:
>[Neel Krishnaswami]
>
>> (I'd love to be able to define coroutines in Python like so:
>>
>> def evens(z):
>> for elt in z:
>> if z % 2 == 0:
>> suspend(elt)
>
>
>> It would probably require some rethinking of Python's iteration
>> protocol though.)
>
>How do you intend to resume it?

I left this unspecified because I don't know the answer. :) I figured
that it could be rolled up into the iteration protocol and I could
use it without having to completely understand how it all works.

[The iteration protocol]

>That's not a hangup; Guido already knows how to do it cleanly.

Good -- now I don't have to worry at all.

>BTW, generators are much easier to implement than coroutines -- the former
>don't require threads or stacklessness or continuations under the covers,
>just some relatively straightforward changes to Python's current
>implementation (Steven Majewski noted this 5 years ago).

What's the difference, exactly? AFAICT you need to save the execution
state when suspending both coroutines and generators, but I might be
missing something obvious....

Neel

Stackless & String-processing [ In reply to ]

Jul 15, 1999, 5:50 PM

Post #5 of 39 (1748 views)

In article <7mkrrt$a81$1@cronkite.cc.uga.edu>,
Graham Matthews <graham@sloth.math.uga.edu> wrote:
>
>[Parser Combinators]
>
>They are something like what Neel is talking about (as the name
>"parser combinators" suggests you write small "string parsers" as
>functions, and then have "parser combinators" that stitch these
>together). Note that in this parser combinator approach there is no
>distinction between the code for "the pattern" and the code for "the
>processing" -- they are all written in the same language
>(Clean/Haskell in my case).

Thanks for the tip -- it led me to do some digging on the web and I
found a nice description of this in a paper called "Monadic Parser
Combinators" by Graham Hutton and Erik Meijer. Its URL is:

http://www.cs.nott.ac.uk/Department/Staff/gmh/monparsing.ps

>At first I found the idea of having the two types of code (the
>pattern and the processing) in the same language (rather than in 2
>distinct languages as in the traditional way) a bit difficult. But I
>perservered and I am glad I did. I found that once you reach a
>certain level of familiarity with combinators you can write new
>pattern and processing code very quickly and very reliably. The
>trick is to get the combinators right.

I think that this weekend I'm going to try and put together a basic
version of this parsing system in Python based on the ideas in this
paper -- basically just write a simple AST class and the combinators
they describe. Then I can add proper support for backtracking and lazy
evaluation and all the stuff with serious hack value when Christian
Tismer makes continuations available to us. :)

This really seems like a clean and straightforward way of doing
things. My experience with recursive descent parsers has been pretty
bad, but now I suspect that this is because I wasn't organizing my
work in a systematic enough fashion.

Neel

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 15, 1999, 7:15 PM

Post #6 of 39 (1752 views)

[Tim]
> BTW, generators are much easier to implement than coroutines -- ...

[Neel]
> What's the difference, exactly? AFAICT you need to save the execution
> state when suspending both coroutines and generators, but I might be
> missing something obvious....

Just that a generator (in the Icon sense) is a semi-coroutine: it always
suspends directly to its caller, at the point it was called. A world of
difference follows from that:

1) From the caller's point of view, it's an ordinary call. So on *that* end
of it, generators are no harder than vanilla procedure calls.

2) From the generator's point of view, "the execution state" is a single
frame (its own). A coroutine can transfer to any other at any time, with
intervening calls piling up an arbitrary number of frames in the state.

3) From a recursive interpreter's point of view, the implementation language
(C, in Python's case) can't get any of its own frames stuck between caller
and callee during suspension of a generator (the generator *returns* to its
caller; it doesn't *resume* it; this asymmetry is the heart of it).

It's #3 (dealing with an intertwined C stack) that requires
platform-specific hackery to make coexps work in Icon, while the lack of C
stack pollution is why generators can work without platform-specific cruft.

Continuations and threads are harder still to implement (efficiently) -- the
less restrictive the rules, the less the implementation can exploit to make
its life easier.

a-phenomenon-you-may-observe-again-some-day<wink>-ly y'rs - tim

Stackless & String-processing [ In reply to ]

pereira at research

Jul 15, 1999, 7:55 PM

Post #7 of 39 (1766 views)

In article <7mlvju$6v3$1@brick.cswv.com>, Neel Krishnaswami
<neelk@brick.cswv.com> wrote:

> In article <7mkrrt$a81$1@cronkite.cc.uga.edu>,
> Graham Matthews <graham@sloth.math.uga.edu> wrote:
> >
> >[Parser Combinators]
> >
> >They are something like what Neel is talking about (as the name
> >"parser combinators" suggests you write small "string parsers" as
> >functions, and then have "parser combinators" that stitch these
> >together). Note that in this parser combinator approach there is no
> >distinction between the code for "the pattern" and the code for "the
> >processing" -- they are all written in the same language
> >(Clean/Haskell in my case).
<snip>
> This really seems like a clean and straightforward way of doing
> things. My experience with recursive descent parsers has been pretty
> bad, but now I suspect that this is because I wasn't organizing my
> work in a systematic enough fashion.
Clean and straightforward it may be, but not efficient, and subject to
termination problems on left-recursive grammars. Techniques for
compiling regular expressions and deterministic context-free grammars
into efficient pattern-matching/parsing automata do not extend readily
to programs in a general purpose declarative (functional or logic)
programming language. Certain clever things can be done (in particular
see "The Functional Treatment of Parsing," by Rene Leermakers,
Kluwer Academic Publishers, 1993), but, as far as I know, nothing beats
the automata-theoretic optimization techniques for pattern matching and
parsing, of which the typical modern compiler for regular expressions
is just the best-known instance.

-- F

Stackless & String-processing [ In reply to ]

greg.ewing at compaq

Jul 15, 1999, 8:50 PM

Post #8 of 39 (1749 views)

Neel Krishnaswami wrote:
>
> What's the difference, exactly? AFAICT you need to save the execution
> state when suspending both coroutines and generators, but I might be
> missing something obvious....

The crucial difference is that a generator is never resumed
after its caller has returned. This means that the generator's
state can be pushed onto the same stack as the caller's.

A coroutine, on the other hand, can outlive the context in
which it was created, and therefore needs a stack all of
its own.

Another way to think about it is that a generator call is
equivalent to an ordinary call where one of the parameters
is a procedure. For example, where in "generator python"
you might write

def even_numbers_up_to(n):
for i in range(n):
if i % 2 == 0:
yield(i)

for e in even_numbers_up_to(42):
print e, "is an even number!"

you could do the same thing in ordinary python as

def for_each_even_number_up_to(n, do_it):
for i in range(n):
if i % 2 == 0:
do_it(i)

def print_it(e):
print e, "is an even number!"

for_each_even_number_up_to(42, print_it)

which clearly can be executed quite happily using
a single stack.

If Python had something akin to Smalltalk code blocks,
generators wouldn't be needed. It would be nifty to
be able to write something like

for_each_even_number_up_to(42) -> (e):
print e, "is an even number!"

Greg

Stackless & String-processing [ In reply to ]

tismer at appliedbiometrics

Jul 16, 1999, 1:27 AM

Post #9 of 39 (1747 views)

Greg Ewing wrote:
>
> Neel Krishnaswami wrote:
> >
> > What's the difference, exactly? AFAICT you need to save the execution
> > state when suspending both coroutines and generators, but I might be
> > missing something obvious....
>
> The crucial difference is that a generator is never resumed
> after its caller has returned. This means that the generator's
> state can be pushed onto the same stack as the caller's.

Fine.

> A coroutine, on the other hand, can outlive the context in
> which it was created, and therefore needs a stack all of
> its own.

Smell of danger...

> Another way to think about it is that a generator call is
> equivalent to an ordinary call where one of the parameters
> is a procedure. For example, where in "generator python"
> you might write

...

> If Python had something akin to Smalltalk code blocks,
> generators wouldn't be needed. It would be nifty to
> be able to write something like
>
> for_each_even_number_up_to(42) -> (e):
> print e, "is an even number!"

Traceback (innermost last):
File "<inbox:Re: Stackless & String-processing>", line 22, in claim
SemanticMisconceptionError:
things should be made as simple as possible, but not any simpler
>>> #:-)

>>> # sorry did you really mean that?
>>> # Where is your generator's state?
>>> # how would you do backtracking?

hoping-that-I'm-misconcepted-and-not-you-ly y'rs - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 16, 1999, 8:17 AM

Post #10 of 39 (1751 views)

Graham Matthews <graham@sloth.math.uga.edu> wrote:
: >[Parser Combinators]
Neel Krishnaswami (neelk@brick.cswv.com) wrote:
: Thanks for the tip -- it led me to do some digging on the web and I
: found a nice description of this in a paper called "Monadic Parser
: Combinators" by Graham Hutton and Erik Meijer. Its URL is:
:
: http://www.cs.nott.ac.uk/Department/Staff/gmh/monparsing.ps

Yes this is a nice place to start. Another nice place to look is in
the Clean documentation. Somewhere in their docs (I think in the old
book) there is a Clean implementation of combinators (without all the
"do" notation).

The trickiest part for me was to get the error handling to work well,
since Clean/Haskell don't have exceptions. This should be "easier"
in Python, although you are welcome to see my code if you want. Also
feel free to email me questions.

Neel Krishnaswami (neelk@brick.cswv.com) wrote:
: This really seems like a clean and straightforward way of doing
: things. My experience with recursive descent parsers has been pretty
: bad, but now I suspect that this is because I wasn't organizing my
: work in a systematic enough fashion.

I had similar experiences with rec.descent parsers. By the way this
technique is not limited to parsing. I use essentially the same
code to implement a type checker too. A parser has input a string
or a file, and output an AST. The type checker has the same structure
as a parser but it's input (the thing it parses) is an AST and it's
output an AST annotated with types. Combinators are more general
than just parsing and patterns. They are a form of sequencing
construct.

graham

--
This ain't no technological breakdown
Oh no, this is the road to hell

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 16, 1999, 8:24 AM

Post #11 of 39 (1747 views)

Fernando Pereira (pereira@research.att.com) wrote:
: but, as far as I know, nothing beats
: the automata-theoretic optimization techniques for pattern matching and
: parsing, of which the typical modern compiler for regular expressions
: is just the best-known instance.

In what sense are you using the words "nothing beats". I am sure that
the automata-theoretic approach produces the fastest pattern matchers
and parsers. So on this sense "nothing beats them". But the class of
languages recognisable by automaton doesn't cover all the languages of
interest (in fact my guess is it covers very few ... context ...). So
in this sense lots of things "beat" the automata approach. And then
there are software engineering issues ...

graham

--
This ain't no technological breakdown
Oh no, this is the road to hell

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 16, 1999, 8:34 AM

Post #12 of 39 (1749 views)

Neel Krishnaswami (neelk@brick.cswv.com) wrote:
: I think that this weekend I'm going to try and put together a basic
: version of this parsing system in Python based on the ideas in this
: paper -- basically just write a simple AST class and the combinators
: they describe. Then I can add proper support for backtracking and lazy
: evaluation and all the stuff with serious hack value when Christian
: Tismer makes continuations available to us. :)

It seems to me that you might also be able to make the regular expression
module available through this interface. All you have to do is wrap the
regular expression up in a function/object that can be combined like the
other parsers. Indeed you will need "primitive parsers" for things like
numbers, and these are really just a special case of a "primitive"
regular expression parser.

graham

--
This ain't no technological breakdown
Oh no, this is the road to hell

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 16, 1999, 6:25 PM

Post #13 of 39 (1749 views)

[Neel Krishnaswami]
> ...
> I found a nice description of this in a paper called "Monadic Parser
> Combinators" by Graham Hutton and Erik Meijer. Its URL is:
> http://www.cs.nott.ac.uk/Department/Staff/gmh/monparsing.ps

[Graham Matthews]
> Yes this is a nice place to start. Another nice place to look is in
> the Clean documentation. Somewhere in their docs (I think in the old
> book) there is a Clean implementation of combinators (without all the
> "do" notation).

A nice paper on Clean parsing combinators can be downloaded as item 5 from
the "Case Studies" list at

http://www.cs.kun.nl/~clean/

Interestingly enough, the next paper on the list ("An Interpreter") declines
to use combinators for its parsing task, instead building a conventional
"by hand" recursive descent parser.

another-two-years-and-they'll-discover-lex/yacc<wink>-ly y'rs - tim

Stackless & String-processing [ In reply to ]

pereira at research

Jul 16, 1999, 9:20 PM

Post #14 of 39 (1748 views)

In article <7mnirc$t2b$2@cronkite.cc.uga.edu>, Graham Matthews
<graham@sloth.math.uga.edu> wrote:

> Fernando Pereira (pereira@research.att.com) wrote:
> : but, as far as I know, nothing beats
> : the automata-theoretic optimization techniques for pattern matching and
> : parsing, of which the typical modern compiler for regular expressions
> : is just the best-known instance.
>
> In what sense are you using the words "nothing beats". I am sure that
> the automata-theoretic approach produces the fastest pattern matchers
> and parsers. So on this sense "nothing beats them".
I made the sense clear. For regular and deterministic CF languages,
although my main use of finite-state techniques these days is in speech
and text processing, involving some less known and some new techniques
for combining and optimizing weighted finite-state transducers
<http://www.research.att.com/sw/tools/fsm/>.
> But the class of
> languages recognisable by automaton doesn't cover all the languages of
> interest (in fact my guess is it covers very few ... context ...).
In what sense of "very few"? Most programming languages are designed to
have an LR(k) or LL(k) backbone, and regular lexical structure (those
that aren't are cursed by their unhappy implementors). {S|HT|X}ML and
other Internet protocols obey similar or even stronger restrictions as
far as I know.

Naturally occurring sequences (text, speech, proteins, DNA, ...) do not
fit neatly into the standard formal language classes, and for those the
question is that of what language class can best approximate the string
distribution, which pushes one quickly to very restrictive language
classes (eg. bounded-context regular languages, or lexicalized
stochastic CFGs) learnable automatically from data.
> So
> in this sense lots of things "beat" the automata approach.
Opportunities for optimization => efficiency. Crucial in text and
speech indexing, Web search, biological sequence processing, event log
analysis, to name just a few.
> And then
> there are software engineering issues ...
Agreed. However, more traditional lexer- and parser-generators had
sensible answers to many of those issues, that might not be as elegant
as monadic approaches, but that paid serious attention to optimization
in the regular and CF backbones. By the way, I find it faintly amusing
to see the functional programming community rediscovering and dressing
in categorial garb a simple idea that was a main motivator for the
development of logic programming in the early 70s (see
<http://akpublic.research.att.com:9000/~pereira/bib.html#Pereira+Warren-
80:DCGs> for a survey). It *is* a good idea for rapid prototyping, but
interspersing pattern matching/parsing with arbitrary computations
(even purely functional ones) effectively blocks optimization, which is
required in large-scale applications.

-- F

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 17, 1999, 9:09 AM

Post #15 of 39 (1753 views)

Tim Peters (tim_one@email.msn.com) wrote:
: A nice paper on Clean parsing combinators can be downloaded as item 5
: >
: Interestingly enough, the next paper on the list ("An Interpreter") declines
: to use combinators for its parsing task, instead building a conventional
: "by hand" recursive descent parser.

Yes it's kind of strange that they don't use the combinators to build
the interpreter. One of Peyton-Jones' papers has a good (but small)
example of making an interpreter with combinators (it's probably the
first paper of his on monads).

Tim Peters (tim_one@email.msn.com) wrote:
: another-two-years-and-they'll-discover-lex/yacc<wink>-ly y'rs - tim

I'll take that as a BIG <wink>. Once you use these combinators you will
never go back to lex/yacc (unless you want LALR(n) grammars).

graham
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 17, 1999, 9:21 AM

Post #16 of 39 (1756 views)

<graham@sloth.math.uga.edu> wrote:
: > But the class of
: > languages recognisable by automaton doesn't cover all the languages of
: > interest (in fact my guess is it covers very few ... context ...).
Fernando Pereira (pereira@research.att.com) wrote:
: In what sense of "very few"? Most programming languages are designed to
: have an LR(k) or LL(k) backbone, and regular lexical structure

So the regular expressions help you with the regular lexical structure.
But they don't help with the LR(n) bit. They also don't help with context
dependencies. That leaves out a lot of languages.

<graham@sloth.math.uga.edu> wrote:
: > And then
: > there are software engineering issues ...
Fernando Pereira (pereira@research.att.com) wrote:
: Agreed. However, more traditional lexer- and parser-generators had
: sensible answers to many of those issues, that might not be as elegant
: as monadic approaches, but that paid serious attention to optimization
: in the regular and CF backbones.

I don't really see why using parser combinators means that you cannot
optimise. You are not precluded from using regular expression parsers
as primitive parsers, nor are you precluded from using LALR(n) parsers
as primitive parsers. Hence you can optimise those parts of your
parsing with automaton if you want. So I don't really understand your
criticism.

Also as I said parser combinators are really more of a sequencing
technique -- one that applies nicely to parsing.

Fernando Pereira (pereira@research.att.com) wrote:
: By the way, I find it faintly amusing
: to see the functional programming community rediscovering and dressing
: in categorial garb a simple idea that was a main motivator for the
: development of logic programming in the early 70s (see
: <http://akpublic.research.att.com:9000/~pereira/bib.html#Pereira+Warren-
: 80:DCGs> for a survey).

Yes combinators are much like definite clause grammars in prolog. But if
I remember correctly from my use of DCG's, they gave you a fixed set of
combinators? Moreover they were limited to string parsing. The combinator
approach in the fp world has neither limitation.

Fernando Pereira (pereira@research.att.com) wrote:
: It *is* a good idea for rapid prototyping, but
: interspersing pattern matching/parsing with arbitrary computations
: (even purely functional ones) effectively blocks optimization, which is
-------------------------------

Can you explain why? I don't see it, since you can have reg expr parsers
and other such fast parsers as primitive parsers that you combine ...

graham
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

pereira at research

Jul 17, 1999, 3:35 PM

Post #17 of 39 (1750 views)

In article <7mqaht$r27$2@cronkite.cc.uga.edu>, Graham Matthews
<graham@sloth.math.uga.edu> wrote:

> <graham@sloth.math.uga.edu> wrote:
> : > But the class of
> : > languages recognisable by automaton doesn't cover all the languages of
> : > interest (in fact my guess is it covers very few ... context ...).
> Fernando Pereira (pereira@research.att.com) wrote:
> : In what sense of "very few"? Most programming languages are designed to
> : have an LR(k) or LL(k) backbone, and regular lexical structure
>
> So the regular expressions help you with the regular lexical structure.
> But they don't help with the LR(n) bit. They also don't help with context
> dependencies. That leaves out a lot of languages.
By "automaton" you seem to have misunderstood "finite-state automaton."
I was referring more generally to determinizable classes of automata,
eg. determinizable PDAs. Hence my reference to LR and LL constructions,
which effectively attempt to determinize PDAs (and fail with conflicts
if they cannot).
>
> <graham@sloth.math.uga.edu> wrote:
> : > And then
> : > there are software engineering issues ...
> Fernando Pereira (pereira@research.att.com) wrote:
> : Agreed. However, more traditional lexer- and parser-generators had
> : sensible answers to many of those issues, that might not be as elegant
> : as monadic approaches, but that paid serious attention to optimization
> : in the regular and CF backbones.
>
> I don't really see why using parser combinators means that you cannot
> optimise. You are not precluded from using regular expression parsers
> as primitive parsers, nor are you precluded from using LALR(n) parsers
> as primitive parsers. Hence you can optimise those parts of your
> parsing with automaton if you want. So I don't really understand your
> criticism.
The monadic approach mixes up the search structure for pattern matching
or parsing (what is elsewhere represented as an automaton) with other
data processing. Unless your optimizer can solve the halting problem,
it will not be able to determinize otherwise determinizable control
structures because it cannot figure out the data dependencies.
>
> Also as I said parser combinators are really more of a sequencing
> technique -- one that applies nicely to parsing.
>
Sure, but so what? Automata are too a sequencing technique, usable for
many other things besides pattern matching and parsing. The fundamental
difference is that automata-theoretic approaches separate neatly the
representation of sequences from what one wants to do with the parts of
those sequences, allowing optimizations that are difficult to obtain
within general-purpose programs.
> Fernando Pereira (pereira@research.att.com) wrote:
> : By the way, I find it faintly amusing
> : to see the functional programming community rediscovering and dressing
> : in categorial garb a simple idea that was a main motivator for the
> : development of logic programming in the early 70s (see
> : <http://akpublic.research.att.com:9000/~pereira/bib.html#Pereira+Warren-
> : 80:DCGs> for a survey).
>
> Yes combinators are much like definite clause grammars in prolog. But if
> I remember correctly from my use of DCG's, they gave you a fixed set of
> combinators?
It's easy enough to define one's own combinators, but not as elegantly
as in a higher-order functional language.
>Moreover they were limited to string parsing.
Not true. There's nothing in DCGs that ties them to strings. The
underlying programming pattern (often called "difference lists") is in
fact a general accumulator pattern that has been used in many other
applications, eg. code generation.
> Fernando Pereira (pereira@research.att.com) wrote:
> : It *is* a good idea for rapid prototyping, but
> : interspersing pattern matching/parsing with arbitrary computations
> : (even purely functional ones) effectively blocks optimization, which is
> -------------------------------
>
> Can you explain why? I don't see it, since you can have reg expr parsers
> and other such fast parsers as primitive parsers that you combine ...
Either you put all the interesting stuff in separate primitive parsers,
in which case I'm not sure what's the point of the monadic approach, or
you combine parsing and other data processing more tightly, which takes
advantage of the monadic approach to achieve greater conciseness, but
then it is hard to optimize the parser because the other data
processing is in the way of the necessary analyses. Focusing on
determinization (a crucial part of regular and LR constructions), you
need to determine whether two alternative computation paths that
consume the same input sequence can be merged into a single path. In
general, this is undecidable since it requires figuring out whether two
arbitrary functions (those associated to those two paths) produce the
same output. Now, it might be interesting to see whether compiler
optimization and code rewriting techniques could be used in trying to
recognize possible otimizations, but the general point stands that
tangling pattern-matching or parsing with general computation makes
optimization very difficult.

-- F

--
-- F

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 18, 1999, 8:43 AM

Post #18 of 39 (1758 views)

<graham@sloth.math.uga.edu> wrote:
> I don't really see why using parser combinators means that you cannot
> optimise. You are not precluded from using regular expression parsers
> as primitive parsers, nor are you precluded from using LALR(n) parsers
> as primitive parsers. Hence you can optimise those parts of your
> parsing with automaton if you want. So I don't really understand your
> criticism.
Fernando Pereira (pereira@research.att.com) wrote:
: The monadic approach mixes up the search structure for pattern matching
: or parsing (what is elsewhere represented as an automaton) with other
: data processing. Unless your optimizer can solve the halting problem,
: it will not be able to determinize otherwise determinizable control
: structures because it cannot figure out the data dependencies.

You have totally misunderstood what I said. I said "you can have regular
expression parsers as primitive parsers". There is no need for the compiler
to "solve the halting problem". The programmer is free to use regular
expression/automaton techniques where he/she wants. Such parsers take
strings and return ASTs. They can then be combined with other parsers
as if they were primitive parsers. This is purely a programming issue
and has nothing to do with the halting problem, compiler optimisations,
etc. To make it clearer a programmer can do

p = some_parser_that_uses_automaton :: String -> Ast
q = some_parser_that_uses_some_other_technique :: String -> Ast
pq = p <&> q
where <&> is the combinator that sequences parsers

Where does the halting problem enter into this? The program has to write
the autoton code him/herself, but there are fp tools that do this, much
like for example Yacc.

<graham@sloth.math.uga.edu> wrote:
: > Also as I said parser combinators are really more of a sequencing
: > technique -- one that applies nicely to parsing.
Fernando Pereira (pereira@research.att.com) wrote:
: Sure, but so what? Automata are too a sequencing technique, usable for
: many other things besides pattern matching and parsing. The fundamental
: difference is that automata-theoretic approaches separate neatly the
: representation of sequences from what one wants to do with the parts of
: those sequences, allowing optimizations that are difficult to obtain
: within general-purpose programs.

I still don't understand why you think the combinator approach is so
inefficient, given that you can have automaton based parsers embedded
in it at will.

The point of saying that combinators are more a of sequence tool than
a parsing tool is to point out that they are more general than automaton.
You can program much richer sequencing with combinators than you can
with automaton (not suprisingly). For those parts of your parser or
sequenceing where you only need automaton you use automaton as outlined
above. But where you need more powerful sequencing you have it in the
combinator approach.

<graham@sloth.math.uga.edu> wrote:
: > Can you explain why? I don't see it, since you can have reg expr parsers
: > and other such fast parsers as primitive parsers that you combine ...
Fernando Pereira (pereira@research.att.com) wrote:
: Either you put all the interesting stuff in separate primitive parsers,
: in which case I'm not sure what's the point of the monadic approach,

Err isn't this what I said you do ... I said "you can have regular
expression parsers as primitive parsers". The point of the monadic
approach is that is allows you to cleanly program arbitrary sequencing
when you need it. Moreover it makes for sound software engineering,
etc. That too is what I said in the last mail ....

graham
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

pereira at research

Jul 18, 1999, 8:37 PM

Post #19 of 39 (1749 views)

In article <7mssne$m8r$1@cronkite.cc.uga.edu>, Graham Matthews
<graham@sloth.math.uga.edu> wrote:
> You have totally misunderstood what I said. I said "you can have regular
> expression parsers as primitive parsers". There is no need for the compiler
> to "solve the halting problem". The programmer is free to use regular
> expression/automaton techniques where he/she wants. Such parsers take
> strings and return ASTs. They can then be combined with other parsers
> as if they were primitive parsers.
One last try. I understood your point from the beginning. However, you
did not understood *mine*. If you push the complex
pattern-matching/parsing machinery into black boxes, there's not much
for the monadic glue to do, is there? And even then, some
optimizations, those involving the various black boxes, will be
blocked. Here's a simple example. Assume that there are two black boxes
A and B, and that you use monadic glue to say "A or B". A and B may be
deterministic, but the result will not be, and exhaustive search will
be required in the glue. Sure, I can recognize such cases by hand and
create a new black box "A or B" which is optimized, but that is an
error-prone hand optimization. And one may not notice subtler
inefficiencies.

-- F

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 18, 1999, 9:12 PM

Post #20 of 39 (1748 views)

[Graham Matthews]
> You have totally misunderstood what I said.
> ...

[Fernando Pereira]
> One last try. I understood your point from the beginning. However, you
> did not understood *mine*.
> ...

Trust me on this one, fellows: you each understand the other, and have for
several (interesting) msgs already. Graham has an unusual way of expressing
his appreciation for a good discussion <wink>.

http://www.deja.com/getdoc.xp?AN=316262255

leaping-thru-hoops-ly y'rs - tim

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 19, 1999, 9:09 AM

Post #21 of 39 (1753 views)

<graham@sloth.math.uga.edu> wrote:
> expression parsers as primitive parsers". There is no need for the compiler
> to "solve the halting problem". The programmer is free to use regular
> expression/automaton techniques where he/she wants. Such parsers take
> strings and return ASTs. They can then be combined with other parsers
> as if they were primitive parsers.
Fernando Pereira (pereira@research.att.com) wrote:
: One last try. I understood your point from the beginning. However, you
: did not understood *mine*. If you push the complex
: pattern-matching/parsing machinery into black boxes, there's not much
: for the monadic glue to do, is there?

What your combinators do depends on your application. If all you are
doing is parsing using primitive automaton based parsers then your
combinators won't have much to do. But if you are doing more than
that then they will have more to do, and hence be more complex. For
example you can use the same combinators you define for parsing to do
type checking, model state, etc -- things I would not want to do with
an automaton. That is why I said that combinators are really sequencing
constructs.

Fernando Pereira (pereira@research.att.com) wrote:
: And even then, some
: optimizations, those involving the various black boxes, will be
: blocked. Here's a simple example. Assume that there are two black boxes
: A and B, and that you use monadic glue to say "A or B". A and B may be
: deterministic, but the result will not be, and exhaustive search will
: be required in the glue. Sure, I can recognize such cases by hand and
: create a new black box "A or B" which is optimized, but that is an
: error-prone hand optimization. And one may not notice subtler
: inefficiencies.

Your point appears to be that if you were programming only using automaton
for parsing you wouldn't miss the above optimisation, but that somehow
using combinators means you would miss the above optimisation. You
and I must program differently then. If I were using automaton, even
in the context of also using combinators, I would be aware of the
potential for such optimisations and look for them.

graham
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 19, 1999, 9:12 AM

Post #22 of 39 (1748 views)

[Graham Matthews]
: > You have totally misunderstood what I said.
[Fernando Pereira]
: > One last try. I understood your point from the beginning. However, you
: > did not understood *mine*.
Tim Peters (tim_one@email.msn.com) wrote:
: Trust me on this one, fellows: you each understand the other, and have for
: several (interesting) msgs already. Graham has an unusual way of expressing
: his appreciation for a good discussion

Tim I don't think that we are agreeing.: Fernando seems to want to
view combinators as parsing constructs and hence says things like
"the combinators don't have to do much if you are using automata
based black boxes" (to paraphrase). But combinators are more than
parsing constructs -- they are sequencing constructs used for a wide
variety of tasks where automata techniques are not available.

graham

--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

pereira at research

Jul 19, 1999, 9:35 PM

Post #23 of 39 (1752 views)

In article <7mvika$cpj$1@cronkite.cc.uga.edu>, Graham Matthews
<graham@sloth.math.uga.edu> wrote:

> <graham@sloth.math.uga.edu> wrote:
> > expression parsers as primitive parsers". There is no need for the compiler
> > to "solve the halting problem". The programmer is free to use regular
> > expression/automaton techniques where he/she wants. Such parsers take
> > strings and return ASTs. They can then be combined with other parsers
> > as if they were primitive parsers.
> Fernando Pereira (pereira@research.att.com) wrote:
> : One last try. I understood your point from the beginning. However, you
> : did not understood *mine*. If you push the complex
> : pattern-matching/parsing machinery into black boxes, there's not much
> : for the monadic glue to do, is there?
>
> What your combinators do depends on your application. If all you are
> doing is parsing using primitive automaton based parsers then your
> combinators won't have much to do. But if you are doing more than
> that then they will have more to do, and hence be more complex. For
> example you can use the same combinators you define for parsing to do
> type checking, model state, etc -- things I would not want to do with
> an automaton. That is why I said that combinators are really sequencing
> constructs.
*What* I've been saying is that there is a generality-efficiency
tradeoff, and that the monadic approach goes for generality at the
expense of efficiency, by precluding optimizations in those very
important cases in which combinators are used for parsing. After all,
the papers cited earlier in this discussion were all about parsing, and
the thread was originally about string processing, coroutines and Icon
generators. Here's an interesting question. A context-free grammar can
be seen as a set of simultaneous recursive definitions in which the
RHSs are polynomials (where + is alternation and * is concatenation;
more generally, this can be expressed in terms of semirings and formal
power series, with some very interesting advantages). A regexp pattern
matcher is just a special case (modulo complications with non-regular
extensions like matching against pattern variables set by earlier \(
... \) ). Now, could the mappings of (special cases of) such grammars
to deterministic automata be extended to interesting cases of monadic
expressions? I have some vague ideas on how this might go based on
recent work on weighted transducer determinization by my colleague
Mehryar Mohri -- basically, one defines an algebraic structure over
"outputs" so that it makes sense to talk about the greatest common
factor of alternative outputs generated from the same input prefix, and
one can give sufficient conditions for determinizability.
>
> Fernando Pereira (pereira@research.att.com) wrote:
> : And even then, some
> : optimizations, those involving the various black boxes, will be
> : blocked. Here's a simple example. Assume that there are two black boxes
> : A and B, and that you use monadic glue to say "A or B". A and B may be
> : deterministic, but the result will not be, and exhaustive search will
> : be required in the glue. Sure, I can recognize such cases by hand and
> : create a new black box "A or B" which is optimized, but that is an
> : error-prone hand optimization. And one may not notice subtler
> : inefficiencies.
>
> Your point appears to be that if you were programming only using automaton
> for parsing you wouldn't miss the above optimisation, but that somehow
> using combinators means you would miss the above optimisation. You
> and I must program differently then. If I were using automaton, even
> in the context of also using combinators, I would be aware of the
> potential for such optimisations and look for them.
You might be, by looking at the code and changing it by hand. But a
program would not, because in general the optimizable skeleton of
sequence, alternation and Kleene closure combinators is interspersed
with arbitrary functional code. The point of automata-based compilation
of regexps and CFGs is that it is *automatic*. Clearly, a monadic
program that is a direct transcription of a regexp or a CFG could be
similarly compiled. But from my readings (those mentioned in the thread
and others) as well as your comments, the whole point of monadic
combinators for parsing is to interweave the grammatical structure with
other computations, thus making the underlying grammatical structure
less accessible to optimization. As I suggested above, it would be
interesting to define classes of monadic programs that still allow (an
appropriate extension of) those optimizations. But AFAIK that has not
been done.

-- F

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 19, 1999, 10:06 PM

Post #24 of 39 (1746 views)

[Tim, on Graham & Fernando's combinator thread]
> Trust me on this one, fellows: you each understand the other,
> and have for several (interesting) msgs already. ...

[Graham]
> Tim I don't think that we are agreeing.

Who said anything about agreeing? You two obviously disagree, but
disagreement doesn't imply misunderstanding. Heck, you've both repeated the
same unbudging positions so often even *I* understand what you're both
saying <wink>.

less-filling-tastes-great-ly y'rs - tim

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 20, 1999, 9:24 AM

Post #25 of 39 (1748 views)

<graham@sloth.math.uga.edu> wrote:
> What your combinators do depends on your application. If all you are
> doing is parsing using primitive automaton based parsers then your
> combinators won't have much to do. But if you are doing more than
> that then they will have more to do, and hence be more complex. For
> example you can use the same combinators you define for parsing to do
> type checking, model state, etc -- things I would not want to do with
> an automaton. That is why I said that combinators are really sequencing
> constructs.
Fernando Pereira (pereira@research.att.com) wrote:
: *What* I've been saying is that there is a generality-efficiency
: tradeoff, and that the monadic approach goes for generality at the
: expense of efficiency, by precluding optimizations in those very
: important cases in which combinators are used for parsing.

And I am saying that this claim that generality implies loss of
efficiency is a red herring. If you are writing a program that uses
combinator based parsing there is nothing to stop you using automaton
based techniques in your program. You just add the appropriate
automaton based primitives (eg. for regular expressions). You
might end up adding in so many automaton based primitives that
you don't need to do much with your combinators, but whether or
not that happens depends entirely on your application and your
needs. The point is that combinator based parsing doesn't prevent
you from using other more efficient techniques as and where
appropriate!

Have you ever written a combinator based parsing program?

<graham@sloth.math.uga.edu> wrote:
> Your point appears to be that if you were programming only using automaton
> for parsing you wouldn't miss the above optimisation, but that somehow
> using combinators means you would miss the above optimisation. You
> and I must program differently then. If I were using automaton, even
> in the context of also using combinators, I would be aware of the
> potential for such optimisations and look for them.
Fernando Pereira (pereira@research.att.com) wrote:
: You might be, by looking at the code and changing it by hand. But a
: program would not, because in general the optimizable skeleton of
: sequence, alternation and Kleene closure combinators is interspersed
: with arbitrary functional code.

Could you please point out to me where I claimed that "a program could
optimise your combinator program". If you can't find such a quote then
what is your point?

Fernando Pereira (pereira@research.att.com) wrote:
: But from my readings (those mentioned in the thread
: and others) as well as your comments, the whole point of monadic
: combinators for parsing is to interweave the grammatical structure with
: other computations, thus making the underlying grammatical structure
: less accessible to optimization. As I suggested above, it would be
: interesting to define classes of monadic programs that still allow (an
: appropriate extension of) those optimizations. But AFAIK that has not
: been done.

It hasn't been done, it won't likely be done, and I never said it
could be done. So what is your point?

graham
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 20, 1999, 9:26 AM

Post #26 of 39 (557 views)

Tim Peters (tim_one@email.msn.com) wrote:
: Who said anything about agreeing? You two obviously disagree, but
: disagreement doesn't imply misunderstanding. Heck, you've both repeated the
: same unbudging positions so often even *I* understand what you're both
: saying <wink>.
:
: + other equally useful contributions.

What is the point of your post Tim? Getting your jollies from adding
useless material to a thread?

graham

--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

m.faassen at vet

Jul 20, 1999, 9:36 AM

Post #27 of 39 (563 views)

Graham Matthews <graham@sloth.math.uga.edu> wrote:
> Tim Peters (tim_one@email.msn.com) wrote:
> : Who said anything about agreeing? You two obviously disagree, but
> : disagreement doesn't imply misunderstanding. Heck, you've both repeated the
> : same unbudging positions so often even *I* understand what you're both
> : saying <wink>.
> :
> : + other equally useful contributions.

> What is the point of your post Tim? Getting your jollies from adding
> useless material to a thread?

Actually this makes Tim go up in his political standing in this newsgroup,
if this is possible. :)

Actually, no. This kind of post is actually socially useful. By adding
some lightness to a heated discussion it's sometimes possible to prevent
nastiness from occuring.

Of course I shouldn't give such Grand Usenet Secrets away; my Pythonic
political standing is sure to drop. :)

Regards,

Martijn

Stackless & String-processing [ In reply to ]

greg.ewing at compaq

Jul 20, 1999, 3:29 PM

Post #28 of 39 (557 views)

If you're trying to parse something using a grammar
that needs heavy transformations to make it tractable,
then obviously using monadic parsers with arbitrary
code embedded in them is going to get in the way
of those transformations.

However, not all parsing problems are like that.
Programming languages are usually designed to be
straightforward to parse efficiently. Monadic
techniques can allow such parsers to be written
very elegantly.

Greg

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 20, 1999, 5:20 PM

Post #29 of 39 (564 views)

[Tim]
> Who said anything about agreeing? You two obviously disagree, but
> disagreement doesn't imply misunderstanding. Heck, you've both
> repeated the same unbudging positions so often even *I* understand
> what you're both saying <wink>.

[Graham Matthews]
> : What is the point of your post Tim?

'Twas but a gentle attempt to break the thread out of the loop it appears to
be retracing now for the third time.

> Getting your jollies from adding useless material to a thread?

Sometimes, yes. Don't think it applies in this case, though.

> :
> : + other equally useful contributions.

My sig? There was nothing else of mine in the msg.

sarcasm-doesn't-work-on-imaginary-targets<wink>-ly y'rs - tim

Stackless & String-processing [ In reply to ]

pereira at research

Jul 20, 1999, 8:48 PM

Post #30 of 39 (559 views)

In article <7n27r8$mm4$1@cronkite.cc.uga.edu>, Graham Matthews
<graham@sloth.math.uga.edu> wrote:
> Could you please point out to me where I claimed that "a program could
> optimise your combinator program". If you can't find such a quote then
> what is your point?
You didn't. The point, which is obvious to anyone who has actually done
serious pattern matching and parsing applications operating on
gigabytes of data, is that pattern matchers and parsers *must* be
optimized automatically (in the same way that programs in high-level
languages *must* be optimized automatically) because humans are just
not good enough at doing hand optimization of complex programs. Insofar
as a style of writing pattern-matchers/parsers is incompatible with
automatic optimization, it is only useful for toy examples, or, at,
best, for proofs of concept (although in my experience a proof of
concept that doesn't deal with efficiency issues is not much of a
proof).
>
> Fernando Pereira (pereira@research.att.com) wrote:
> : But from my readings (those mentioned in the thread
> : and others) as well as your comments, the whole point of monadic
> : combinators for parsing is to interweave the grammatical structure with
> : other computations, thus making the underlying grammatical structure
> : less accessible to optimization. As I suggested above, it would be
> : interesting to define classes of monadic programs that still allow (an
> : appropriate extension of) those optimizations. But AFAIK that has not
> : been done.
>
> It hasn't been done, it won't likely be done, and I never said it
> could be done. So what is your point?
That it might be interesting to understand under what conditions it
could be done, building on the work I mentioned in my earlier posting
as well as on the work on attribute grammars, which is directly
relevant. Of course, if what is at stake is religion rather than
science, such curiousity may be frowned upon. Oh well... Enough said.

-- F

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 20, 1999, 9:57 PM

Post #31 of 39 (569 views)

[Fernando Pereira]
> ...
> But from my readings (those mentioned in the thread and others) as well
> as your comments, the whole point of monadic combinators for parsing is
> to interweave the grammatical structure with other computations,

That was at least part of the point, the rest being the perceived advantages
of using the same notation (language) for both parsing and semantic
processing. Based on your work with weighted automata, I'd guess you'd be
the last person on earth to dismiss the advantages of a unifying framework
<wink>.

> thus making the underlying grammatical structure less accessible to
> optimization.

Whereas your approaches have factored things in such a way as to make them
more accesible to optimization.

If it isn't clear yet <wink>, efficiency isn't driving Graham here. He's
found a cool new approach, and is having fun with it.

Different strokes. Most of my parsing life has been concerned with
artificial languages, and there I've got a different complaint about
intermixing syntax with semantics: the grammar is interesting for its own
sake, and it's a positive burden at times to clutter it up with semantic
annotations!

Case in point: Python's grammar is formally defined by the file
Grammar/Grammar in the source distribution. That's the input to a parser
generator. At 60 non-noise lines, it's quite possible to digest the whole
thing, and minor changes in syntax can be effected without touching anything
else (how minor? exactly what doesn't require changing anything else
<wink>).

This is a joy as it is; even extending it to a conventional attributed
grammar would get in the way of the purposes to which the grammar file is
*usually* put.

I haven't tried using parser combinators, but in reading the papers the
question I had over and over is whether they scale well in practice, whether
measured by speed or maintainability (they eventually answered the "speed"
one: no). It didn't bode well that each new example seemed to require
creating a new combinator, and then you've got the implementation of *that*
to worry about on top of the target grammar and the semantics. It seemed to
carry the flavor of programming in an assembly language for "a real parsing
engine" (as Icon often feels to me in contrast to SNOBOL4 as grammars get
large). "You can have any sequencing you want, but you have to build it
yourself."

> As I suggested above, it would be interesting to define classes of
> monadic programs that still allow (an appropriate extension of) those
> optimizations. But AFAIK that has not been done.

It would. Not for me, though <wink>.

lex/yacc-looking-better-all-the-time-ly y'rs - tim

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 21, 1999, 8:09 AM

Post #32 of 39 (563 views)

[Fernando Pereira]
> But from my readings (those mentioned in the thread and others) as well
> as your comments, the whole point of monadic combinators for parsing is
> to interweave the grammatical structure with other computations,
Tim Peters (tim_one@email.msn.com) wrote:
: If it isn't clear yet <wink>, efficiency isn't driving Graham here. He's
: found a cool new approach, and is having fun with it.

No that's not the case. Efficiency drives all my programs *when it's a
necessity*. My point is that using combinators:

a) is a very nice way of structuring a parser since it is very amenable
to change and robust under change.

b) doesn't have to cost anything in efficiency since you can freely mix
combinators with other parsing techniques.

c) is a general programming technique useful in a vast range of programs
outside of just parsing. For example I have used it for type checking
and writing a GUI. The literature has a lot of other examples of this
approach.

Moreover the point of combinators is not to "interweave the grammatical
structure with other computations". Combinators do do that, and I think
it's a great benefit. But the point is to:

a) clearly separate sequencing code from other code. In a combinator
program the precise control flow of the program is clear.
l
b) allow programs to program their own sequencing combinators (rather
than providing some fixed set that the language designer hopes will
be enough.

Tim Peters (tim_one@email.msn.com) wrote:
: Different strokes. Most of my parsing life has been concerned with
: artificial languages, and there I've got a different complaint about
: intermixing syntax with semantics: the grammar is interesting for its own
: sake, and it's a positive burden at times to clutter it up with semantic
: annotations!

I have been using combinators for some time now and never noticed the
"grammar" being cluttered with all these "semantic annotations". What
annotations one requires can easily be separated from the grammar
either by sensible layout, or by the design of clever combinators that
hide semantic annotations away.
:
Tim Peters (tim_one@email.msn.com) wrote:
: Case in point: Python's grammar is formally defined by the file
: Grammar/Grammar in the source distribution. That's the input to a parser
: generator. At 60 non-noise lines, it's quite possible to digest the whole
: thing, and minor changes in syntax can be effected without touching anything
: else (how minor? exactly what doesn't require changing anything else

Combinators are very robust under change. I have a parser written using
combinators and I often tweak the grammar, without requiring large
changes to other code or the parser itself.

Have you ever written a combinator based parser or are you just
"theorising"?

Tim Peters (tim_one@email.msn.com) wrote:
: I haven't tried using parser combinators, but in reading the papers the
: question I had over and over is whether they scale well in practice, whether
: measured by speed or maintainability (they eventually answered the "speed"
: one: no).

Ah there's the rub -- you haven't written a combinator based parser so you
your comments are just what you *think* might happen if you did.

I don't know where you got the idea that speed is a problem! For some
kinds of parsing I wouldn't use combinators rather I would use
automaton techniques. But for most kinds of parsing they are plenty
fast enough.

Tim Peters (tim_one@email.msn.com) wrote:
: It didn't bode well that each new example seemed to require
: creating a new combinator, and then you've got the implementation of *that*
: to worry about on top of the target grammar and the semantics. It seemed to
: carry the flavor of programming in an assembly language for "a real parsing
: engine" (as Icon often feels to me in contrast to SNOBOL4 as grammars get
: large). "You can have any sequencing you want, but you have to build it
: yourself."

Let me guess Tim -- you have read two papers on combinators. One is the
Hutton/Meijer paper on "monadic parser combinators", and the other is
the chapter in the old Clean book. Each new example in these papers
requires a different combinator because each new example illustrates
some new sequencing idea -- eg. determinisation, coninuation, ...
The papers are illustrating the generality of the techniques, and hence
you expect to see a lot of different combinators. In practice however
(something you wouldn't know about right?) you don't use all these
combinators. I have written a parser for a new programming language
and I used 10 combinators (5 of which I re-used in the type checker!).
Yes you do have to build your own combinators (that's the point of the
approach!), but once done you stick them in a library and re-use them.

graham:
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Stackless & String-processing [ In reply to ]

tismer at appliedbiometrics

Jul 21, 1999, 9:27 AM

Post #33 of 39 (560 views)

Hi friends,

My understanding of this thread is limited and reduces
to the following sketch:

Graham Matthews wrote:
> [Fernando Pereira] [arguing]
> [Tim Peters] [digging in]
[Graham] [ elaborating ]
> [Tim Peters] [syntax mixed with sematix]
[Graham] [ elaborating ]
> Tim Peters [insists]
[Graham] [ combinatores are robust, is Tim theorising? ]
> Tim Peters [ papers, not tried ]
[Graham] [ made a point, Tim is theorising ]
> Tim Peters [ combinators as assembly language ]
[Graham, now at Hutton/Meijer and the old CLean book]

can't see an end of this and would like to understand more.

I believe this thread would not have evolved if I had not
begun the Stackless Python / continuation/coroutine/generator
talk. Now I'm asking to excuse my limited understanding.

Besides my person, I think there are more interested readers
who would be happy to know what you are talking about at all.
I may assume that you would use private email if this
is not your intent. Also I'm sure that we'll see its
relevance for Python, soon.

Can you please be all a little more concrete?
How would your patterns look in Python?
Please show some small example code pieces.

This would also be valuable input for my implementation,
and would give me more reason to finish it at all.

gimme-a-reason-to-read-then-I'll-give-you-a-reason-to-write - ly y'rs

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home

Stackless & String-processing [ In reply to ]

tim_one at email

Jul 21, 1999, 8:29 PM

Post #34 of 39 (560 views)

[Tim]
> ...
> If it isn't clear yet <wink>, efficiency isn't driving Graham
> here. He's found a cool new approach, and is having fun with it.

[Graham Matthews]
> No that's not the case.

OK, my mistake: you didn't find it, it's not cool, it's not new, and/or
you're not having fun <wink>.

> Efficiency drives all my programs *when it's a necessity*.

Ack, I'm speaking English here. If efficiency isn't your *primary* concern,
then efficiency isn't "driving" you. Dictionary. Look up. Efficiency is
Fernando's primary concern wrt parsing approaches -- it drives him. That's
how you two differ. It's important to note that efficiency really is a
necessity in his applications (have you read any of his papers? no claim
makes sense out of context).

> My point is that using combinators:
>
> a) is a very nice way of structuring a parser since it is very amenable
> to change and robust under change.

Could well be.

> b) doesn't have to cost anything in efficiency since you can freely mix
> combinators with other parsing techniques.

Like programming in Python doesn't have to cost anything in efficiency since
you can freely mix Python with other languages? It's a true enough
argument, but useless in practice unless 5% of the code accounts for 95% of
the runtime. Which it often *does* in general-purpose Python apps. For
parsing, it's not clear what kind of sequencing you could want that isn't
already covered by the "other parsing techniques" that are presumably doing
95% of the (presumed to be) efficiency-critical computation. Have something
concrete in mind?

> c) is a general programming technique useful in a vast range of programs
> outside of just parsing. For example I have used it for type checking
> and writing a GUI. The literature has a lot of other examples of this
> approach.

I didn't see anyone argue against that. This thread started with string
processing, and everyone else has stuck to that.

> Moreover the point of combinators is not to "interweave the grammatical
> structure with other computations". Combinators do do that, and I think
> it's a great benefit.

Expect that depends on the app: sometimes a benefit, sometimes not. I've
used systems of both kinds, and neither is a pure win.

> But the point is to:
>
> a) clearly separate sequencing code from other code. In a combinator
> program the precise control flow of the program is clear.

In a way deeper than it is in a functional program not using monads? IOW,
is the clarity of "control flow" due to monads specifically or just to
functional semantics in general? AFAIK, "combinator" hasn't sprouted a new
meaning beyond its old "higher-order curried function" one.

> b) allow programs to program their own sequencing combinators (rather
> than providing some fixed set that the language designer hopes will
> be enough.

WRT parsing specifically, what kinds of desirable sequencing do you find
lacking in other approaches? Any parsing framework caters more-than-less
directly to juxtaposition, choice, optionality, repetition and (at least
conceptual) backtracking. I understand you can model all those with
combinators. What I haven't seen is an example of something clearly useful
(wrt parsing specifically) *beyond* those.

> ...
> I have been using combinators for some time now and never noticed the
> "grammar" being cluttered with all these "semantic annotations".

We often see most clearly that which we hate <wink>. I saw 'em all over the
place in the papers.

> What annotations one requires can easily be separated from the grammar
> either by sensible layout, or by the design of clever combinators that
> hide semantic annotations away.

I mentioned indirectly that what drives me is variously "speed or
maintainability" in parsing systems. "clever" is at odds with the latter --
unless all the cleverness comes built in to the system once & for all.

> ...
> Combinators are very robust under change. I have a parser written using
> combinators and I often tweak the grammar, without requiring large
> changes to other code or the parser itself.

That's encouraging!

> Have you ever written a combinator based parser or are you just
> "theorising"?

You ask that immediately before quoting my

> : I haven't tried using parser combinators, but in reading the papers ...

? Prophetic <wink>.

> Ah there's the rub -- you haven't written a combinator based parser so you
> your comments are just what you *think* might happen if you did.

Oh yes! Parsing with combinators is a Porsche. I've driven all kinds of
cars for thirty years, in various terrains and in all kinds of weather; even
built a few engines from metal I smolt myself <wink>, that other people
relied on. I used functional languages extensively too, although very
lightly this decade. So when I read the Porsche owner's manual it's not
exactly blind imagination driving my concerns. It's certainly no substitute
for a test drive, though. I installed Clean with the aim of doing that, but
had too much fun poking at other aspects of the system to get around to it.

> I don't know where you got the idea that speed is a problem!

*Can* be a problem. Fernando covered that exhaustively, and you actually
haven't disgreed with his points for all your words to the contrary <wink>.
I also have much other experience crafting elegant functional solutions that
had to be tossed in practice for the same kinds of "can't automatically
optimize a huge nest of clever higher-order applications" reasons.

> For some kinds of parsing I wouldn't use combinators rather I would use
> automaton techniques. But for most kinds of parsing they are plenty
> fast enough.

"Most" depends on what you're doing. For most things I do for fun, even
SNOBOL4 was plenty fast enough (and I don't doubt for an instant that even a
casually tossed-together combinator-based program is faster than that!).
Half the things I do for work deal with artificial languages that are
specifically designed for one-token lookahead techniques, because they're
designed for peak speed over megabytes of input; the other half is parsing
huge ambiguous grammars on cheap PCs via dynamic programming techniques, and
there we *claw* to save bytes and milliseconds, no holds barred.

Different strokes. I believe I'd enjoy combinator approaches for my own
fun; I still doubt they'd be practical for my work (but I'd rather try that
than continue arguing about it <0.9 wink>).

> ...
> Each new example in these papers requires a different combinator because
? each new example illustrates some new sequencing idea -- eg.
determinisation,
> coninuation, ...
> The papers are illustrating the generality of the techniques, and hence
> you expect to see a lot of different combinators.

Yes, and I confess to over-panicking at what I indeed should have expected.

> In practice however (something you wouldn't know about right?) you don't
> use all these combinators.

Which casts some doubt on the great utility of being able to create new
ones. Parsing is an old problem. If only a handful of combinators are of
actual use in parsing problems, freeze the semantics, package them in a
library, and implement a pleasant parser generator on top of them <wink>.

> I have written a parser for a new programming language and I used 10
> combinators (5 of which I re-used in the type checker!).
> Yes you do have to build your own combinators (that's the point of the
> approach!), but once done you stick them in a library and re-use them.

That's fine for a one-man show. "Maintainability" in my life cuts across
programmers, and it's Not Good for each person to reinvent their own basic
machinery -- even to have the ability to do so. One reason I like Python is
that it *doesn't* let the jerk down the hall invent their own control
structures <0.5 wink>.

I will give parser combinators a try; they do sound interesting to me. More
interesting would be to hear about that "new programming language"!

i-believe-you-can-parse-it<wink>-ly y'rs - tim

Stackless & String-processing [ In reply to ]

Jul 21, 1999, 8:29 PM

Post #35 of 39 (563 views)

From: "Tim Peters" <tim_one@email.msn.com>
Subject: RE: Stackless & String-processing

[Tim]
> ...
> If it isn't clear yet <wink>, efficiency isn't driving Graham
> here. He's found a cool new approach, and is having fun with it.

[Graham Matthews]
> No that's not the case.

OK, my mistake: you didn't find it, it's not cool, it's not new, and/or
you're not having fun <wink>.

> Efficiency drives all my programs *when it's a necessity*.

Ack, I'm speaking English here. If efficiency isn't your *primary* concern,
then efficiency isn't "driving" you. Dictionary. Look up. Efficiency is
Fernando's primary concern wrt parsing approaches -- it drives him. That's
how you two differ. It's important to note that efficiency really is a
necessity in his applications (have you read any of his papers? no claim
makes sense out of context).

> My point is that using combinators:
>
> a) is a very nice way of structuring a parser since it is very amenable
> to change and robust under change.

Could well be.

> b) doesn't have to cost anything in efficiency since you can freely mix
> combinators with other parsing techniques.

Like programming in Python doesn't have to cost anything in efficiency since
you can freely mix Python with other languages? It's a true enough
argument, but useless in practice unless 5% of the code accounts for 95% of
the runtime. Which it often *does* in general-purpose Python apps. For
parsing, it's not clear what kind of sequencing you could want that isn't
already covered by the "other parsing techniques" that are presumably doing
95% of the (presumed to be) efficiency-critical computation. Have something
concrete in mind?

> c) is a general programming technique useful in a vast range of programs
> outside of just parsing. For example I have used it for type checking
> and writing a GUI. The literature has a lot of other examples of this
> approach.

I didn't see anyone argue against that. This thread started with string
processing, and everyone else has stuck to that.

> Moreover the point of combinators is not to "interweave the grammatical
> structure with other computations". Combinators do do that, and I think
> it's a great benefit.

Expect that depends on the app: sometimes a benefit, sometimes not. I've
used systems of both kinds, and neither is a pure win.

> But the point is to:
>
> a) clearly separate sequencing code from other code. In a combinator
> program the precise control flow of the program is clear.

In a way deeper than it is in a functional program not using monads? IOW,
is the clarity of "control flow" due to monads specifically or just to
functional semantics in general? AFAIK, "combinator" hasn't sprouted a new
meaning beyond its old "higher-order curried function" one.

> b) allow programs to program their own sequencing combinators (rather
> than providing some fixed set that the language designer hopes will
> be enough.

WRT parsing specifically, what kinds of desirable sequencing do you find
lacking in other approaches? Any parsing framework caters more-than-less
directly to juxtaposition, choice, optionality, repetition and (at least
conceptual) backtracking. I understand you can model all those with
combinators. What I haven't seen is an example of something clearly useful
(wrt parsing specifically) *beyond* those.

> ...
> I have been using combinators for some time now and never noticed the
> "grammar" being cluttered with all these "semantic annotations".

We often see most clearly that which we hate <wink>. I saw 'em all over the
place in the papers.

> What annotations one requires can easily be separated from the grammar
> either by sensible layout, or by the design of clever combinators that
> hide semantic annotations away.

I mentioned indirectly that what drives me is variously "speed or
maintainability" in parsing systems. "clever" is at odds with the latter --
unless all the cleverness comes built in to the system once & for all.

> ...
> Combinators are very robust under change. I have a parser written using
> combinators and I often tweak the grammar, without requiring large
> changes to other code or the parser itself.

That's encouraging!

> Have you ever written a combinator based parser or are you just
> "theorising"?

You ask that immediately before quoting my

> : I haven't tried using parser combinators, but in reading the papers ...

? Prophetic <wink>.

> Ah there's the rub -- you haven't written a combinator based parser so you
> your comments are just what you *think* might happen if you did.

Oh yes! Parsing with combinators is a Porsche. I've driven all kinds of
cars for thirty years, in various terrains and in all kinds of weather; even
built a few engines from metal I smolt myself <wink>, that other people
relied on. I used functional languages extensively too, although very
lightly this decade. So when I read the Porsche owner's manual it's not
exactly blind imagination driving my concerns. It's certainly no substitute
for a test drive, though. I installed Clean with the aim of doing that, but
had too much fun poking at other aspects of the system to get around to it.

> I don't know where you got the idea that speed is a problem!

*Can* be a problem. Fernando covered that exhaustively, and you actually
haven't disgreed with his points for all your words to the contrary <wink>.
I also have much other experience crafting elegant functional solutions that
had to be tossed in practice for the same kinds of "can't automatically
optimize a huge nest of clever higher-order applications" reasons.

> For some kinds of parsing I wouldn't use combinators rather I would use
> automaton techniques. But for most kinds of parsing they are plenty
> fast enough.

"Most" depends on what you're doing. For most things I do for fun, even
SNOBOL4 was plenty fast enough (and I don't doubt for an instant that even a
casually tossed-together combinator-based program is faster than that!).
Half the things I do for work deal with artificial languages that are
specifically designed for one-token lookahead techniques, because they're
designed for peak speed over megabytes of input; the other half is parsing
huge ambiguous grammars on cheap PCs via dynamic programming techniques, and
there we *claw* to save bytes and milliseconds, no holds barred.

Different strokes. I believe I'd enjoy combinator approaches for my own
fun; I still doubt they'd be practical for my work (but I'd rather try that
than continue arguing about it <0.9 wink>).

> ...
> Each new example in these papers requires a different combinator because
? each new example illustrates some new sequencing idea -- eg.
determinisation,
> coninuation, ...
> The papers are illustrating the generality of the techniques, and hence
> you expect to see a lot of different combinators.

Yes, and I confess to over-panicking at what I indeed should have expected.

> In practice however (something you wouldn't know about right?) you don't
> use all these combinators.

Which casts some doubt on the great utility of being able to create new
ones. Parsing is an old problem. If only a handful of combinators are of
actual use in parsing problems, freeze the semantics, package them in a
library, and implement a pleasant parser generator on top of them <wink>.

> I have written a parser for a new programming language and I used 10
> combinators (5 of which I re-used in the type checker!).
> Yes you do have to build your own combinators (that's the point of the
> approach!), but once done you stick them in a library and re-use them.

That's fine for a one-man show. "Maintainability" in my life cuts across
programmers, and it's Not Good for each person to reinvent their own basic
machinery -- even to have the ability to do so. One reason I like Python is
that it *doesn't* let the jerk down the hall invent their own control
structures <0.5 wink>.

I will give parser combinators a try; they do sound interesting to me. More
interesting would be to hear about that "new programming language"!

i-believe-you-can-parse-it<wink>-ly y'rs - tim

--
|Fidonet: UUCP 2:500/3.1
|Internet: UUCP@p1.f3.n500.z2.hccfido.hcc.nl
|
| Standard disclaimer: The views of this user are strictly his own.

Stackless & String-processing [ In reply to ]

greg.ewing at compaq

Jul 22, 1999, 3:58 PM

Post #36 of 39 (558 views)

Tim Peters wrote:
>
> In a way deeper than it is in a functional program not using monads? IOW,
> is the clarity of "control flow" due to monads specifically or just to
> functional semantics in general?

Monads are really only a big deal in the context of
functional languages, where there is no such thing
as control flow. Monads provide an astoundingly
elegant way of writing what is effectively procedural
code in a functional language. If you've ever used
one of the pre-monad systems for doing I/O in a
functional language, you'll appreciate what a
massive achievement this is.

However, Python is already a procedural language,
so there's no need to use monads there to fake
procedural code. It would be as pointless as
putting training wheels on a tricycle.

With regard to parser combinators, there are various
things at work in the functional language context that
don't apply to Python.

Using monads and parser combinators in a functional
language, you can write parsers that look very much
like a BNF grammar or whatever your favourite notation
is -- *without* needing any special tool to compile
and execute them.

This is largely due to the general sparseness of the
syntax, and the ability to define new infix operators,
neither of which apply to Python. You can't translate
the code into Python without losing most of the notational
convenience which accounts for a large part of the
technique's attractiveness.

The idea of composing parsing functions can still
be useful, however. For instance you might have a
function with the signature

def p_separated_list(p_item, p_separator):

where p_item and p_separator are parsers for
list items and separator tokens. Which incidentally
could be an answer to

> What I haven't seen is an example of something clearly useful
> (wrt parsing specifically) *beyond* those.

Note that the implementation of p_separated_list
need not be quite the standard version you might
find in a library. In a parser for Python code, for
example, it would take care of that quirky comma-at-
the-end-of-a-sequence thing.

> I will give parser combinators a try; they do sound interesting to me. More
> interesting would be to hear about that "new programming language"!

If you're interested in examples of monad use for
parsing and compiling, I had a fit of insanity last
year in which I implemented a compiler in Hugs using
monads for several different things, including parsing.
If you'd like to see it, I could dig it out and put it
on a web page somewhere.

Greg

Stackless & String-processing [ In reply to ]

Jul 22, 1999, 6:46 PM

Post #37 of 39 (565 views)

In article <3795F4E4.8330755F@appliedbiometrics.com>,
Christian Tismer <tismer@appliedbiometrics.com> wrote:
>
>I believe this thread would not have evolved if I had not
>begun the Stackless Python / continuation/coroutine/generator
>talk. Now I'm asking to excuse my limited understanding.
>
>Can you please be all a little more concrete?
>How would your patterns look in Python?
>Please show some small example code pieces.

I am working through the Hutton/Meijer paper, and (slowly!)
implementing the parsers they describe. Basically, the important part
isn't the "monadic" part, but the "combinator" part. The idea is that
you can/should write primitive parsers, and then write higher-order
functions to weave those little parsers into a bigger parser. It's a
nice way to write recursive descent parsers without sinking into
spaghetti code, basically.

(This is kind of hard to do right in Python, because the lack of
lexical scoping and easy-to-use metaclasses makes it trickier to mimic
combinators they describe exactly. But it's still a fun project.)

One of the consequences of this approach is that outputting all the
parses for an ambiguous grammar as a stream of parse trees is very
straightforward. In Haskell or Gofer, you could rely on the laziness
of the language to avoid unnecessary computations, but in Python
it would be not-fun to wait for Python to compute the whole list.

Here's a (simple, rough-draft) combinator I've written:

class Bind:
def __init__(self, parser1, func):
self.parser1 = parser1
self.func = func # func(v :: AST) -> Parser; e.g. Result
def __call__(self, input):
parse_list = self.parser1(input)
lst = []
for ast, s in parse_list:
foo = self.func(ast)(s)
for elt in foo:
lst.append( elt ) # <-- a generator would help a lot here!
return lst

I would love to rewrite the __call__ method so that it returns a
generator that returns parses one at a time rather than returning a
list. Note that the combinator approach magnifies the benefits of
generators, since parsers are normally built through composition, and
each sub-parser can potentially generate a sequence of parses.
Laziness is a real win here.

Also, it might make sense to add a xreadlines() method to file
objects, which returns the lines of a file one at a time, because then
you can write code in the natural style of:

for line in file.xreadlines():
blah_blah_blah(line)

without having to read the whole file into memory. (It would
also likely eliminate about 80% of the push for assignment
expressions.)

Neel

Stackless & String-processing [ In reply to ]

graham at sloth

Jul 23, 1999, 7:24 AM

Post #38 of 39 (554 views)

[Tim]
: > If it isn't clear yet <wink>, efficiency isn't driving Graham
: > here. He's found a cool new approach, and is having fun with it.
[Graham Matthews]
: > Efficiency drives all my programs *when it's a necessity*.
[Tim]
: Ack, I'm speaking English here. If efficiency isn't your *primary* concern,
: then efficiency isn't "driving" you. Dictionary. Look up.

No Tim I suggest you look up the dictionary. The dictionary definition of
"driven" does not preclude one being driven by more than one concern (and
indeed driven by more than one concern each of equal importance). For
example I am driven by efficiency, correctness, and expressiveness in
roughly equal importance. So efficiency is driving me. Perhaps the motto
here is that he who lives by the dictionary shall die by the dictionary.

At this point I am signing off this thread. It is no longer remotely
relevant to the group. I refer all questions about combinators to
Tim, who, although he has never written a combinator based parser,
appears to know everything about them. Oh and he can look up words in
dictionary too, although perhaps not understand the definition he finds.

graham

--
Black velvet in that little boy's smile
Black velvet and a soft southern sky
A new religion that'll bring you to your knees
Black velvet, if you please

Stackless & String-processing [ In reply to ]

greg.ewing at compaq

Jul 28, 1999, 1:16 PM

Post #39 of 39 (556 views)

Christian Tismer wrote:
>
> def backtrack():
> a = next_alternative(list)
> b = next_alternative(otherlist)
> c = next_alternative(listwhatever)
> if specialcondition(a, b, c):
> return 42
> fail()

This might not do what you seem to think
it will do under Icon semantics. For each value
generated for a, you will get all the values of
b, etc., so you'll get the cross product of
the three lists. If you were expecting a parallel
iteration, that's not what you'll get.

This illustrates why I don't like this sort of
implicit generator syntax -- often it's not obvious
what it's going to do. Switching languages for a
moment, rewriting the above in Scheme it comes
out as:

(define (backtrack cont)
(for-each-alternative list
(lambda (a)
(for-each-alternative otherlist
(lambda (b)
(for-each-alternative listwhatever
(lambda (c)
(if (specialcondition a b c)
(cont 42)
'()))))))))

which makes it perfectly clear that you have
three nested loops.

Currently there is no elegant way to write that
sort of thing in Python. Either you have to split
the thunks off into separate procedures, or put up
with a seriously crippled lambda syntax. Either
way, you get namespace problems. That's why I
suggested an extension, which would allow you to
write

def backtrack(cont):
for_each_alternative(list) -> (a):
for_each_alternative(otherlist) -> (b):
for_each_alternative(listwhatever) -> (c):
if specialcondition(a, b, c):
cont(42)

which is actually one line shorter than the
generator version, and (to my way of thinking)
the control flow is clearer.

Greg