Mailing List Archive

Re: Source Filters, the next generation
Paul Marquess wrote :
|| The scope of a filter is limited to the current file *only*. So any
|| files that get included using 'require' or 'use' will not be affected.
||
|| A Filter remains active until
||
|| 1 the end of the file
|| 2 *IF* the filter has been coded to support it, a filter can be
|| removed using 'no'.
||
|| I would imagine that *most* filters will use 1 above.

Is there not also:

3 the filter removes itself

That would be useful for cases where the format of the filtered
text includes an explicit termination method. For example, a
uudecode filter would terminate itself when it found the final
'end' line in its input stream.

--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |
Re: Source Filters, the next generation [ In reply to ]
> From: Paul Marquess <pmarquess@bfsec.bt.co.uk>
>
> From: Tim Bunce <Tim.Bunce@ig.co.uk>
> >
> > > From: Paul Marquess <pmarquess@bfsec.bt.co.uk>
> > >
> > > Below is a complete source filter which will substitute every occurence
> > > of the string "Today" into the string "Tomorrow". 'new' is used to
> > > instantiate a new copy of the source filter and the 'filter' method is
> > > the place where all the filtering takes place. I'm only concerned here
> > > about the operation of the filter method.
> > >
> > > package Future ;
> > >
> > > sub new { bless [] }
> > >
> > > sub filter
> > > {
> > > my ($self, $buffer) = @_ ;
> > > my ($status) ;
> > >
> > > if ($status = filter_read($buffer)) {
> > > $$buffer =~ s/Today/Tomorrow/g
> > > }
> > > $status ;
> > > }
> > > 1 ;
> > >
> > > The $buffer parameter is a reference to a scalar. The filter is
> > > expected to return the source (after filtering it) via this reference.
> > >
> > > filter_read is used by the filter to say "give me another line". It
> > > expects its first parameter to be a reference to a scalar as well and,
> > > as I'm sure you have guessed, it returns the source line via that
> > > reference.
> > >
> > > The use of references seems quite efficient (it does seem to cut down
> > > on the amount of unnecessary copying of buffers) but I'm a bit uneasy
> > > about it.
> > >
> > > So the question is - can anyone think of an interface which is cleaner
> > > but is still efficient? Having lived with the current interface for a
> > > while I think I could accept with the way I've done it, but I would
> > > like to improve it if possible.
> > >
> > I think my original suggestion was to localise and use $_:
> >
> > sub filter
> > {
> > my $self = shift ;
> > my $status ;
> >
> > if ($status = filter_read()) {
> > s/Today/Tomorrow/g
> > }
> > $status ;
> > }
> > 1 ;
> >
> > I'm not quite sure what the right save_*() or SAVE*() function/macro to
> > use is (maybe someone else does, should be easy to find) but I think
> > this is a good approach.
>
> Ah yes, I remember now.
>
> The localised $_ model looks great for trivial filters, like the one
> above. I just wonder if it would be appropriate for a filter which
> required even a little bit more logic... then again, it might be ok :-)
>
You could still pass the reference in. Trivial filters would ignore it,
complex ones could use it to store state if needed.

> Either way, I think it's worth investigating, so I could do with a few
> hints on how to go about it.
>
> Here is my first attempt. This code is used to call the 'filter'
> method. The parts to do with manipulating $_ are highlighed.
>
> /* remember the current idx */
> av_push(idx_stack, newSViv(idx)) ;
> /* save current $_ */ <==========
> av_push(def_stack, GvSV(defgv)) ; <==========
> /* make $_ use our buffer */ <==========
> GvSV(defgv) = OUTPUT_SV(my_sv) ; <==========
>
> [...]
>
> Although the above code does seem to work ok, I would really like some
> feedback on its correctness.

I think the av_push's and explicit stacks should be changed to use
the right save_*() or SAVE*() function/macros from scope.c (I wish
they were documented) and thus make implicit use of perl's own stacks.
If you don't do this you must add G_EVAL to perl_call_method()
otherwise a die will mess up your stacks.

> > But why do it in perl? Why not find a public domain CPP and include that?
> > Much less effort and faster too.
>
> Good point, anyone know of a public domain cpp implementation we could
> use?
>
Not off hand. Suggestions anyone?

> Paul
>
Tim.
Re: Source Filters, the next generation [ In reply to ]
>>>>> "Paul" == Paul Marquess <pmarquess@bfsec.bt.co.uk> writes:
In article <9508180943.AA21266@claudius.bfsec.bt.co.uk> pmarquess@bfsec.bt.co.uk (Paul Marquess) writes:


Paul> Below is a complete source filter which will substitute every
Paul> occurence of the string "Today" into the string
Paul> "Tomorrow". 'new' is used to instantiate a new copy of the
Paul> source filter and the 'filter' method is the place where all
Paul> the filtering takes place. I'm only concerned here about the
Paul> operation of the filter method.

Paul> package Future ;

Paul> sub new { bless [] }

Paul> sub filter { my ($self, $buffer) = @_ ; my ($status) ;

Paul> if ($status = filter_read($buffer)) { $$buffer =~
Paul> s/Today/Tomorrow/g } $status ; }

Paul> 1 ;

This scheme works very well for simple transformations, but not that
well for transformations that don't preserve line boundaries (like
preprocessors form a different, application-specific language). In such
contexts the speed advantage might get a disadvantage, even. On the
other hand, these preprocessor will probably be slow anyway.


On a slightly different point, how will different layers of
preprocessors work, like a compressed ppp input ? Will this work out
of the box, or does it need thinking about?

Jost

--
Jost Krieger, Rechenzentrum der Ruhr-Universitaet Bochum
Jost.Krieger@rz.ruhr-uni-bochum.de
C=de;AD=d400;PD=ruhr-uni-bochum;OU=rz;OU=ruba;S=Krieger;G=Jost;
Re: Source Filters, the next generation [ In reply to ]
Jost Krieger wrote :
||
|| This scheme works very well for simple transformations, but not that
|| well for transformations that don't preserve line boundaries (like
|| preprocessors form a different, application-specific language). In such
|| contexts the speed advantage might get a disadvantage, even. On the
|| other hand, these preprocessor will probably be slow anyway.
||
|| On a slightly different point, how will different layers of
|| preprocessors work, like a compressed ppp input ? Will this work out
|| of the box, or does it need thinking about?

The complicated transform would use a separate buffer variable
for its "input" and its "output". It would extract lines,
words, or bytes (whichever was most appropriate) from its input
buffer, and transform them into lines (or what ever) in its
output buffer. It would return the output buffer whenever it
seemed sensible to do so, and it would refill its input buffer
whenever neccessary.

Multiple layers work out of the box, to any depth - it has been
thought about.

--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |
Re: Source Filters, the next generation [ In reply to ]
Re: Source Filters, the next generation [ In reply to ]
Re: Source Filters, the next generation [ In reply to ]
> From: jmm@elegant.com (John Macdonald)
>
> Paul Marquess wrote :
> || The scope of a filter is limited to the current file *only*. So any
> || files that get included using 'require' or 'use' will not be affected.
> ||
> || A Filter remains active until
> ||
> || 1 the end of the file
> || 2 *IF* the filter has been coded to support it, a filter can be
> || removed using 'no'.
> ||
> || I would imagine that *most* filters will use 1 above.
>
> Is there not also:
>
> 3 the filter removes itself
>
> That would be useful for cases where the format of the filtered
> text includes an explicit termination method. For example, a
> uudecode filter would terminate itself when it found the final
> 'end' line in its input stream.

Quite true, forgot about that one.

Paul
Re: Source Filters, the next generation [ In reply to ]
From: jmm@elegant.com (John Macdonald)
>
> Paul Marquess wrote :
> || Fisrt off, I'm thinking of changing the way a Perl Source Filter (i.e.
> || a Source Filter written in Perl) gets installed. For those of you who
> || havn't been paying attention, it currently works like this
> [ ... ]
> || Although this works fine, it doesn't seem quite right. What I think would
> || be better is for the end user to be able to type
> ||
> || use myfilter ;
> ||
> || and have the filter module to 'attach' itself to the source filter
> || low level code. Something along these lines:
> [ ... ]
> || I personally think that this is a *much* better interface. Any
> || comments before I go and implement it?
>
> Excellent. This means that a filter can be invisibly changed from
> a pipelined invokation of an external program to an internal perl
> or C subroutine without causing any change in how it gets invoked
> by a user's program - which *shouldn't* have to care about the
> way the filter happened to be implemented. Good idea.

To be honest, I hadn't thought of the other types of filter - the perl
one was the only one I was thinking of.

Now that you mention it though, I do think it would be a good idea to
allow the implementation to be hidden.

To allow that to work properly, how about if I restructure the filters
along these lines:

Low level filter interface: Filter::call, Filter::exec and direct C interface.

These are not meant to be called directly by the end user and would
involve redoing Filter::exec to work more like the new Filter::call
interface.

Application Filters: Filter::cpp, Filter::tee and a work-alike Filter::exec.

This then raises the question of naming the filters. The low level ones
could get renamed

Filter::util::call
Filter::util::exec

Does this sound like overkill?

> || Below is a complete source filter which will substitute every occurence
> || of the string "Today" into the string "Tomorrow". 'new' is used to
> || instantiate a new copy of the source filter and the 'filter' method is
> || the place where all the filtering takes place. I'm only concerned here
> || about the operation of the filter method.
>
> I expect that there will be two types of filters - textual
> transforms and binary manipulations. The s/Today/Tomorrow/ is
> an example of a textual transform. It is characterized by
> getting text lines as input and making some changes to that
> input "in place" and not needing to keep much state and input
> data around from one invokation to the next. Something like
> uncompress is an example of a binary manipulation - it gets
> binary data with no fixed "lines" involved, and can have some
> trailing data from one block that cannot be processed until it
> goes on to read the next block (yet it has already filtered
> enough output data from the block that there is no need to get
> another input block immediately).

I agree.

> It might be useful to provide a second form of filter routine
> for the textual filter type. It could have the calling
> environment already call filter_read before the filter routine
> is called - so that buffer already contains input data to be
> transformed. The filter routine would just do the transform, on
> the buffer in place. So, your sample Future filter would be
> something like:
>
> || package Future ;
> ||
> || sub new { bless [] }
> ||
> || sub transform_filter
> || {
> || my ($self, $buffer) = @_ ;
> ||
> || $$buffer =~ s/Today/Tomorrow/g
> || }
> ||
> || 1 ;
>
> (It might even be better to have transform filters be called as
> subroutines rather than as methods, and not be given the $self
> argument.)
>
> The point of all this is to make textual transform filters
> *very* simple to write be hiding all of the interface stuff up
> in the calling area (and possibly making it faster too in the
> process).
>
> The current filter interface would still be used for binary
> manipulation filters that might have to use a different buffer
> for reading than what it fills with filtered data, and it would
> have an object available from the method interface - for keeping
> any history necessary like expansion tables for uncompress,
> definitions for cpp, etc.

Interesting. I'll stick it in the source filters todo list.


Paul
Re: Source Filters, the next generation [ In reply to ]
From: Tim Bunce <Tim.Bunce@ig.co.uk>
>
>
> > From: Paul Marquess <pmarquess@bfsec.bt.co.uk>
> >
> > First off, I'm thinking of changing the way a Perl Source Filter (i.e.
> > a Source Filter written in Perl) gets installed. For those of you who
> > havn't been paying attention, it currently works like this
> >
> > use Filter::call qw(myfilter args) ;
> >
> > where 'myfilter' is the name of the Perl Source Filter module and
> > 'args' are any optional parameters that the filter may require. The
> > users filter module then has this sort of structure
> >
> > package myfilter ;
> > sub new { bless [] }
> > sub filter { code to filter source }
> >
> > Although this works fine, it doesn't seem quite right. What I think would
> > be better is for the end user to be able to type
> >
> > use myfilter ;
> >
> > and have the filter module to 'attach' itself to the source filter
> > low level code. Something along these lines:
> >
> > package myfilter ;
> > use Filter::call ;
> > sub filter { ... }
> > sub import { attach to Filter::call }
> >
> Fine by me.
>
> > This means that if the fabled Perl pre-processor (ppp) ever gets
> > implemented it can be used like this
> >
> > use ppp ;
> >
> > rather than this
> >
> > use Filter::call 'ppp' ;
> >
> > I personally think that this is a *much* better interface. Any
> > comments before I go and implement it?
> >
> Nope. Go for it!

ok.

> > Below is a complete source filter which will substitute every occurence
> > of the string "Today" into the string "Tomorrow". 'new' is used to
> > instantiate a new copy of the source filter and the 'filter' method is
> > the place where all the filtering takes place. I'm only concerned here
> > about the operation of the filter method.
> >
> > package Future ;
> >
> > sub new { bless [] }
> >
> > sub filter
> > {
> > my ($self, $buffer) = @_ ;
> > my ($status) ;
> >
> > if ($status = filter_read($buffer)) {
> > $$buffer =~ s/Today/Tomorrow/g
> > }
> > $status ;
> > }
> > 1 ;
> >
> >
> > The $buffer parameter is a reference to a scalar. The filter is
> > expected to return the source (after filtering it) via this reference.
> >
> > filter_read is used by the filter to say "give me another line". It
> > expects its first parameter to be a reference to a scalar as well and,
> > as I'm sure you have guessed, it returns the source line via that
> > reference.
> >
> > The use of references seems quite efficient (it does seem to cut down
> > on the amount of unnecessary copying of buffers) but I'm a bit uneasy
> > about it.
> >
> > So the question is - can anyone think of an interface which is cleaner
> > but is still efficient? Having lived with the current interface for a
> > while I think I could accept with the way I've done it, but I would
> > like to improve it if possible.
> >
> I think my original suggestion was to localise and use $_:
>
> sub filter
> {
> my $self = shift ;
> my $status ;
>
> if ($status = filter_read()) {
> s/Today/Tomorrow/g
> }
> $status ;
> }
> 1 ;
>
> I'm not quite sure what the right save_*() or SAVE*() function/macro to
> use is (maybe someone else does, should be easy to find) but I think
> this is a good approach.

Ah yes, I remember now.

The localised $_ model looks great for trivial filters, like the one
above. I just wonder if it would be appropriate for a filter which
required even a little bit more logic... then again, it might be ok :-)

Either way, I think it's worth investigating, so I could do with a few
hints on how to go about it.

Here is my first attempt. This code is used to call the 'filter'
method. The parts to do with manipulating $_ are highlighed.

dSP ;
int count ;


/* remember the current idx */
av_push(idx_stack, newSViv(idx)) ;

/* save current $_ */ <==========
av_push(def_stack, GvSV(defgv)) ; <==========
/* make $_ use our buffer */ <==========
GvSV(defgv) = OUTPUT_SV(my_sv) ; <==========

PUSHMARK(sp) ;
XPUSHs(PERL_OBJECT(my_sv)) ;
PUTBACK ;
count = perl_call_method("filter", 0) ;
SPAGAIN ;
if (count != 1)
croak("Filter::call - %s::filter returned %d values, 1 was expec
ted \n",
count, PERL_MODULE(my_sv)) ;

n = POPi ;
if (fdebug)
warn("status = %d, length op buf = %d\n",
n, SvCUR(OUTPUT_SV(my_sv))) ;
PUTBACK ;

sv_free(av_pop(idx_stack)) ;

/* restore $_ */
GvSV(defgv) = av_pop(def_stack) ; <============


and filter_read gets the $_ buffer like this

SV * buffer = GvSV(defgv) ;

Although the above code does seem to work ok, I would really like some
feedback on its correctness.


> > So... what do you think about the idea of having a ppp? I *might* be
> > prepared to give this a go, *if* there is enough interest.
> >
> Yes please. Certainly doing #define's, #if's and #ifdef's should be
> straightforward. Doing macro substitutions with parameters might be a
> little more tricky!

*If* I do it, it would only be #if #elsif #else #endif #define and
#undef. I wouldn't even attempt macros with parameters.

> But why do it in perl? Why not find a public domain CPP and include that?
> Much less effort and faster too.

Good point, anyone know of a public domain cpp implementation we could
use?

Only thing with using an existing cpp implementation is interfacing to
it. The simplest way would be to use Filter::exec, but I would kinda
like to avoid the sub-process method.

Making an existing cpp work with the filter_read model might be a
non-trivial task.

Paul
Re: Source Filters, the next generation [ In reply to ]
In <9508211734.aa11068@post.demon.co.uk>
On Mon, 21 Aug 1995 16:54:14 +0100
Tim Bunce <Tim.Bunce@ig.co.uk> writes:
>>
>> Good point, anyone know of a public domain cpp implementation we could
>> use?
>>
>Not off hand. Suggestions anyone?
>
It is GPL'ed but gcc-2.7.* introduces cpp 'library', worryingly it
is not used by default, except for fixincludes.
Re: Source Filters, the next generation [ In reply to ]
> From: Paul Marquess <pmarquess@bfsec.bt.co.uk>
>
> The more I think about the use of $_ for the buffer, the more I like it.
>
:-)

> I *might* make one change though. Currently filter_read explicitly
> returns the status and implicitly returns the buffer in $_. If
> filter_read is called in an array context, it could return the
> status/buffer pair explicitly.
>
> ($status, $buffer) = filter_read
>
> Similarly the filter method itself could either return only the status
> (with the buffer implicitly in $_) or a status/buffer pair.
>
> That could allow more complex filters to avoid possible confusion with
> the use of $_.
>
> What do you think?
>
Is it worth it? (To be honest, at 4:30am I'm past caring :-)

Umm, if list context is optional and filters are stacked you might
need to juggle things ($_/$buffer) around in the interface layer
between two filters (ramble ramble). Either that or enforce
'list context' == 'return two values' else croak.

I'd be tempted to avoid the issue/complexity. Keep it simple etc.

> OK, I think I've got the SAVE* stuff figured out.
>
Great! Care to documents it!

:-)

> Paul
>
Tim.