Mailing List Archive

Re: [Patches] selfnanny.py: checking for "self" in every method
On Sat, 4 Mar 2000, Guido van Rossum wrote:

> Before we all start writing nannies and checkers, how about a standard
> API design first?

I thoroughly agree -- we should have a standard API. I tried to write
selfnanny so it could be callable from any API possible (e.g., it can
take either a file, a string, an ast or a tuple representation)

> I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.

Very cool: what I imagine is a sort of modular PyLint.

> I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

Mine definitely isn't: it's designed to run both like a script and like
a module. One outstanding bug: no docos. To be supplied upon request <0.5
particular nanny is worth while.

> Since parsing is expensive, we probably want to share the parse tree.

Yes. Probably as an AST, and transform to tuples/lists inside the
checkers.

> Ideas?

Here's a strawman API:
There's a package called Nanny
Every module in that package should have a function called check_ast.
It's argument is an AST object, and it's output should be a list
of three-tuples: (line-number, error-message, None) or
(line-number, error-message, (column-begin, column-end)) (each tuple can
be a different form).

Problems?
(I'm CCing to python-dev. Please follow up to that discussion to
python-dev only, as I don't believe it belongs in patches)
--
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html
Re: selfnanny.py / nanny architecture [ In reply to ]
> > Guido van Rossum wrote:
> > Before we all start writing nannies and checkers, how about a standard
> > API design first?

> Moshe Zadka wrote:
> Here's a strawman API:
> There's a package called Nanny
> Every module in that package should have a function called check_ast.
> It's argument is an AST object, and it's output should be a list
> of three-tuples: (line-number, error-message, None) or
> (line-number, error-message, (column-begin, column-end)) (each tuple can
> be a different form).

Greg Wilson wrote:

The SUIF (Stanford University Intermediate Format) group has been working
on an extensible compiler framework for about ten years now. The
framework is based on an extensible AST spec; anyone can plug in a new
analysis or optimization algorithm by writing one or more modules that
read and write decorated ASTs. (See http://suif.stanford.edu for more
information.)

Based on their experience, I'd suggest that every nanny take an AST as an
argument, and add complaints in place as decorations to the nodes. A
terminal nanny could then collect these and display them to the user. I
think this architecture will make it simpler to write meta-nannies.

I'd further suggest that the AST be something that can be manipulated
through DOM, since (a) it's designed for tree-crunching, (b) it's already
documented reasonably well, (c) it'll save us re-inventing a wheel, and
(d) generating human-readable output in a variety of customizable formats
ought to be simple (well, simpler than the alternatives).

Greg
RE: [Patches] selfnanny.py: checking for "self" in every method [ In reply to ]
[Guido van Rossum]
> Before we all start writing nannies and checkers, how about a standard
> API design first? I will want to call various nannies from a "Check"
> command that I plan to add to IDLE. I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

I like Moshe's suggestion fine, except with an abstract base class named
Nanny with a virtual method named check_ast. Nannies should (of course)
derive from that.

> Since parsing is expensive, we probably want to share the parse tree.

What parse tree? Python's parser module produces an AST not nearly "A
enough" for reasonably productive nanny writing. GregS & BillT have
improved on that, but it's not in the std distrib. Other "problems" include
the lack of original source lines in the trees, and lack of column-number
info.

Note that by the time Python has produced a parse tree, all evidence of the
very thing tabnanny is looking for has been removed. That's why she used
the tokenize module to begin with.

God knows tokenize is too funky to use too when life gets harder (check out
checkappend.py's tokeneater state machine for a preliminary taste of that).

So the *only* solution is to adopt Christian's Stackless so I can rewrite
tokenize as a coroutine like God intended <wink>.

Seriously, I don't know of anything that produces a reasonably usable (for
nannies) parse tree now, except via modifying a Python grammar for use with
John Aycock's SPARK; the latter also comes with very pleasant & powerful
tree pattern-matching abilities. But it's probably too slow for everyday
"just folks" use. Grabbing the GregS/BillT enhancement is probably the most
practical thing we could build on right now (but tabnanny will have to
remain a special case).

unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs - tim
Re: RE: [Patches] selfnanny.py: checking for "self" in every method [ In reply to ]
On Sat, 4 Mar 2000, Tim Peters wrote:

> I like Moshe's suggestion fine, except with an abstract base class named
> Nanny with a virtual method named check_ast. Nannies should (of course)
> derive from that.

Why? The C++ you're programming damaged your common sense cycles?

> > Since parsing is expensive, we probably want to share the parse tree.
>
> What parse tree? Python's parser module produces an AST not nearly "A
> enough" for reasonably productive nanny writing.

As a note, selfnanny uses the parser module AST.

> GregS & BillT have
> improved on that, but it's not in the std distrib. Other "problems" include
> the lack of original source lines in the trees,

The parser module has source lines.

> and lack of column-number info.

Yes, that sucks.

> Note that by the time Python has produced a parse tree, all evidence of the
> very thing tabnanny is looking for has been removed. That's why she used
> the tokenize module to begin with.

Well, it's one of the few nannies which would be in that position.

> God knows tokenize is too funky to use too when life gets harder (check out
> checkappend.py's tokeneater state machine for a preliminary taste of that).

Why doesn't checkappend.py uses the parser module?

> Grabbing the GregS/BillT enhancement is probably the most
> practical thing we could build on right now

You got some pointers?

> (but tabnanny will have to remain a special case).

tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z.

--
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html
RE: RE: [Patches] selfnanny.py: checking for "self" inevery method [ In reply to ]
[Tim]
>> [make Nanny a base class]

[Moshe Zadka]
> Why?

Because it's an obvious application for OO design. A common base class
formalizes the interface and can provide useful utilities for subclasses.

> The C++ you're programming damaged your common sense cycles?

Yes, very, but that isn't relevant here <wink>. It's good Python sense too.

>> [parser module produces trees far too concrete for comfort]

> As a note, selfnanny uses the parser module AST.

Understood, but selfnanny has a relatively trivial task. Hassling with
tuples nested dozens deep for even relatively simple stmts is both a PITA
and a time sink.

>> [parser doesn't give source lines]

> The parser module has source lines.

No, it does not (it only returns terminals, as isolated strings). The
tokenize module does deliver original source lines in their entirety (as
well as terminals, as isolated strings; and column numbers).

>> and lack of column-number info.

> Yes, that sucks.

> ...
> Why doesn't checkappend.py uses the parser module?

Because it wanted to display the acutal source line containing an offending
"append" (which, again, the parse module does not supply). Besides, it was
a trivial variation on tabnanny.py, of which I have approximately 300 copies
on my disk <wink>.

>> Grabbing the GregS/BillT enhancement is probably the most
>> practical thing we could build on right now

> You got some pointers?

Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
transformer.py from the zip file. The latter supplies a very useful
post-processing pass over the parse module's output, squashing it *way*
down.
RE: RE: [Patches] selfnanny.py: checking for "self" inevery method [ In reply to ]
On Sun, 5 Mar 2000, Tim Peters wrote:

> [Tim]
> >> [make Nanny a base class]
>
> [Moshe Zadka]
> > Why?
>
> Because it's an obvious application for OO design. A common base class
> formalizes the interface and can provide useful utilities for subclasses.

The interface is just one function. You're welcome to have a do-nothing
nanny that people *can* derive from: I see no point in making them derive
from a base class.

> > As a note, selfnanny uses the parser module AST.
>
> Understood, but selfnanny has a relatively trivial task.

That it does, and it was painful.

> >> [parser doesn't give source lines]
>
> > The parser module has source lines.
>
> No, it does not (it only returns terminals, as isolated strings).

Sorry, misunderstanding: it seemed obvious to me you wanted line numbers.
For lines, use the linecache module...

> > You got some pointers?
>
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the zip file.

I'll have a look.
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html
Re: RE: [Patches] selfnanny.py: checking for "self" inevery method [ In reply to ]
> >> [parser doesn't give source lines]
>
> > The parser module has source lines.
>
> No, it does not (it only returns terminals, as isolated strings). The
> tokenize module does deliver original source lines in their entirety (as
> well as terminals, as isolated strings; and column numbers).

Moshe meant line numbers - -it has those.

> > Why doesn't checkappend.py uses the parser module?
>
> Because it wanted to display the acutal source line containing an offending
> "append" (which, again, the parse module does not supply). Besides, it was
> a trivial variation on tabnanny.py, of which I have approximately 300 copies
> on my disk <wink>.

Of course another argument for making things more OO. (The code used
in tabnanny.py to process files and recursively directories fronm
sys.argv is replicated a thousand times in various scripts of mine --
Tim took it from my now-defunct takpolice.py. This should be in the
std library somehow...)

> >> Grabbing the GregS/BillT enhancement is probably the most
> >> practical thing we could build on right now
>
> > You got some pointers?
>
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the zip file. The latter supplies a very useful
> post-processing pass over the parse module's output, squashing it *way*
> down.

Those of you who have seen the compiler-sig should know that Jeremy
made an improvement which will find its way into p2c. It's currently
on display in the Python CVS tree in the nondist branch: see
http://www.python.org/pipermail/compiler-sig/2000-February/000011.html
and the ensuing thread for more details.

--Guido van Rossum (home page: http://www.python.org/~guido/)
RE: RE: [Patches] selfnanny.py: checking for "self" inevery method [ In reply to ]
>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

>>> Grabbing the GregS/BillT enhancement is probably the most
>>> practical thing we could build on right now

>> You got some pointers?

TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and
TP> grab transformer.py from the zip file. The latter supplies a
TP> very useful post-processing pass over the parse module's output,
TP> squashing it *way* down.

The compiler tools in python/nondist/src/Compiler include Bill &
Greg's transformer code, a class-based AST (each node is a subclass of
the generic node), and a visitor framework for walking the AST.

The APIs and organization are in a bit of flux; Mark Hammond suggested
some reorganization that I've not finished yet. I may finish it up
this evening.

The transformer module does a good job of incuding line numbers, but
I've occasionally run into a node that didn't have a lineno
attribute when I expected it would. I haven't taken the time to
figure out if my expection was unreasonable or if the transformer
should be fixed.

The compiler-sig might be a good place to discuss this further. A
warning framework was one of my original goals for the SIG. I imagine
we could convince Guido to move warnings + compiler tools into the
standard library if they end up being useful.

Jeremy