Mailing List Archive

Is it possible to view tokenizer output?
Hi, I'm just getting into the CPython codebase just for fun, and I've
just started messing around with the tokenizer and the grammar. I was
wondering, is there a way to just print out the results of the tokenizer
(as in just the stream of tokens it generates) in a human readable
format? It would be really helpful for debugging. Hope the question's
not too basic.

Cheers.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2ZTZBAN5H2ET2IB7EXTKD27R5T6QVHZB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Is it possible to view tokenizer output? [ In reply to ]
Le 30/05/2022 à 00:59, Jack a écrit :
> Hi, I'm just getting into the CPython codebase just for fun, and I've
> just started messing around with the tokenizer and the grammar. I was
> wondering, is there a way to just print out the results of the
> tokenizer (as in just the stream of tokens it generates) in a human
> readable format? It would be really helpful for debugging. Hope the
> question's not too basic.

python -m tokenize file.py

?

See https://docs.python.org/3/library/tokenize.html#command-line-usage

Cheers,
Jean


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AKIHN3EVNBRJCOLR4ABXV7OADYKXKKUU/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Is it possible to view tokenizer output? [ In reply to ]
Thanks! I didn't even know about that module. Does this take into
account your local changes to the tokenizer, though? I've added a new
token type to Grammar/Tokens, and some code to tokenizer.c to return
that token type in appropriate circumstances. I've stepped through the
tokenizer in the debugger, so I /think /it's working. When I run -m
tokenize as you suggest, I don't see my custom token type.

The devguide mentions that "|Lib/tokenize.py| needs changes to match
changes to the tokenizer.", so I'm guessing I would have to manually
repeat my changes in tokenize.py to see them, right? But what I want to
see is what tokenizer.c is producing when my newly built Python binary
actually reads a file.

On 30/05/2022 00:09, Jean Abou Samra wrote:
>
>
> Le 30/05/2022 à 00:59, Jack a écrit :
>> Hi, I'm just getting into the CPython codebase just for fun, and I've
>> just started messing around with the tokenizer and the grammar. I was
>> wondering, is there a way to just print out the results of the
>> tokenizer (as in just the stream of tokens it generates) in a human
>> readable format? It would be really helpful for debugging. Hope the
>> question's not too basic.
>
> python -m tokenize file.py
>
> ?
>
> See https://docs.python.org/3/library/tokenize.html#command-line-usage
>
> Cheers,
> Jean
>
>
Re: Is it possible to view tokenizer output? [ In reply to ]
python -m tokenize < file-to-parse.py

See the comment at the top of tokenize.py. IIRC, it re-implements the
tokenizer, it does not call the one used for python code.

Eric

On 5/29/2022 6:59 PM, Jack wrote:
> Hi, I'm just getting into the CPython codebase just for fun, and I've
> just started messing around with the tokenizer and the grammar. I was
> wondering, is there a way to just print out the results of the
> tokenizer (as in just the stream of tokens it generates) in a human
> readable format? It would be really helpful for debugging. Hope the
> question's not too basic.
>
> Cheers.
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/2ZTZBAN5H2ET2IB7EXTKD27R5T6QVHZB/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EZ3E7VMT6WAT23EWWLV3O4CXWXYUGJWB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Is it possible to view tokenizer output? [ In reply to ]
Well, I just stuck a print statement in _PyTokenizer_Get() and it's done
the job for me, right now.

Thanks,

Jack

On 30/05/2022 00:36, Eric V. Smith wrote:
> python -m tokenize < file-to-parse.py
>
> See the comment at the top of tokenize.py. IIRC, it re-implements the
> tokenizer, it does not call the one used for python code.
>
> Eric
>
> On 5/29/2022 6:59 PM, Jack wrote:
>> Hi, I'm just getting into the CPython codebase just for fun, and I've
>> just started messing around with the tokenizer and the grammar. I was
>> wondering, is there a way to just print out the results of the
>> tokenizer (as in just the stream of tokens it generates) in a human
>> readable format? It would be really helpful for debugging. Hope the
>> question's not too basic.
>>
>> Cheers.
>>
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/2ZTZBAN5H2ET2IB7EXTKD27R5T6QVHZB/
>> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PLOALW2GXI3MHWEEA6L2KDLGHZDP2NC6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Is it possible to view tokenizer output? [ In reply to ]
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
> python -m tokenize < file-to-parse.py
>
> See the comment at the top of tokenize.py. IIRC, it re-implements the
> tokenizer, it does not call the one used for python code.

Ah right, I would be surprised that there would be a public Python API
to get the tokenizer output, since there is no public C API for that
:-)

I just removed <token.h> header file since it was never usable outside
Python C internals: there is no public C API to just run the tokenizer
and gets its output.

Victor
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSPMJ5DLUCVBX3AAPRWOUOXYWEL/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Is it possible to view tokenizer output? [ In reply to ]
There is no *public* one but there is a private one accesible from Python I
added for testing purposes.

On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:

> On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
> > python -m tokenize < file-to-parse.py
> >
> > See the comment at the top of tokenize.py. IIRC, it re-implements the
> > tokenizer, it does not call the one used for python code.
>
> Ah right, I would be surprised that there would be a public Python API
> to get the tokenizer output, since there is no public C API for that
> :-)
>
> I just removed <token.h> header file since it was never usable outside
> Python C internals: there is no public C API to just run the tokenizer
> and gets its output.
>
> Victor
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSPMJ5DLUCVBX3AAPRWOUOXYWEL/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Is it possible to view tokenizer output? [ In reply to ]
Is on the main branch but as I mentioned is **exclusively** for internal
consumption:

https://github.com/python/cpython/blob/8136606769661c103c46d142e52ecbbbb88803f6/Lib/tokenize.py#L685

On Mon, 30 May 2022 at 17:37, Jack <jack.jjmillist@gmail.com> wrote:

> Hi Pablo, could you clarify please? Is that on the main branch, or would
> you be willing to share the code?
> On 30/05/2022 16:23, Pablo Galindo Salgado wrote:
>
> There is no *public* one but there is a private one accesible from Python
> I added for testing purposes.
>
> On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
>
>> On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
>> > python -m tokenize < file-to-parse.py
>> >
>> > See the comment at the top of tokenize.py. IIRC, it re-implements the
>> > tokenizer, it does not call the one used for python code.
>>
>> Ah right, I would be surprised that there would be a public Python API
>> to get the tokenizer output, since there is no public C API for that
>> :-)
>>
>> I just removed <token.h> header file since it was never usable outside
>> Python C internals: there is no public C API to just run the tokenizer
>> and gets its output.
>>
>> Victor
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSPMJ5DLUCVBX3AAPRWOUOXYWEL/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
Re: Is it possible to view tokenizer output? [ In reply to ]
Hi Pablo, could you clarify please? Is that on the main branch, or would
you be willing to share the code?

On 30/05/2022 16:23, Pablo Galindo Salgado wrote:
> There is no *public* one but there is a private one accesible from
> Python I added for testing purposes.
>
> On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
>
> On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com>
> wrote:
> > python -m tokenize < file-to-parse.py
> >
> > See the comment at the top of tokenize.py. IIRC, it
> re-implements the
> > tokenizer, it does not call the one used for python code.
>
> Ah right, I would be surprised that there would be a public Python API
> to get the tokenizer output, since there is no public C API for that
> :-)
>
> I just removed <token.h> header file since it was never usable outside
> Python C internals: there is no public C API to just run the tokenizer
> and gets its output.
>
> Victor
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSPMJ5DLUCVBX3AAPRWOUOXYWEL/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Is it possible to view tokenizer output? [ In reply to ]
You should maybe move the code out of the stdlib (to tests?) if it
should not be used. Otherwise, someone somehow will start to rely on
it, and then complain when it breaks :-)

Victor

On Mon, May 30, 2022 at 6:51 PM Pablo Galindo Salgado
<pablogsal@gmail.com> wrote:
>
> Is on the main branch but as I mentioned is **exclusively** for internal consumption:
>
> https://github.com/python/cpython/blob/8136606769661c103c46d142e52ecbbbb88803f6/Lib/tokenize.py#L685
>
> On Mon, 30 May 2022 at 17:37, Jack <jack.jjmillist@gmail.com> wrote:
>>
>> Hi Pablo, could you clarify please? Is that on the main branch, or would you be willing to share the code?
>>
>> On 30/05/2022 16:23, Pablo Galindo Salgado wrote:
>>
>> There is no *public* one but there is a private one accesible from Python I added for testing purposes.
>>
>> On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
>>>
>>> On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
>>> > python -m tokenize < file-to-parse.py
>>> >
>>> > See the comment at the top of tokenize.py. IIRC, it re-implements the
>>> > tokenizer, it does not call the one used for python code.
>>>
>>> Ah right, I would be surprised that there would be a public Python API
>>> to get the tokenizer output, since there is no public C API for that
>>> :-)
>>>
>>> I just removed <token.h> header file since it was never usable outside
>>> Python C internals: there is no public C API to just run the tokenizer
>>> and gets its output.
>>>
>>> Victor
>>> _______________________________________________
>>> Python-Dev mailing list -- python-dev@python.org
>>> To unsubscribe send an email to python-dev-leave@python.org
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSPMJ5DLUCVBX3AAPRWOUOXYWEL/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UXPSZFOKCKGHUERUVO7UPLZK3L53CGFW/
> Code of Conduct: http://python.org/psf/codeofconduct/



--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R43LSIR3JN7FHVHO26T3SOLKTAG3J4DQ/
Code of Conduct: http://python.org/psf/codeofconduct/