Mailing List Archive

ast.parse, ast.dump, but with comment preservation?
I wrote a little open-source tool to expose internal constructs in OpenAPI. Along the way, I added related functionality to:
- Generate/update a function prototype to/from a class
- JSON schema
- Automatically add type annotations to all function arguments, class attributes, declarations, and assignments

alongside a bunch of other features. All implemented using just the builtin modules (plus astor on Python < 3.9; and optionally black).

Now I'm almost at the point where I can run it—without issue—against, e.g., the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s because the comments aren't preserved (and there are some whitespace issues… but I should be able to resolve the latter).

Is the only viable solution available to rewrite around redbaron | libcst? - I don't need to parse the comments just dump them out unedited whence they're found…

Thanks for any suggestions

PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])
--
https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation? [ In reply to ]
On Thu, Dec 16, 2021 at 2:47 PM samue...@gmail.com
<samuelmarks@gmail.com> wrote:
>
> I wrote a little open-source tool to expose internal constructs in OpenAPI. Along the way, I added related functionality to:
> - Generate/update a function prototype to/from a class
> - JSON schema
> - Automatically add type annotations to all function arguments, class attributes, declarations, and assignments
>
> alongside a bunch of other features. All implemented using just the builtin modules (plus astor on Python < 3.9; and optionally black).
>
> Now I'm almost at the point where I can run it—without issue—against, e.g., the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s because the comments aren't preserved (and there are some whitespace issues… but I should be able to resolve the latter).
>
> Is the only viable solution available to rewrite around redbaron | libcst? - I don't need to parse the comments just dump them out unedited whence they're found…
>
> Thanks for any suggestions
>
> PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])

I haven't actually used it, but what you may want to try is lib2to3.
It's capable of full text reconstruction like you're trying to do.

Otherwise: Every AST node contains line and column information, so you
could possibly work the other way: keep the source code as well as the
AST, and make changes line by line as you have need.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation? [ In reply to ]
> On 16 Dec 2021, at 03:49, samue...@gmail.com <samuelmarks@gmail.com> wrote:
>
> ?I wrote a little open-source tool to expose internal constructs in OpenAPI. Along the way, I added related functionality to:
> - Generate/update a function prototype to/from a class
> - JSON schema
> - Automatically add type annotations to all function arguments, class attributes, declarations, and assignments
>
> alongside a bunch of other features. All implemented using just the builtin modules (plus astor on Python < 3.9; and optionally black).
>
> Now I'm almost at the point where I can run it—without issue—against, e.g., the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s because the comments aren't preserved (and there are some whitespace issues… but I should be able to resolve the latter).
>
> Is the only viable solution available to rewrite around redbaron | libcst? - I don't need to parse the comments just dump them out unedited whence they're found…
>
> Thanks for any suggestions

Have a look at the code that is used by https://github.com/asottile/pyupgrade
There are a couple of libraries that it uses that does what I think you want to do.

Barry

>
> PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])
> --
> https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation? [ In reply to ]
Hi !

Maybe RedBaron may help you ?

https://github.com/PyCQA/redbaron

IIRC, it aims to conserve the exact same representation of the source
code, including comments and empty lines.

--lucas


On 16/12/2021 04:37, samue...@gmail.com wrote:
> I wrote a little open-source tool to expose internal constructs in OpenAPI. Along the way, I added related functionality to:
> - Generate/update a function prototype to/from a class
> - JSON schema
> - Automatically add type annotations to all function arguments, class attributes, declarations, and assignments
>
> alongside a bunch of other features. All implemented using just the builtin modules (plus astor on Python < 3.9; and optionally black).
>
> Now I'm almost at the point where I can run it—without issue—against, e.g., the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s because the comments aren't preserved (and there are some whitespace issues… but I should be able to resolve the latter).
>
> Is the only viable solution available to rewrite around redbaron | libcst? - I don't need to parse the comments just dump them out unedited whence they're found…
>
> Thanks for any suggestions
>
> PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])
--
https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation? [ In reply to ]
On Thursday, December 16, 2021 at 5:56:51 AM UTC-5, lucas wrote:
> Hi !
>
> Maybe RedBaron may help you ?
>
> https://github.com/PyCQA/redbaron
>
> IIRC, it aims to conserve the exact same representation of the source
> code, including comments and empty lines.
>
> --lucas
> On 16/12/2021 04:37, samue...@gmail.com wrote:
> > I wrote a little open-source tool to expose internal constructs in OpenAPI. Along the way, I added related functionality to:
> > - Generate/update a function prototype to/from a class
> > - JSON schema
> > - Automatically add type annotations to all function arguments, class attributes, declarations, and assignments
> >
> > alongside a bunch of other features. All implemented using just the builtin modules (plus astor on Python < 3.9; and optionally black).
> >
> > Now I'm almost at the point where I can run it—without issue—against, e.g., the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s because the comments aren't preserved (and there are some whitespace issues… but I should be able to resolve the latter).
> >
> > Is the only viable solution available to rewrite around redbaron | libcst? - I don't need to parse the comments just dump them out unedited whence they're found…
> >
> > Thanks for any suggestions
> >
> > PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])

Ended up writing my own CST and added it to that library of mine (link above).

My target is adding/removing/changing of: docstrings, function return types, function arguments, and Assign/AnnAssign. All but the last are now implemented.

I was careful not to replace code elsewhere in my codebase, so everything except my new CST code (in its own files) stays, and everything else works exclusively with the builtin `ast` module as before.
--
https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation? [ In reply to ]
> > PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])
Ended up writing my own CST and added it to that library of mine (link above).

My target is adding/removing/changing of: docstrings, function return types, function arguments, and Assign/AnnAssign. All but the last are now implemented.

I was careful not to replace code elsewhere in my codebase, so everything except my new CST code (in its own files) stays, and everything else works exclusively with the builtin `ast` module as before.
--
https://mail.python.org/mailman/listinfo/python-list