Mailing List Archive

VIM and [Python] block jumping
Angus MacKay wrote:
>
> I am not looking for indentation but just code navigation.
>
> when code uses identifiers for blocks (such as "()" in lisp, "{}" is
> most sane languages, "#if/#endif" for C preproc ...) vim can navigate
> blocks very easily (with [{/]} for "{}", [#/]# for "#if/#endif" ...)
>
> with python this becomes harder. is there any way to do it? (I think I
> could write a function for it but I thought I'd see if it had already
> been done)

That's a neat idea. I just wrote this up, so it hasn't been tested a
lot, but it seems to work. The way I implemented this requires that you
have python support compiled into VIM. It's probably possible to
reimplement this as a pure-VIM script, but the python version is
probably orders of magnitude faster (not to mention being easier to
write). :-)

(I had once implemented the C-style [{ ]} motion keys as vim-scripts
before I realized that VIM already had built-in support for that. They
were quite slow.)

First, make a file called "vimMotion.py":

## begin vimMotion.py ##

import vim
import string
import re

# returns a tuple containing the indentation, and the line number where
we got
# that indentation. That can be different than lineNo if lineNo refers
to an
# empty (ie: only whitespace) line.
def getIndent(lineNo):
for line in vim.current.buffer[lineNo:]:
result = getIndentQuick(line)
if result>=0:
return (result,lineNo)
lineNo = lineNo + 1
return (0,len(vim.current.buffer)-1)

# grabs the initial whitespace and the stuff after it into two groups
indentRE = re.compile( r"([\t ]*)(.*)" )

# returns the indentation for a line, or -1 if it can't be determined
(ie:
# only whitespace)
def getIndentQuick(line):
match = indentRE.match(line)
if len(match.group(2)) == 0:
# if the line is just whitespace, we don't know the indentation
return -1
else:
# make sure we expand tabs like python does...
return len(string.expandtabs(match.group(1)))

# if dir is positive, goes down, negative, goes up. Other than that, the
value
# is irrelevant
def blockMotion( dir ):
lineNo0 = vim.current.window.cursor[0]-1
indent0,lineNo = getIndent(lineNo0)
newLine = lineNo0

if indent0 == 0:
# this is an optimization
if dir>0:
newLine = len(vim.current.buffer)-1
else:
newLine = 0
elif dir>0:
indent = indent0
numLines = len(vim.current.buffer)
while lineNo<numLines and indent >= indent0:
lineNo = lineNo + 1
indent,lineNo = getIndent(lineNo)
newLine = lineNo
else:
indent = indent0
lineNo = lineNo0
while lineNo>0 and indent >= indent0:
lineNo = lineNo - 1
qIndent = getIndentQuick(vim.current.buffer[lineNo])
if qIndent>=0:
indent = qIndent
newLine = lineNo
# move the cursor
vim.current.window.cursor = (newLine+1,0)

## end vimMotion.py ##

Then add the following three lines to your .vimrc:

pyfile vimMotion.py
nmap [{ :python blockMotion(-1)<CR>^
nmap ]} :python blockMotion(1)<CR>^

You should probably just execute these when editting a python file.

Note that python is a little bit funny as to where blocks can end,
because multiple blocks can all end at exactly the same position. That
means that doing ']}' can potentially (and often does) take you out of
several blocks. In a language with braces, ]} will always take you out
of one block at a time.

The other tricky part was figuring out the indentation of a line. The
two things that make this tricky are tabs (but string.expandtabs made
that pretty easy), and the fact that python seems to ignore
empty/whitespace-only lines when figuring out block levels. The "right"
thing to do seems to be to scan forward until we find a line with some
non-whitespace on it.

Let me know if you have any problems.

(note: this was posted to both comp.lang.python and comp.editors, since
people in both those groups might be interested in this)

--
C. Laurence Gonsalves "Any sufficiently advanced
clgonsal@kami.com technology is indistinguishable
http://www.cryogen.com/clgonsal/ from magic" -- Arthur C. Clarke
VIM and [Python] block jumping [ In reply to ]
Angus MacKay wrote:
>
> I am not looking for indentation but just code navigation.
>
> when code uses identifiers for blocks (such as "()" in lisp, "{}" is
> most sane languages, "#if/#endif" for C preproc ...) vim can navigate
> blocks very easily (with [{/]} for "{}", [#/]# for "#if/#endif" ...)
>
> with python this becomes harder. is there any way to do it? (I think I
> could write a function for it but I thought I'd see if it had already
> been done)

That's a neat idea. I just wrote this up, so it hasn't been tested a
lot, but it seems to work. The way I implemented this requires that you
have python support compiled into VIM. It's probably possible to
reimplement this as a pure-VIM script, but the python version is
probably orders of magnitude faster (not to mention being easier to
write). :-)

(I had once implemented the C-style [{ ]} motion keys as vim-scripts
before I realized that VIM already had built-in support for that. They
were quite slow.)

First, make a file called "vimMotion.py":

## begin vimMotion.py ##

import vim
import string
import re

# returns a tuple containing the indentation, and the line number where
we got
# that indentation. That can be different than lineNo if lineNo refers
to an
# empty (ie: only whitespace) line.
def getIndent(lineNo):
for line in vim.current.buffer[lineNo:]:
result = getIndentQuick(line)
if result>=0:
return (result,lineNo)
lineNo = lineNo + 1
return (0,len(vim.current.buffer)-1)

# grabs the initial whitespace and the stuff after it into two groups
indentRE = re.compile( r"([\t ]*)(.*)" )

# returns the indentation for a line, or -1 if it can't be determined
(ie:
# only whitespace)
def getIndentQuick(line):
match = indentRE.match(line)
if len(match.group(2)) == 0:
# if the line is just whitespace, we don't know the indentation
return -1
else:
# make sure we expand tabs like python does...
return len(string.expandtabs(match.group(1)))

# if dir is positive, goes down, negative, goes up. Other than that, the
value
# is irrelevant
def blockMotion( dir ):
lineNo0 = vim.current.window.cursor[0]-1
indent0,lineNo = getIndent(lineNo0)
newLine = lineNo0

if indent0 == 0:
# this is an optimization
if dir>0:
newLine = len(vim.current.buffer)-1
else:
newLine = 0
elif dir>0:
indent = indent0
numLines = len(vim.current.buffer)
while lineNo<numLines and indent >= indent0:
lineNo = lineNo + 1
indent,lineNo = getIndent(lineNo)
newLine = lineNo
else:
indent = indent0
lineNo = lineNo0
while lineNo>0 and indent >= indent0:
lineNo = lineNo - 1
qIndent = getIndentQuick(vim.current.buffer[lineNo])
if qIndent>=0:
indent = qIndent
newLine = lineNo
# move the cursor
vim.current.window.cursor = (newLine+1,0)

## end vimMotion.py ##

Then add the following three lines to your .vimrc:

pyfile vimMotion.py
nmap [{ :python blockMotion(-1)<CR>^
nmap ]} :python blockMotion(1)<CR>^

You should probably just execute these when editting a python file.

Note that python is a little bit funny as to where blocks can end,
because multiple blocks can all end at exactly the same position. That
means that doing ']}' can potentially (and often does) take you out of
several blocks. In a language with braces, ]} will always take you out
of one block at a time.

The other tricky part was figuring out the indentation of a line. The
two things that make this tricky are tabs (but string.expandtabs made
that pretty easy), and the fact that python seems to ignore
empty/whitespace-only lines when figuring out block levels. The "right"
thing to do seems to be to scan forward until we find a line with some
non-whitespace on it.

Let me know if you have any problems.

(note: this was posted to both comp.lang.python and comp.editors, since
people in both those groups might be interested in this)

--
C. Laurence Gonsalves "Any sufficiently advanced
clgonsal@kami.com technology is indistinguishable
http://www.cryogen.com/clgonsal/ from magic" -- Arthur C. Clarke
VIM and [Python] block jumping [ In reply to ]
Of course, right after I posted that, realized that there's a problem.
The blockMotion function doesn't account for lines that are
continuations of other lines. That means that triple-quoted string
literals, lines with more {[('s that }])'s, and lines joined with \ can
screw it up. Are there any other situations? I'll try to post a
corrected version later. Some of those problems are pretty nasty though
(triple-quoted strings in particular).

--
C. Laurence Gonsalves "Any sufficiently advanced
clgonsal@kami.com technology is indistinguishable
http://www.cryogen.com/clgonsal/ from magic" -- Arthur C. Clarke
VIM and [Python] block jumping [ In reply to ]
Of course, right after I posted that, realized that there's a problem.
The blockMotion function doesn't account for lines that are
continuations of other lines. That means that triple-quoted string
literals, lines with more {[('s that }])'s, and lines joined with \ can
screw it up. Are there any other situations? I'll try to post a
corrected version later. Some of those problems are pretty nasty though
(triple-quoted strings in particular).

--
C. Laurence Gonsalves "Any sufficiently advanced
clgonsal@kami.com technology is indistinguishable
http://www.cryogen.com/clgonsal/ from magic" -- Arthur C. Clarke
VIM and [Python] block jumping [ In reply to ]
In article <3785DA06.974567F3@kami.com>,
C. Laurence Gonsalves <clgonsal@kami.com> wrote:
>Of course, right after I posted that, realized that there's a problem.
>The blockMotion function doesn't account for lines that are
>continuations of other lines. That means that triple-quoted string
>literals, lines with more {[('s that }])'s, and lines joined with \ can
>screw it up. Are there any other situations? I'll try to post a
>corrected version later. Some of those problems are pretty nasty though
>(triple-quoted strings in particular).

Being a faithful Emacs-ite, I've never used vim. But since you can
script vim with Python, it might be possible to use the tokenizer in
the standard library (module tokenize) to handle this cleanly. Likely
you'd need to extend it a bit to handle the syntax errors that show up
in half-written code gracefully, but taking vim that one step past
syntax-awareness and into structure-awareness would be a very slick
hack.


Neel
VIM and [Python] block jumping [ In reply to ]
In article <3785DA06.974567F3@kami.com>,
C. Laurence Gonsalves <clgonsal@kami.com> wrote:
>Of course, right after I posted that, realized that there's a problem.
>The blockMotion function doesn't account for lines that are
>continuations of other lines. That means that triple-quoted string
>literals, lines with more {[('s that }])'s, and lines joined with \ can
>screw it up. Are there any other situations? I'll try to post a
>corrected version later. Some of those problems are pretty nasty though
>(triple-quoted strings in particular).

Being a faithful Emacs-ite, I've never used vim. But since you can
script vim with Python, it might be possible to use the tokenizer in
the standard library (module tokenize) to handle this cleanly. Likely
you'd need to extend it a bit to handle the syntax errors that show up
in half-written code gracefully, but taking vim that one step past
syntax-awareness and into structure-awareness would be a very slick
hack.


Neel
VIM and [Python] block jumping [ In reply to ]
[C. Laurence Gonsalves]
> ...
> The blockMotion function doesn't account for lines that are
> continuations of other lines. That means that triple-quoted string
> literals, lines with more {[('s that }])'s, and lines joined with \ can
> screw it up. Are there any other situations?

No, those are the only ways a line can get continued in Python. Note that
there are subtleties, like that a backslash preceding a newline doesn't
*always* mean continuation (it doesn't at the tail end of a comment!).

> I'll try to post a corrected version later. Some of those problems are
> pretty nasty though (triple-quoted strings in particular).

Also single-quoted strings continued via backslash. Strings are the real
headache, because the special characters you're looking for lose their magic
when they're inside a string.

[Neel Krishnaswami]
> ...
> But since you can script vim with Python, it might be possible to use
> the tokenizer in the standard library (module tokenize) to handle this
> cleanly. Likely you'd need to extend it a bit to handle the syntax
> errors that show up in half-written code gracefully ...

This can be made to work, but I found it too slow to bear in an editor --
"subjectively instantaneous" is the goal there. Most of the tokens tokenize
delivers are simply irrelevant to this task, so most of the work that goes
into producing them is wasted.

The current CVS version of IDLE has a new module PyParse.py, which does the
minimum necessary to determine whether a line is part of a continuation, and
if so why. It can do this 100x faster than tokenize, even in
non-pathological programs. The next version of IDLE uses this to do Emacs
pymode-like "very smart" indentation, and the next version of Mark Hammond's
PythonWin also uses this module.

It's GUI-independent (straight self-contained Python), but as called from
AutoIndent.py (also shared between IDLE and PythonWin) takes major advantage
of the host system's colorization, to get a fast answer to the question "is
this random character in a string?". The colorizer generally knows this
already, for every character in the program, and AutoIdent/PyParse use that
to find a good (close) place to begin reparsing the file.

If the host can't answer that question reliably, then PyParse needs to
reparse the entire file from its start in order to get a guaranteed-correct
answer, so that's what it does. PyParse is fast enough that this is *still*
"subjectively instantaneous" for most files, but becomes a noticeable delay
near the end of long files (like e.g. Tkinter.py).

pyparse-is-incomprehensible-but-that's-why-it's-fast<wink>-ly y'rs - tim
VIM and [Python] block jumping [ In reply to ]
Tim Peters wrote:
>
> [C. Laurence Gonsalves]
> > ...
> > The blockMotion function doesn't account for lines that are
> > continuations of other lines. That means that triple-quoted string
> > literals, lines with more {[('s that }])'s, and lines joined with \ can
> > screw it up. Are there any other situations?
>
> No, those are the only ways a line can get continued in Python. Note that
> there are subtleties, like that a backslash preceding a newline doesn't
> *always* mean continuation (it doesn't at the tail end of a comment!).

I was meaning to test some of those weird cases.

> Also single-quoted strings continued via backslash. Strings are the real
> headache, because the special characters you're looking for lose their magic
> when they're inside a string.

Yup, but I'd thought of a simple way to determine if I'm in a string
though...

> The current CVS version of IDLE has a new module PyParse.py, which does the
> minimum necessary to determine whether a line is part of a continuation, and
> if so why. It can do this 100x faster than tokenize, even in
> non-pathological programs. The next version of IDLE uses this to do Emacs
> pymode-like "very smart" indentation, and the next version of Mark Hammond's
> PythonWin also uses this module.
>
> It's GUI-independent (straight self-contained Python), but as called from
> AutoIndent.py (also shared between IDLE and PythonWin) takes major advantage
> of the host system's colorization, to get a fast answer to the question "is
> this random character in a string?". The colorizer generally knows this
> already, for every character in the program, and AutoIdent/PyParse use that
> to find a good (close) place to begin reparsing the file.

That's exactly what I was thinking of doing. It's possible to ask Vim
what syntax highlighting a character is getting. That would make it easy
to determine if a \ is in a string, for example.

Is PyParse available any way other than from IDLE CVS?

Thanks.

--
C. Laurence Gonsalves clgonsal@kami.com
http://www.cryogen.com/clgonsal/

"Any sufficiently advanced technology is indistinguishable from
magic."
-- Arthur C.
Clarke