Mailing List Archive: Introductions / Code cleanup

So, I've been using Trac for a couple of weeks now. I noticed it show
up in the Debian repository just about the time I was getting a Wiki set
up for my projects at work. I decided to check Trac out and have been
quite impressed. For now Trac hasn't been used too heavily at work
since I'm mainly working on a couple projects by myself, but as I start
to shift over to our new product and some new developers start to take
over some of my work I think it will become more important.

So, as I've been looking through the Trac code, there are places I
believe the code could be tidied up, or abstracted a bit. Now, this
isn't really a bug so I didn't want to file a ticket, but I think some
things could be done to improve the readability and reduce some code
duplication.

In general I like to use a more functional style programming when
working in Python, which leads me to use iterators quite frequently.

I've noticed there are a variety of places that loop over the rows
returned by a SQL query like:

while 1:
row = cursor.fetchone()
if not row:
break
# do something with row

I'd suggest adding a generator that would abstract this:

def row_iterator(cursor):
while 1:
row = cursor.fetchone()
if not row:
raise StopIteration
yield row

Now the loop becomes:

for row in row_iterator(cursor):
# do something with row

The itertools package has some nice functions for working with
iterators. I tend to define a simple function based off of them to keep
track of an index while iterating over an object:

def counter(obj):
return izip(count(), obj)

Now, I can pass a list or other iterable object to the function and
replace the loops like:

for i in range(len(lst)):
value = lst[i]
# do something with i and value

with:

for i, value in counter(lst):
# do something with i and value

Also, I've noticed that when building url strings there are places that
manual manipulation could be replaced by using the urlparse package.

I'm also not in favor of manually escaping variables to use in building
SQL query strings. I generally find it preferable to use '?' or if
possible named parameters to bind variables since the database driver
will handle the necessary conversions.

In core.py the parse_path_info function uses a bunch of regular
expressions to parse out the pieces of the path to determine the module
and arguments needed. I think a nicer way to handle this would be just
to look at the first part of the path to determine which module should
handle it, then leave the rest of the parsing up to the module. Now,
even if you didn't want to move that functionality into the modules, the
function could still be reworked. One of the telling features of that
function is that it's pretty easy to recognize that a lot of the code
was copied and pasted, a clear sign that there's probably a better way
to abstract the code.

Anyways, I hope that didn't come off as pretentious. I know that when
you're working you generally focus more on working through your current
problem than thinking about ways to generalize what you're doing to use
in the future. But looking through the codebase for the first time, I'm
just trying to make sense of it, so it stands out when a portion of code
isn't as straightforward as it could be.

I'd like to help start making some of these types of changes in Trac,
but I'm not really sure what the protocol should be. I don't want to
file tickets for every change like this. I suppose I'll post patches to
the mailing list. The mailing list volume doesn't seem too heavy at
this point, but at some point it may be good to create separate user and
developer lists.

--
Matthew Good <trac@matt-good.net>

Matthew Good wrote:

> So, I've been using Trac for a couple of weeks now. I noticed it show
> up in the Debian repository just about the time I was getting a Wiki set
> up for my projects at work. I decided to check Trac out and have been
> quite impressed. For now Trac hasn't been used too heavily at work
> since I'm mainly working on a couple projects by myself, but as I start
> to shift over to our new product and some new developers start to take
> over some of my work I think it will become more important.
>
> So, as I've been looking through the Trac code, there are places I
> believe the code could be tidied up, or abstracted a bit. Now, this
> isn't really a bug so I didn't want to file a ticket, but I think some
> things could be done to improve the readability and reduce some code
> duplication.
>
Hi Matthew,

Sure, the constant development and addition of new features has made the
parts of the code less readable and organized than preferred. It would
be a great addition to the project if we could get more contributions in
the form of general code improvements and not just new features.

Unfortunately Trac has since the beginning been compatible with
Python 2.1 because that is basically the python version debian stable
users are stuck with. Fortunately it looks like the next debian stable
version (sarge) will be released soon, so we might be able to drop
Python 2.1 (and 2.2) support in a not so distant future...

> In general I like to use a more functional style programming when
> working in Python, which leads me to use iterators quite frequently.
>
> I've noticed there are a variety of places that loop over the rows
> returned by a SQL query like:
>
> while 1:
> row = cursor.fetchone()
> if not row:
> break
> # do something with row
>
This is because generators are not available in python 2.1 :/

> I'd suggest adding a generator that would abstract this:
>
> def row_iterator(cursor):
> while 1:
> row = cursor.fetchone()
> if not row:
> raise StopIteration
> yield row
>
> Now the loop becomes:
>
> for row in row_iterator(cursor):
> # do something with row
>

The sqlite module already has built in generator support if you
are using python >= 2.3 like this:

cursor.execute('SELECT something FROM somewhere')
for row in cursor:
print row

>
> The itertools package has some nice functions for working with
> iterators. I tend to define a simple function based off of them to keep
> track of an index while iterating over an object:
>
same thing here, not available in python2.1 :/

> def counter(obj):
> return izip(count(), obj)
>
> Now, I can pass a list or other iterable object to the function and
> replace the loops like:
>
> for i in range(len(lst)):
> value = lst[i]
> # do something with i and value
>
> with:
>
> for i, value in counter(lst):
> # do something with i and value
>
I think python2.3 already has a builtin function that does what you
described above:

for i, value in enumerate(lst):
# do something with i and value

> Also, I've noticed that when building url strings there are places that
> manual manipulation could be replaced by using the urlparse package.
>
The url query string is parsed using the FieldStorage class from the cgi
and mod_python modules and the actual url is parsed using a set of hairy
regexps. This might be done easier with urlparse but I've never used
that module.

> I'm also not in favor of manually escaping variables to use in building
> SQL query strings. I generally find it preferable to use '?' or if
> possible named parameters to bind variables since the database driver
> will handle the necessary conversions.

As far as I know Trac uses '%s' in query strings everywhere and lets the
database driver do the actual binding and conversion. I think this is
what you suggested except that we use '%s' instead of '?' everywhere
but the result is the same. The only place in trac where the binding
is done outside the database driver is the QueryModule and that is
because the dynamic behaviour of that module makes it hard to do
it the ordinary way.

>
> In core.py the parse_path_info function uses a bunch of regular
> expressions to parse out the pieces of the path to determine the module
> and arguments needed. I think a nicer way to handle this would be just
> to look at the first part of the path to determine which module should
> handle it, then leave the rest of the parsing up to the module. Now,
> even if you didn't want to move that functionality into the modules, the
> function could still be reworked. One of the telling features of that
> function is that it's pretty easy to recognize that a lot of the code
> was copied and pasted, a clear sign that there's probably a better way
> to abstract the code.

I think the "module" design is one of the last pieces left from the
first version of Trac and could indeed need some work. Our longterm goal
is to make Trac a thin framework where different modules like wiki,
source browser, issue-tracker could be added just by dropping a file
into a plug-in folder without needing to manually modifying any files.
Unfortunately this isn't a easy as it sounds because of things like
database schemas and html templates.

>
> Anyways, I hope that didn't come off as pretentious. I know that when
> you're working you generally focus more on working through your current
> problem than thinking about ways to generalize what you're doing to use
> in the future. But looking through the codebase for the first time, I'm
> just trying to make sense of it, so it stands out when a portion of code
> isn't as straightforward as it could be.

Not at all, we appreciate your feedback and I think you're correct when
you say it's easier to locate portions of code needing cleanup when
you look at a piece of code for the first time. This is also an
excellent opportunity to add more unit tests to test parts being
rewritten.

>
> I'd like to help start making some of these types of changes in Trac,
> but I'm not really sure what the protocol should be. I don't want to
> file tickets for every change like this. I suppose I'll post patches to
> the mailing list. The mailing list volume doesn't seem too heavy at
> this point, but at some point it may be good to create separate user and
> developer lists.
>
It's probably best to join the #trac irc-channel on freenode.
I'm not there all the time but there is a lot of people there familiar
with the Trac source code. Once we have identified a good place
to start you can file an enhancement ticket and attach patches to that.

Cheers!
/ Jonas
--
Jonas Borgstr?m | Edgewall Software
jonas@edgewall.com | Professional GNU/Linux & Open Source Consulting.
| http://www.edgewall.com/