On Sunday 16 May 2004 10:53 CET Theo wrote:
> On Sat, May 15, 2004 at 06:26:27AM -0700, bugzilla-daemon wrote:
> > About your comment to my second patch, I don't really understand what
> > you mean. The points of these patch are:
> > 1. Don't reinvent the wheel and do the header parsing in spamd.raw but
> > use Message.pm. That's trivial.
>
> Right, we agree on that.
Ok.
> > 2. Don't store two copies of the mail in the memory. Currently the mail
> > is first completely read into @msglines which is then copied inside the
> > Message.pm ctor. At that point, two copies of the mail exist in memory
> > while we theoretically only need one if we could just feed the incoming
> > stuff directly line-per-line to the Message.pm. That's what the
> > anonymous subroutine is for. As that one also has to check for the
> > size, I can't just feed the ctor a reference to $client->getline().
>
> Well, a few things here.
>
> 1) The current method is cleaner.
Right, but the other one takes less memory :)
> 2) I don't think the patch works. It references non-global/non-local
> variables in the anon-sub. Not only that, but the variables it wants to
> use really have no way of being passed in without mucking up the Message
> API, which I don't want to do.
It does work, at least did it look like when I tested it. There's Perl
Voodoo involved: The anonymous subroutine is created in spamd's namespace
and has access to all variables of its parent routine. Even after it got
fed to the Message.pm ctor. I first couldn't believe it either, but it
works. I'll check it another time to be really sure though :)
> 3) We should do an 'undef @msglines' after the parse. The Message code
> specifically wants to get a copy of the message for itself, since it'll
> be messing with it during the parse.
Yes, that was my first idea too. But with this we still have two instances
of the messages in the memory, though for a shorter time:
We first read stuff into @msglines, feed that array to the ctor which then
copies it into its own "cache". At this point we've got the two instances
of the message. Even if we undef @msglines after the ctor returns, we still
had that "spike" of memory usage.
We also copied a whole potentially large array from A to B but I don't know
if that has much impact on performance.
My point is that we don't really need to cache the whole message first if we
can feed the incoming stuff directly line-per-line to the Message object.
Another idea I had was to create an "empty" object first and then offer a
method to feed later on. But that would have even more impact on the API.
> 4) Changing the M::SA::Message code in a non-trivial way is beyond this
> ticket IMNSHO.
Actually, the change is pretty trivial, I just reordered some of the other
variable handling and made sure that it doesn't bail out with a nasty error
if you feed it a ref to something it doesn't like (a has for example).
But you're right, that bug is fixed, that's why I took this to the list ;-)
> So what I was saying was we can easily put the parse before the use of
> the message-id. I was also commenting that we could probably put the
> expected!=actual test before the message-id output, so we don't have
> to worry about calling $mail->finish(). We can also undef the msglines
> array to free up some memory during processing.
Nope, we can't put the size checking before we extract the Message-Id
because else we would have some random messages about mail with incorrect
sizes in the syslog for which the user can't find out which the culprit
was.
Cheers,
Malte
--
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
<http://www.catb.org/~esr/faqs/smart-questions.html>
> On Sat, May 15, 2004 at 06:26:27AM -0700, bugzilla-daemon wrote:
> > About your comment to my second patch, I don't really understand what
> > you mean. The points of these patch are:
> > 1. Don't reinvent the wheel and do the header parsing in spamd.raw but
> > use Message.pm. That's trivial.
>
> Right, we agree on that.
Ok.
> > 2. Don't store two copies of the mail in the memory. Currently the mail
> > is first completely read into @msglines which is then copied inside the
> > Message.pm ctor. At that point, two copies of the mail exist in memory
> > while we theoretically only need one if we could just feed the incoming
> > stuff directly line-per-line to the Message.pm. That's what the
> > anonymous subroutine is for. As that one also has to check for the
> > size, I can't just feed the ctor a reference to $client->getline().
>
> Well, a few things here.
>
> 1) The current method is cleaner.
Right, but the other one takes less memory :)
> 2) I don't think the patch works. It references non-global/non-local
> variables in the anon-sub. Not only that, but the variables it wants to
> use really have no way of being passed in without mucking up the Message
> API, which I don't want to do.
It does work, at least did it look like when I tested it. There's Perl
Voodoo involved: The anonymous subroutine is created in spamd's namespace
and has access to all variables of its parent routine. Even after it got
fed to the Message.pm ctor. I first couldn't believe it either, but it
works. I'll check it another time to be really sure though :)
> 3) We should do an 'undef @msglines' after the parse. The Message code
> specifically wants to get a copy of the message for itself, since it'll
> be messing with it during the parse.
Yes, that was my first idea too. But with this we still have two instances
of the messages in the memory, though for a shorter time:
We first read stuff into @msglines, feed that array to the ctor which then
copies it into its own "cache". At this point we've got the two instances
of the message. Even if we undef @msglines after the ctor returns, we still
had that "spike" of memory usage.
We also copied a whole potentially large array from A to B but I don't know
if that has much impact on performance.
My point is that we don't really need to cache the whole message first if we
can feed the incoming stuff directly line-per-line to the Message object.
Another idea I had was to create an "empty" object first and then offer a
method to feed later on. But that would have even more impact on the API.
> 4) Changing the M::SA::Message code in a non-trivial way is beyond this
> ticket IMNSHO.
Actually, the change is pretty trivial, I just reordered some of the other
variable handling and made sure that it doesn't bail out with a nasty error
if you feed it a ref to something it doesn't like (a has for example).
But you're right, that bug is fixed, that's why I took this to the list ;-)
> So what I was saying was we can easily put the parse before the use of
> the message-id. I was also commenting that we could probably put the
> expected!=actual test before the message-id output, so we don't have
> to worry about calling $mail->finish(). We can also undef the msglines
> array to free up some memory during processing.
Nope, we can't put the size checking before we extract the Message-Id
because else we would have some random messages about mail with incorrect
sizes in the syslog for which the user can't find out which the culprit
was.
Cheers,
Malte
--
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
<http://www.catb.org/~esr/faqs/smart-questions.html>