Mailing List Archive

[ANN] DejaGrabber 0.1 -- search Dejanews with Python
If you'd like a convenient way of searching the DejaNews archive using
Python, I'm happy to present the DejaGrabber module. This is a module
to make searching Dejanews easier. There are two classes, DejaGrabber
and Article. I've tried to add a reasonable number of useful
docstrings, too.

The interface to DejaGrabber could probably use a bit of work. If I
ever use it enough to find out what the flaws are, I will fix
it. Patches are welcome. :)

Here's an example:
>>> from DejaGrabber import *
>>> d = DejaGrabber(group='comp.lang.python', author='Tim Peters')
>>> l = d.get_messages(4)
>>> print l
[.99/06/23 30 comp.lang.python RE: Python Exes! Tim Peters
, 99/06/18 28 comp.lang.python RE: Newbie: Truth values (th Tim Peters
, 99/06/18 28 comp.lang.python RE: DATE ARITHMETIC Tim Peters
, 99/06/18 28 comp.lang.python RE: NameError Problem Tim Peters
, 99/05/18 25 comp.lang.python RE: while (a=b()) ... Tim Peters
]

You can grab a copy at

http://www.sff.net/people/neelk/free-software/DejaGrabber.py


Neel
[ANN] DejaGrabber 0.1 -- search Dejanews with Python [ In reply to ]
Neel:
This looks really great..
But I have this:

Traceback (innermost last):
File "<interactive input>", line 1, in ?
File "C:\Program Files\Py152\Lib\httplib.py", line 51, in __init__
if host: self.connect(host, port)
File "C:\Program Files\Py152\Lib\httplib.py", line 79, in connect
self.sock.connect(host, port)
File "<string>", line 1, in connect
error: (10065, 'winsock error')

when i am trying to reach www.python.org thru a proxy (www.mc.xerox.com).
How to handle this proxy-related situation?

Thanx -- val

Neel Krishnaswami <neelk@brick.cswv.com> wrote in message
news:7kuu73$1ki$1@brick.cswv.com...
>
> If you'd like a convenient way of searching the DejaNews archive using
> Python, I'm happy to present the DejaGrabber module. This is a module
> to make searching Dejanews easier. There are two classes, DejaGrabber
> and Article. I've tried to add a reasonable number of useful
> docstrings, too.
>
> The interface to DejaGrabber could probably use a bit of work. If I
> ever use it enough to find out what the flaws are, I will fix
> it. Patches are welcome. :)
>
> Here's an example:
> >>> from DejaGrabber import *
> >>> d = DejaGrabber(group='comp.lang.python', author='Tim Peters')
> >>> l = d.get_messages(4)
> >>> print l
> [.99/06/23 30 comp.lang.python RE: Python Exes! Tim Peters
> , 99/06/18 28 comp.lang.python RE: Newbie: Truth values (th Tim
Peters
> , 99/06/18 28 comp.lang.python RE: DATE ARITHMETIC Tim Peters
> , 99/06/18 28 comp.lang.python RE: NameError Problem Tim Peters
> , 99/05/18 25 comp.lang.python RE: while (a=b()) ... Tim Peters
> ]
>
> You can grab a copy at
>
> http://www.sff.net/people/neelk/free-software/DejaGrabber.py
>
>
> Neel
[ANN] DejaGrabber 0.1 -- search Dejanews with Python [ In reply to ]
In article <7l0520$qe4$1@news.wrc.xerox.com>,
"Val Bykoski" <vbykovsk@sdsp.mc.xerox.com> wrote:
> Neel:
> This looks really great..
> But I have this:
>
> Traceback (innermost last):
> File "<interactive input>", line 1, in ?
> File "C:\Program Files\Py152\Lib\httplib.py", line 51, in __init__
> if host: self.connect(host, port)
> File "C:\Program Files\Py152\Lib\httplib.py", line 79, in connect
> self.sock.connect(host, port)
> File "<string>", line 1, in connect
> error: (10065, 'winsock error')
>
> when i am trying to reach www.python.org thru a proxy (www.mc.xerox.com).
> How to handle this proxy-related situation?
>
> Thanx -- val

Well, I can't access www.mc.xerox.com; presumably it is Xerox internal
use only. I was wondering if this machine would be a Windows box by any
chance, possibly running WinProxy? Your problem looks exactly like one I
had with WinProxy on an NT box. WinProxy is slightly broken. The
authors intend to fix it in their next release, but for now you would
need to add a workaround (kindly provided to me by a reader of this
group) to your Python libraries. Unfortunately I don't have the fix
here, it is on my firewalled machine at work. You could search for the
fix in this group; just search for WinProxy.

If this isn't your problem, I guess it is something else...

Good luck,
- Bruce


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ANN] DejaGrabber 0.1 -- search Dejanews with Python [ In reply to ]
Val Bykoski <vbykovsk@sdsp.mc.xerox.com> wrote in
<7l0520$qe4$1@news.wrc.xerox.com>:

>Neel:
>when i am trying to reach www.python.org thru a proxy (www.mc.xerox.com).
>How to handle this proxy-related situation?
>
>Thanx -- val
>
You need to set an environment variable HTTP_PROXY to point to the proxy server
before using DejaGrabber.

--
Duncan Booth duncan@dales.rmplc.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
http://dales.rmplc.co.uk/Duncan
[ANN] DejaGrabber 0.1 -- search Dejanews with Python [ In reply to ]
Neel Krishnaswami <neelk@alum.mit.edu> wrote in
<7kuu73$1ki$1@brick.cswv.com>:

>
>If you'd like a convenient way of searching the DejaNews archive using
>Python, I'm happy to present the DejaGrabber module. This is a module
>to make searching Dejanews easier. There are two classes, DejaGrabber
>and Article. I've tried to add a reasonable number of useful
>docstrings, too.
>
>Here's an example:
>>>> from DejaGrabber import *
>>>> d = DejaGrabber(group='comp.lang.python', author='Tim Peters')
>>>> l = d.get_messages(4)
>>>> print l
>[.99/06/23 30 comp.lang.python RE: Python Exes! Tim Peters
>, 99/06/18 28 comp.lang.python RE: Newbie: Truth values (th Tim
Peters
>, 99/06/18 28 comp.lang.python RE: DATE ARITHMETIC Tim Peters
>, 99/06/18 28 comp.lang.python RE: NameError Problem Tim Peters
>, 99/05/18 25 comp.lang.python RE: while (a=b()) ... Tim Peters
>]
>

Nice bit of work. Two comments though:
Why does get_messages(4) return 5 messages? The doc string implies it
should return only as many as its argument.

It would be useful if a test example was included in a "if
__name__=='__main__'" block at the end of the file. At the very least the
example you give above, but perhaps more usefully a full command line
driven grabber.

--
Duncan Booth duncan@dales.rmplc.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
http://dales.rmplc.co.uk/Duncan
[ANN] DejaGrabber 0.1 -- search Dejanews with Python [ In reply to ]
duncan@rcp.co.uk (Duncan Booth) writes:
> Nice bit of work. Two comments though:
> Why does get_messages(4) return 5 messages? The doc string implies it
> should return only as many as its argument.

It's neat, admittedly (I thought of writing one myself the other day, and
then found that one ;->). The first time I need to do some serious Deja
action, I'll be using it :>.

There's some oops in slicing's internal logic at get_messages; following
patch fixed behavior (for me, anyhow).

Index: DejaGrabber.py
===================================================================
RCS file: /home/fingon/cvs/projects/python/DejaGrabber.py,v
retrieving revision 1.1
diff -u -r1.1 DejaGrabber.py
--- DejaGrabber.py 1999/06/28 09:35:31 1.1
+++ DejaGrabber.py 1999/06/28 09:40:18
@@ -208,6 +208,5 @@
message_list.extend(l)
message_list.sort()
message_list.reverse()
- if len(message_list) > n:
- del message_list[n:-1]
+ message_list = message_list[:n]
return message_list


> It would be useful if a test example was included in a "if
> __name__=='__main__'" block at the end of the file. At the very least the
> example you give above, but perhaps more usefully a full command line
> driven grabber.



-Markus

--
I consider "Anthill Inside" to be much more comforting sticker than
"Intel Inside".
[ANN] DejaGrabber 0.1 -- search Dejanews with Python [ In reply to ]
In article <al89094u0e3.fsf@myntti.helsinki.fi>,
Markus Stenberg <mstenber@cc.Helsinki.FI> wrote:
>duncan@rcp.co.uk (Duncan Booth) writes:
>> Nice bit of work. Two comments though:
>> Why does get_messages(4) return 5 messages? The doc string implies it
>> should return only as many as its argument.
>
>It's neat, admittedly (I thought of writing one myself the other day, and
>then found that one ;->). The first time I need to do some serious Deja
>action, I'll be using it :>.
>
>There's some oops in slicing's internal logic at get_messages; following
>patch fixed behavior (for me, anyhow).

[Code snipped]

Thanks -- I've applied the change and put the new version up.

>> It would be useful if a test example was included in a "if
>> __name__=='__main__'" block at the end of the file. At the very least the
>> example you give above, but perhaps more usefully a full command line
>> driven grabber.

This is a good idea, but I'm not sure what the right interface would
look like. I suspect that producing a command-line interface useful
enough to play well with other tools would be a bigger job than the class
itself -- a decent UI must be flexible, and flexibility is always
trickier than it seems.

That said, I'm all for it, and will probably take a shot at it RSN. A
query can return multiple messages, and I think that we can want as
output either:

a) A list of headers,
b) A big file of headers+messages,
c) A lot of little files each with a message in it,
d) A single HTML file with a list of headers with links
to the message bodies (using HTML in-page links).

Any I missed, or any of these fundamentally the wrong thing to do for
a reason I'm not seeing?


Neel