Yes, WebSphinx is not very scalable, as it stores data about each page
in memory, and even stores parent-child page relationships in memory.
Why not use the web crawler in Lucene Sandbox? The link is on Lucene's
home page.
Otis
--- Mike Tinnes <tinnes@ecliptictech.com> wrote:
>
> I've been using webSphinx with Lucene by simply extending the Crawler
> class
> and placing my Lucene code in the overridden 'visit' method. Seems to
> work,
> but I've encountered problems with 'OutOfMemory' errors when crawling
> large
> sites with 512mb and also using the -Xmx VM args. The sphinx faq
> mentions
> the problem, but the recommened fixes don't seem to help. Alas I've
> resorted
> to implementing a custom crawler.
>
>
> ----- Original Message -----
> From: "A Rambocus" <eem2ar@eim.surrey.ac.uk>
> To: <lucene-user@jakarta.apache.org>
> Sent: Thursday, June 27, 2002 7:55 AM
> Subject: SPIDER /CRAWLERS /ROBOTS with lucene
>
>
> >
> > Hello all does anyone know how to integrate th eWebSphinx with
> lucene...
> > - the code previous distributed on this list does not work!
> >
> > I am currently trying spindle.......
> >
> > but does anyone know if lucene could be used to support image
> indexing
> > since this would be very helpful!!
> >
> > Cheers
> >
> > Ajay R
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com --
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>