Mailing List Archive

SPIDER /CRAWLERS /ROBOTS with lucene
Hello all does anyone know how to integrate th eWebSphinx with lucene...
- the code previous distributed on this list does not work!

I am currently trying spindle.......

but does anyone know if lucene could be used to support image indexing
since this would be very helpful!!

Cheers

Ajay R


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: SPIDER /CRAWLERS /ROBOTS with lucene [ In reply to ]
I've been using webSphinx with Lucene by simply extending the Crawler class
and placing my Lucene code in the overridden 'visit' method. Seems to work,
but I've encountered problems with 'OutOfMemory' errors when crawling large
sites with 512mb and also using the -Xmx VM args. The sphinx faq mentions
the problem, but the recommened fixes don't seem to help. Alas I've resorted
to implementing a custom crawler.


----- Original Message -----
From: "A Rambocus" <eem2ar@eim.surrey.ac.uk>
To: <lucene-user@jakarta.apache.org>
Sent: Thursday, June 27, 2002 7:55 AM
Subject: SPIDER /CRAWLERS /ROBOTS with lucene


>
> Hello all does anyone know how to integrate th eWebSphinx with lucene...
> - the code previous distributed on this list does not work!
>
> I am currently trying spindle.......
>
> but does anyone know if lucene could be used to support image indexing
> since this would be very helpful!!
>
> Cheers
>
> Ajay R
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: SPIDER /CRAWLERS /ROBOTS with lucene [ In reply to ]
Yes, WebSphinx is not very scalable, as it stores data about each page
in memory, and even stores parent-child page relationships in memory.
Why not use the web crawler in Lucene Sandbox? The link is on Lucene's
home page.

Otis

--- Mike Tinnes <tinnes@ecliptictech.com> wrote:
>
> I've been using webSphinx with Lucene by simply extending the Crawler
> class
> and placing my Lucene code in the overridden 'visit' method. Seems to
> work,
> but I've encountered problems with 'OutOfMemory' errors when crawling
> large
> sites with 512mb and also using the -Xmx VM args. The sphinx faq
> mentions
> the problem, but the recommened fixes don't seem to help. Alas I've
> resorted
> to implementing a custom crawler.
>
>
> ----- Original Message -----
> From: "A Rambocus" <eem2ar@eim.surrey.ac.uk>
> To: <lucene-user@jakarta.apache.org>
> Sent: Thursday, June 27, 2002 7:55 AM
> Subject: SPIDER /CRAWLERS /ROBOTS with lucene
>
>
> >
> > Hello all does anyone know how to integrate th eWebSphinx with
> lucene...
> > - the code previous distributed on this list does not work!
> >
> > I am currently trying spindle.......
> >
> > but does anyone know if lucene could be used to support image
> indexing
> > since this would be very helpful!!
> >
> > Cheers
> >
> > Ajay R
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>