Mailing List Archive: a single index

a single index

Sep 7, 2007, 7:39 AM

Post #1 of 4 (2924 views)

I am working with lucene and i am new

I want to index documents HTML for this I do

java org.w3c.tidy.Tidy - m * html

java org.apache.lucene.demo.IndexHTML - create - index index .\

all this generates index to me and when doing my search in the Web if it
shows to the documents and the summary to me.

despues I index pdf

org.pdfbox.searchengine.lucene.IndexFiles - create - index pdf \

this also generates index to me

but the index PDF replace index HTML

how I can make him to have single index and when doing my search in the WEB
showme as HTML and PDF documents?

thanks
--
View this message in context: http://www.nabble.com/a-single-index-tf4401665.html#a12556579
Sent from the Lucene - General mailing list archive at Nabble.com.

a single index [ In reply to ]

payo22 at yahoo

Sep 7, 2007, 7:39 AM

Post #2 of 4 (2832 views)

Permalink

I am working with lucene and i am new

I want to index documents HTML for this I do

java org.w3c.tidy.Tidy - m * html

java org.apache.lucene.demo.IndexHTML - create - index index .\

all this generates index to me and when doing my search in the Web if it
shows to the documents and the summary to me.

despues I index pdf

org.pdfbox.searchengine.lucene.IndexFiles - create - index pdf \

this also generates index to me

but the index PDF replace index HTML

how I can make him to have single index and when doing my search in the WEB
showme as HTML and PDF documents?

i use the lucene demo

thanks
--
View this message in context: http://www.nabble.com/a-single-index-tf4401665.html#a12556579
Sent from the Lucene - General mailing list archive at Nabble.com.

a single index [ In reply to ]

payo22 at yahoo

Sep 7, 2007, 7:45 AM

Post #3 of 4 (2811 views)

Permalink

I am working with lucene and i am new

I want to index documents HTML for this I do

java org.w3c.tidy.Tidy - m * html

java org.apache.lucene.demo.IndexHTML - create - index index .\

all this generates index to me and when doing my search in the Web if it
shows to the documents and the summary to me.

despues I index pdf

org.pdfbox.searchengine.lucene.IndexFiles - create - index pdf \

this also generates index to me

but the index PDF replace index HTML

how I can make him to have single index and when doing my search in the WEB
showme as HTML and PDF documents?

my directory base is

C:\Tomcat\weapps\luceneweb

i use the lucene demo

thanks
--
View this message in context: http://www.nabble.com/a-single-index-tf4401665.html#a12556579
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: a single index [ In reply to ]

hossman_lucene at fucit

Sep 12, 2007, 11:47 AM

Post #4 of 4 (2804 views)

Permalink

Please note that "Lucene" is a java library for building applications.
the examples you refer to below are two applications built with the Lucene
library -- those applications are actually just demonstrations of hte
types of things that are possible using the Lucene library (and the PDFBox
library)

if you want to do more complicated things you either need to write you own
application (you can base it off the sample code you are currently
running) or you need to look into existing applications.

in the first case, please consult the java-user@lucene mailing list if you
need assistence

in the second case, it may help to review this list of applications...

http://wiki.apache.org/lucene-java/PoweredBy

...based on the situation you describe however, i would think that Nutch
may be the best place for you to start...

http://lucene.apache.org/nutch/

: I am working with lucene and i am new
:
: I want to index documents HTML for this I do
:
: java org.w3c.tidy.Tidy - m * html
:
: java org.apache.lucene.demo.IndexHTML - create - index index .\
:
: all this generates index to me and when doing my search in the Web if it
: shows to the documents and the summary to me.
:
: despues I index pdf
:
: org.pdfbox.searchengine.lucene.IndexFiles - create - index pdf \
:
: this also generates index to me
:
: but the index PDF replace index HTML
:
: how I can make him to have single index and when doing my search in the WEB
: showme as HTML and PDF documents?
:
: thanks

-Hoss