Hi to all
how i can index remotely documents(PDF, HTML, XML)?
i use lucene 2.0.0
i use current
java org.w3c.tidy.Tidy -m *.html to parser HTML
java org.apache.lucene.demo.IndexHTML -create -index index .\ for index
HTML
java org.pdfbox.searchengine.lucene.IndexFiles -create -index
C:\tomcat\webapps\luceneweb\index .\ for index PDF
but how i can parser XML?
i use
java dom.DOMFilter *.xml
but how i can index XML
thanks
--
View this message in context: http://www.nabble.com/Index-remotely-documents-tf4430491.html#a12639240
Sent from the Lucene - General mailing list archive at Nabble.com.
how i can index remotely documents(PDF, HTML, XML)?
i use lucene 2.0.0
i use current
java org.w3c.tidy.Tidy -m *.html to parser HTML
java org.apache.lucene.demo.IndexHTML -create -index index .\ for index
HTML
java org.pdfbox.searchengine.lucene.IndexFiles -create -index
C:\tomcat\webapps\luceneweb\index .\ for index PDF
but how i can parser XML?
i use
java dom.DOMFilter *.xml
but how i can index XML
thanks
--
View this message in context: http://www.nabble.com/Index-remotely-documents-tf4430491.html#a12639240
Sent from the Lucene - General mailing list archive at Nabble.com.