Mailing List Archive

Indexing documents without an extension..
Hi all,

Is there any means within lucene ..to index a particular document wich has not an extension specified ?...

i.e -- I need to index a document named something like "DOC"( not "DOC.doc")

Thanks in advance..




mmcd
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: Indexing documents without an extension.. [ In reply to ]
>
> Is there any means within lucene ..to index a particular document wich has
> not an extension specified ?...
> i.e -- I need to index a document named something like "DOC"( not "DOC.doc
> ")

You can take a look at the Nutch MimeType resolver (
http://lucene.apache.org/nutch/apidocs/org/apache/nutch/util/mime/package-summary.html
)
It solves mime types using a file extension repository, and can uses magic
numbers for some mime types in order to retrieve the document mime-type from
its content (without the extension).
This nutch utility has no dependency on nutch code, and it could be a good
idea to move it to lucene code (or perhaps to an utility library common to
both lucene and nutch...)

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/