Mailing List Archive

Lucene files
Hi folk,

I was looking in the structure of the index files and here is what
I have found... I'd like to know if what I have seen is right and if I
didn't forget something:

*.fdt:
- Start with the number of record
- for each record
- record number
- if it's tokenized
- record value (which field stored is true, this is for short
field)

*.fnm:
-contain fields information:
for each fields:
- field name
- if field is indexed
segment:
- contain the base name of the segment file

*.prx:
- contain proximity offset for each indexed word

*.freq:
if the document is not optimize
- contain document number and frequence of the word in the document
if the document is optimized
- contain the frequence of the word in the document

.f([1-9])+:

I don't know (here I need your help)

.tii or .tis:
I m really not sure about this

- contain word
- proximity
- frequence


Thanks in advance :)

--

Trémont romain
EPITA promotion 2004
Stagiaire chez AIS

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene files [ In reply to ]
Doug put together a document about this

http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01971.html

--Peter


On Monday, October 28, 2002, at 04:01 AM, Tremont romain wrote:

> Hi folk,
>
> I was looking in the structure of the index files and here is what
> I have found... I'd like to know if what I have seen is right and if I
> didn't forget something:
>
> *.fdt:
> - Start with the number of record
> - for each record
> - record number
> - if it's tokenized
> - record value (which field stored is true, this is for short
> field)
>
> *.fnm:
> -contain fields information:
> for each fields:
> - field name
> - if field is indexed
> segment:
> - contain the base name of the segment file
>
> *.prx:
> - contain proximity offset for each indexed word
>
> *.freq:
> if the document is not optimize
> - contain document number and frequence of the word in the document
> if the document is optimized
> - contain the frequence of the word in the document
>
> .f([1-9])+:
>
> I don't know (here I need your help)
>
> .tii or .tis:
> I m really not sure about this
>
> - contain word
> - proximity
> - frequence
>
>
> Thanks in advance :)
>
> --
>
> Trémont romain
> EPITA promotion 2004
> Stagiaire chez AIS
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene files [ In reply to ]
On Mon, 28 Oct 2002 07:18:52 -0800
Peter Carlson <carlson@bookandhammer.com> wrote:

> Doug put together a document about this
>
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01971.html
>
> --Peter


Thanks that's exactly what I was looking for (even better than what I
expected)


--

Trémont romain
EPITA promotion 2004
Stagiaire chez AIS

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene files [ In reply to ]
Should I cast this doc into XML? I would like to do that, as well as the
ranking function from the FAQ

--Clemens

----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Monday, October 28, 2002 4:18 PM
Subject: Re: Lucene files


> Doug put together a document about this
>
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01971.html
>
> --Peter
>
>
> On Monday, October 28, 2002, at 04:01 AM, Tremont romain wrote:
>
> > Hi folk,
> >
> > I was looking in the structure of the index files and here is what
> > I have found... I'd like to know if what I have seen is right and if I
> > didn't forget something:
> >
> > *.fdt:
> > - Start with the number of record
> > - for each record
> > - record number
> > - if it's tokenized
> > - record value (which field stored is true, this is for short
> > field)
> >
> > *.fnm:
> > -contain fields information:
> > for each fields:
> > - field name
> > - if field is indexed
> > segment:
> > - contain the base name of the segment file
> >
> > *.prx:
> > - contain proximity offset for each indexed word
> >
> > *.freq:
> > if the document is not optimize
> > - contain document number and frequence of the word in the document
> > if the document is optimized
> > - contain the frequence of the word in the document
> >
> > .f([1-9])+:
> >
> > I don't know (here I need your help)
> >
> > .tii or .tis:
> > I m really not sure about this
> >
> > - contain word
> > - proximity
> > - frequence
> >
> >
> > Thanks in advance :)
> >
> > --
> >
> > Trémont romain
> > EPITA promotion 2004
> > Stagiaire chez AIS
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene files [ In reply to ]
+1 for that! I'd love to see that in CVS somewhere, and I've been
meaning to do that myself, and was just wondering if Doug updated
anything since he sent his stuff, or would want to add anything else
before the stuff is made available.

I'd say put that in XML if you can and lets make it available through
the web site. I have another Web document that goes into more detail,
written by somebody who peeked into the files with some editor and
analyzed the contents.
If you want it, let me know, and I can send it to you for XMLization.

Otis


--- Clemens Marschner <cmad@lanlab.de> wrote:
> Should I cast this doc into XML? I would like to do that, as well as
> the
> ranking function from the FAQ
>
> --Clemens
>
> ----- Original Message -----
> From: "Peter Carlson" <carlson@bookandhammer.com>
> To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> Sent: Monday, October 28, 2002 4:18 PM
> Subject: Re: Lucene files
>
>
> > Doug put together a document about this
> >
> >
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01971.html
> >
> > --Peter
> >
> >
> > On Monday, October 28, 2002, at 04:01 AM, Tremont romain wrote:
> >
> > > Hi folk,
> > >
> > > I was looking in the structure of the index files and here is
> what
> > > I have found... I'd like to know if what I have seen is right and
> if I
> > > didn't forget something:
> > >
> > > *.fdt:
> > > - Start with the number of record
> > > - for each record
> > > - record number
> > > - if it's tokenized
> > > - record value (which field stored is true, this is for short
> > > field)
> > >
> > > *.fnm:
> > > -contain fields information:
> > > for each fields:
> > > - field name
> > > - if field is indexed
> > > segment:
> > > - contain the base name of the segment file
> > >
> > > *.prx:
> > > - contain proximity offset for each indexed word
> > >
> > > *.freq:
> > > if the document is not optimize
> > > - contain document number and frequence of the word in the
> document
> > > if the document is optimized
> > > - contain the frequence of the word in the document
> > >
> > > .f([1-9])+:
> > >
> > > I don't know (here I need your help)
> > >
> > > .tii or .tis:
> > > I m really not sure about this
> > >
> > > - contain word
> > > - proximity
> > > - frequence
> > >
> > >
> > > Thanks in advance :)
> > >
> > > --
> > >
> > > Trémont romain
> > > EPITA promotion 2004
> > > Stagiaire chez AIS
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>