Mailing List Archive: Accessing the "contents" field

Accessing the "contents" field

Mar 13, 2002, 9:36 PM

Post #1 of 8 (1842 views)

Hi,

I would like to access the contents field of a document, fo rexample

doc(i).get("contents")

this should return a String (am i right?) but when I print it out I find that it is a Null. How do I go about accessing the contents of the file????

Rosh.

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: Accessing the "contents" field [ In reply to ]

otis_gospodnetic at yahoo

Mar 13, 2002, 10:21 PM

Post #2 of 8 (1826 views)

Permalink

Make sure that you added it to the index as a stored field, and not
just indexed. Look at the Javadoc for Field class to see different
field types.

Otis

--- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> Hi,
>
> I would like to access the contents field of a document, fo rexample
>
> doc(i).get("contents")
>
> this should return a String (am i right?) but when I print it out I
> find that it is a Null. How do I go about accessing the contents of
> the file????
>
> Rosh.
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>

__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: Accessing the "contents" field [ In reply to ]

suneethad at india

Mar 13, 2002, 10:35 PM

Post #3 of 8 (1832 views)

Permalink

Hi Roshan,
U've got to get the contents as
doc(i).get("body")
The field name has to match what is queried back .

Suneetha.

NAVENDRA wrote:

> It is being added as a Text field.... which is stored i gather. Is is also being added as as a Reader.... this might be the problem I am not sure.
>
> Here is my code, can anybody please help me.
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.*;
>
> import java.io.*;
>
> public class IndexFiles {
> // usage: IndexFiles index-path file . . .
> public static void main(String[] args) throws Exception {
> String indexPath = args[0];
> IndexWriter writer;
>
> writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
> for (int i=1; i<args.length; i++) {
> System.out.println("Indexing file " + args[i]);
> InputStream is = new FileInputStream(args[i]);
>
>
>
> // We create a Document with two Fields, one which contains
> // the file path, and one the file's contents.
> Document doc = new Document();
> doc.add(Field.UnIndexed("path", args[i]));
> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
>
> writer.addDocument(doc);
> is.close();
> };
>
> writer.close();
> }
> }
>
> >>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
> Make sure that you added it to the index as a stored field, and not
> just indexed. Look at the Javadoc for Field class to see different
> field types.
>
> Otis
>
> --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> > Hi,
> >
> > I would like to access the contents field of a document, fo rexample
> >
> > doc(i).get("contents")
> >
> > this should return a String (am i right?) but when I print it out I
> > find that it is a Null. How do I go about accessing the contents of
> > the file????
> >
> > Rosh.
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Sports - live college hoops coverage
> http://sports.yahoo.com/
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

RE: Accessing the "contents" field [ In reply to ]

SreenivasuluM at PLANETASIA

Mar 13, 2002, 10:58 PM

Post #4 of 8 (1834 views)

Permalink

Hi Roshan,
Here is the solution.

> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));

while adding the fields to the document...there is another overloaded
function which takes string instead of Reader. Which actually stores the
contents in document.
First read the contents from the reader in to string and pass the string to
that func.
It works

sreeni

> ----------
> From: ROSHAN NAVENDRA[SMTP:rnavendra@ccnetwork.com.au]
> Reply To: Lucene Users List
> Sent: Thursday, March 14, 2002 12:24 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Accessing the "contents" field
>
> Sunnetha,
>
> this is my Search code... I have tried the get("body") call but it returns
> a Null..... thus i cannot actually process document contents...... I need
> to do this to extract data such as document titles etc
>
> currently it uses get("path") to get the documents path but if I were to
> change that to get("body") to System.out.print the contents it would print
> "Null"
>
> anyway.. here is the code:
>
> import java.io.IOException;
> import java.io.BufferedReader;
> import java.io.InputStreamReader;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.queryParser.QueryParser;
>
> public class Search {
> public static void main(String[] args) throws Exception {
> String indexPath = "index", queryString = "sharon";
>
> Searcher searcher = new IndexSearcher(indexPath);
> Query query = QueryParser.parse(queryString, "body",
> new StopAnalyzer());
> Hits hits = searcher.search(query);
>
> for (int i=0; i<hits.length(); i++) {
> System.out.println(hits.doc(i).get("path") + "; Score: " +
> hits.score(i));
> };
> }
> }
>
> >>> suneethad@india.adventnet.com 03/14/02 03:35PM >>>
> Hi Roshan,
> U've got to get the contents as
> doc(i).get("body")
> The field name has to match what is queried back .
>
> Suneetha.
>
> NAVENDRA wrote:
>
> > It is being added as a Text field.... which is stored i gather. Is is
> also being added as as a Reader.... this might be the problem I am not
> sure.
> >
> > Here is my code, can anybody please help me.
> >
> > import org.apache.lucene.analysis.SimpleAnalyzer;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.document.Document;
> > import org.apache.lucene.document.*;
> >
> > import java.io.*;
> >
> > public class IndexFiles {
> > // usage: IndexFiles index-path file . . .
> > public static void main(String[] args) throws Exception {
> > String indexPath = args[0];
> > IndexWriter writer;
> >
> > writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
> > for (int i=1; i<args.length; i++) {
> > System.out.println("Indexing file " + args[i]);
> > InputStream is = new FileInputStream(args[i]);
> >
> >
> >
> > // We create a Document with two Fields, one which contains
> > // the file path, and one the file's contents.
> > Document doc = new Document();
> > doc.add(Field.UnIndexed("path", args[i]));
> > doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
> >
> > writer.addDocument(doc);
> > is.close();
> > };
> >
> > writer.close();
> > }
> > }
> >
> > >>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
> > Make sure that you added it to the index as a stored field, and not
> > just indexed. Look at the Javadoc for Field class to see different
> > field types.
> >
> > Otis
> >
> > --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> > > Hi,
> > >
> > > I would like to access the contents field of a document, fo rexample
> > >
> > > doc(i).get("contents")
> > >
> > > this should return a String (am i right?) but when I print it out I
> > > find that it is a Null. How do I go about accessing the contents of
> > > the file????
> > >
> > > Rosh.
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Sports - live college hoops coverage
> > http://sports.yahoo.com/
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: Accessing the "contents" field [ In reply to ]

rnavendra at ccnetwork

Mar 13, 2002, 11:34 PM

Post #5 of 8 (1825 views)

Permalink

It is being added as a Text field.... which is stored i gather. Is is also being added as as a Reader.... this might be the problem I am not sure.

Here is my code, can anybody please help me.

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.*;

import java.io.*;

public class IndexFiles {
// usage: IndexFiles index-path file . . .
public static void main(String[] args) throws Exception {
String indexPath = args[0];
IndexWriter writer;

writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
for (int i=1; i<args.length; i++) {
System.out.println("Indexing file " + args[i]);
InputStream is = new FileInputStream(args[i]);

// We create a Document with two Fields, one which contains
// the file path, and one the file's contents.
Document doc = new Document();
doc.add(Field.UnIndexed("path", args[i]));
doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));

writer.addDocument(doc);
is.close();
};

writer.close();
}
}

>>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
Make sure that you added it to the index as a stored field, and not
just indexed. Look at the Javadoc for Field class to see different
field types.

Otis

--- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> Hi,
>
> I would like to access the contents field of a document, fo rexample
>
> doc(i).get("contents")
>
> this should return a String (am i right?) but when I print it out I
> find that it is a Null. How do I go about accessing the contents of
> the file????
>
> Rosh.
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>

__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: Accessing the "contents" field [ In reply to ]

rnavendra at ccnetwork

Mar 13, 2002, 11:54 PM

Post #6 of 8 (1829 views)

Permalink

Sunnetha,

this is my Search code... I have tried the get("body") call but it returns a Null..... thus i cannot actually process document contents...... I need to do this to extract data such as document titles etc

currently it uses get("path") to get the documents path but if I were to change that to get("body") to System.out.print the contents it would print "Null"

anyway.. here is the code:

import java.io.IOException;
import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;

public class Search {
public static void main(String[] args) throws Exception {
String indexPath = "index", queryString = "sharon";

Searcher searcher = new IndexSearcher(indexPath);
Query query = QueryParser.parse(queryString, "body",
new StopAnalyzer());
Hits hits = searcher.search(query);

for (int i=0; i<hits.length(); i++) {
System.out.println(hits.doc(i).get("path") + "; Score: " +
hits.score(i));
};
}
}

>>> suneethad@india.adventnet.com 03/14/02 03:35PM >>>
Hi Roshan,
U've got to get the contents as
doc(i).get("body")
The field name has to match what is queried back .

Suneetha.

NAVENDRA wrote:

> It is being added as a Text field.... which is stored i gather. Is is also being added as as a Reader.... this might be the problem I am not sure.
>
> Here is my code, can anybody please help me.
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.*;
>
> import java.io.*;
>
> public class IndexFiles {
> // usage: IndexFiles index-path file . . .
> public static void main(String[] args) throws Exception {
> String indexPath = args[0];
> IndexWriter writer;
>
> writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
> for (int i=1; i<args.length; i++) {
> System.out.println("Indexing file " + args[i]);
> InputStream is = new FileInputStream(args[i]);
>
>
>
> // We create a Document with two Fields, one which contains
> // the file path, and one the file's contents.
> Document doc = new Document();
> doc.add(Field.UnIndexed("path", args[i]));
> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
>
> writer.addDocument(doc);
> is.close();
> };
>
> writer.close();
> }
> }
>
> >>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
> Make sure that you added it to the index as a stored field, and not
> just indexed. Look at the Javadoc for Field class to see different
> field types.
>
> Otis
>
> --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> > Hi,
> >
> > I would like to access the contents field of a document, fo rexample
> >
> > doc(i).get("contents")
> >
> > this should return a String (am i right?) but when I print it out I
> > find that it is a Null. How do I go about accessing the contents of
> > the file????
> >
> > Rosh.
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Sports - live college hoops coverage
> http://sports.yahoo.com/
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: Accessing the "contents" field [ In reply to ]

carlson at bookandhammer

Mar 14, 2002, 1:07 AM

Post #7 of 8 (1822 views)

Permalink

Hi,

Your adding the Field.Text(String, Reader) which does not store the field
contents. See API docs.

Either use
doc.add(Field.Text(String,String)) or

doc.add(new Field(String, String true, true, true))
To store, index and tokenize

Hope this helps

--Peter

On 3/13/02 10:54 PM, "ROSHAN NAVENDRA" <rnavendra@ccnetwork.com.au> wrote:

> Sunnetha,
>
> this is my Search code... I have tried the get("body") call but it returns a
> Null..... thus i cannot actually process document contents...... I need to do
> this to extract data such as document titles etc
>
> currently it uses get("path") to get the documents path but if I were to
> change that to get("body") to System.out.print the contents it would print
> "Null"
>
> anyway.. here is the code:
>
> import java.io.IOException;
> import java.io.BufferedReader;
> import java.io.InputStreamReader;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.queryParser.QueryParser;
>
> public class Search {
> public static void main(String[] args) throws Exception {
> String indexPath = "index", queryString = "sharon";
>
> Searcher searcher = new IndexSearcher(indexPath);
> Query query = QueryParser.parse(queryString, "body",
> new StopAnalyzer());
> Hits hits = searcher.search(query);
>
> for (int i=0; i<hits.length(); i++) {
> System.out.println(hits.doc(i).get("path") + "; Score: " +
> hits.score(i));
> };
> }
> }
>
>>>> suneethad@india.adventnet.com 03/14/02 03:35PM >>>
> Hi Roshan,
> U've got to get the contents as
> doc(i).get("body")
> The field name has to match what is queried back .
>
> Suneetha.
>
> NAVENDRA wrote:
>
>> It is being added as a Text field.... which is stored i gather. Is is also
>> being added as as a Reader.... this might be the problem I am not sure.
>>
>> Here is my code, can anybody please help me.
>>
>> import org.apache.lucene.analysis.SimpleAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.*;
>>
>> import java.io.*;
>>
>> public class IndexFiles {
>> // usage: IndexFiles index-path file . . .
>> public static void main(String[] args) throws Exception {
>> String indexPath = args[0];
>> IndexWriter writer;
>>
>> writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
>> for (int i=1; i<args.length; i++) {
>> System.out.println("Indexing file " + args[i]);
>> InputStream is = new FileInputStream(args[i]);
>>
>>
>>
>> // We create a Document with two Fields, one which contains
>> // the file path, and one the file's contents.
>> Document doc = new Document();
>> doc.add(Field.UnIndexed("path", args[i]));
>> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
>>
>> writer.addDocument(doc);
>> is.close();
>> };
>>
>> writer.close();
>> }
>> }
>>
>>>>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
>> Make sure that you added it to the index as a stored field, and not
>> just indexed. Look at the Javadoc for Field class to see different
>> field types.
>>
>> Otis
>>
>> --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
>>> Hi,
>>>
>>> I would like to access the contents field of a document, fo rexample
>>>
>>> doc(i).get("contents")
>>>
>>> this should return a String (am i right?) but when I print it out I
>>> find that it is a Null. How do I go about accessing the contents of
>>> the file????
>>>
>>> Rosh.
>>>
>>>
>>> --
>>> To unsubscribe, e-mail:
>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>> For additional commands, e-mail:
>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Yahoo! Sports - live college hoops coverage
>> http://sports.yahoo.com/
>>
>> --
>> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>>
>> --
>> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: Accessing the "contents" field [ In reply to ]

william_wws at hotmail

Mar 14, 2002, 7:09 AM

Post #8 of 8 (1837 views)

Permalink

Hi Roshan,
You only will have the contents in a field if you add it.
Ex.:
doc.add(Field.Text("contents", "My text"));

"contents" is not a default name.
William.

>From: "ROSHAN NAVENDRA" <rnavendra@ccnetwork.com.au>
>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>To: <lucene-user@jakarta.apache.org>
>Subject: Accessing the "contents" field
>Date: Thu, 14 Mar 2002 14:36:53 +1000
>
>Hi,
>
>I would like to access the contents field of a document, fo rexample
>
>doc(i).get("contents")
>
>this should return a String (am i right?) but when I print it out I find
>that it is a Null. How do I go about accessing the contents of the file????
>
>Rosh.
>
>
>--
>To unsubscribe, e-mail:
><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail:
><mailto:lucene-user-help@jakarta.apache.org>
>

_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>