Mailing List Archive

Accessing the "contents" field
Hi,

I would like to access the contents field of a document, fo rexample

doc(i).get("contents")

this should return a String (am i right?) but when I print it out I find that it is a Null. How do I go about accessing the contents of the file????

Rosh.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Accessing the "contents" field [ In reply to ]
Make sure that you added it to the index as a stored field, and not
just indexed. Look at the Javadoc for Field class to see different
field types.

Otis

--- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> Hi,
>
> I would like to access the contents field of a document, fo rexample
>
> doc(i).get("contents")
>
> this should return a String (am i right?) but when I print it out I
> find that it is a Null. How do I go about accessing the contents of
> the file????
>
> Rosh.
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Accessing the "contents" field [ In reply to ]
Hi Roshan,
U've got to get the contents as
doc(i).get("body")
The field name has to match what is queried back .

Suneetha.

NAVENDRA wrote:

> It is being added as a Text field.... which is stored i gather. Is is also being added as as a Reader.... this might be the problem I am not sure.
>
> Here is my code, can anybody please help me.
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.*;
>
> import java.io.*;
>
> public class IndexFiles {
> // usage: IndexFiles index-path file . . .
> public static void main(String[] args) throws Exception {
> String indexPath = args[0];
> IndexWriter writer;
>
> writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
> for (int i=1; i<args.length; i++) {
> System.out.println("Indexing file " + args[i]);
> InputStream is = new FileInputStream(args[i]);
>
>
>
> // We create a Document with two Fields, one which contains
> // the file path, and one the file's contents.
> Document doc = new Document();
> doc.add(Field.UnIndexed("path", args[i]));
> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
>
> writer.addDocument(doc);
> is.close();
> };
>
> writer.close();
> }
> }
>
> >>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
> Make sure that you added it to the index as a stored field, and not
> just indexed. Look at the Javadoc for Field class to see different
> field types.
>
> Otis
>
> --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> > Hi,
> >
> > I would like to access the contents field of a document, fo rexample
> >
> > doc(i).get("contents")
> >
> > this should return a String (am i right?) but when I print it out I
> > find that it is a Null. How do I go about accessing the contents of
> > the file????
> >
> > Rosh.
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Sports - live college hoops coverage
> http://sports.yahoo.com/
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Accessing the "contents" field [ In reply to ]
Hi Roshan,
Here is the solution.

> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));

while adding the fields to the document...there is another overloaded
function which takes string instead of Reader. Which actually stores the
contents in document.
First read the contents from the reader in to string and pass the string to
that func.
It works


sreeni


> ----------
> From: ROSHAN NAVENDRA[SMTP:rnavendra@ccnetwork.com.au]
> Reply To: Lucene Users List
> Sent: Thursday, March 14, 2002 12:24 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Accessing the "contents" field
>
> Sunnetha,
>
> this is my Search code... I have tried the get("body") call but it returns
> a Null..... thus i cannot actually process document contents...... I need
> to do this to extract data such as document titles etc
>
> currently it uses get("path") to get the documents path but if I were to
> change that to get("body") to System.out.print the contents it would print
> "Null"
>
> anyway.. here is the code:
>
> import java.io.IOException;
> import java.io.BufferedReader;
> import java.io.InputStreamReader;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.queryParser.QueryParser;
>
> public class Search {
> public static void main(String[] args) throws Exception {
> String indexPath = "index", queryString = "sharon";
>
> Searcher searcher = new IndexSearcher(indexPath);
> Query query = QueryParser.parse(queryString, "body",
> new StopAnalyzer());
> Hits hits = searcher.search(query);
>
> for (int i=0; i<hits.length(); i++) {
> System.out.println(hits.doc(i).get("path") + "; Score: " +
> hits.score(i));
> };
> }
> }
>
> >>> suneethad@india.adventnet.com 03/14/02 03:35PM >>>
> Hi Roshan,
> U've got to get the contents as
> doc(i).get("body")
> The field name has to match what is queried back .
>
> Suneetha.
>
> NAVENDRA wrote:
>
> > It is being added as a Text field.... which is stored i gather. Is is
> also being added as as a Reader.... this might be the problem I am not
> sure.
> >
> > Here is my code, can anybody please help me.
> >
> > import org.apache.lucene.analysis.SimpleAnalyzer;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.document.Document;
> > import org.apache.lucene.document.*;
> >
> > import java.io.*;
> >
> > public class IndexFiles {
> > // usage: IndexFiles index-path file . . .
> > public static void main(String[] args) throws Exception {
> > String indexPath = args[0];
> > IndexWriter writer;
> >
> > writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
> > for (int i=1; i<args.length; i++) {
> > System.out.println("Indexing file " + args[i]);
> > InputStream is = new FileInputStream(args[i]);
> >
> >
> >
> > // We create a Document with two Fields, one which contains
> > // the file path, and one the file's contents.
> > Document doc = new Document();
> > doc.add(Field.UnIndexed("path", args[i]));
> > doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
> >
> > writer.addDocument(doc);
> > is.close();
> > };
> >
> > writer.close();
> > }
> > }
> >
> > >>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
> > Make sure that you added it to the index as a stored field, and not
> > just indexed. Look at the Javadoc for Field class to see different
> > field types.
> >
> > Otis
> >
> > --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> > > Hi,
> > >
> > > I would like to access the contents field of a document, fo rexample
> > >
> > > doc(i).get("contents")
> > >
> > > this should return a String (am i right?) but when I print it out I
> > > find that it is a Null. How do I go about accessing the contents of
> > > the file????
> > >
> > > Rosh.
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Sports - live college hoops coverage
> > http://sports.yahoo.com/
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Accessing the "contents" field [ In reply to ]
It is being added as a Text field.... which is stored i gather. Is is also being added as as a Reader.... this might be the problem I am not sure.

Here is my code, can anybody please help me.

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.*;

import java.io.*;


public class IndexFiles {
// usage: IndexFiles index-path file . . .
public static void main(String[] args) throws Exception {
String indexPath = args[0];
IndexWriter writer;

writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
for (int i=1; i<args.length; i++) {
System.out.println("Indexing file " + args[i]);
InputStream is = new FileInputStream(args[i]);



// We create a Document with two Fields, one which contains
// the file path, and one the file's contents.
Document doc = new Document();
doc.add(Field.UnIndexed("path", args[i]));
doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));

writer.addDocument(doc);
is.close();
};

writer.close();
}
}



>>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
Make sure that you added it to the index as a stored field, and not
just indexed. Look at the Javadoc for Field class to see different
field types.

Otis

--- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> Hi,
>
> I would like to access the contents field of a document, fo rexample
>
> doc(i).get("contents")
>
> this should return a String (am i right?) but when I print it out I
> find that it is a Null. How do I go about accessing the contents of
> the file????
>
> Rosh.
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Accessing the "contents" field [ In reply to ]
Sunnetha,

this is my Search code... I have tried the get("body") call but it returns a Null..... thus i cannot actually process document contents...... I need to do this to extract data such as document titles etc

currently it uses get("path") to get the documents path but if I were to change that to get("body") to System.out.print the contents it would print "Null"

anyway.. here is the code:

import java.io.IOException;
import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;

public class Search {
public static void main(String[] args) throws Exception {
String indexPath = "index", queryString = "sharon";

Searcher searcher = new IndexSearcher(indexPath);
Query query = QueryParser.parse(queryString, "body",
new StopAnalyzer());
Hits hits = searcher.search(query);

for (int i=0; i<hits.length(); i++) {
System.out.println(hits.doc(i).get("path") + "; Score: " +
hits.score(i));
};
}
}

>>> suneethad@india.adventnet.com 03/14/02 03:35PM >>>
Hi Roshan,
U've got to get the contents as
doc(i).get("body")
The field name has to match what is queried back .

Suneetha.

NAVENDRA wrote:

> It is being added as a Text field.... which is stored i gather. Is is also being added as as a Reader.... this might be the problem I am not sure.
>
> Here is my code, can anybody please help me.
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.*;
>
> import java.io.*;
>
> public class IndexFiles {
> // usage: IndexFiles index-path file . . .
> public static void main(String[] args) throws Exception {
> String indexPath = args[0];
> IndexWriter writer;
>
> writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
> for (int i=1; i<args.length; i++) {
> System.out.println("Indexing file " + args[i]);
> InputStream is = new FileInputStream(args[i]);
>
>
>
> // We create a Document with two Fields, one which contains
> // the file path, and one the file's contents.
> Document doc = new Document();
> doc.add(Field.UnIndexed("path", args[i]));
> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
>
> writer.addDocument(doc);
> is.close();
> };
>
> writer.close();
> }
> }
>
> >>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
> Make sure that you added it to the index as a stored field, and not
> just indexed. Look at the Javadoc for Field class to see different
> field types.
>
> Otis
>
> --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
> > Hi,
> >
> > I would like to access the contents field of a document, fo rexample
> >
> > doc(i).get("contents")
> >
> > this should return a String (am i right?) but when I print it out I
> > find that it is a Null. How do I go about accessing the contents of
> > the file????
> >
> > Rosh.
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Sports - live college hoops coverage
> http://sports.yahoo.com/
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Accessing the "contents" field [ In reply to ]
Hi,

Your adding the Field.Text(String, Reader) which does not store the field
contents. See API docs.

Either use
doc.add(Field.Text(String,String)) or

doc.add(new Field(String, String true, true, true))
To store, index and tokenize

Hope this helps

--Peter


On 3/13/02 10:54 PM, "ROSHAN NAVENDRA" <rnavendra@ccnetwork.com.au> wrote:

> Sunnetha,
>
> this is my Search code... I have tried the get("body") call but it returns a
> Null..... thus i cannot actually process document contents...... I need to do
> this to extract data such as document titles etc
>
> currently it uses get("path") to get the documents path but if I were to
> change that to get("body") to System.out.print the contents it would print
> "Null"
>
> anyway.. here is the code:
>
> import java.io.IOException;
> import java.io.BufferedReader;
> import java.io.InputStreamReader;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.queryParser.QueryParser;
>
> public class Search {
> public static void main(String[] args) throws Exception {
> String indexPath = "index", queryString = "sharon";
>
> Searcher searcher = new IndexSearcher(indexPath);
> Query query = QueryParser.parse(queryString, "body",
> new StopAnalyzer());
> Hits hits = searcher.search(query);
>
> for (int i=0; i<hits.length(); i++) {
> System.out.println(hits.doc(i).get("path") + "; Score: " +
> hits.score(i));
> };
> }
> }
>
>>>> suneethad@india.adventnet.com 03/14/02 03:35PM >>>
> Hi Roshan,
> U've got to get the contents as
> doc(i).get("body")
> The field name has to match what is queried back .
>
> Suneetha.
>
> NAVENDRA wrote:
>
>> It is being added as a Text field.... which is stored i gather. Is is also
>> being added as as a Reader.... this might be the problem I am not sure.
>>
>> Here is my code, can anybody please help me.
>>
>> import org.apache.lucene.analysis.SimpleAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.*;
>>
>> import java.io.*;
>>
>> public class IndexFiles {
>> // usage: IndexFiles index-path file . . .
>> public static void main(String[] args) throws Exception {
>> String indexPath = args[0];
>> IndexWriter writer;
>>
>> writer = new IndexWriter(indexPath, new SimpleAnalyzer(), false);
>> for (int i=1; i<args.length; i++) {
>> System.out.println("Indexing file " + args[i]);
>> InputStream is = new FileInputStream(args[i]);
>>
>>
>>
>> // We create a Document with two Fields, one which contains
>> // the file path, and one the file's contents.
>> Document doc = new Document();
>> doc.add(Field.UnIndexed("path", args[i]));
>> doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));
>>
>> writer.addDocument(doc);
>> is.close();
>> };
>>
>> writer.close();
>> }
>> }
>>
>>>>> otis_gospodnetic@yahoo.com 03/14/02 03:21PM >>>
>> Make sure that you added it to the index as a stored field, and not
>> just indexed. Look at the Javadoc for Field class to see different
>> field types.
>>
>> Otis
>>
>> --- ROSHAN NAVENDRA <rnavendra@ccnetwork.com.au> wrote:
>>> Hi,
>>>
>>> I would like to access the contents field of a document, fo rexample
>>>
>>> doc(i).get("contents")
>>>
>>> this should return a String (am i right?) but when I print it out I
>>> find that it is a Null. How do I go about accessing the contents of
>>> the file????
>>>
>>> Rosh.
>>>
>>>
>>> --
>>> To unsubscribe, e-mail:
>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>> For additional commands, e-mail:
>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Yahoo! Sports - live college hoops coverage
>> http://sports.yahoo.com/
>>
>> --
>> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>>
>> --
>> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Accessing the "contents" field [ In reply to ]
Hi Roshan,
You only will have the contents in a field if you add it.
Ex.:
doc.add(Field.Text("contents", "My text"));

"contents" is not a default name.
William.

>From: "ROSHAN NAVENDRA" <rnavendra@ccnetwork.com.au>
>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>To: <lucene-user@jakarta.apache.org>
>Subject: Accessing the "contents" field
>Date: Thu, 14 Mar 2002 14:36:53 +1000
>
>Hi,
>
>I would like to access the contents field of a document, fo rexample
>
>doc(i).get("contents")
>
>this should return a String (am i right?) but when I print it out I find
>that it is a Null. How do I go about accessing the contents of the file????
>
>Rosh.
>
>
>--
>To unsubscribe, e-mail:
><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail:
><mailto:lucene-user-help@jakarta.apache.org>
>




_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>