Mailing List Archive

Problem of indexing pdf files
Hello,

I am getting the following warning message when I am indexing the pdf files using Lucene Indexing.

log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParser).
log4j:WARN Please initialize the log4j system properly.

This is the code I am using:

if(pdf.exists())
{
String text = "";
try{
PDDocument document = PDDocument.load(pdf); // laden des Files

PDFTextStripper pts = new PDFTextStripper(); //Extrahieren des Textes
text = pts.getText(document);
document.close();
}
catch(IOException e){
System.out.println("File not found");
}
mDocument.add(Field.Text("fulltext", text));


thanx,
MTREDDY




Tirupati Reddy Manyam
24-06-08,
Sundugaullee-24,
79110 Freiburg
GERMANY.

Phone: 00497618811257
cell : 004917624649007

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: Problem of indexing pdf files [ In reply to ]
Hi ,

If you are using lucene to index pdf files actually it won't work .But ther's an on going project within Sourceforge with relate to content search called "docSearcher" .docSearcher supports indexing pdf, and allother MS format files except ppt files..So i think you better to have a look into it, and the most important thing is that docSearcher is built using lucene ..

And the warnings that you have mentioned...are common..you have to append a looger for logings..and initialize the property file for log4j..



Best Regards,
Mano

tirupathi reddy <tirupathireddy_m@yahoo.com> wrote:
Hello,

I am getting the following warning message when I am indexing the pdf files using Lucene Indexing.

log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParser).
log4j:WARN Please initialize the log4j system properly.

This is the code I am using:

if(pdf.exists())
{
String text = "";
try{
PDDocument document = PDDocument.load(pdf); // laden des Files

PDFTextStripper pts = new PDFTextStripper(); //Extrahieren des Textes
text = pts.getText(document);
document.close();
}
catch(IOException e){
System.out.println("File not found");
}
mDocument.add(Field.Text("fulltext", text));


thanx,
MTREDDY




Tirupati Reddy Manyam
24-06-08,
Sundugaullee-24,
79110 Freiburg
GERMANY.

Phone: 00497618811257
cell : 004917624649007

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

mmcd
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: Problem of indexing pdf files [ In reply to ]
That's a log4j warning message, because one of the PDFBox classes is
trying to log something, and you don't have log4j configured
appropriately. This is not a Lucene issue, and it's a warning, so you
can ignore it if you want.

Otis


--- tirupathi reddy <tirupathireddy_m@yahoo.com> wrote:

> Hello,
>
> I am getting the following warning message when I am indexing the
> pdf files using Lucene Indexing.
>
> log4j:WARN No appenders could be found for logger
> (org.pdfbox.pdfparser.PDFParser).
> log4j:WARN Please initialize the log4j system properly.
>
> This is the code I am using:
>
> if(pdf.exists())
> {
> String text = "";
> try{
> PDDocument document = PDDocument.load(pdf); // laden des Files
>
> PDFTextStripper pts = new PDFTextStripper(); //Extrahieren des
> Textes
> text = pts.getText(document);
> document.close();
> }
> catch(IOException e){
> System.out.println("File not found");
> }
> mDocument.add(Field.Text("fulltext", text));
>
>
> thanx,
> MTREDDY
>
>
>
>
> Tirupati Reddy Manyam
> 24-06-08,
> Sundugaullee-24,
> 79110 Freiburg
> GERMANY.
>
> Phone: 00497618811257
> cell : 004917624649007
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com