Mailing List Archive

Lucene in action at www.mil.fi
Hello,

I'm glad to inform you that I've built a complete Lucene-based web
search solution for the Finnish Defence Forces web site and that it's
online as of this moment.

You can see it in action at:
http://www2.mil.fi:8080/haku/haku?q=hornet

The user interface is in Finnish, but I hope you can get a general grasp
of what's going on there anyway.

As for the technical facts, I basically built a web crawler for indexing
the www.mil.fi -sites, and a servlet/xml/xsl -based frontend that
delivers the results to your screen.
The crawler is capable of indexing HTML (I used the Swing
parser), PDF (I used xpdf, which is kinda bubble-gum-ish, but it works
;) and images (they're searched for by filename only).
And for the front end, I have a servlet that does the searching,
prints out XML (raw XML output:
http://www2.mil.fi:8080/haku/raw?q=hornet) which is then transformed to
HTML via XSL (I wrote a neat little servlet filter for this).
The search servlet also has a simple query parser: the incoming
query is parsed so that the default operand is AND instead of OR.
So basically, if you type 'hornet picture', the actual search
sent to Lucene will be '+hornet +picture' - I wanted it to be
Google-like.

Anyway, check it out and feel free to ask me if you'd like to know
something more about the implementation.

Also, feel free to mention "Finnish Defence Forces" at the "Powered by"
-section of the Lucene web site.

Thanks go to all the Lucene developers - it's great stuff :D

Jari Aarniala

----------------------------------------------
Jari Aarniala
foo@welho.com "death is the
Vantaa, .fi last dance eternal"




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene in action at www.mil.fi [ In reply to ]
Hi Jari

whre do you build your index? On filesystem? Do you use database?

Laura


> Hello,
>
> I'm glad to inform you that I've built a complete Lucene-based web
> search solution for the Finnish Defence Forces web site and that it's
> online as of this moment.
>
> You can see it in action at:
> http://www2.mil.fi:8080/haku/haku?q=hornet
>
> The user interface is in Finnish, but I hope you can get a general gra
sp
> of what's going on there anyway.
>
> As for the technical facts, I basically built a web crawler for indexi
ng
> the www.mil.fi -sites, and a servlet/xml/xsl -based frontend that
> delivers the results to your screen.
> The crawler is capable of indexing HTML (I used the Swing
> parser), PDF (I used xpdf, which is kinda bubble-gum-ish, but it works
> ;) and images (they're searched for by filename only).
> And for the front end, I have a servlet that does the searching,
> prints out XML (raw XML output:
> http://www2.mil.fi:8080/haku/raw?
q=hornet) which is then transformed to
> HTML via XSL (I wrote a neat little servlet filter for this).
> The search servlet also has a simple query parser: the incoming
> query is parsed so that the default operand is AND instead of OR.
> So basically, if you type 'hornet picture', the actual search
> sent to Lucene will be '+hornet +picture' - I wanted it to be
> Google-like.
>
> Anyway, check it out and feel free to ask me if you'd like to know
> something more about the implementation.
>
> Also, feel free to mention "Finnish Defence Forces" at the "Powered by
"
> -section of the Lucene web site.
>
> Thanks go to all the Lucene developers - it's great stuff :D
>
> Jari Aarniala
>
> ----------------------------------------------
> Jari Aarniala
> foo@welho.com "death is the
> Vantaa, .fi last dance eternal"
>
>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-
unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
help@jakarta.apache.org>
>
>
RE: Lucene in action at www.mil.fi [ In reply to ]
The index is built on the local filesystem once every day.

> -----Original Message-----
> From: lucene@libero.it [mailto:lucene@libero.it]
> Sent: 23. huhtikuuta 2002 10:06
> To: lucene-user@jakarta.apache.org
> Subject: Re: Lucene in action at www.mil.fi
>
> Hi Jari
>
> whre do you build your index? On filesystem? Do you use database?
>
> Laura
>
>
> > Hello,
> >
> > I'm glad to inform you that I've built a complete Lucene-based web
> > search solution for the Finnish Defence Forces web site and that
it's
> > online as of this moment.
> >
> > You can see it in action at:
> > http://www2.mil.fi:8080/haku/haku?q=hornet



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Lucene in action at www.mil.fi [ In reply to ]
Hi Jari,

what happens if you have to index about 15 millions of pages?
My problem is knowing what happens when the number of indexed pages
grows over 10 million.

Laura


> The index is built on the local filesystem once every day.
>
> > -----Original Message-----
> > From: lucene@libero.it [mailto:lucene@libero.it]
> > Sent: 23. huhtikuuta 2002 10:06
> > To: lucene-user@jakarta.apache.org
> > Subject: Re: Lucene in action at www.mil.fi
> >
> > Hi Jari
> >
> > whre do you build your index? On filesystem? Do you use database?
> >
> > Laura
> >
> >
> > > Hello,
> > >
> > > I'm glad to inform you that I've built a complete Lucene-based web
> > > search solution for the Finnish Defence Forces web site and that
> it's
> > > online as of this moment.
> > >
> > > You can see it in action at:
> > > http://www2.mil.fi:8080/haku/haku?q=hornet
>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-
unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
help@jakarta.apache.org>
>
>