Hello,
I'm glad to inform you that I've built a complete Lucene-based web
search solution for the Finnish Defence Forces web site and that it's
online as of this moment.
You can see it in action at:
http://www2.mil.fi:8080/haku/haku?q=hornet
The user interface is in Finnish, but I hope you can get a general grasp
of what's going on there anyway.
As for the technical facts, I basically built a web crawler for indexing
the www.mil.fi -sites, and a servlet/xml/xsl -based frontend that
delivers the results to your screen.
The crawler is capable of indexing HTML (I used the Swing
parser), PDF (I used xpdf, which is kinda bubble-gum-ish, but it works
;) and images (they're searched for by filename only).
And for the front end, I have a servlet that does the searching,
prints out XML (raw XML output:
http://www2.mil.fi:8080/haku/raw?q=hornet) which is then transformed to
HTML via XSL (I wrote a neat little servlet filter for this).
The search servlet also has a simple query parser: the incoming
query is parsed so that the default operand is AND instead of OR.
So basically, if you type 'hornet picture', the actual search
sent to Lucene will be '+hornet +picture' - I wanted it to be
Google-like.
Anyway, check it out and feel free to ask me if you'd like to know
something more about the implementation.
Also, feel free to mention "Finnish Defence Forces" at the "Powered by"
-section of the Lucene web site.
Thanks go to all the Lucene developers - it's great stuff :D
Jari Aarniala
----------------------------------------------
Jari Aarniala
foo@welho.com "death is the
Vantaa, .fi last dance eternal"
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
I'm glad to inform you that I've built a complete Lucene-based web
search solution for the Finnish Defence Forces web site and that it's
online as of this moment.
You can see it in action at:
http://www2.mil.fi:8080/haku/haku?q=hornet
The user interface is in Finnish, but I hope you can get a general grasp
of what's going on there anyway.
As for the technical facts, I basically built a web crawler for indexing
the www.mil.fi -sites, and a servlet/xml/xsl -based frontend that
delivers the results to your screen.
The crawler is capable of indexing HTML (I used the Swing
parser), PDF (I used xpdf, which is kinda bubble-gum-ish, but it works
;) and images (they're searched for by filename only).
And for the front end, I have a servlet that does the searching,
prints out XML (raw XML output:
http://www2.mil.fi:8080/haku/raw?q=hornet) which is then transformed to
HTML via XSL (I wrote a neat little servlet filter for this).
The search servlet also has a simple query parser: the incoming
query is parsed so that the default operand is AND instead of OR.
So basically, if you type 'hornet picture', the actual search
sent to Lucene will be '+hornet +picture' - I wanted it to be
Google-like.
Anyway, check it out and feel free to ask me if you'd like to know
something more about the implementation.
Also, feel free to mention "Finnish Defence Forces" at the "Powered by"
-section of the Lucene web site.
Thanks go to all the Lucene developers - it's great stuff :D
Jari Aarniala
----------------------------------------------
Jari Aarniala
foo@welho.com "death is the
Vantaa, .fi last dance eternal"
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>