Mailing List Archive: fun with kinosearch

fun with kinosearch

May 30, 2006, 8:57 AM

Post #1 of 17 (1640 views)

I have been having fun with KinoSearch, and I have documented my
experiences here:

http://dewey.library.nd.edu/morgan/kinosearch/

FYI.

--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame

(574) 631-8604

fun with kinosearch [ In reply to ]

marvin at rectangular

May 30, 2006, 9:40 AM

Post #2 of 17 (1643 views)

Permalink

On May 30, 2006, at 8:45 AM, Eric Lease Morgan wrote:
> http://dewey.library.nd.edu/morgan/kinosearch/

Great writeup! It's like the KinoSearch::Docs::Tutorial that I'd
like to write if I could find the time, but better than I could do
even if I did find the time.

The first search I tried was 'fun'. It turned up one document:
"State prison life by one who has been there". Keywords: "Prisoners;
Personal narratives; Prison violence; History;"

I'm a little concerned about this: "First of all, after merging
indexes they seem to loose their fielded searching ability." Can you
elaborate on what you are not able to do after merging indexes?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

fun with kinosearch [ In reply to ]

brian.cassidy at nald

May 30, 2006, 11:14 AM

Post #3 of 17 (1639 views)

Permalink

Hey Eric,

> -----Original Message-----
> I have been having fun with KinoSearch, and I have documented my
> experiences here:
>
> http://dewey.library.nd.edu/morgan/kinosearch/

In your section regarding index merging, I couldn't help but think to myself
that you'd probably rather use a multi-index searcher. This would alleviate
the pain of having to re-merge indices from time to time and also allows you
to mix and match indices on the fly for searching.

In the Lucene world, this would be the MultiSearcher class. I haven't delved
too far into the Kinosearch classes, but KinoSearch::Index::MultiReader may
provide what you'd need.

-Brian Cassidy

fun with kinosearch [ In reply to ]

marvin at rectangular

May 30, 2006, 11:41 AM

Post #4 of 17 (1641 views)

Permalink

On May 30, 2006, at 11:03 AM, Brian Cassidy wrote:

> In your section regarding index merging, I couldn't help but think
> to myself
> that you'd probably rather use a multi-index searcher. This would
> alleviate
> the pain of having to re-merge indices from time to time and also
> allows you
> to mix and match indices on the fly for searching.
>
> In the Lucene world, this would be the MultiSearcher class. I
> haven't delved
> too far into the Kinosearch classes, but
> KinoSearch::Index::MultiReader may
> provide what you'd need.

As in Lucene, KinoSearch's MultiReader reads multi-segment indexes,
not multiple indexes. MultiSearcher has not been ported, and it's
not at the top of my to-do list (FieldCache and Sort are).

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

fun with kinosearch [ In reply to ]

perl at peknet

May 31, 2006, 6:46 AM

Post #5 of 17 (1638 views)

Permalink

Thanks, Eric, for posting this good example.

I'm glad you mentioned Swish-e in your notes. I have been following the
KinoSearch development for several months. I've also been working on the
next version of Swish-e, which will offer KinoSearch as one of a few
possible backends (along with Xapian and some others).

Thanks, Marvin, for all your good work. It's always a pleasure to find
OSS that's so well documented and thought out.

cheers,
pek

Eric Lease Morgan scribbled on 5/30/06 10:45 AM:
>
> I have been having fun with KinoSearch, and I have documented my
> experiences here:
>
> http://dewey.library.nd.edu/morgan/kinosearch/
>
> FYI.
>
> --Eric Lease Morgan
> Head, Digital Access and Information Architecture Department
> University Libraries of Notre Dame
>
> (574) 631-8604
>
>
>
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch@rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>

--
Peter Karman . http://peknet.com/ . peter@peknet.com

fun with kinosearch [ In reply to ]

marvin at rectangular

May 31, 2006, 1:14 PM

Post #6 of 17 (1636 views)

Permalink

Peter,

Thanks for the kind words. If you've done any work on integrating
KinoSearch as a backend, I'd love to see it.

I understand that Swish now powers search.cpan.org, correct? I
noticed immediately when that change was made live -- much more
relevant results, much more quickly.

Best,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

On May 31, 2006, at 6:33 AM, Peter Karman wrote:

> I'm glad you mentioned Swish-e in your notes. I have been following
> the KinoSearch development for several months. I've also been
> working on the next version of Swish-e, which will offer KinoSearch
> as one of a few possible backends (along with Xapian and some others).
>
> Thanks, Marvin, for all your good work. It's always a pleasure to
> find OSS that's so well documented and thought out.
>
> cheers,
> pek

Re: fun with kinosearch [sru] [ In reply to ]

emorgan at nd

May 31, 2006, 9:32 PM

Post #7 of 17 (1640 views)

Permalink

On May 30, 2006, at 11:45 AM, Eric Lease Morgan wrote:

> http://dewey.library.nd.edu/morgan/kinosearch/

I have implemented a simple SRU2KinoSearch client:

http://dewey.library.nd.edu/morgan/kinosearch/sru/client.html

Brian Cassidy brought to my attention that KinoSearch supports the
same query language as Lucence, and since the Perl module named
CQ::Node contains a toLucene method for converting CQL queries into
Lucene queries, I was able to interface with my KinoSearch index.
Thus, Simple SRU2KinoSearch Client.

Hooray for open source software and open standards!

--
Eric Lease Morgan

I'm hiring a Senior Programmer Analyst. See http://
dewey.library.nd.edu/morgan/programmer/.

fun with kinosearch [ In reply to ]

peter at peknet

Jun 5, 2006, 11:56 AM

Post #8 of 17 (1640 views)

Permalink

Marvin Humphrey scribbled on 5/31/06 3:01 PM:
> Peter,
>
> Thanks for the kind words. If you've done any work on integrating
> KinoSearch as a backend, I'd love to see it.
>

Will definitely post to this list when I have something to see. :)

The plan at present is to have a doc parser in C (using libxml2) to
parse docs according to the current Swish-e -S prog API (thus taking
advantage of all the existing code for spidering, converting doc types,
etc.), and then a callback handler to take the resulting list of
words/metadata and store them with whatever backend the user chooses. I
am planning a KinoSearch backend for sure, since I am writing a Perl
interface to the C library as I go. Makes for much quicker devel time.

Which reminds me: while learning Inline::C for the Perl interface, I ran
across some of your posts, Marvin, on the Inline list. It seems like
you've opted to write your own Perl/C stuff with a special build script
instead of using Inline::C.

Care to explain why you went that direction? Just curious as to what
pitfalls you might have encountered, so that I can avoid them myself. :)

> I understand that Swish now powers search.cpan.org, correct? I noticed
> immediately when that change was made live -- much more relevant
> results, much more quickly.
>

I wasn't aware of Swish-e being used at that cpan url (which I use every
day). But now I see that news item here:
http://log.perl.org/2005/10/new_search_engi.html

That's the great thing about writing OSS: you cast your bread upon the
water and who knows where it floats. :)

pek

--
Peter Karman . http://peknet.com/ . peter@peknet.com

fun with kinosearch [ In reply to ]

marvin at rectangular

Jun 6, 2006, 10:46 AM

Post #9 of 17 (1635 views)

Permalink

On Jun 5, 2006, at 11:41 AM, Peter Karman wrote:

> Which reminds me: while learning Inline::C for the Perl interface,
> I ran across some of your posts, Marvin, on the Inline list. It
> seems like you've opted to write your own Perl/C stuff with a
> special build script instead of using Inline::C.
>
> Care to explain why you went that direction? Just curious as to
> what pitfalls you might have encountered, so that I can avoid them
> myself. :)

There are two reasons.

First, I didn't want to introduce Inline::C as a dependency. Not
everyone can use it. I wanted dependencies held down to a bare
minimum. Etc.

Second, Inline::C doesn't work very well when you want to do
complicated things. What Inline::C does is write XS for you. If you
need to do something that's not Inline::C's forte, you need to learn
XS anyway. It got me started learning perlapi, but I soon wanted
greater control than it was able to provide. Inline::C is like
training wheels for your XS bicycle.

KinoSearch's current build system took a little while to set up, but
for the most part it's not very tricky. Most of it is just basic
Perl text wrangling, the main point of which is to keep the XS and C
code in the same file as the relevant Perl code. It I just populated
the src directory with the .c and .h files, and concatenated all the
XS code in all the perl modules into one giant KinoSearch.xs file, a
lot of the code in Build.PL could go away.

The mildly tricky stuff is the auto-generation of the typemap, and
the way it keeps track of which files have been modified (which
spares me from recompiling everything every time).

One thing I recommend in general is limiting XS to glue code, while
doing most of your work in pure C. XS is kind of clunky, and it's
easier to see what's going on if you separate all the stuff needed
for moving across the Perl/C boundary from the code that does "real"
work.

Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

fun with kinosearch [ In reply to ]

bpphillips+ml at gmail

Jun 12, 2006, 6:31 AM

Post #10 of 17 (1640 views)

Permalink

Hey Marvin,
I'm curious if there's any ETA for the "Sort" functionality you mentioned as
being on the top of your list.

Thanks,
Brian

On 5/30/06, Marvin Humphrey <marvin@rectangular.com> wrote:
>
>
> As in Lucene, KinoSearch's MultiReader reads multi-segment indexes,
> not multiple indexes. MultiSearcher has not been ported, and it's
> not at the top of my to-do list (FieldCache and Sort are).
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch@rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.rectangular.com/pipermail/kinosearch/attachments/20060612/218fff3e/attachment.htm

fun with kinosearch [ In reply to ]

marvin at rectangular

Jun 13, 2006, 8:56 AM

Post #11 of 17 (1635 views)

Permalink

On Jun 12, 2006, at 6:14 AM, Brian Phillips wrote:

> I'm curious if there's any ETA for the "Sort" functionality you
> mentioned as being on the top of your list.

Honestly, Brian, I'm not sure. Development on Lucy is about to
commence in earnest, OSCON's coming up, and my main clients have some
pretty intense work for me right now. FieldCache I have to have
going by the end of this month, 'cause that feature's being
sponsored. The Sort functionality probably isn't that much more
work on top of that, but I haven't looked into it that closely.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

fun with kinosearch [ In reply to ]

jhfoo-ml at extracktor

Sep 5, 2006, 11:44 PM

Post #12 of 17 (1644 views)

Permalink

Saw this posting and thought I'd jump in for a few words...

I'm using swish-e for my indexing purposes. Just want to thank you and
your teamates for the perl library abstraction. Integrating swish-e was
a snap!

Any hint on when the next release will be out?

Peter Karman wrote:
> Thanks, Eric, for posting this good example.
>
> I'm glad you mentioned Swish-e in your notes. I have been following
> the KinoSearch development for several months. I've also been working
> on the next version of Swish-e, which will offer KinoSearch as one of
> a few possible backends (along with Xapian and some others).
>
> Thanks, Marvin, for all your good work. It's always a pleasure to find
> OSS that's so well documented and thought out.
>
> cheers,
> pek
>
> Eric Lease Morgan scribbled on 5/30/06 10:45 AM:
>>
>> I have been having fun with KinoSearch, and I have documented my
>> experiences here:
>>
>> http://dewey.library.nd.edu/morgan/kinosearch/
>>
>> FYI.
>>
>> --Eric Lease Morgan
>> Head, Digital Access and Information Architecture Department
>> University Libraries of Notre Dame
>>
>> (574) 631-8604
>>
>>
>>
>>
>>
>> _______________________________________________
>> KinoSearch mailing list
>> KinoSearch@rectangular.com
>> http://www.rectangular.com/mailman/listinfo/kinosearch
>>
>

fun with kinosearch [ In reply to ]

henka at cityweb

Sep 6, 2006, 11:43 PM

Post #13 of 17 (1638 views)

Permalink

>> Thanks, Marvin, for all your good work. It's always a pleasure to find
>> OSS that's so well documented and thought out.

...and from a commercial perspective, an OSS project that actually
responds professionally and timeously to bug fixes and requests.

An absolute pleasure working with Marvin.

fun with kinosearch [ In reply to ]

david at kineticode

Sep 7, 2006, 9:06 AM

Post #14 of 17 (1638 views)

Permalink

On Sep 6, 2006, at 23:31, henka@cityweb.co.za wrote:

> An absolute pleasure working with Marvin.

So give this man work!

Cheers,

David

fun with kinosearch [ In reply to ]

marvin at rectangular

Sep 7, 2006, 12:38 PM

Post #15 of 17 (1634 views)

Permalink

On Sep 6, 2006, at 11:31 PM, henka@cityweb.co.za wrote:

> ...and from a commercial perspective, an OSS project that actually
> responds professionally and timeously to bug fixes and requests.
>
> An absolute pleasure working with Marvin.

You are kind to say so. Thank you.

Marvin Humphrey

--
I'm looking for a part time job.

fun with kinosearch [ In reply to ]

marvin at rectangular

Sep 7, 2006, 12:55 PM

Post #16 of 17 (1640 views)

Permalink

On Sep 7, 2006, at 8:55 AM, David E. Wheeler wrote:

> So give this man work!

I've gotten some nice contract work out of this forum. We'll see how
things go -- I'm not looking very hard right now. And Nicholas
Clark's experience speaks to the effectiveness of job-hunting via
email signature: "And maybe I'm being a bit arrogant here in some of
my assumptions, but even when I was looking for a job a few months
ago I was surprised and disappointed that no-one in London seemed to
be interested in hiring the current pumpking." <http://london.pm.org/
pipermail/london.pm/Week-of-Mon-20060731/003436.html>. Nevertheless,
I figure this is an easy way to put out some feelers.

Marvin Humphrey

--
I'm looking for a part time job.

fun with kinosearch [ In reply to ]

henka at cityweb

Sep 7, 2006, 10:52 PM

Post #17 of 17 (1638 views)

Permalink

> Nevertheless,
> I figure this is an easy way to put out some feelers.

It's your call Marvin, but I recommend that you place a "donate to this
project" paypal button at www.rectangular.com.

Open source desperately needs to be sponsored wherever possible - it
improves quality, service levels, etc; however, open source projects also
need to be more commercial in their approach to marketing and sponsorship
endeavours.

just my 2c (...and after conversion due to exchange rates, that's 0.003
US$ cents) ;-)