Mailing List Archive

State of multisearcher/sorting in svn
Hello Marvin,

I've been diligently reading (in some cases glassily, so I may have missed
something important:) the subversion commits and noticed:

Revision: 3469
Modified:
trunk/c_src/KinoSearch/Search/SortSpec.bp
trunk/c_src/KinoSearch/Search/SortSpec.c
trunk/perl/lib/KinoSearch/Search/MultiSearcher.pm
trunk/perl/lib/KinoSearch/Search/SortSpec.pm
Log:
Port the rest of SortSpec to C.

I've lost track a bit: can you provide a description of the current
status of multisearch/sorting (as of latest svn commit)? I vaguely recall
that the two (multisearch/sort) were on your todo list at some point.

Regards (and keep up the amazing work!)
Henry



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: State of multisearcher/sorting in svn [ In reply to ]
On Jun 10, 2008, at 1:23 AM, Henry wrote:

> I've been diligently reading (in some cases glassily, so I may have
> missed
> something important:) the subversion commits and noticed:

> Log:
> Port the rest of SortSpec to C.

There weren't any meaningful functional changes in that commit. It
was just another step in the process of porting the modules, so that
KS can run from C and be bound to other languages.

> can you provide a description of the current
> status of multisearch/sorting (as of latest svn commit)? I vaguely
> recall
> that the two (multisearch/sort) were on your todo list at some point.

There's a working implementation, but it's disabled by default and
requires an undocumented call to enable it.

KinoSearch::Search::MultiSearcher->set_enable_sorting(1);

It's that way because I basically want only people who are subscribed
to this list to be able to use that feature.

Sorting at the single machine level works pretty well. The "sort
cache" which is maintained for each sortable field, is actually an
array of 32-bit integers, one for each document, which indicates the
document's rank in a list sorted on that field. When a sorted search
is requested, these rank numbers are compared, rather than the
original field values. It's very fast, and the memory footprint to
maintain the cache, while substantial, is smaller because we only need
32-bit integers rather than the original strings.

Unfortunately, that model breaks down at the multi-machine level
because the rank numbers are no longer comparable. That means that
once we have the top hits for each node, we have to retrieve the
original string values, send them across the network, and sort at the
master node.

The infrastructure required to pull that trick off is quite
elaborate. It took a long time to write, and I'm concerned that by
dint of its sheer size that there are bugs lurking. In particular, I
don't like the implementation of MultiLexicon. I wish there was a
better way.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: State of multisearcher/sorting in svn [ In reply to ]
Apologies for delay in responding - my email forwarding from public
spam-magnet to private email was b0rk3n.

> On Jun 10, 2008, at 1:23 AM, Henry wrote:
> There's a working implementation, but it's disabled by default and
> requires an undocumented call to enable it.
>
> KinoSearch::Search::MultiSearcher->set_enable_sorting(1);
>
> It's that way because I basically want only people who are subscribed
> to this list to be able to use that feature.

Ah yes, I recall the above function; I was just curios as to whether
anything had changed.

> The infrastructure required to pull that trick off is quite
> elaborate. It took a long time to write, and I'm concerned that by
> dint of its sheer size that there are bugs lurking. In particular, I
> don't like the implementation of MultiLexicon. I wish there was a
> better way.

Kudos for pulling it off! As Yoda would say, a Perl Monk of elevated
eliteness you are, creamygoodness.

I hope to start flexing this aspect of KS soon and will provide feedback
then.

Cheers
Henry

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch