Mailing List Archive

API for subclassing Scorer (was adding a proximity scorer)
On Jun 16, 2007, at 12:05 AM, Nathan Kurz wrote:

>> Ideally, our discussion will result in an improvement upon that
>> scheme that will allow you to write your ORScorer subclass without
>> touching BoilerPlater. Something like this:
>>
>> package MyORScorer;
>> use base qw( KinoSearch::Search::ORScorer );
>>
>> __PACKAGE__->register_c_method( tally => 'my_tally' );
>>
>> use Inline => C << 'END_C';
>>
>> kino_Tally*
>> my_tally(kino_OrScorer *self) {
>> /* ... */
>> }
>>
>> END_C
> That seems like a great goal. For now I'm happy writing C.

OK, check. But I want to make it easier for you to maintain your
Scorer subclass, and I want to make it easier for other people to
write them.

> Perhaps
> more useful for most people would be the ability to override a
> BoilerPlated C method with a Perl function, with it automatically
> wrapped in just enough C to push the args.

Yes, I absolutely agree we should do that.

In this particular case, adding Perl's function call overhead to
Scorer_Tally() would be a disaster for search-time performance,
because it's inner loop code. But that's not true everywhere, and
for rapid prototyping taking the performance hit would be acceptable.

> You aren't already doing this anywhere, are you?

No, but the prime candidate would be Similarity->length_norm.

> Personally, though, I'd probably rather see a greater split between
> the Perl and the C. I love them both individually, but I'd be more
> comfortable with a standard C library (libidf?) with a Perl wrapper
> and a clearly defined boundary.

This is clearly the direction that KS is headed.

... ...

Let's design the ideal API for subclassing Scorer, then work
backwards to implement it and see how close we can get.

* It should be possible to implement a Scorer class entirely in
Perl and have KS use it. (Schema and FieldSpec sort of work
this way.)
* It should be possible to override individual methods used by
a Scorer implemented in C with wrapped Perl subroutines.
* It should be possible to override individual methods used by
a Scorer implemented in C with C functions, as in the code
block at the top of this post. (This is fairly easy.)
* It should be possible to add additional Perl member variables
to a Scorer implemented in C.
* It should be possible to add additional C member variables to
a Scorer implemented in C.
* It _must_ be possible to upgrade KS without encountering binary
compatibility problems such as reordered vtables or object
structs.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
API for subclassing Scorer (was adding a proximity scorer) [ In reply to ]
On 6/16/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> > Personally, though, I'd probably rather see a greater split between
> > the Perl and the C. I love them both individually, but I'd be more
> > comfortable with a standard C library (libidf?) with a Perl wrapper
> > and a clearly defined boundary.
>
> This is clearly the direction that KS is headed.

From the outside, I'm not sure that this is clear. Currently, the C
code (which I take to be proto-Lucy) seems very intimately tied to the
KinoSearch (and Lucene, and presumably Ferret) class hierarchies, and
the boundaries between the layers seem pretty malleable. Not that this
is a direction you want to go, but I'd be more comfortable with a
standard procedural C (hard to override) library with bindings that
allow the object hierarchy to be created in Perl or whatever.

Without prejudice, I can see why you've taken the route you have, but
I'd hesitate to call it standard. I think a worthwhile question is
to ask whether an outsider considering implementing full-text-search
in another language would find it advantageous to link to your library
rather than implementing just the parts they felt they needed. For
example (in a direction I've considered) if I were designing a search
component using an Apache module done purely in C, would I link to
this?

> Let's design the ideal API for subclassing Scorer, then work
> backwards to implement it and see how close we can get.

Probably only semantics, but I'd start by defining the problem a
little differently: the goal is to allow someone to easily change the
way in which scoring happens. Subclassing the existing Scorer is one
way to do this, but making it the scoring procedure simple and clear
enough that they can implement their own Scorer should be a priority.
And making it possible to change the operation of an existing class
without subclassing is nice too. That said...

> * It should be possible to implement a Scorer class entirely in
> Perl and have KS use it. (Schema and FieldSpec sort of work
> this way.)

Yes, that would be useful. Even having examples of the code done in
Perl would be useful to make it easier to understand whats happening.
If the default KinoSearch could be those Perl examples selectively
overriden with C using the same mechanism that a user would use to
customize, that would be fantastic.

> * It should be possible to override individual methods used by
> a Scorer implemented in C with wrapped Perl subroutines.

This would be impressive. I'd agree this would be ideal, but I'd be
willing to make this a lower priority --- the kind of thing one
designs well enough to make possible in the future but doesn't
implement right now. Are there examples of this in other software
that could be used as a pattern?

> * It should be possible to override individual methods used by
> a Scorer implemented in C with C functions, as in the code
> block at the top of this post. (This is fairly easy.)

Yes, this seems like appropriate fruit. In addition to the inline
approach, I'd like to see it possible to load an external shared
library and use a method in that. If possible, I'd also like to see it
possible to override the method directly in the base class (or
perhaps one instance of it), rather than only in the subclass.

Currently, it's often difficult to get your subclass to be actually
used. Thus I'd also like the code to avoid hardcoded constructors,
and provide a similar override mechanism to call your custom subclass
constructor:
$Kinosearch::Search::BooleanScorer->override(newORScorer,

'MyORScorer_new')
Which is to say, hardcoded constructors should become class methods.

> * It should be possible to add additional Perl member variables
> to a Scorer implemented in C.
> * It should be possible to add additional C member variables to
> a Scorer implemented in C.

I can see why you are interested in an inside-out object model. I
wasn't familiar with it before you mentioned it. I can see why it's
appealing, but it's still too new for me to evaluate. At first
glance, this seems like it would be complex.

> * It _must_ be possible to upgrade KS without encountering binary
> compatibility problems such as reordered vtables or object
> structs.

I'm sure you've thought about this part much more than I have. Do you
mean that it must be possible to upgrade the Perl portion only while
leaving the C portion untouched? Or vice versa? Or both? (perhaps
this is obvious --- I'm getting tired)

Have a good night,

Nathan Kurz
nate@verse.com