Mailing List Archive

remove Hits->seek() [patch]
Here's a poorly tested first pass at a patch to remove Hits->seek().

Is this the direction you were anticipating? If not, I can happily
try other approaches. As it is, it changes Seachable->search() to
call self->TopDocs, and then passes to resulting TopDocs to the new
Hits object at construction. I kept the ability to reset the fetch
iterator, but now a new search is required to extend the results.
Despite being a tiny patch, there are numerous things that could have
been done differently, so feel free to point towards the way you would
prefer.

It seems to pass the test suite, but I haven't done any live testing
beyond the sample dir. Patch is 'svn diff lib t' from within the
kinosearch/perl directory. I can clean it up and change it as you
desire after your feedback.

Thanks!

Nathan Kurz
nate@verse.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hits_seek.patch
Type: application/octet-stream
Size: 9215 bytes
Desc: not available
Url : http://www.rectangular.com/pipermail/kinosearch/attachments/20070713/6a702a6c/hits_seek.obj
remove Hits->seek() [patch] [ In reply to ]
On Jul 13, 2007, at 2:09 AM, Nathan Kurz wrote:

> Is this the direction you were anticipating?

Yep!

Thanks, applied as repository revision 2490.

> Despite being a tiny patch, there are numerous things that could have
> been done differently, so feel free to point towards the way you would
> prefer.

Anything I might have done differently is superficial.

The important bit is that search-time control flow is now less
convoluted, which should make the code base both easier to spelunk
and easier to develop. Thank you for having the insight to find this
simplification.

> It seems to pass the test suite, but I haven't done any live testing
> beyond the sample dir.

No other testing is necessary. There weren't any calls to Hits->seek
outside of lib/ and t/. I'll add a changelog entry later for the users.

> Patch is 'svn diff lib t' from within the
> kinosearch/perl directory. I can clean it up and change it as you
> desire after your feedback.

My only feedback is that this is a well-constructed patch which
demonstrates your familiarity with the material and upholds the
necessary internal conventions. Nicely done.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
remove Hits->seek() [patch] [ In reply to ]
On 7/13/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> The important bit is that search-time control flow is now less
> convoluted, which should make the code base both easier to spelunk
> and easier to develop.

My one concern about the way I did it (and this is essentially an
artifact from the previous implementation) is that if a search offset
is used, the Hits still contains all the results but just sets its
internal offset (nee tick) to the correct starting position.

Thus this is potentially a trap, since if someone tries uses a
search() offset and then resets the Hits offset to zero to start over,
they won't be getting the same Hit they got the first time. I wasn't
sure if this merited a documentation note.

The alternative would be to build score_docs taking the search offset
into account, so that the earlier results are not even present. For
large offsets this might be helpful, especially when combined with
SearchClient. But it would change behaviour slightly.

> My only feedback is that this is a well-constructed patch which
> demonstrates your familiarity with the material and upholds the
> necessary internal conventions. Nicely done.

Thanks for the compliment. What should I tackle next?

Nathan Kurz
nate@verse.com
remove Hits->seek() [patch] [ In reply to ]
On Jul 13, 2007, at 11:49 AM, Nathan Kurz wrote:

> if someone tries uses a
> search() offset and then resets the Hits offset to zero to start over,
> they won't be getting the same Hit they got the first time.

Not a problem. Because there is no public API for resetting the
offset, this is an implementation detail. We can change the
internals later if we decide to add that public functionality.

> What should I tackle next?

I've got a ToDo list on the wiki:

http://www.rectangular.com/kinosearch/wiki/ToDoList

Pick whatever you'd like except for Windows compat, which I'm working
on.

I'd guess that the task you'd enjoy most would be this one (: it's
nice and "reductive" :)...

Remove position-generating code in Scorer subclasses and Tally,
now that the decision has been made not to implement the
position-aware BooleanScorer in core.

To elaborate: we'll keep Tally as an extensible container for scoring
information, but the core Scorer subclasses will only use it for
score and num_matchers.

* The ScoreProx class can be eliminated entirely.

* TermScorer:
o Eliminate the sprox member var.
o Deal with the fallout of eliminating sprox in TermScorer's
subclasses.

* PhraseScorer:
o Eliminate the sprox member var.
o Kill off the static function build_prox().

* ANDScorer:
o Eliminate the vestigial raw_prox_bb member var.

* Tally:
o Eliminate the the sprox_cap, num_sproxen and sproxen member vars.
o Eliminate the Tally_Add_Sprox and Tally_Zap_Sproxen methods.
o Kill Tally_Destroy, since inheriting from Obj_Destroy is fine
once we eliminate the sproxen member var.

That sounds like a lot, but it's not. Basically, "svn remove"
ScoreProx.h and ScoreProx.c, then keep deleting other stuff until the
library compiles again, at which point I think all tests will pass.
Lemme know if you get a successful compile and tests are still
failing. (Watch out for the Build.PL bug that Edward Betts just
encountered -- use "./Build distclean" liberally.)

While you're deleting all that stuff, you might glance over it.
It'll probably find its way back into position-aware scorers that
either you write or I write. (FYI, the "rich positions" stuff will
still work, since it depends on the behavior of RichPosting, not
ScoreProx and Tally.)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/