I'm trying to do a paged result set which is sorted by date. For
paging I'm using the ->seek method (or the offset and num_wanted
args to search, which comes to the same thing) and for sorting
I'm using a sort spec.
Unfortunately what it looks like is that the seek is happening
before the sort - which I guess makes sense from an implementation
point of view, but isn't expected behaviour from a user's point of
view.
To show how unexpected, here is some code which retrieves a given
number of results, then (hopefully) displays the earliest hit in
the dataset:
use Glob;
my $sort = KinoSearch::Search::SortSpec->new();
$sort->add( field => "date");
my $hits = Glob->searcher->search(
query => "tag:theology",
offset => 0,
num_wanted => shift(@ARGV),
sort_spec => $sort
);
my $hit = $hits->fetch_hit_hashref;
print $hit->{id}, " ", $hit->{date}, "\n";
Here's the earliest result when we're retrieving 10:
% perl test.pl 10
523 2005-10-05
And the earliest when we're retrieving 30:
% perl test.pl 30
836 2004-11-29
Now it is of course the earliest of the 10 hits, and the
earliest of the 30 hits, and of course these are different,
but what I really wanted was the earliest of the 10 earliest
hits and earliest of the 30 earliest hits and these ought to
be the same!
I don't know if this is a bug. It's a bug if setting a sort_spec
is expected to sort the document collection, but if it is
expected merely to sort the result set, then it's just a major
annoyance; either way it looks like I have to retrieve all the
hits and them sort them myself.
Is there a better way?
Simon
paging I'm using the ->seek method (or the offset and num_wanted
args to search, which comes to the same thing) and for sorting
I'm using a sort spec.
Unfortunately what it looks like is that the seek is happening
before the sort - which I guess makes sense from an implementation
point of view, but isn't expected behaviour from a user's point of
view.
To show how unexpected, here is some code which retrieves a given
number of results, then (hopefully) displays the earliest hit in
the dataset:
use Glob;
my $sort = KinoSearch::Search::SortSpec->new();
$sort->add( field => "date");
my $hits = Glob->searcher->search(
query => "tag:theology",
offset => 0,
num_wanted => shift(@ARGV),
sort_spec => $sort
);
my $hit = $hits->fetch_hit_hashref;
print $hit->{id}, " ", $hit->{date}, "\n";
Here's the earliest result when we're retrieving 10:
% perl test.pl 10
523 2005-10-05
And the earliest when we're retrieving 30:
% perl test.pl 30
836 2004-11-29
Now it is of course the earliest of the 10 hits, and the
earliest of the 30 hits, and of course these are different,
but what I really wanted was the earliest of the 10 earliest
hits and earliest of the 30 earliest hits and these ought to
be the same!
I don't know if this is a bug. It's a bug if setting a sort_spec
is expected to sort the document collection, but if it is
expected merely to sort the result set, then it's just a major
annoyance; either way it looks like I have to retrieve all the
hits and them sort them myself.
Is there a better way?
Simon