On Jun 7, 2007, at 1:55 PM, Edward Betts wrote:
> I'd like to be able to highlight the matches in a field without
> creating an excerpt from it.
At least one other person has made the same feature request (<http://
rt.cpan.org/Ticket/Display.html?id=25400>).
The revised Highlighter API introduced in 0.20_03 is intended to
facilitate such features. You can even process the same field
multiple times if you want.
$highlighter->add_spec(
field => 'content',
name => 'less'
excerpt_length => 50,
);
$highlighter->add_spec(
field => 'content',
name => 'more'
excerpt_length => 2000,
);
...
print "$hit->{excerpts}{less}\n";
...
print "$hit->{excerpts}{more}\n";
As I mentioned in my reply to that bug report, you can sort of fake
up a non-excerpted excerpt by making excerpt_length a large number.
However, as was pointed out to me, Highlighter will tack on an
ellipsis unless the field ends with a period.
That's a bug that needs fixin'. Highlighter should not tack on an
ellipsis if the end of the excerpt coincides with the end of the
field value.
> A typical use case would be for
> highlighting in titles, like Google does.
Another use would be highlighting within URLs, something Google also
does.
> I would have a go at implementing it, but I'm not sure how best to fit
> it into the class hierarchy, and where to put the result in the data
> structure returned by fetch_hit_hashref.
It should still go under $hit->{excerpts}.
I think there's a bit of a disconnect because of the name of that
hash key and the name of the Hits method, "create_excerpts". Those
names sort-of imply that you can't use the Highlighter without
excerpting. Maybe that Hits method should be named "set_highlighter"
instead, though having the word "highlighter" in there sort-of
implies the opposite -- that you can't create excerpts without
highlighting -- which is just as misleading.
In any case, there should be a way to turn off excerpting via the
Highlighter->add_spec API. I think the best way to do that is to add
a extract_excerpt parameter to add_spec():
$highlighter->add_spec(
field => 'title',
extract_excerpt => 0, # default 1
);
Another possibility would be to treat an explicit undef supplied to
excerpt_length as an indication that no excerpting should be
performed, but I think people reading the docs wouldn't find that as
easily.
Do you feel like taking this on?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/