Mailing List Archive

r3793 - in trunk/perl: . lib/KinoSearch/Docs/Cookbook
Author: creamyg
Date: 2008-08-29 13:32:22 -0700 (Fri, 29 Aug 2008)
New Revision: 3793

Added:
trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod
Modified:
trunk/perl/MANIFEST
Log:
Add KinoSearch::Docs::Cookbook::CachedSearcher.


Modified: trunk/perl/MANIFEST
===================================================================
--- trunk/perl/MANIFEST 2008-08-29 20:19:31 UTC (rev 3792)
+++ trunk/perl/MANIFEST 2008-08-29 20:32:22 UTC (rev 3793)
@@ -62,6 +62,7 @@
lib/KinoSearch/Doc.pm
lib/KinoSearch/Doc/HitDoc.pm
lib/KinoSearch/Docs/Cookbook.pod
+lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod
lib/KinoSearch/Docs/Cookbook/CustomQuery.pod
lib/KinoSearch/Docs/Cookbook/CustomQueryParser.pod
lib/KinoSearch/Docs/DocNums.pod

Added: trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod
===================================================================
--- trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod (rev 0)
+++ trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod 2008-08-29 20:32:22 UTC (rev 3793)
@@ -0,0 +1,92 @@
+=head1 NAME
+
+KinoSearch::Docs::Cookbook::CachedSearcher - Improve search-time
+responsiveness with a cached Searcher.
+
+=head1 ABSTRACT
+
+At the core of every Searcher object is an IndexReader, and when an
+IndexReader object is created, a small portion of the InvIndex is loaded into
+memory. Additional caches are filled as relevant queries arrive.
+
+For small document collections on lightly-loaded servers, the time to warm up
+the Searcher/Reader isn't worth worrying about. For large document
+collections or busy servers, the warmup time may become significant, in which
+case reusing the Searcher is likely to speed up your application.
+
+=head1 FastCGI
+
+A script running under standard CGI runs once per request. In contrast, a
+script running on FastCGI webserver using the CGI::Fast module from CPAN
+starts upon the first request then executes a loop once per request.
+
+Create your Searcher outside this loop:
+
+ my $searcher = KinoSearch::Searcher->new(
+ invindex => MySchema->read('/path/to/invindex/')
+ );
+ while ( my $cgi = CGI::Fast->new ) {
+ my $hits = $searcher->search( query => $cgi->param('q') || '' );
+ ...
+ }
+
+=head2 mod_perl
+
+Under mod_perl, the Searcher can be stored in a module loaded by startup.pl.
+
+ package CachedSearcher;
+
+ my $searcher;
+
+ sub obtain {
+ $searcher ||= KinoSearch::Searcher->new(
+ invindex => MySchema->read('/path/to/invindex/')
+ );
+ return $searcher;
+ }
+
+ sub refresh {
+ undef $searcher;
+ return get_searcher();
+ }
+
+ # Load at startup rather than wait for first request.
+ obtain();
+
+Individual search processes call CachedSearcher->obtain rather than
+create their own Searcher object. If an index gets updated, a special http
+request can be made which triggers a call to CachedSearcher->refresh.
+
+=head1 Benchmarks
+
+Using Benchmark::Stopwatch to measure a lightly-modified version of the sample
+search.cgi app, we get the following results for a query for "congress" under
+standard CGI...
+
+ NAME TIME CUMULATIVE PERCENTAGE
+ load modules 0.121 0.121 73.754%
+ init searcher 0.004 0.125 2.626%
+ process search 0.032 0.158 19.735%
+ fetch hits 0.006 0.164 3.877%
+ _stop_ 0.000 0.164 0.008%
+
+... and these results under CGI::Fast:
+
+ NAME TIME CUMULATIVE PERCENTAGE
+ process search 0.002 0.002 24.213%
+ fetch hits 0.006 0.008 75.602%
+ _stop_ 0.000 0.008 0.186%
+
+As the numbers indicate, for a simple term query, the time to initialize the
+Searcher overwhelms the time to execute the search and return results.
+
+=head1 COPYRIGHT
+
+Copyright 2008 Marvin Humphrey
+
+=head1 LICENSE, DISCLAIMER, BUGS, etc.
+
+See L<KinoSearch> version 0.20.
+
+=cut
+


_______________________________________________
kinosearch-commits mailing list
kinosearch-commits@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch-commits