Mailing List Archive

Module Wais 2.1 available
For all WWW-WAIS gateway implementors:

The Perl module Wais 2.1 is available real soon now at your
favourite CPAN site.

I append the documentation for convenience.

... Yes there is documentation now ;-)

Randal: Would you as the author of chat2 replace the 'do' by a '&'? It
would make the tests look prettier:


t/basic.............Use of "do" to call subroutines is deprecated at /usr/local/ls6/perl5.001n/lib/perl5/chat2.pl line 265.
ok
t/dict..............Use of "do" to call subroutines is deprecated at /usr/local/ls6/perl5.001n/lib/perl5/chat2.pl line 265.
ok
t/parallel..........Use of "do" to call subroutines is deprecated at /usr/local/ls6/perl5.001n/lib/perl5/chat2.pl line 265.
ok
All tests successful.
Files=3, Tests=17, 34 secs ( 1.83 cusr 0.68 csys = 2.52 cpu)


Currently I also do this in Wais.pm to avoid warnings:-(

# make strict happy

@we_know = ($chat::name, $chat::debug, $chat::aliases,
$chat::family, $chat::nfound, $chat::thisbuf,
$chat::thishost, $chat::timeleft);
@we_know = ();

The module is tested with 5.001n and 5.002b1f.

--
@J = split //,"J!k Phau^eHeens%rarrot&\ncl t ";
for(0..24){print $J[$_*7%($#J+1)]}
------------------------------------------------------------------------
NAME
Wais - access to freeWAIS-sf libraries

SYNOPSIS
use Wais;

DESCRIPTION
The interface is divided in four major parts.

SFgate 4.0
For backward compatibility the functions used in
SFgate up to version 4 are still present. Their
use is deprecated and they are not documented
here. These functions may no be supported in
following versions of this module.

Protocol XS functions which provide a low-level access to
the WAIS protocol. E.g. generate_search_apdu()
constructs a request message.

SFgate 5.0
Perl functions that implement high-level access
to WAIS servers. E.g. parallel searching is
supported.

dictionary
A bunch of XS functions useful for inspecting
local databases.

We will start with the SFgate 5.0 functions.

USAGE
The main high-level interface are the functions
Wais::Search and Wais::Retrieve. Both return a reference
to an object of the class Wais::Result.

Wais::Search

Arguments of Wais::Search are hash references, one for
each database to search. The keys of the hashes should be:

query The query to submit.

database The database which should be searched.

host host is optional. It defaults to 'localhost'.

port port is optional. It defaults to 210.

tag A tag by which individual results can be
associated to a database/host/port triple. If
omitted defaults to the database name.

relevant If present must be a reference to an array
containing alternating document id's and types.
Document id's must be of type Wais:Docid.

Here is a complete example:

$result = Wais::Search({'query' => 'pfeifer',
'database' => $db1,
'host' => 'ls6',
'relevant' => [$id, 'TEXT']},
{'query' => 'pfeifer',
'database' => $db2});

If host is 'localhost' and database.src exists, local
search is performed instead of connecting a server.

Wais::Search will open $Wais::maxnumfd connections in
parallel at most.

Wais::Retrieve

Wais::Retrieve should be called with named parameters
(i.e. a hash). Valid parameters are database, host, port,
docid, and type.

$result = Wais::Retrieve('database' => $db,
'docid' => $id,
'host' => 'ls6',
'type' => 'TEXT');

Defaults are the same as for Wais::Search. In addition
type defaults to 'TEXT'.

Wais:Result

The functions Wais::Search and Wais::Retrieve return
references to objects blessed into Wais:Result. The
following methods are available:

diagnostics
Returns and array of diagnostic messages. Each
element (if any) is a reference to an array
consisting of

tag The tag of the corresponding search request
or 'document' if the request was a retrieve
request.

code The WAIS diagnostic code.

message A textual diagnostic message.

header Returns and array of WAIS document headers. Each
element (if any) is a reference to an array
consisting of

tag The tag of the corresponding search request
or 'document' if the request was a retrieve
request.

score

lines Length of the corresponding dcoument in
lines.

length Length of the corresponding document in
bytes.

headline

types A reference to an array of types valid for
docid.

docid A reference to the WAIS identifier blessed
into Wais::Docid.

text Returns the text fetched by Wais::Retrieve.

Dictionary
There are a couple of functions to inspect local
databases. See the inspect script in the distribution. You
need the Curses module to run it. Also adapt the directory
settings in the top part.

Wais::dictionary

%frequency = Wais::dictionary($database);
%frequency = Wais::dictionary($database, $field);
%frequency = Wais::dictionary($database, 'foo*');
%frequency = Wais::dictionary($database, $field, 'foo*');

The function returns an array containing alternating the
matching words in the global or field dictionary matching
the prefix if given and the freqence of the preceding
word. In a sclar context, the number of matching word is
returned.

Wais::list_offset

The function takes the same arguments as Wais::dictionary.
It returns the same array rsp. wordcount with the word
frequencies replaced by the offset of the postinglist in
the inverted file.

Wais::postings

%postings = Wais::dictionary($database, 'foo');
%postings = Wais::dictionary($database, $field, 'foo');

Returns and an array containing alternating numeric
document id's and a reference to an array whichs first
element is the internal weight if the word with respect to
the document. The other elements are the word/character
positions of the occurances of the word in the document.
If freeWAIS-sf is compiled with -DPROXIMITY, word
positions are returned otherwise character postitions.

In an scalar context the number of occurances of the word
is returned.

Wais::headline

$headline = Wais::headline($database, $docid);

The function retrieves the headline (only the text!) of
the document numbered $docid.

Protocol
Wais::generate_search_apdu

$apdu = Wais::generate_search_apdu($query,$database);
$relevant = [$id1, 'TEXT', $id2, 'HTML'];
$apdu = Wais::generate_search_apdu($query,$database,$relevant);

Document id's must be of type WAIS::Docid as returned by
Wais::Result::header or Wais::Search::header.
$WAIS::maxdoc may be set to modify the number of documents
to retrieve.

Wais::generate_retrieval_apdu

$apdu = Wais::generate_retrieval_apdu($database, $docid, $type);
$apdu = Wais::generate_retrieval_apdu($database, $docid,
$type, $chunk);

Request to send the $chunk's chunk of the document whichs
id is $docid (must be of type WAIS::Docid). $chunk
defaults to 0. $Wais::CHARS_PER_PAGE may be set to
influence the chunk size.

Wais::local_answer

$answer = Wais::local_answer($apdu);

Answer the request by local search/retrieval. The message
header is stripped from the result for convenience (see
the code of Wais::Search rsp. documentaion of
Wais::Search::new below).

Wais::Search::new

$result = Wais::Search::new($message);

Turn the result message in an object of type Wais::Search.
The following methods are available: diagnostics, header,
and text. Result of the message is pretty the same as for
Wais::Result. Just the tags are missing.

diagnostics
Return an array of references to [$code,
$message]

header Return an array of references to [$score,
$lines, $length, $headline, $types, $docid].

text Returns the chunk of the document requested. For
documents larger than $Wais::CHARS_PER_PAGE more
than one request must be send.

Wais::Search::DESTROY

The objects will be destroyed by Perl.

VARIABLES
$Wais::version
Generated by: sprintf(buf, "Wais %3.1f%d",
VERSION, PATCHLEVEL);

$Wais:errmsg
Set to an verbose error message if something
went wrong. Most functions return undef on
failure after setting $Wais:errmsg.

$Wais::maxdoc
Maximum number of hits to return when searching.
Defaults to 40.

$Wais::CHARS_PER_PAGE
Maximum number of bytes to retrieve in a single
retrieve request. Wais:Retrieve sends multiple
requests if necessary to retrieve a document.
CHARS_PER_PAGE defaults to 4096.

$Wais::timeout
Number of seconds to wait for an answer from
remote servers. Defaults to 120.

$Wais::maxnumfd
Maximum number of file descriptors to use
simultaneously in Wais::Search.

BUGS
Wais::Search currently splits the request in groups of
$Wais::maxnumfd requests. Since some requests of the group
might be local and/or some might refer to the same
host/port, groups may not use all $Wais::maxnumfd possible
file descriptors. Therefore some performance my be lost
when more than $Wais::maxnumfd requests are processed.

AUTHOR
Ulrich Pfeifer <pfeifer@ls6.informatik.uni-dortmund.de>

--
Ulrich UNIVERSITAET-DORTMUND telefax: 49 231 755 2405 /////
Pfeifer Lehrstuhl Informatik VI voice: 49 231 755 3032 ____UNI DO
@RR D-44221 Dortmund postbox: 50 05 00 \\*\\////
http://ls6-www.informatik.uni-dortmund.de/WhoIsWhoAtLS6.html \\\\\//