|
Text::Query::Simple - Match text against simple query expression and return relevance value for ranking |
Text::Query::Simple - Match text against simple query expression and return relevance value for ranking
use Text::Query::Simple;
# Constructor
$query = Text::Query::Simple->new([QSTRING] [OPTIONS]);
# Methods
$query->prepare(QSTRING [OPTIONS]);
$query->match([TARGET]);
$query->matchscalar([TARGET]);
This module provides an object that tests a string or list of strings against a query expression similar to an AltaVista ``simple query'' and returns a ``relevance value.'' Elements of the query expression may be regular expressions or literal text, and may be assigned weights.
Query expressions are compiled into an internal form when a new object is
created or the prepare method is called; they are not recompiled on each
match.
Query expressions consist of words (sequences of non-whitespace), regexps
or phrases (quoted strings) separated by whitespace. Words or phrases
prefixed with a + must be present for the expression to match; words or
phrases prefixed with a - must be absent for the expression to match.
A successful match returns a count of the number of times any of the words
(except ones prefixed with -) appeared in the text. This type of result
is useful for ranking documents according to relevance.
Words or phrases may optionally be followed by a number in parentheses (no whitespace is allowed between the word or phrase and the parenthesized number). This number specifies the weight given to the word or phrase; it will be added to the count each time the word or phrase appears in the text. If a weight is not given, a weight of 1 is assumed.
use Text::Query::Simple;
my $q=new Text::Query::Simple('+hello world');
die "bad query expression" if not defined $q;
$count=$q->match;
...
$q->prepare('goodbye adios -"ta ta",-litspace=>1);
#requires single space between the two ta's
if ($q->match($line,-case=>1)) {
#doesn't match "Goodbye"
...
$q->prepare('\\bintegrate\\b',-regexp=>1);
#won't match "disintegrated"
...
$q->prepare('information(2) retrieval');
#information has twice the weight of retrieval
QSTRING is given it will be compiled to internal form.
OPTIONS are passed in a hash like fashion, using key and value pairs.
Possible options are:
-case - If true, do case-sensitive match.
-litspace - If true, match spaces (except between operators) in
QSTRING literally. If false, match spaces as \s+.
-regexp - If true, treat patterns in QSTRING as regular expressions
rather than literal text.
-whole - If true, match whole words only, not substrings of words.
The constructor will return undef if a QSTRING was supplied and had
illegal syntax.
QSTRING to internal form and sets any
options (same as in the constructor). prepare may be used to change
the query expression and options for an existing query object. If
OPTIONS are omitted, any options set by a previous call to the
constructor or prepare remain in effect.
This method returns a reference to the query object if the syntax of the
expression was legal, or undef if not.
TARGET is a scalar, match returns the number of words in the
string specified by TARGET that match the query object's query
expression. If TARGET is not given, the match is made against $_.
If TARGET is an array, match returns a list of references to
anonymous arrays consisting of each element followed by its match count.
The list is sorted in descending order by match count. If the elements of
TARGET were anonymous arrays, the match count is appended to each
element. This allows arbitrary information (such as a filename) to be
associated with each element.
If TARGET is a reference to an array, match returns a reference to
a sorted list of matching items, with counts, for all elements.
MATCH when TARGET is a scalar or is not given.
Slightly faster than MATCH under these circumstances.
This module requires Perl 5.005 or higher due to the use of evaluated expressions in regexes
Eric Bohlman (ebohlman@netcom.com)
The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords.
Copyright (c) 1998 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself. =cut
|
Text::Query::Simple - Match text against simple query expression and return relevance value for ranking |