Text::Query::Simple - Match text against simple query expression and return relevance value for ranking |
Text::Query::Simple - Match text against simple query expression and return relevance value for ranking
use Text::Query::Simple;
# Constructor $query = Text::Query::Simple->new([QSTRING] [OPTIONS]);
# Methods $query->prepare(QSTRING [OPTIONS]); $query->match([TARGET]); $query->matchscalar([TARGET]);
This module provides an object that tests a string or list of strings against a query expression similar to an AltaVista ``simple query'' and returns a ``relevance value.'' Elements of the query expression may be regular expressions or literal text, and may be assigned weights.
Query expressions are compiled into an internal form when a new object is
created or the prepare
method is called; they are not recompiled on each
match.
Query expressions consist of words (sequences of non-whitespace), regexps
or phrases (quoted strings) separated by whitespace. Words or phrases
prefixed with a +
must be present for the expression to match; words or
phrases prefixed with a -
must be absent for the expression to match.
A successful match returns a count of the number of times any of the words
(except ones prefixed with -
) appeared in the text. This type of result
is useful for ranking documents according to relevance.
Words or phrases may optionally be followed by a number in parentheses (no whitespace is allowed between the word or phrase and the parenthesized number). This number specifies the weight given to the word or phrase; it will be added to the count each time the word or phrase appears in the text. If a weight is not given, a weight of 1 is assumed.
use Text::Query::Simple; my $q=new Text::Query::Simple('+hello world'); die "bad query expression" if not defined $q; $count=$q->match; ... $q->prepare('goodbye adios -"ta ta",-litspace=>1); #requires single space between the two ta's if ($q->match($line,-case=>1)) { #doesn't match "Goodbye" ... $q->prepare('\\bintegrate\\b',-regexp=>1); #won't match "disintegrated" ... $q->prepare('information(2) retrieval'); #information has twice the weight of retrieval
QSTRING
is given it will be compiled to internal form.
OPTIONS
are passed in a hash like fashion, using key and value pairs.
Possible options are:
-case - If true, do case-sensitive match.
-litspace - If true, match spaces (except between operators) in
QSTRING
literally. If false, match spaces as \s+
.
-regexp - If true, treat patterns in QSTRING
as regular expressions
rather than literal text.
-whole - If true, match whole words only, not substrings of words.
The constructor will return undef
if a QSTRING
was supplied and had
illegal syntax.
QSTRING
to internal form and sets any
options (same as in the constructor). prepare
may be used to change
the query expression and options for an existing query object. If
OPTIONS
are omitted, any options set by a previous call to the
constructor or prepare
remain in effect.
This method returns a reference to the query object if the syntax of the
expression was legal, or undef
if not.
TARGET
is a scalar, match
returns the number of words in the
string specified by TARGET
that match the query object's query
expression. If TARGET
is not given, the match is made against $_
.
If TARGET
is an array, match
returns a list of references to
anonymous arrays consisting of each element followed by its match count.
The list is sorted in descending order by match count. If the elements of
TARGET
were anonymous arrays, the match count is appended to each
element. This allows arbitrary information (such as a filename) to be
associated with each element.
If TARGET
is a reference to an array, match
returns a reference to
a sorted list of matching items, with counts, for all elements.
MATCH
when TARGET
is a scalar or is not given.
Slightly faster than MATCH
under these circumstances.
This module requires Perl 5.005 or higher due to the use of evaluated expressions in regexes
Eric Bohlman (ebohlman@netcom.com)
The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords.
Copyright (c) 1998 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself. =cut
Text::Query::Simple - Match text against simple query expression and return relevance value for ranking |