Text::Query::Advanced - Match text against Boolean expression |
Text::Query::Advanced - Match text against Boolean expression
use Text::Query::Advanced;
# Constructor $query = Text::Query::Advanced->new([QSTRING] [OPTIONS]);
# Methods $query->prepare(QSTRING [OPTIONS]); $query->match([TARGET]); $query->matchscalar([TARGET]);
# Methods that can be overridden to produce custom query trees, etc.
$query->build_final_expression(Q1); $query->build_expression(Q1,Q2); $query->build_expression_finish(Q1); $query->build_conj(Q1,Q2,F); $query->build_near(Q1,Q2); $query->build_concat(Q1,Q2); $query->build_negation(Q1); $query->build_literal(Q1);
This module provides an object that matches a string or list of strings against a Boolean query expression similar to an AltaVista ``advanced query''. Elements of the query expression may be regular expressions or literal text.
Query expressions are compiled into an internal form (currently, a regular
expression making use of most of the tricks listed in Recipe 6.17 of _The
Perl Cookbook_) when a new object is created or the prepare
method is
called; they are not recompiled on each match.
The class provided by this module may be subclassed to produce query processors that match against input other than literal strings, e.g. indices.
Query expressions consist of literal strings (or regexps) joined by the following operators, in order of precedence from lowest to highest:
Operator names are not case-sensitive. Note that if you want to use a |
in a regexp, you need to backwhack it to keep it from being seen as a query
operator. Sub-expressions may be quoted in single or double quotes to
match ``and,'' ``or,'' or ``not'' literally and may be grouped in parentheses
((, )
) to alter the precedence of evaluation.
A parenthesized sub-expression may also be concatenated with other sub-
expressions to match sequences: (Perl or Python) interpreter
would match
either ``Perl interpreter'' or ``Python interpreter''. Concatenation has a
precedence higher than NOT but lower than AND. Juxtaposition of
simple words has the highest precedence of all.
use Text::Query::Advanced; my $q=new Text::Query::Advanced('hello and world'); die "bad query expression" if not defined $q; print if $q->match; ... $q->prepare('goodbye or adios or ta ta',-litspace=>1,-case=>1); #requires single space between the two ta's if ($q->match($line)) { #doesn't match "Goodbye" ... $q->prepare('"and" or "or"'); #quoting operators for literal match ... $q->prepare('\\bintegrate\\b',-regexp=>1); #won't match "disintegrated"
QSTRING
is
given it will be compiled to internal form.
OPTIONS
are passed in a hash like fashion, using key and value pairs.
Possible options are:
-case - If true, do case-sensitive match.
-litspace - If true, match spaces (except between operators) in
QSTRING
literally. If false, match spaces as \s+
.
-near - Sets the number of words that can occur between two expressions and still satisfy the NEAR operator. The default is 10.
-regexp - If true, treat patterns in QSTRING
as regular expressions
rather than literal text.
-whole - If true, match whole words only, not substrings of words.
The constructor will return undef
if a QSTRING
was supplied and had
illegal syntax.
QSTRING
to internal form and sets any
options (same as in the constructor). prepare
may be used to change
the query expression and options for an existing query object. If
OPTIONS
are omitted, any options set by a previous call to the
constructor or prepare
remain in effect.
This method returns a reference to the query object if the syntax of the
expression was legal, or undef
if not.
TARGET
is a scalar, match
returns a true value if the string
specified by TARGET
matches the query object's query expression. If
TARGET
is not given, the match is made against $_
.
If TARGET
is an array, match
returns a (possibly empty) list of all
matching elements. If the elements of the array are references to sub-
arrays, the match is done against the first element of each sub-array.
This allows arbitrary information (e.g. filenames) to be associated with
each string to match.
If TARGET
is a reference to an array, match
returns a reference to
a (possibly empty) list of all matching elements.
MATCH
when TARGET
is a scalar or is not given.
Slightly faster than MATCH
under these circumstances.
The following methods are used to generate regexps based on query elements.
They may be overridden to generate other forms of matching code, such as
trees to be used by a custom version of match
that evaluates index lists
or the like.
All these methods return a scalar corresponding to the code that performs the specified options. As supplied, they return regexp strings, but overridden methods could return objects, array references, etc.
Parameters Q1 and Q2 are the same type of scalar as the return values.
build_final_expression(Q1)
qr//
to compile the regexp. The return value will be stored in the object's
matchexp
field. It is NOT necessarily of a type that can be passed to
the other code-generation methods.
build_expression(Q1,Q2)
Q1
OR Q2
build_expression_finish(Q1)
build_conj(Q1,Q2,F)
Q1
AND Q2
. F will be true if this is the first
time this method is called in a sequence of several conjunctions (the
supplied method uses this to factor a common ^
out of the generated sub-
expressions, which greatly speeds up matching).
=item build_near(Q1,Q2)
Generate code needed to match Q1
NEAR Q2
.
build_concat(Q1,Q2)
Q1
immediately followed by Q2
.
build_negation(Q1)
Q1
.
build_literal(Q1)
Q1
as a literal.
Eric Bohlman (ebohlman@netcom.com)
The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords.
Copyright (c) 1998-1999 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself. =cut
Text::Query::Advanced - Match text against Boolean expression |