XML::XQL - A perl module for querying XML tree structures with XQL |
XML::XQL - A perl module for querying XML tree structures with XQL
use XML::XQL; use XML::XQL::DOM;
$parser = new XML::DOM::Parser; $doc = $parser->parsefile ("file.xml");
# Return all elements with tagName='title' under the root element 'book' $query = new XML::XQL::Query (Expr => "book/title"); @result = $query->solve ($doc); $query->dispose; # Avoid memory leaks - Remove circular references
# Or (to save some typing) @result = XML::XQL::solve ("book/title", $doc);
# Or (to save even more typing) @result = $doc->xql ("book/title");
The XML::XQL module implements the XQL (XML Query Language) proposal submitted to the XSL Working Group in September 1998. The spec can be found at: http://www.w3.org/TandS/QL/QL98/pp/xql.html Most of the contents related to the XQL syntax can also be found in the the XML::XQL::Tutorial manpage that comes with this distribution. Note that XQL is not the same as XML-QL!
The current implementation only works with the the XML::DOM manpage module, but once the design is stable and the major bugs are flushed out, other extensions might follow, e.g. for XML::Grove.
XQL was designed to be extensible and this implementation tries to stick to that. Users can add their own functions, methods, comparison operators and data types. Plugging in a new XML tree structure (like XML::Grove) should be a piece of cake.
To use the XQL module, either
use XML::XQL;
or
use XML::XQL::Strict;
The Strict module only provides the core XQL functionality as found in the XQL spec. By default (i.e. by using XML::XQL) you get 'XQL+', which has some additional features.
See the section Additional Features in XQL+ for the differences.
This module is still in development. See the To-do list in XQL.pm for what still needs to be done. Any suggestions are welcome, the sooner these implementation issues are resolved, the faster we can all use this module.
If you find a bug, you would do me great favor by sending it to me in the form of a test case. See the file t/xql_template.t that comes with this distribution.
If you have written a cool comparison operator, function, method or XQL data type that you would like to share, send it to tjmather@tjmather.com and I will add it to this module.
@result = XML::XQL::solve ("doc//book", $doc);
This is provided as a shortcut for:
$query = new XML::XQL::Query (Expr => "doc//book"); @result = $query->solve ($doc); $query->dispose;
Note that with the XML::XQL::DOM manpage, you can also write (see the XML::DOM::Node manpage for details):
@result = $doc->xql ("doc//book");
document()
method.
By default it uses an XML::DOM::Parser that was created without any arguments,
i.e.
$PARSER = new XML::DOM::Parser;
The second parameter must be a reference to a Perl function or an anonymous sub. E.g. '\&my_func' or 'sub { ... code ... }'
If ALLOWED_OUTSIDE (default is 0) is set to 1, the function or method may also be used outside subqueries in node queries. (See NodeQuery parameter in Query constructor)
If CONST (default is 0) is set to 1, the function is considered to be ``constant''. See Constant Function Invocations for details.
If QUERY_ARG (default is 0) is not -1, the argument with that index is considered to be a 'query parameter'. If the query parameter is a subquery, that returns multiple values, the result list of the function invocation will contain one result value for each value of the subquery. E.g. 'length(book/author)' will return a list of Numbers, denoting the string lengths of all the author elements returned by 'book/author'.
Note that only methods (not functions) may appear after a Bang ``!'' operator. This is checked when parsing the XQL query string.
See also: defineMethod
Function values are always converted to Perl strings with xql_toString before they are passed to the Perl function implementation. The function return value is cast to an object of type RETURN_TYPE, or to the empty list [] if the result is undef. It uses expandType to expand XQL primitive type names. If RETURN_TYPE is ``*'', it returns the function result as is, unless the function result is undef, in which case it returns [].
The second parameter must be a reference to a Perl function or an anonymous sub. E.g. '\&my_func' or 'sub { ... code ... }'
If ALLOWED_OUTSIDE (default is 0) is set to 1, the function or method may also be used outside subqueries in node queries. (See NodeQuery parameter in Query constructor)
Note that only methods (not functions) may appear after a Bang ``!'' operator. This is checked when parsing the XQL query string.
See also: defineFunction
E.g. define the operators $my_op$ and $my_op2$:
defineComparisonOperators ('my_op' => \&my_op, 'my_op2' => sub { ... insert code here ... });
value()
call for Elements with the specified
TAG_NAME uses the specified function. The function will receive
two parameters. The second one is the TAG_NAME of the Element node
and the first parameter is the Element node itself.
FUNCREF should be a reference to a Perl function, e.g. \&my_sub, or
an anonymous sub.
E.g. to define that all Elements with tag name 'date-of-birth' should return XML::XQL::Date objects:
defineElementValueConvertor ('date-of-birth', sub { my $elem = shift; # Always pass in the node as the second parameter. This is # the reference node for the object, which is used when # sorting values in document order. new XML::XQL::Date ($elem->xql_text, $elem); });
These convertors can only be specified at a global level, not on a per query basis. To undefine a convertor, simply pass a FUNCREF of undef.
value()
call for Attributes with the specified
ATTR_NAME and a parent Element with the specified ELEM_TAG_NAME
uses the specified function. An ELEM_TAG_NAME of ``*'' will match regardless of
the tag name of the parent Element. The function will receive
3 parameters. The third one is the tag name of the parent Element (even if
ELEM_TAG_NAME was ``*''), the second is the ATTR_NAME and the first is the
Attribute node itself.
FUNCREF should be a reference to a Perl function, e.g. \&my_sub, or
an anonymous sub.
These convertors can only be specified at a global level, not on a per query basis. To undefine a convertor, simply pass a FUNCREF of undef.
Overriding the ALIAS for ``date'', also affects the object type returned by the
date()
function.
When printing the error message, the subexpression that caused the error will be enclosed by the delimiters, i.e. underlined on Unix.
For certain subexpressions the significant keyword, e.g. ``$and$'' is enclosed in the bold delimiters BOLD_ON (default: `tput bold` on Unix, ``'' elsewhere) and BOLD_OFF (default: (`tput rmul` . `tput smul`) on Unix, ``'' elsewhere, see $BoldOff in XML::XQL::XQL.pm for details.)
# at a global level - shared by all queries (that don't (re)define 'q') XML::XQL::defineTokenQ ('k'); XML::XQL::defineTokenQQ (undef);
# at a query level - only defined for this query $query = new XML::XQL::Query (Expr => "book/title", q => 'k', qq => undef);
From now on k// works like q// did and qq// doesn't work at all anymore.
$queryExpr = "book/title # this comment is inside the query string [. = 'Moby Dick']"; # this comment is outside
E.g. $AND$, $And$, $aNd$, and, And, aNd are all valid replacements for $and$.
Note that XQL+ comparison operators ($match$, $no_match$, $isa$, $can$) still require dollar delimiters and are case-sensitive.
When casting the values to be matched, both are converted to Text.
When casting the values to be matched, both are converted to Text.
value()
function returns an XML::XQL::Date object. (Note that the value()
function can
be overridden to return a specific object type for certain elements and
attributes.) It uses expandType to expand XQL primitive type names.
value()
function returns an object that implements the (Perl) swim()
method.
(Note that the value()
function can be overridden to return a specific object
type for certain elements and attributes.)
once()
will cache the QUERY result for the
rest of the query.
Note that ``constant'' function invocations are always cached. See also Constant Function Invocations
For most Node types, it converts the value()
to a string (with xql_toString)
to match the string and xql_setValue to set the new value in case it matched.
For XQL primitives (Boolean, Number, Text) and other data types (e.g. Date) it
uses xql_toString to match the String and xql_setValue to set the result.
Beware that performing a substitution on a primitive that was found in the
original XQL query expression, changes the value of that constant.
If MODE is 0 (default), it treats Element nodes differently by matching and replacing text blocks occurring in the Element node. A text block is defined as the concatenation of the raw text of subsequent Text, CDATASection and EntityReference nodes. In this mode it skips embedded Element nodes. If a text block matches, it is replaced by a single Text node, regardless of the original node type(s).
If MODE is 1, it treats Element nodes like the other nodes, i.e. it converts
the value()
to a string etc. Note that the default implementation of value()
calls text(), which normalizes whitespace and includes embedded Element
descendants (recursively.) This is probably not what you want to use in most
cases, but since I'm not a professional psychic... :-)
??? add more specifics
E.g. 'eval(``2 + 5'', ``Number'')' returns a Number object with the value 7, and 'eval(``%ENV{USER}'')' returns a Text object with the user name.
Consider using once()
to cache the return value, when the invocation will
return the same result for each invocation within a query.
??? add more specifics
new()
function) is considered to be a 'query parameter'.
See defineFunction for a definition of query parameter.
It uses expandType to expand XQL primitive type names.
document()
function creates a new the XML::XML::Document manpage for each result
of QUERY (QUERY may be a simple string expression, like ``/usr/enno/file.xml''.
See t/xql_document.t or below for an example with a more complex QUERY.)
document()
may be abbreviated to doc().
document()
uses an XML::DOM::Parser underneath, which can be set with
XML::XQL::setDocParser(). By default it uses a parser that was created without
any arguments, i.e.
$PARSER = new XML::DOM::Parser;
Let's try a more complex example, assuming $doc contains:
<doc> <file name="file1.xml"/> <file name="file2.xml"/> </doc>
Then the following query will return two the XML::XML::Document manpages, one for file1.xml and one for file2.xml:
@result = XML::XQL::solve ("document(doc/file/@name)", $doc);
The resulting documents can be used as input for following queries, e.g.
@result = XML::XQL::solve ("document(doc/file/@name)/root/bla", $doc);
will return all /root/bla elements from the documents returned by document().
DOM_nodeType()
returns
4 and 5 respectively, whereas nodeType()
returns 3, because they are
considered text nodes.
The function result is casted to the appropriate XQL primitive type (Number, Text or Boolean), or to an empty list if the result was undef.
The following functions were found in the XPath specification:
substring-before("1999/04/01","/") returns 1999.
substring-after("1999/04/01","/") returns 04/01,
and
substring-after("1999/04/01","19") returns 99/04/01.
substring("12345",2,3) returns "234".
If the third argument is not specified, it returns the substring starting at the position specified in the second argument and continuing to the end of the string. For example,
substring("12345",2) returns "2345".
More precisely, each character in the string is considered to have a numeric position: the position of the first character is 1, the position of the second character is 2 and so on.
NOTE: This differs from the substr method , in which the method treats the position of the first character as 0.
The XPath spec says this about rounding, but that is not true in this implementation: The returned substring contains those characters for which the position of the character is greater than or equal to the rounded value of the second argument and, if the third argument is specified, less than the sum of the rounded value of the second argument and the rounded value of the third argument; the comparisons and addition used for the above follow the standard IEEE 754 rules; rounding is done as if by a call to the round function.
Note that the generated XQL wrapper for the Perl built-in substr does not allow the argument to be omitted.
translate("bar","abc","ABC") returns the string BAr.
If there is a character in the second argument string with no character at a corresponding position in the third argument string (because the second argument string is longer than the third argument string), then occurrences of that character in the first argument string are removed. For example,
translate("--aaa--","abc-","ABC") returns "AAA".
If a character occurs more than once in the second argument string, then the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, then excess characters are ignored.
NOTE: The translate function is not a sufficient solution for case conversion in all languages. A future version may provide additional functions for case conversion.
This function was implemented using tr///d.
true()
and false().
date()
function.
value()
(i.e. the XQL value()
function is used, which returns a Text value by
default, but may return any data type if the user so chooses.)
The resulting values are then casted to the type of the object with the highest
xql_primType()
value. They are as follows: Node (0), Text (1), Number (2),
Boolean (3), Date (4), other data types (4 by default, but this may be
overriden by the user.)
E.g. if one value is a Text value and the other is a Number, the Text value is cast to a Number and the resulting low-level (Perl) comparison is (for $eq$):
$number->xql_toString == $text->xql_toString
If both were Text values, it would have been
$text1->xql_toString eq $text2->xql_toString
Note that the XQL spec is vague and even conflicting where it concerns type casting. This implementation resulted after talking to Joe Lapp, one of the spec writers.
I will add more stuff here to explain it all, but for now, look at the code for the primitive XQL types or the Date class (the XML::XQL::Date manpage in Date.pm.)
Non-node values that have no associated reference node, always end up at the end of the result list in the order that they were added. The XQL spec states that the reference node for an XML Attribute is the Element to which it belongs, and that the order of values with the same reference node is undefined. This means that the order of an Element and its attributes would be undefined. But since the XML::DOM module keeps track of the order of the attributes, the XQL engine does the same, and therefore, the attributes of an Element are sorted and appear after their parent Element in a sorted result list.
date("12-03-1998") true() sin(0.3) length("abc") date(substr("12-03-1998 is the date", 0, 10))
are constant, but not:
length(book[2])
Results of constant function invocations are cached and calculated only once
for each query. See also the CONST parameter in defineFunction.
It is not necessary to wrap constant function invocations in a once()
call.
Constant XQL functions are: date, true, false and a lot of the XQL+ wrappers for Perl builtin functions. Function wrappers for certain builtins are not made constant on purpose to force the invocation to be evaluated every time, e.g. 'mkdir(``/user/enno/my_dir'', ``0644'')' (although constant in appearance) may return different results for multiple invocations. See %PerlFunc in Plus.pm for details.
count()
function has no parameters in the XQL spec. In this implementation
it will return the number of QUERY results when passed a QUERY parameter.
text()
method adds the expanded text()
value
of sub-Elements. When RECURSE is set to 0 (default is 1), it will not include
sub-elements. This is useful e.g. when using the $match$ operator in a recursive
context (using the // operator), so it won't return parent Elements when one of
the children matches.
the XML::XQL::Query manpage, the XML::XQL::DOM manpage, the XML::XQL::Date manpage
The Japanese version of this document can be found on-line at http://member.nifty.ne.jp/hippo2000/perltips/xml/xql.htm
The the XML::XQL::Tutorial manpage manual page. The Japanese version can be found at http://member.nifty.ne.jp/hippo2000/perltips/xml/xql/tutorial.htm
The XQL spec at http://www.w3.org/TandS/QL/QL98/pp/xql.html
The Design of XQL at http://www.texcel.no/whitepapers/xql-design.html
The DOM Level 1 specification at http://www.w3.org/TR/REC-DOM-Level-1
The XML spec (Extensible Markup Language 1.0) at http://www.w3.org/TR/REC-xml
The the XML::Parser manpage and the XML::Parser::Expat manpage manual pages.
Enno Derksen is the original author.
Please send bugs, comments and suggestions to T.J. Mather <tjmather@tjmather.com>
XML::XQL - A perl module for querying XML tree structures with XQL |