Text::BibTeX::Entry - read and parse BibTeX files |
Text::BibTeX::Entry - read and parse BibTeX files
use Text::BibTeX; # do not use Text::BibTeX::Entry alone!
# ...assuming that $bibfile and $newbib are both objects of class # Text::BibTeX::File, opened for reading and writing (respectively):
# Entry creation/parsing methods: $entry = new Text::BibTeX::Entry; $entry->read ($bibfile); $entry->parse ($filename, $filehandle); $entry->parse_s ($entry_text);
# or: $entry = new Text::BibTeX::Entry $bibfile; $entry = new Text::BibTeX::Entry $filename, $filehandle; $entry = new Text::BibTeX::Entry $entry_text; # Entry query methods warn "error in input" unless $entry->parse_ok; $metatype = $entry->metatype; $type = $entry->type;
# if metatype is BTE_REGULAR or BTE_MACRODEF: $key = $entry->key; # only for BTE_REGULAR metatype $num_fields = $entry->num_fields; @fieldlist = $entry->fieldlist; $has_title = $entry->exists ('title'); $title = $entry->get ('title'); # or: ($val1,$val2,...$valn) = $entry->get ($field1, $field2, ..., $fieldn);
# if metatype is BTE_COMMENT or BTE_PREAMBLE: $value = $entry->value;
# Author name methods @authors = $entry->split ('author'); ($first_author) = $entry->names ('author');
# Entry modification methods $entry->set_type ($new_type); $entry->set_key ($new_key); $entry->set ('title', $new_title); # or: $entry->set ($field1, $val1, $field2, $val2, ..., $fieldn, $valn); $entry->delete (@fields); $entry->set_fieldlist (\@fieldlist);
# Entry output methods $entry->write ($newbib); $entry->print ($filehandle); $entry_text = $entry->print_s;
# Miscellaneous methods $entry->warn ($entry_warning); # or: $entry->warn ($field_warning, $field);
Text::BibTeX::Entry
does all the real work of reading and parsing
BibTeX files. (Well, actually it just provides an object-oriented Perl
front-end to a C library that does all that. But that's not important
right now.)
BibTeX entries can be read either from Text::BibTeX::File
objects (using
the read
method), or directly from a filehandle (using the parse
method), or from a string (using parse_s
). The first is preferable,
since you don't have to worry about supplying the filename, and because of
the extra functionality provided by the Text::BibTeX::File
class.
Currently, this means that you may specify the database structure to
which entries are expected to conform via the File
class. This lets you
ensure that entries follow the rules for required fields and mutually
constrained fields for a particular type of database, and also gives you
access to all the methods of the structured entry class for this
database structure. See the Text::BibTeX::Structure manpage for details on database
structures.
Once you have the entry, you can query it or change it in a variety of
ways. The query methods are parse_ok
, type
, key
, num_fields
,
fieldlist
, exists
, and get
. Methods for changing the entry are
set_type
, set_key
, set_fieldlist
, delete
, and set
.
Finally, you can output BibTeX entries, again either to an open
Text::BibTeX::File
object, a filehandle or a string. (A filehandle or
File
object must, of course, have been opened in write mode.) Output to
a File
object is done with the write
method, to a filehandle via
print
, and to a string with print_s
. Using the File
class is
recommended for future extensibility, although it currently doesn't offer
anything extra.
Text::BibTeX::Entry
object. If the SOURCE parameter is
supplied, it must be one of the following: a Text::BibTeX::File
(or
descendant class) object, a filename/filehandle pair, or a string. Calls
read
to read from a Text::BibTeX::File
object, parse
to read from
a filehandle, and parse_s
to read from a string.
A filehandle can be specified as a GLOB reference, or as an
IO::Handle
(or descendants) object, or as a FileHandle
(or
descendants) object. (But there's really no point in using
FileHandle
objects, since Text::BibTeX
requires Perl 5.004, which
always includes the IO
modules.) You can not pass in the name of
a filehandle as a string, though, because Text::BibTeX::Entry
conforms to the use strict
pragma (which disallows such symbolic
references).
The corresponding filename should be supplied in order to allow for
accurate error messages; if you simply don't have the filename, you can
pass undef
and you'll get error messages without a filename. (It's
probably better to rearrange your code so that the filename is
available, though.)
Thus, the following are equivalent to read from a file named by
$filename
(error handling ignored):
# good ol' fashioned filehandle and GLOB ref open (BIBFILE, $filename); $entry = new Text::BibTeX::Entry ($filename, \*BIBFILE);
# newfangled IO::File thingy $file = new IO::File $filename; $entry = new Text::BibTeX::Entry ($filename, $file);
But using a Text::BibTeX::File
object is simpler and preferred:
$file = new Text::BibTeX::File $filename; $entry = new Text::BibTeX::Entry $file;
Returns the new object, unless SOURCE is supplied and reading/parsing the entry fails (e.g., due to end of file) -- then it returns false.
Text::BibTeX::File
object (or descendant). The next entry will be read
from the file associated with that object.
Returns the same as parse
(or parse_s
): false if no entry found
(e.g., at end-of-file), true otherwise. To see if the parse itself failed
(due to errors in the input), call the parse_ok
method.
Text::BibTeX::Entry
object
you've been tossing around. But you don't need to know any of that -- I
just figured if you've read this far, you might want to know something
about the inner workings of this module.)
The success of the parse is stored internally so that you can later
query it with the parse_ok
method. Even in the presence of syntax
errors, you'll usually get something resembling your input, but it's
usually not wise to try to do anything with it. Just call parse_ok
,
and if it returns false then silently skip to the next entry. (The
error messages printed out by the parser should be quite adequate for
the user to figure out what's wrong. And no, there's currently no way
for you to capture or redirect those error messages -- they're always
printed to stderr
by the underlying C code. That should change in
future releases.)
If no '@' signs are seen on the input before reaching end-of-file, then
we've exhausted all the entries in the file, and parse
returns a
false value. Otherwise, it returns a true value -- even if there were
syntax errors. Hence, it's important to check parse_ok
.
The FILENAME parameter is only used for generating error messages, but anybody using your program will certainly appreciate your setting it correctly!
parse_s
with the same string
will give you the same results each time. Thus, there's no point in
putting multiple entries in one string.
stderr
for the user's edification, but no notice is
available to the calling code.)
@string
entries), and regular (all other entry types).
Text::BibTeX
exports four constants for these metatypes: BTE_COMMENT
,
BTE_PREAMBLE
, BTE_MACRODEF
, and BTE_REGULAR
.)
undef
for entries that don't have a key, such as macro definition (@string
)
entries.)
scalar
in front of a call to fieldlist
.
See below for the consequences of calling fieldlist
in a scalar
context.)
$author = $entry->get ('author'); ($author, $editor) = $entry->get ('author', 'editor');
If a FIELD is not present in the entry, undef
will be returned at its
place in the return list. However, you can't completely trust this as a
test for presence or absence of a field; it is possible for a field to be
present but undefined. Currently this can only happen due to certain
syntax errors in the input, or if you pass an undefined value to set
, or
if you create a new field with set_fieldlist
(the new field's value is
implicitly set to undef
).
Normally, the field value is what the input looks like after ``maximal processing''--quote characters are removed, whitespace is collapsed (the same way that BibTeX itself does it), macros are expanded, and multiple tokens are pasted together. (See bt_postprocess for details on the post-processing performed by btparse.)
For example, if your input file has the following:
@string{of = "of"} @string{foobars = "Foobars"}
@article{foobar, title = { The Mating Habits } # of # " Adult " # foobars }
then using get
to query the value of the title
field from the
foobar
entry would give the string ``The Mating Habits of Adult Foobars''.
However, in certain circumstances you may wish to preserve the values as
they appear in the input. This is done by setting a preserve_values
flag at some point; then, get
will return not strings but
Text::BibTeX::Value
objects. Each Value
object is a list of
Text::BibTeX::SimpleValue
objects, which in turn consists of a simple
value type (string, macro, or number) and the text of the simple value.
Various ways to set the preserve_values
flag and the interface to
both Value
and SimpleValue
objects are described in
the Text::BibTeX::Value manpage.
@comment
and @preamble
entries. For instance, the entry
@preamble{" This is a preamble" # {---the concatenation of several strings}}
would return a value of ``This is a preamble---the concatenation of several strings''.
If this entry was parsed in ``value preservation'' mode, then value
acts like get
, and returns a Value
object rather than a simple
string.
This is the only part of the module that makes any assumption about the nature of the data, namely that certain fields are lists delimited by a simple word such as ``and'', and that the delimited sub-strings are human names of the ``First von Last'' or ``von Last, Jr., First'' style used by BibTeX. If you are using this module for anything other than bibliographic data, you can most likely forget about these two methods. However, if you are in fact hacking on BibTeX-style bibliographic data, these could come in very handy -- the name-parsing done by BibTeX is not trivial, and the list-splitting would also be a pain to implement in Perl because you have to pay attention to brace-depth. (Not that it wasn't a pain to implement in C -- it's just a lot more efficient than a Perl implementation would be.)
Incidentally, both of these methods assume that the strings being split
have already been ``collapsed'' in the BibTeX way, i.e. all leading and
trailing whitespace removed and internal whitespace reduced to single
spaces. This should always be the case when using these two methods on
a Text::BibTeX::Entry
object, but these are actually just front ends
to more general functions in Text::BibTeX
. (More general in that you
supply the string to be parsed, rather than supplying the name of an
entry field.) Should you ever use those more general functions
directly, you might have to worry about collapsing whitespace; see
the Text::BibTeX manpage (the split_list
and split_name
functions in
particular) for more information.
Please note that the interface to author name parsing is experimental, subject to change, and open to discussion. Please let me know if you have problems with it, think it's just perfect, or whatever.
split
just because the names are
the same: in particular, DELIM must be a simple string (no regexps), and
delimiters that are at the beginning or end of the string, or at non-zero
brace depth, or not surrounded by whitespace, are ignored. Some examples
might illuminate matters:
if field F is... then split (F) returns... 'Name1 and Name2' ('Name1', 'Name2') 'Name1 and and Name2' ('Name1', undef, 'Name2') 'Name1 and' ('Name1 and') 'and Name2' ('and Name2') 'Name1 {and} Name2 and Name3' ('Name1 {and} Name2', 'Name3') '{Name1 and Name2} and Name3' ('{Name1 and Name2}', 'Name3')
Note that a warning will be issued for empty names (as in the second example above). A warning ought to be issued for delimiters at the beginning or end of a string, but currently this isn't done. (Hmmm.)
DESC is a one-word description of the substrings; it defaults to 'name'. It is only used for generating warning messages.
Returns a list of Text::BibTeX::Name
objects, each of which represents
one name. Use the part
method to query these objects; see
the Text::BibTeX::Name manpage for details on the interface to name objects (and on
name-parsing as well).
For example if this entry:
@article{foo, author = {John Smith and Hacker, J. Random and Ludwig van Beethoven and {Foo, Bar and Company}}}
has been parsed into a Text::BibTeX::Entry
object $entry
, then
@names = $entry->names ('author');
will put a list of Text::BibTeX::Name
objects in @names
. These can
be queried individually as described in the Text::BibTeX::Name manpage; for instance,
@last = $names[0]->part ('last');
would put the list of tokens comprising the last name of the first author
into the @last
array: ('Smith')
.
BTE_COMMENT
, BTE_PREAMBLE
, BTE_MACRODEF
, and BTE_REGULAR
, which
are all optionally exported from Text::BibTeX
).
undef
or unsupplied,
in which case FIELD will simply be set to undef
-- this is where the
difference between the exists
method and testing the definedness of
field values becomes clear.)
Multiple (FIELD, VALUE) pairs may be supplied; they will be processed in order (i.e. the input is treated like a list, not a hash). For example:
$entry->set ('author', $author); $entry->set ('author', $author, 'editor', $editor);
VALUE can be either a simple string or a Text::BibTeX::Value
object;
it doesn't matter if the entry was parsed in ``full post-processing'' or
``preserve input values'' mode.
undef
and a warning is printed. Conversely, if any of the fields currently
present in the entry are not named in the list of fields supplied to
set_fields
, they are deleted from the entry and another warning is
printed.
Text::BibTeX::File
object, opened for output). Currently
the printout is not particularly human-friendly; a highly configurable
pretty-printer will be developed eventually.
number(s))
to
WARNING, appends a newline, and passes it to Perl's warn
. If FIELD is
supplied, the line number given is just that of the field; otherwise, the
range of lines for the whole entry is given. (Well, almost -- currently,
the line number of the last field is used as the last line of the whole
entry. This is a bug.)
For example, if lines 10-15 of file foo.bib look like this:
@article{homer97, author = {Homer Simpson and Ned Flanders}, title = {Territorial Imperatives in Modern Suburbia}, journal = {Journal of Suburban Studies}, year = 1997 }
then, after parsing this entry to $entry
, the calls
$entry->warn ('what a silly entry'); $entry->warn ('what a silly journal', 'journal');
would result in the following warnings being issued:
foo.bib, lines 10-14: what a silly entry foo.bib, line 13: what a silly journal
If FIELD is not supplied, returns a two-element list containing the line numbers of the beginning and end of the whole entry. (Actually, the ``end'' line number is currently inaccurate: it's really the the line number of the last field in the entry. But it's better than nothing.)
Text::BibTeX::File
object---if
you just passed a filename/filehandle pair to parse
, you can't get
the filename back. (Sorry.)
the Text::BibTeX manpage, the Text::BibTeX::File manpage, the Text::BibTeX::Structure manpage
Greg Ward <gward@python.net>
Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file is part of the Text::BibTeX library. This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Text::BibTeX::Entry - read and parse BibTeX files |