Text::BibTeX - interface to read and parse BibTeX files |
Text::BibTeX - interface to read and parse BibTeX files
use Text::BibTeX;
$bibfile = new Text::BibTeX::File "foo.bib"; $newfile = new Text::BibTeX::File ">newfoo.bib";
while ($entry = new Text::BibTeX::Entry $bibfile) { next unless $entry->parse_ok;
. # hack on $entry contents, using various . # Text::BibTeX::Entry methods .
$entry->write ($newfile); }
The Text::BibTeX
module serves mainly as a high-level introduction to
the Text::BibTeX
library, for both code and documentation purposes.
The code loads the two fundamental modules for processing BibTeX files
(Text::BibTeX::File
and Text::BibTeX::Entry
), and this
documentation gives a broad overview of the whole library that isn't
available in the documentation for the individual modules that comprise
it.
In addition, the Text::BibTeX
module provides a number of
miscellaneous functions that are useful in processing BibTeX data
(especially the kind that comes from bibliographies as defined by BibTeX
0.99, rather than generic database files). These functions don't
generally fit in the object-oriented class hierarchy centred around the
Text::BibTeX::Entry
class, mainly because they are specific to
bibliographic data and operate on generic strings (rather than being
tied to a particular BibTeX entry). These are also documented here, in
MISCELLANEOUS FUNCTIONS.
Note that every module described here begins with the Text::BibTeX
prefix. For brevity, I have dropped this prefix from most class and
module names in the rest of this manual page (and in most of the other
manual pages in the library).
The Text::BibTeX
library includes a number of modules, many of which
provide classes. Usually, the relationship is simple and obvious: a
module provides a class of the same name---for instance, the
Text::BibTeX::Entry
module provides the Text::BibTeX::Entry
class.
There are a few exceptions, though: most obviously, the Text::BibTeX
module doesn't provide any classes itself, it merely loads two modules
(Text::BibTeX::Entry
and Text::BibTeX::File
) that do. The other
exceptions are mentioned in the descriptions below, and discussed in
detail in the documentation for the respective modules.
The modules are presented roughly in order of increasing specialization: the first three are essential for any program that processes BibTeX data files, regardless of what kind of data they hold. The later modules are specialized for use with bibliographic databases, and serve both to emulate BibTeX 0.99's standard styles and to provide an example of how to define a database structure through such specialized modules. Each module is fully documented in its respective manual page.
Text::BibTeX
Entry
and File
), and provides a
number of miscellaneous functions that don't fit anywhere in the class
hierarchy.
Text::BibTeX::File
Text::BibTeX::Entry
File
objects, arbitrary filehandles, or strings. Manages
all the properties of a single entry: type, key, fields, and values.
Also serves as the base class for the structured entry classes
(described in detail in the Text::BibTeX::Structure manpage).
Text::BibTeX::Value
Text::BibTeX
to return Text::BibTeX::Value
objects, which give you access to the
original form of the data.
Text::BibTeX::Structure
Structure
and StructuredEntry
classes, which serve
primarily as base classes for the two kinds of classes that define
database structures. Read this man page for a comprehensive description
of the mechanism for implementing Perl classes analogous to BibTeX
``style files''.
Text::BibTeX::Bib
BibStructure
and BibEntry
classes, which serve two
purposes: they fulfill the same role as the standard style files of
BibTeX 0.99, and they give an example of how to write new database
structures. These ultimately derive from, respectively, the
Structure
and StructuredEntry
classes provided by the Structure
module.
Text::BibTeX::BibSort
BibEntry
class's base classes: handles the generation of
sort keys for sorting prior to output formatting.
Text::BibTeX::BibFormat
BibEntry
class's base classes: handles the formatting of
bibliographic data for output in a markup language such as LaTeX.
Text::BibTeX::Name
Bib
structure and specific to bibliographic data
as defined by BibTeX itself: parses individual author names into
``first'', ``von'', ``last'', and ``jr'' parts.
Text::BibTeX::NameFormat
Name
class) back together in a custom way.
For a first time through the library, you'll probably want to confine
your reading to the Text::BibTeX::File manpage and the Text::BibTeX::Entry manpage. The
other modules will come in handy eventually, especially if you need to
emulate BibTeX in a fairly fine grained way (e.g. parsing names,
generating sort keys). But for the simple database hacks that are the
bread and butter of the Text::BibTeX
library, the File
and
Entry
classes are the bulk of what you'll need. You may also find
some of the material in this manual page useful, namely CONSTANT VALUES and UTILITY FUNCTIONS.
The Text::BibTeX
module has a number of optional exports, most of
them constant values described in CONSTANT VALUES below. The
default exports are a subset of these constant values that are used
particularly often, the ``entry metatypes'' (also accessible via the
export tag metatypes
). Thus, the following two lines are equivalent:
use Text::BibTeX; use Text::BibTeX qw(:metatypes);
Some of the various subroutines provided by the module are also
exportable. bibloop
, split_list
, purify_string
, and
change_case
are all useful in everyday processing of BibTeX data, but
don't really fit anywhere in the class hierarchy. They may be imported
from Text::BibTeX
using the subs
export tag. check_class
and
display_list
are also exportable, but only by name; they are not
included in any export tag. (These two mainly exist for use by other
modules in the library.) For instance, to use Text::BibTeX
and
import the entry metatype constants and the common subroutines:
use Text::BibTeX qw(:metatypes :subs);
Another group of subroutines exists for direct manipulation of the macro
table maintained by the underlying C library. These functions (see
Macro table functions, below) allow you to define, delete, and
query the value of BibTeX macros (or ``abbreviations''). They may be
imported en masse using the macrosubs
export tag:
use Text::BibTeX qw(:macrosubs);
The Text::BibTeX
module makes a number of constant values available.
These correspond to the values of various enumerated types in the
underlying C library, btparse, and their meanings are more fully
explained in the btparse documentation.
Each group of constants is optionally exportable using an export tag given in the descriptions below.
BTE_UNKNOWN
, BTE_REGULAR
, BTE_COMMENT
, BTE_PREAMBLE
,
BTE_MACRODEF
. The metatype
method in the Entry
class always
returns one of these values. The latter three describe, respectively,
comment
, preamble
, and string
entries; BTE_REGULAR
describes
all other entry types. BTE_UNKNOWN
should never be seen (it's mainly
useful for C code that might have to detect half-baked data structures).
See also btparse. Export tag: metatypes
.
BTAST_STRING
, BTAST_MACRO
, BTAST_NUMBER
. Used to distinguish
the three kinds of simple values---strings, macros, and numbers. The
SimpleValue
class' type
method always returns one of these three
values. See also the Text::BibTeX::Value manpage, btparse. Export tag:
nodetypes
.
BTN_FIRST
, BTN_VON
, BTN_LAST
, BTN_JR
, BTN_NONE
. Used to
specify the various parts of a name after it has been split up. These
are mainly useful when using the NameFormat
class. See also
bt_split_names and bt_format_names. Export tag: nameparts
.
BTJ_MAYTIE
, BTJ_SPACE
, BTJ_FORCETIE
, BTJ_NOTHING
. Used to
tell the NameFormat
class how to join adjacent tokens together; see
the Text::BibTeX::NameFormat manpage and bt_format_names. Export tag:
joinmethods
.
Text::BibTeX
provides several functions that operate outside of the
normal class hierarchy. Of these, only bibloop
is likely to be of
much use to you in writing everyday BibTeX-hacking programs; the other
two (check_class
and display_list
) are mainly provided for the use
of other modules in the library. They are documented here mainly for
completeness, but also because they might conceivably be useful in other
circumstances.
Text::BibTeX::File
object (opened for output) to which entries might
be printed.
The subroutine referenced by ACTION is called with exactly one argument:
the Text::BibTeX::Entry
object representing the entry currently being
processed. Information about both the entry itself and the file where
it originated is available through this object; see
the Text::BibTeX::Entry manpage. The ACTION subroutine is only called if the
entry was successfully parsed; any syntax errors will result in a
warning message being printed, and that entry being skipped. Note that
all successfully parsed entries are passed to the ACTION subroutine,
even preamble
, string
, and comment
entries. To skip these
pseudo-entries and only process ``regular'' entries, then your action
subroutine should look something like this:
sub action { my $entry = shift; return unless $entry->metatype == BTE_REGULAR; # process $entry ... }
If your action subroutine needs any more arguments, you can just create
a closure (anonymous subroutine) as a wrapper, and pass it to
bibloop
:
sub action { my ($entry, $extra_stuff) = @_; # ... }
my $extra = ...; Text::BibTeX::bibloop (sub { &action ($_[0], $extra) }, \@files);
If the ACTION subroutine returns a true value and DEST was given, then the processed entry will be written to DEST.
isa
).
This derivation might be through multiple inheritance, or through
several generations of a class hierarchy; the only requirement is that
SUPERCLASS is somewhere in PACKAGE's tree of base classes. Finally, it
checks that PACKAGE provides each method listed in METHODS (a reference
to a list of method names). This is done with the universal method
can
, so the methods might actually come from one of PACKAGE's base
classes.
DESCRIPTION should be a brief string describing the class that was expected to be provided by PACKAGE. It is used for generating warning messages if any of the class requirements are not met.
This is mainly used by the supervisory code in
Text::BibTeX::Structure
, to ensure that user-supplied structure
modules meet the rules required of them.
" and "
and the resulting string is returned. Otherwise, the list
has N elements for N >= 3; elements 1..N-1 are joined with
commas, and the final element is tacked on with an intervening
", and "
.
If QUOTE is true, then each string is encased in single quotes before anything else is done.
This is used elsewhere in the library for two very distinct purposes: for generating warning messages describing lists of fields that should be present or are conflicting in an entry, and for generating lists of author names in formatted bibliographies.
In addition to loading the File
and Entry
modules, Text::BibTeX
loads the XSUB code which bridges the Perl modules to the underlying C
library, btparse. This XSUB code provides a number of miscellaneous
utility functions, most of which are put into other packages in the
Text::BibTeX
family for use by the corresponding classes. (For
instance, the XSUB code loaded by Text::BibTeX
provides a function
Text::BibTeX::Entry::parse
, which is actually documented as the
parse
method of the Text::BibTeX::Entry
class---see
the Text::BibTeX::Entry manpage. However, for completeness this function---and
all the other functions that become available when you use
Text::BibTeX
---are at least mentioned here. The only functions from
this group that you're ever likely to use are described in Generic string-processing functions.
These just initialize and shutdown the underlying C library. Don't call
either one of them; the Text::BibTeX
startup/shutdown code takes care
of it as appropriate. They're just mentioned here for completeness.
"and"
; here, you can supply any string. Instances of DELIM in
STRING are considered delimiters if they are at brace-depth zero,
surrounded by whitespace, and not at the beginning or end of STRING; the
comparison is case-insensitive. See bt_split_names for full details
of how splitting is done (it's not the same as Perl's split
function).
Returns the list of strings resulting from splitting STRING on DELIM.
purify_string
does not modify STRING in-place. A purified copy of
the input string is returned.
OPTIONS is currently unused.
'u'
, 'l'
, or 't'
). See bt_misc for
details; again, change_case
differs from the C interface in that
STRING is not modified in-place---the input string is copied, and the
transformed copy is returned.
Although these functions are provided by the Text::BibTeX
module,
they are actually in the Text::BibTeX::Entry
package. That's because
they are implemented in C, and thus loaded with the XSUB code that
Text::BibTeX
loads; however, they are actually methods in the
Text::BibTeX::Entry
class. Thus, they are documented as methods in
the Text::BibTeX::Entry manpage.
These functions allow direct access to the macro table maintained by
btparse, the C library underlying Text::BibTeX
. In the normal
course of events, macro definitions always accumulate, and are only
defined as a result of parsing a macro definition (@string
) entry.
btparse never deletes old macro definitions for you, and doesn't have
any built-in default macros. If, for example, you wish to start fresh
with new macros for every file, use delete_all_macros
. If you wish
to pre-define certain macros, use add_macro_text
. (But note that the
Bib
structure, as part of its mission to emulate BibTeX 0.99, defines
the standard ``month name'' macros for you.)
See also bt_macros in the btparse documentation for a description of the C interface to these functions.
add_macro_text()
issues a warning saying so.
undef
. FILENAME and LINE, if supplied,
are used for generating this warning; they should be supplied if you're
looking up the macro as a result of finding it in a file.
These are both private functions for the use of the Name
class, and
therefore are put in the Text::BibTeX::Name
package. You should use
the interface provided by that class for parsing names in the BibTeX
style.
These are private functions for the use of the NameFormat
class, and
therefore are put in the Text::BibTeX::NameFormat
package. You
should use the interface provided by that class for formatting names in
the BibTeX style.
Text::BibTeX
inherits several limitations from its base C library,
btparse; see btparse/BUGS AND LIMITATIONS for details. In addition,
Text::BibTeX
will not work with a Perl binary built using the sfio
library. This is because Perl's I/O abstraction layer does not extend to
third-party C libraries that use stdio, and btparse most certainly does
use stdio.
btool_faq, the Text::BibTeX::File manpage, the Text::BibTeX::Entry manpage, the Text::BibTeX::Value manpage
Greg Ward <gward@python.net>
Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file is part of the Text::BibTeX library. This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
The btOOL home page, where you can get up-to-date information about
Text::BibTeX
(and download the latest version) is
http://starship.python.net/~gward/btOOL/
You will also find the latest version of btparse, the C library
underlying Text::BibTeX
, there. btparse is needed to build
Text::BibTeX
, and must be downloaded separately.
Both libraries are also available on CTAN (the Comprehensive TeX Archive
Network, http://www.ctan.org/tex-archive/
) and CPAN (the Comprehensive
Perl Archive Network, http://www.cpan.org/
). Look in
biblio/bibtex/utils/btOOL/ on CTAN, and authors/Greg_Ward/ on
CPAN. For example,
http://www.ctan.org/tex-archive/biblio/bibtex/utils/btOOL/ http://www.cpan.org/authors/Greg_Ward
will both get you to the latest version of Text::BibTeX
and btparse
-- but of course, you should always access busy sites like CTAN and CPAN
through a mirror.
Text::BibTeX - interface to read and parse BibTeX files |