Text::BibTeX - interface to read and parse BibTeX files

BUGS AND LIMITATIONS
SEE ALSO
AUTHOR
COPYRIGHT
AVAILABILITY

NAME

Text::BibTeX - interface to read and parse BibTeX files

SYNOPSIS

   use Text::BibTeX;

   $bibfile = new Text::BibTeX::File "foo.bib";
   $newfile = new Text::BibTeX::File ">newfoo.bib";

   while ($entry = new Text::BibTeX::Entry $bibfile)
   {
      next unless $entry->parse_ok;

         .             # hack on $entry contents, using various
         .             # Text::BibTeX::Entry methods
         .

      $entry->write ($newfile);
   }

DESCRIPTION

The Text::BibTeX module serves mainly as a high-level introduction to the Text::BibTeX library, for both code and documentation purposes. The code loads the two fundamental modules for processing BibTeX files (Text::BibTeX::File and Text::BibTeX::Entry), and this documentation gives a broad overview of the whole library that isn't available in the documentation for the individual modules that comprise it.

In addition, the Text::BibTeX module provides a number of miscellaneous functions that are useful in processing BibTeX data (especially the kind that comes from bibliographies as defined by BibTeX 0.99, rather than generic database files). These functions don't generally fit in the object-oriented class hierarchy centred around the Text::BibTeX::Entry class, mainly because they are specific to bibliographic data and operate on generic strings (rather than being tied to a particular BibTeX entry). These are also documented here, in MISCELLANEOUS FUNCTIONS.

Note that every module described here begins with the Text::BibTeX prefix. For brevity, I have dropped this prefix from most class and module names in the rest of this manual page (and in most of the other manual pages in the library).

MODULES AND CLASSES

The Text::BibTeX library includes a number of modules, many of which provide classes. Usually, the relationship is simple and obvious: a module provides a class of the same name---for instance, the Text::BibTeX::Entry module provides the Text::BibTeX::Entry class. There are a few exceptions, though: most obviously, the Text::BibTeX module doesn't provide any classes itself, it merely loads two modules (Text::BibTeX::Entry and Text::BibTeX::File) that do. The other exceptions are mentioned in the descriptions below, and discussed in detail in the documentation for the respective modules.

The modules are presented roughly in order of increasing specialization: the first three are essential for any program that processes BibTeX data files, regardless of what kind of data they hold. The later modules are specialized for use with bibliographic databases, and serve both to emulate BibTeX 0.99's standard styles and to provide an example of how to define a database structure through such specialized modules. Each module is fully documented in its respective manual page.

Text::BibTeX: Loads the two fundamental modules (Entry and File), and provides a number of miscellaneous functions that don't fit anywhere in the class hierarchy.
Text::BibTeX::File: Provides an object-oriented interface to BibTeX database files. In addition to the obvious attributes of filename and filehandle, the ``file'' abstraction manages properties such as the database structure and options for it.
Text::BibTeX::Entry: Provides an object-oriented interface to BibTeX entries, which can be parsed from File objects, arbitrary filehandles, or strings. Manages all the properties of a single entry: type, key, fields, and values. Also serves as the base class for the structured entry classes (described in detail in the Text::BibTeX::Structure manpage).
Text::BibTeX::Value: Provides an object-oriented interface to values and simple values, high-level constructs that can be used to represent the strings associated with each field in an entry. Normally, field values are returned simply as Perl strings, with macros expanded and multiple strings ``pasted'' together. If desired, you can instruct Text::BibTeX to return Text::BibTeX::Value objects, which give you access to the original form of the data.
Text::BibTeX::Structure: Provides the Structure and StructuredEntry classes, which serve primarily as base classes for the two kinds of classes that define database structures. Read this man page for a comprehensive description of the mechanism for implementing Perl classes analogous to BibTeX ``style files''.
Text::BibTeX::Bib: Provides the BibStructure and BibEntry classes, which serve two purposes: they fulfill the same role as the standard style files of BibTeX 0.99, and they give an example of how to write new database structures. These ultimately derive from, respectively, the Structure and StructuredEntry classes provided by the Structure module.
Text::BibTeX::BibSort: One of the BibEntry class's base classes: handles the generation of sort keys for sorting prior to output formatting.
Text::BibTeX::BibFormat: One of the BibEntry class's base classes: handles the formatting of bibliographic data for output in a markup language such as LaTeX.
Text::BibTeX::Name: A class used by the Bib structure and specific to bibliographic data as defined by BibTeX itself: parses individual author names into ``first'', ``von'', ``last'', and ``jr'' parts.
Text::BibTeX::NameFormat: Also specific to bibliographic data: puts split-up names (as parsed by the Name class) back together in a custom way.

For a first time through the library, you'll probably want to confine your reading to the Text::BibTeX::File manpage and the Text::BibTeX::Entry manpage. The other modules will come in handy eventually, especially if you need to emulate BibTeX in a fairly fine grained way (e.g. parsing names, generating sort keys). But for the simple database hacks that are the bread and butter of the Text::BibTeX library, the File and Entry classes are the bulk of what you'll need. You may also find some of the material in this manual page useful, namely CONSTANT VALUES and UTILITY FUNCTIONS.

EXPORTS

The Text::BibTeX module has a number of optional exports, most of them constant values described in CONSTANT VALUES below. The default exports are a subset of these constant values that are used particularly often, the ``entry metatypes'' (also accessible via the export tag metatypes). Thus, the following two lines are equivalent:

   use Text::BibTeX;
   use Text::BibTeX qw(:metatypes);

Some of the various subroutines provided by the module are also exportable. bibloop, split_list, purify_string, and change_case are all useful in everyday processing of BibTeX data, but don't really fit anywhere in the class hierarchy. They may be imported from Text::BibTeX using the subs export tag. check_class and display_list are also exportable, but only by name; they are not included in any export tag. (These two mainly exist for use by other modules in the library.) For instance, to use Text::BibTeX and import the entry metatype constants and the common subroutines:

   use Text::BibTeX qw(:metatypes :subs);

Another group of subroutines exists for direct manipulation of the macro table maintained by the underlying C library. These functions (see Macro table functions, below) allow you to define, delete, and query the value of BibTeX macros (or ``abbreviations''). They may be imported en masse using the macrosubs export tag:

   use Text::BibTeX qw(:macrosubs);

CONSTANT VALUES

The Text::BibTeX module makes a number of constant values available. These correspond to the values of various enumerated types in the underlying C library, btparse, and their meanings are more fully explained in the btparse documentation.

Each group of constants is optionally exportable using an export tag given in the descriptions below.

Entry metatypes: BTE_UNKNOWN, BTE_REGULAR, BTE_COMMENT, BTE_PREAMBLE, BTE_MACRODEF. The metatype method in the Entry class always returns one of these values. The latter three describe, respectively, comment, preamble, and string entries; BTE_REGULAR describes all other entry types. BTE_UNKNOWN should never be seen (it's mainly useful for C code that might have to detect half-baked data structures). See also btparse. Export tag: metatypes.
AST node types: BTAST_STRING, BTAST_MACRO, BTAST_NUMBER. Used to distinguish the three kinds of simple values---strings, macros, and numbers. The SimpleValue class' type method always returns one of these three values. See also the Text::BibTeX::Value manpage, btparse. Export tag: nodetypes.
Name parts: BTN_FIRST, BTN_VON, BTN_LAST, BTN_JR, BTN_NONE. Used to specify the various parts of a name after it has been split up. These are mainly useful when using the NameFormat class. See also bt_split_names and bt_format_names. Export tag: nameparts.
Join methods: BTJ_MAYTIE, BTJ_SPACE, BTJ_FORCETIE, BTJ_NOTHING. Used to tell the NameFormat class how to join adjacent tokens together; see the Text::BibTeX::NameFormat manpage and bt_format_names. Export tag: joinmethods.

UTILITY FUNCTIONS

Text::BibTeX provides several functions that operate outside of the normal class hierarchy. Of these, only bibloop is likely to be of much use to you in writing everyday BibTeX-hacking programs; the other two (check_class and display_list) are mainly provided for the use of other modules in the library. They are documented here mainly for completeness, but also because they might conceivably be useful in other circumstances.

bibloop (ACTION, FILES [, DEST])

Loops over all entries in a set of BibTeX files, performing some caller-supplied action on each entry. FILES should be a reference to the list of filenames to process, and ACTION a reference to a subroutine that will be called on each entry. DEST, if given, should be a Text::BibTeX::File object (opened for output) to which entries might be printed.

The subroutine referenced by ACTION is called with exactly one argument: the Text::BibTeX::Entry object representing the entry currently being processed. Information about both the entry itself and the file where it originated is available through this object; see the Text::BibTeX::Entry manpage. The ACTION subroutine is only called if the entry was successfully parsed; any syntax errors will result in a warning message being printed, and that entry being skipped. Note that all successfully parsed entries are passed to the ACTION subroutine, even preamble, string, and comment entries. To skip these pseudo-entries and only process ``regular'' entries, then your action subroutine should look something like this:

   sub action {
      my $entry = shift;
      return unless $entry->metatype == BTE_REGULAR;
      # process $entry ...
   }

If your action subroutine needs any more arguments, you can just create a closure (anonymous subroutine) as a wrapper, and pass it to bibloop:

   sub action {
      my ($entry, $extra_stuff) = @_;
      # ...
   }

   my $extra = ...;
   Text::BibTeX::bibloop (sub { &action ($_[0], $extra) }, \@files);

If the ACTION subroutine returns a true value and DEST was given, then the processed entry will be written to DEST.

check_class (PACKAGE, DESCRIPTION, SUPERCLASS, METHODS)

Ensures that a PACKAGE implements a class meeting certain requirements. First, it inspects Perl's symbol tables to ensure that a package named PACKAGE actually exists. Then, it ensures that the class named by PACKAGE derives from SUPERCLASS (using the universal method isa). This derivation might be through multiple inheritance, or through several generations of a class hierarchy; the only requirement is that SUPERCLASS is somewhere in PACKAGE's tree of base classes. Finally, it checks that PACKAGE provides each method listed in METHODS (a reference to a list of method names). This is done with the universal method can, so the methods might actually come from one of PACKAGE's base classes.

DESCRIPTION should be a brief string describing the class that was expected to be provided by PACKAGE. It is used for generating warning messages if any of the class requirements are not met.

This is mainly used by the supervisory code in Text::BibTeX::Structure, to ensure that user-supplied structure modules meet the rules required of them.

display_list (LIST, QUOTE)

Converts a list of strings to the grammatical conventions of a human language (currently, only English rules are supported). LIST must be a reference to a list of strings. If this list is empty, the empty string is returned. If it has one element, then just that element is returned. If it has two elements, then they are joined with the string " and " and the resulting string is returned. Otherwise, the list has N elements for N >= 3; elements 1..N-1 are joined with commas, and the final element is tacked on with an intervening ", and ".

If QUOTE is true, then each string is encased in single quotes before anything else is done.

This is used elsewhere in the library for two very distinct purposes: for generating warning messages describing lists of fields that should be present or are conflicting in an entry, and for generating lists of author names in formatted bibliographies.

MISCELLANEOUS FUNCTIONS

In addition to loading the File and Entry modules, Text::BibTeX loads the XSUB code which bridges the Perl modules to the underlying C library, btparse. This XSUB code provides a number of miscellaneous utility functions, most of which are put into other packages in the Text::BibTeX family for use by the corresponding classes. (For instance, the XSUB code loaded by Text::BibTeX provides a function Text::BibTeX::Entry::parse, which is actually documented as the parse method of the Text::BibTeX::Entry class---see the Text::BibTeX::Entry manpage. However, for completeness this function---and all the other functions that become available when you use Text::BibTeX---are at least mentioned here. The only functions from this group that you're ever likely to use are described in Generic string-processing functions.

Startup/shutdown functions

These just initialize and shutdown the underlying C library. Don't call either one of them; the Text::BibTeX startup/shutdown code takes care of it as appropriate. They're just mentioned here for completeness.

initialize ()
cleanup ()

Generic string-processing functions

split_list (STRING, DELIM [, FILENAME [, LINE [, DESCRIPTION]]]): Splits a string on a fixed delimiter according to the BibTeX rules for splitting up lists of names. With BibTeX, the delimiter is hard-coded as "and"; here, you can supply any string. Instances of DELIM in STRING are considered delimiters if they are at brace-depth zero, surrounded by whitespace, and not at the beginning or end of STRING; the comparison is case-insensitive. See bt_split_names for full details of how splitting is done (it's not the same as Perl's split function).; Returns the list of strings resulting from splitting STRING on DELIM.
purify_string (STRING [, OPTIONS]): ``Purifies'' STRING in the BibTeX way (usually for generation of sort keys). See bt_misc for details; note that, unlike the C interface, purify_string does not modify STRING in-place. A purified copy of the input string is returned.; OPTIONS is currently unused.
change_case (TRANFORM, STRING [, OPTIONS]): Transforms the case of STRING according to TRANSFORM (a single character, one of 'u', 'l', or 't'). See bt_misc for details; again, change_case differs from the C interface in that STRING is not modified in-place---the input string is copied, and the transformed copy is returned.

Entry-parsing functions

Although these functions are provided by the Text::BibTeX module, they are actually in the Text::BibTeX::Entry package. That's because they are implemented in C, and thus loaded with the XSUB code that Text::BibTeX loads; however, they are actually methods in the Text::BibTeX::Entry class. Thus, they are documented as methods in the Text::BibTeX::Entry manpage.

parse (ENTRY_STRUCT, FILENAME, FILEHANDLE)
parse_s (ENTRY_STRUCT, TEXT)

Macro table functions

These functions allow direct access to the macro table maintained by btparse, the C library underlying Text::BibTeX. In the normal course of events, macro definitions always accumulate, and are only defined as a result of parsing a macro definition (@string) entry. btparse never deletes old macro definitions for you, and doesn't have any built-in default macros. If, for example, you wish to start fresh with new macros for every file, use delete_all_macros. If you wish to pre-define certain macros, use add_macro_text. (But note that the Bib structure, as part of its mission to emulate BibTeX 0.99, defines the standard ``month name'' macros for you.)

See also bt_macros in the btparse documentation for a description of the C interface to these functions.

add_macro_text (MACRO, TEXT [, FILENAME [, LINE]]): Defines a new macro, or redefines an old one. MACRO is the name of the macro, and TEXT is the text it should expand to. FILENAME and LINE are just used to generate any warnings about the macro definition. The only such warning occurs when you redefine an old macro: its value is overridden, and add_macro_text() issues a warning saying so.
delete_macro (MACRO): Deletes a macro from the macro table. If MACRO isn't defined, takes no action.
delete_all_macros (): Deletes all macros from the macro table.
macro_length (MACRO): Returns the length of a macro's expansion text. If the macro is undefined, returns 0; no warning is issued.
macro_text (MACRO [, FILENAME [, LINE]]): Returns the expansion text of a macro. If the macro is not defined, issues a warning and returns undef. FILENAME and LINE, if supplied, are used for generating this warning; they should be supplied if you're looking up the macro as a result of finding it in a file.

Name-parsing functions

These are both private functions for the use of the Name class, and therefore are put in the Text::BibTeX::Name package. You should use the interface provided by that class for parsing names in the BibTeX style.

_split (NAME_STRUCT, NAME, FILENAME, LINE, NAME_NUM, KEEP_CSTRUCT)
free (NAME_STRUCT)

Name-formatting functions

These are private functions for the use of the NameFormat class, and therefore are put in the Text::BibTeX::NameFormat package. You should use the interface provided by that class for formatting names in the BibTeX style.

create ([PARTS [, ABBREV_FIRST]])
free (FORMAT_STRUCT)
_set_text (FORMAT_STRUCT, PART, PRE_PART, POST_PART, PRE_TOKEN, POST_TOKEN)
_set_options (FORMAT_STRUCT, PART, ABBREV, JOIN_TOKENS, JOIN_PART)
format_name (NAME_STRUCT, FORMAT_STRUCT)

BUGS AND LIMITATIONS

Text::BibTeX inherits several limitations from its base C library, btparse; see btparse/BUGS AND LIMITATIONS for details. In addition, Text::BibTeX will not work with a Perl binary built using the sfio library. This is because Perl's I/O abstraction layer does not extend to third-party C libraries that use stdio, and btparse most certainly does use stdio.

AUTHOR

Greg Ward <gward@python.net>

COPYRIGHT

Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file is part of the Text::BibTeX library. This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

AVAILABILITY

The btOOL home page, where you can get up-to-date information about Text::BibTeX (and download the latest version) is

   http://starship.python.net/~gward/btOOL/

You will also find the latest version of btparse, the C library underlying Text::BibTeX, there. btparse is needed to build Text::BibTeX, and must be downloaded separately.

Both libraries are also available on CTAN (the Comprehensive TeX Archive Network, http://www.ctan.org/tex-archive/) and CPAN (the Comprehensive Perl Archive Network, http://www.cpan.org/). Look in biblio/bibtex/utils/btOOL/ on CTAN, and authors/Greg_Ward/ on CPAN. For example,

   http://www.ctan.org/tex-archive/biblio/bibtex/utils/btOOL/
   http://www.cpan.org/authors/Greg_Ward

will both get you to the latest version of Text::BibTeX and btparse -- but of course, you should always access busy sites like CTAN and CPAN through a mirror.

Text::BibTeX - interface to read and parse BibTeX files