PDF::API2::Basic::PDF::File - Holds the trailers and cross-reference tables for a PDF file


PDF::API2::Basic::PDF::File - Holds the trailers and cross-reference tables for a PDF file


 $p = PDF::API2::Basic::PDF::File->open("filename.pdf", 1);
 $p->release;       # IMPORTANT!


This class keeps track of the directory aspects of a PDF file. There are two parts to the directory: the main directory object which is the parent to all other objects and a chain of cross-reference tables and corresponding trailer dictionaries starting with the main directory object.


Within this class hierarchy, rather than making everything visible via methods, which would be a lot of work, there are various instance variables which are accessible via associative array referencing. To distinguish instance variables from content variables (which may come from the PDF content itself), each such variable will start with a space.

Variables which do not start with a space directly reflect elements in a PDF dictionary. In the case of a PDF::API2::Basic::PDF::File, the elements reflect those in the trailer dictionary.

Since some variables are not designed for class users to access, variables are marked in the documentation with (R) to indicate that such an entry should only be used as read-only information. (P) indicates that the information is private and not designed for user use at all, but is included in the documentation for completeness and to ensure that nobody else tries to use it.

This variable allows the user to create a new root entry to occur in the trailer dictionary which is output when the file is written or appended. If you wish to over-ride the root element in the dictionary you have, use this entry to indicate that without losing the current Root entry. Notice that newroot should point to a PDF level object and not just to a dictionary which does not have object status.

Contains the filehandle used to read this information into this PDF directory. Is an IO object.

fname (R)
This is the filename which is reflected by INFILE, or the original IO object passed in.

update (R)
This indicates that the read file has been opened for update and that at some point, $p->appendfile() can be called to update the file with the changes that have been made to the memory representation.

maxobj (R)
Contains the first useable object number above any that have already appeared in the file so far.

outlist (P)
This is a list of Objind which are to be output when the next appendfile or outfile occurs.

firstfree (P)
Contains the first free object in the free object list. Free objects are removed from the front of the list and added to the end.

lastfree (P)
Contains the last free object in the free list. It may be the same as the firstfree if there is only one free object.

objcache (P)
All objects are held in the cache to ensure that a system only has one occurrence of each object. In effect, the objind class acts as a container type class to hold the PDF object structure and it would be unfortunate if there were two identical place-holders floating around a system.

epos (P)
The end location of the read-file.

Each trailer dictionary contains a number of private instance variables which hold the chain together.

loc (P)
Contains the location of the start of the cross-reference table preceding the trailer.

xref (P)
Contains an anonymous array of each cross-reference table entry.

prev (P)
A reference to the previous table. Note this differs from the Prev entry which is in PDF which contains the location of the previous cross-reference table.



Releases ALL of the memory used by the PDF document and all of its component objects. After calling this method, do NOT expect to have anything left in the PDF::API2::Basic::PDF::File object (so if you need to save, be sure to do it before calling this method).

NOTE, that it is important that you call this method on any PDF::API2::Basic::PDF::File object when you wish to destruct it and free up its memory. Internally, PDF files have an enormous number of cross-references and this causes circular references within the internal data structures. Calling 'release()' forces a brute-force cleanup of the data structures, freeing up all of the memory. Once you've called this method, though, don't expect to be able to do anything else with the PDF::API2::Basic::PDF::File object; it'll have no internal state whatsoever.

Developer note: As part of the brute-force cleanup done here, this method will throw a warning message whenever unexpected key values are found within the PDF::API2::Basic::PDF::File object. This is done to help ensure that any unexpected and unfreed values are brought to your attention so that you can bug us to keep the module updated properly; otherwise the potential for memory leaks due to dangling circular references will exist.


Appends the objects for output to the read file and then appends the appropriate tale.

($value, $str) = $p->readval($str, %opts)

Reads a PDF value from the current position in the file. If $str is too short then read some more from the current location in the file until the whole object is read. This is a recursive call which may slurp in a whole big stream (unprocessed).

Returns the recursive data structure read and also the current $str that has been read from the file.

$p->copy($outpdf, \&filter)

Iterates over every object in the file reading the object, calling filter with the object and outputting the result. if filter is not defined, then just copies input to output.

$offset = $p->locate_obj($num, $gen)

Returns a file offset to the object asked for by following the chain of cross reference tables until it finds the one you want.


Martin Hosken Martin_Hosken@sil.org

Copyright Martin Hosken 1999 and onwards

No warranty or expression of effectiveness, least of all regarding anyone's safety, is implied in this software or documentation.


This Perl Text::PDF module is licensed under the Perl Artistic License.

 PDF::API2::Basic::PDF::File - Holds the trailers and cross-reference tables for a PDF file