PDF::Core - Core Library for PDF library


PDF::Core - Core Library for PDF library


  use PDF::Core;
  $pdf=PDF::Core->new ;
  $res= $pdf->GetObject($ref);
  $name = UnQuoteName($pdfname);                                                          
  $string = UnQuoteString($pdfstring);
  $pdfname = QuoteName($name);                                                    
  $pdfhexstring = QuoteHexString($string);
  $pdfstring = QuoteString($string);
  $obj = PDFGetPrimitive (filehandle, \$offset);
  $line = PDFGetLine (filehandle, \$offset);


The main purpose of the PDF::Core library is to provide the data structure and the constructor for the more general PDF library.

Helper functions

This functions are not part of the class, but perform useful services.

UnQuoteName ( string )

This function processes quoted characters in a PDF-name. PDF-names returned by GetObject are already processed by this function.

Returns a string.

UnQuoteString ( string )

This function extracts the text from PDF-strings and PDF-hexstrings. It will process all quoted characters and remove the enclosing braces.

WARNING: The current version doesn't handle unicode strings properly.

Returns a string.

QuoteName ( string )

This function quotes problematic characters in a PDF-name. This function should be used before writing a PDF-name back to a PDF-file.

Returns a string.

QuoteHexString ( string )

This function translates a string into a PDF-hexstring.

Returns a string.

QuoteString ( string )

This function translates a string into a PDF-string. Problematic character will be quoted.

WARNING: The current version doesn't handle unicode strings properly.

Returns a string.

PDFGetPrimitive ( filehandle, offset )

This internal function is used while parsing a PDF-file. If you are not writing extentions for this library and are parsing some special parts of the PDF-file, stay away and use GetObject instead.

This function has many quirks and limitations. Check the source for details.

PDFGetline ( filehandle, offset )

This internal function was used to read a line from a PDF-file. It has many limitations and you should stay away from it, if you don't know what you are doing. Use GetObject or PDFGetPrimitive instead.


new ( [ filename ] )

This is the constructor of a new PDF object. If the filename is missing, it returns an empty PDF descriptor ( can be filled with $pdf->TargetFile). Otherwise, It acts as the PDF::Parse::TargetFile method.


The available methods are:

GetObject (reference)

This methods returns the PDF-object for reference. The string reference must match the regular expression /^\d+ \d+ R$/, where the first number is the object number, the second number the generation number.

The return value is a PDF-primitive, the type depends on the content of the object:

The object could not be found or an error. Not all referenced objects need to be present in a PDF-file. This value can be ignored.

Hash Reference
If (UNIVERSAL::isa ($retval, ``HASH'') is true, the object is a PDF-dictionary. The keys of the hash should be either a PDF name (eg: /MediaBox) or a generated value like Stream_Offset. Everything else is an error.

The values of the hash can be any PDF-primitive, including PDF-arrays and other dictionaries.

This is the most common value returned by GetObject. If the key Stream_Offset exists, the dictionary is followed by stream data, starting at the file offeset indicated by this value.

Array Reference
If (UNIVERSAL::isa ($retval, ``ARRAY'') is true, the object is a PDF-array. Each element may be of a different type, and may contain further references to arrays or any other PDF-primitive.

String matching /^\d+ \d+ R$/
This is a reference to another PDF-Object. This value can be passed to GetObject. This kind of value may appear instead of most other types. Some PDF-writing programs seem to have special fun writing references when a simple number is expected. If the final number is need, use code like this to resolve references:

while ($len =~ m/^\d+ \d+ R$/) {$len = $self->GetObject ($len); }

Example: 22 0 R

String matching /^\//
This is a Name in a PDF dictionary. This string is already processed by UnQuotName and may differ from the value in the PDF-file. In some very old andstrange non-standard PDF-files, this may lead to confusion.

Example: /MediaBox

String matching /^\(.*\)$/
This is a string. It may contain newlines, quoted characters und other strange stuff. Use PDF::UnQuoteString to extract the text.

Example: (This is\na string with two \(2\) lines.)

String matching /^<.*>$/
This is a hex encoded string. Use PDF::UnQuoteString to extract the text.

Example: <48 45 4c4C4 F1c>

String matching /^[\d.\+\-]+$/
This is probably a number.

Example: 611

String matching none of the above
this is either a PDF bareword (eg. true, false, ...) or a value generated by this method like Stream_Offset.

Example: true

To improve performance GetObject uses an internal cache for objects. Repeated requests for the same objects are not read form the file but satisfied from the cache. With the Variable $PDF::Core::UseObjectCache, the caching mechanism can be turned off.


Special care must be taken, when returned objects are modified. If the object contains sub-objects, the sub-objects are not duplicated and all changes affect all other copies of this object. Use your own copy, if you need to modify those values.


Available variables are:

Contains the version of the library installed

If this variable is true, all processed objects will be added to the object cache. If only header information of a PDF are read or very big PDF are processed, turning off the cache reduces the memory usage.


  Copyright (c) 1998 - 2000 Antonio Rosella Italy antro@tiscalinet.it, Johannes Blach dw235@yahoo.com

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


The latest version of this library is likely to be available from:


 PDF::Core - Core Library for PDF library