Data::Dump::Streamer - Accurately serialize a data structure as Perl code. |
Data::Dump::Streamer - Accurately serialize a data structure as Perl code.
use Data::Dump::Streamer; use DDS; # optionally installed alias
Dump($x,$y); # Prints to STDOUT Dump($x,$y)->Out(); # " "
my $o=Data::Dump::Streamer->new(); # Returns a new ... my $o=Dump(); # ... uninitialized object.
my $o=Dump($x,$y); # Returns an initialized object my $s=Dump($x,$y)->Out(); # " a string of the dumped obj my @l=Dump($x,$y); # " a list of code fragments my @l=Dump($x,$y)->Out(); # " a list of code fragments
Dump($x,$y)->To(\*STDERR)->Out(); # Prints to STDERR
Dump($x,$y)->Names('foo','bar') # Specify Names ->Out();
Dump($x,$y)->Indent(0)->Out(); # No indent
Dump($x,$y)->To(\*STDERR) # Output to STDERR ->Indent(0) # ... no indent ->Names('foo','bar') # ... specify Names ->Out(); # Print...
$o->Data($x,$y); # OO form of what Dump($x,$y) does. $o->Names('Foo','Names'); # ... $o->Out(); # ...
Given a list of scalars or reference variables, writes out their contents in perl syntax. The references can also be objects. The contents of each variable is output using the least number of Perl statements as convenient, usually only one. Self-referential structures, closures, and objects are output correctly.
The return value can be evaled to get back an identical copy of the original reference structure. In some cases this may require the use of utility subs that Data::Dump::Streamer will optionally export.
This module is very similar in concept to the core module Data::Dumper, with the major differences being that this module is designed to output to a stream instead of constructing its output in memory (trading speed for memory), and that the traversal over the data structure is effectively breadth first versus the depth first traversal done by the others.
In fact the data structure is scanned twice, first in breadth first mode to perform structural analysis, and then in depth first mode to actually produce the output, but obeying the depth relationships of the first pass.
As of version 1.11 DDS has had the ability to dump closures properly. This means that the lexicals that are bound to the closure are dumped along with the subroutine that uses them. This makes it much easier to debug code that uses closures and to a certain extent provides a persistancy framework for closure based code. The way this works is that DDS figures out what all the lexicals are that are bound to CODE refs it is dumping and then pretends that it had originally been called with all of them as its arguements, (along with the original arguments as well of course.)
One consequence of the way the dumping process works is that all of the recreated subroutines will be in the same scope. This of course can lead to collisions as two subroutines can easily be bound to different variables that have the same name.
The way that DDS resolves these collisions is that it renames one of the
variables with a special name so that presumably there are no collisions.
However this process is very simplistic with no checks to prevent
collisions with other lexicals or other globals that may be used by other
dumped code. In some situations it may be necessary to change the default
value of the rename template which may be done by using the EclipseName
method.
Similarly to the problem of colliding lexicals is the problem of colliding lexicals and globals. DDS pays no attention to globals when dumping closures which can potentially result in lexicals being declared that will eclipse their global namesake. There is currently no way around this other than to avoid accessing a global and a lexical with the same name from the subs being dumped. An example is
my $a = sub { $a++ }; Dump( sub { $a->() } );
which will not be dumped correctly. Generally speaking this kind of thing is bad practice anyway, so this should probably be viewed as a ``feature''. :-)
Generally if the closures being dumped avoid accessing lexicals and
globals with the same name from out of scope and that all of the CODE
being dumped avoids vars with the EclipseName
in their names the dumps
should be valid and should eval back into existance properly.
Note that the behaviour of dumping closures is subject to change in future versions as its possible that I will put some additional effort into more sophisiticated ways of avoiding name collisions in the dump.
While Data::Dump::Streamer is at heart an object oriented module, it is
expected (based on experience with using Data::Dumper)
that the common case will not exploit these features. Nevertheless the
method based approach is convenient and accordingly a compromise hybrid
approach has been provided via the Dump()
subroutine. Such as
Dump($foo); $as_string= Dump($foo)->Out();
All attribute methods are designed to be chained together. This means that when used as set attribute (called with arguments) they return the object they were called against. When used as get attributes (called without arguments) they return the value of the attribute.
From an OO point of view the key methods are the Data()
and Out()
methods. These correspond to the breadth first and depth first traversal,
and need to be called in this order. Some attributes must be set prior
to the Data()
phase and some need only be set before the Out()
phase.
Attributes once set last the lifetime of the object, unless explicitly reset.
This module provides hooks to allow objects to override how they are represented. The basic idea is that a subroutine (or method) is provided which is responsible for the override. The return of the method governs how the object will be represented when dumped, and how it will be restored. The basic calling convention is
my ( $proxy, $thaw, $postop )= $callback->($obj); #or = $obj->$method();
The Freezer()
method controls what methods to use as a default method
and also allows per class overrides. When dumping an object of a given
class the first time it tries to execute the class specific handler if
it is specified, then the user specific generic handler if its been
specified and then ``DDS_freeze''. This means that class authors can
implement a DDS_freeze()
method and their objects will automatically
be serialized as necessary. Note that if either the class specific or
generic handler is defined but false DDS_freeze()
will not be used
even if it is present.
The interface of the Freezer()
handler in detail is as follows:
$obj
$proxy
$obj
. It may be one of
the following values:
undef
(first time only)$proxy
is taken to mean that it should be ignored.
Its like saying IgnoreClass(ref($obj)); Note that undef has a special
meaning when the callback is called the first time.
do{}
to wrap multistatement code.
$thaw
$proxy
representation
into the real thing. It is only relevent when $proxy
is a reference.
/^(->)?((?:\w*::)\w+)(\(\))?$/
in which case it
is taken as a sub name when the string ends in () and a method name
when the string doesnt. If the ->
is present then the sub or method
is called inline. If it is not then the sub or method is called
after the main dump.
$_
to the variable in question.
$postdump
$thaw
but is called in process instead
of being emitted as part of the dump. Any return is ignored.
It is only relevent when $proxy
is a reference.
An example DDS_freeze method is one I had to put together for an object which contained a key whose value was a ref to an array tied to the value of another key. Dumping this got crazy, so I wanted to surpress dumping the tied array. I did it this way:
sub DDS_freeze { my $self=shift; delete $self->{'tie'}; return ($self,'->fix_tie','fix_tie'); }
sub fix_tie { my $self=shift; if ( ! $self->{'tie'} ) { $self->{str}="" unless defined $self->{str}; tie my @a, 'Tie::Array::PackedC', $self->{str}; $self->{'tie'} = \@a; } return $self; }
The $postop
means the object is relatively unaffected after the
dump, the $thaw
says that we should also include the method
inline as we dump. An example dump of an object like this might be
$Foo1=bless({ str=>'' },'Foo')->fix_tie();
Wheras if we omit the ->
then we would get:
$Foo1=bless({ str=>'' },'Foo'); $Foo1->fix_tie();
In our example it wouldn't actually make a difference, but the former style can be nicer to read if the object is embedded in another. However the non arrow notation is slightly more dangerous, in that its possible that the internals of the object will not be fully linked when the method is evaluated. The second form guarantees that the object will be fully linked when the method is evaluated.
See Controlling Hash Traversal and Display Order for a different way to control the representation of hash based objects.
When dumping a hash you may control the order the keys will be output and which keys will be included. The basic idea is to specify a subroutine which takes a hash as an argument and returns a reference to an array containing the keys to be dumped.
You can use the KeyOrder() routine or the SortKeys() routine to specify the sorter to be used.
The routine will be called in the following way:
( $key_array, $thaw ) = $sorter->($hash,($pass=0),$addr,$class); ( $key_array,) = $sorter->($hash,($pass=1),$addr,$class);
$hash
is the hash to be dumped, $addr
is the refaddr()
of the
$hash
, and $class
will be set if the hash has been blessed.
When $pass
is 0 the $thaw
variable may be supplied as well as the
keyorder. If it is defined then it specifies what thaw action to perform
after dumping the hash. See $thaw
in Controlling Object Representation for details as to how it works. This allows an object
to define those keys needed to recreate itself properly, and a followup
hook to recreate the rest.
Note that if a Freezer() method is defined and returns
a $thaw
then the $thaw
returned by the sorter
will override it.
By default Data::Dump::Streamer will ``run length encode'' array values.
This means that when an array value is simple (ie, its not referenced and
does contain a reference) and is repeated mutliple times the output will
be single a list multiplier statement, and not each item output
seperately. Thus: Dump([0,0,0,0])
will be output somthing like
$ARRAY1 = [ (0) x 4 ];
This is particularly useful when dealing with large arrays that are only partly filled, and when accidentally the array has been made very large, such as with the improper use of pseudo-hash notation.
To disable this feature you may set the Rle() property to FALSE, by default it is enabled and set to TRUE.
Its possible to have an alias to Data::Dump::Streamer created and installed for easier useage in one liners and short scripts. Data::Dump::Streamer is a bit long to type sometimes. However because this technically means polluting the root level namespace, and having it listed on CPAN, I have elected to have the installer not install it by default. If you wish it to be installed you must explicitly state so when Makefile.Pl is run:
perl Makefile.Pl DDS [Other MakeMaker options]
Then a normal 'make test, make install' invocation will install DDS.
Using DDS is identical to Data::Dump::Streamer.
You can also specify an alias at use-time, then use that alias in the rest of your program, thus avoiding the permanent (but modest) namespace pollution of the previous method.
use Data::Dumper::Streamer as => 'DDS';
# or if you prefer use Data::Dumper::Streamer; import Data::Dumper::Streamer as => 'DDS';
You can use any alias you like, but that doesn't mean you should.. Folks doing as => 'DBI' will be mercilessly ridiculed.
If PadWalker 1.0 is installed you can use DumpLex()
to try to
automatically determine the names of the vars being dumped. As
long as the vars being dumped have my or our declarations in scope
the vars will be correctly named. Padwalker will also be used
instead of the B:: modules when dumping closures when it is available.
For drop in compatibility with the Dumper()
usage of Data::Dumper, you may
request that the Dumper() method is exported. It will not be exported by
default. In addition the standard Data::Dumper::Dumper() may be exported
on request as DDumper
. If you provide the tag :Dumper
then both will
be exported.
See Dump()
for a better way to do things.
This routine behaves very differently depending on the context it is called in and whether arguments are provided.
If called with no arguments it is exactly equivelent to calling
Data::Dump::Streamer->new()
which means it returns an object reference.
If called with arguments and in scalar context it is equivelent to calling
Data::Dump::Streamer->new()->Data(@vals)
except that the actual depth first traversal is delayed until Out()
is called. This means that options that must be provided before the
Data()
phase can be provided after the call to Dump()
. Again, it
returns a object reference.
If called with arguments and in void or list context it is equivelent to calling
Data::Dump::Streamer->new()->Data(@vals)->Out()
The reason this is true in list context is to make
print Dump(...),"\n";
do the right thing. And also that combined with
method chaining options can be added or removed as required quite easily
and naturally.
So to put it short:
my $obj=Dump($x,$y); # Returns an object my $str=Dump($x,$y)->Out(); # Returns a string of the dump. my @code=Dump($x,$y); # Returns a list of the dump.
Dump($x,$y); # prints the dump. print Dump($x,$y); # prints the dump.
It should be noted that the setting of $\
will affect the behaviour of
both of
Dump($x,$y); print Dump($x,$y);
but it will not affect the behaviour of
print scalar Dump($x,$y);
Note As of 1.11 Dump also works as a method, with identical properties
as when called as a subroutine, with the exception that when called with
no arguments it is a synonym for Out()
. Thus
$obj->Dump($foo)->Names('foo')->Out();
will work fine, as will the odd looking:
$obj->Dump($foo)->Names('foo')->Dump();
which are both the same as
$obj->Names('foo')->Data($foo)->Out();
Hopefully this should make method use more or less DWIM.
If called with arguments then the internal object state is reset before scanning the list of arguments provided.
If called with no arguments then whatever arguments were provided to Dump()
will be scanned.
Returns $self.
Data()
and then
printed, if called with no values then whatever was scanned last with
Data()
or Dump()
is printed.
If the To()
attribute was provided then will dump to whatever object
was specified there (any object, including filehandles that accept the
print()
method), and will always return $self.
If the To()
attribute was not provided then will use an internal
printing object, returning either a list or scalar or printing to STDOUT
in void context.
This routine is virtually always called without arguments as the last method in the method chain.
Dump->Arguments(1)->Out(@vars); $obj->Data(@vars)->Out(); Dump(@vars)->Out; Data::Dump::Streamer->Out(@vars);
All should DWIM.
If no names are provided then names are generated automatically based on the type of object being dumped, with abreviations applied to compound class names.
If called with arguments then returns the object itself, otherwise in list context returns the list of names in use, or in scalar context a reference or undef. In void context with no arguments the names are cleared.
NOTE:
Must be called before Data()
is called.
Purity(1)
but more
accurate.
When Purity()
is set to FALSE aliases will be output with a function call
wrapper of 'alias_to' whose argument will be the value the item is an
alias to. This wrapper does nothing, and is only there as a visual cue.
Likewise, 'make_ro' will be output when the value was readonly, and again
the effect is cosmetic only.
If a filehandle is specified then it is used until it is explicitly changed, or the object is destroyed.
Defaults to False.
Default is Indent(2)
If indent is False then no indentation is done, and all optional whitespace. is omitted. See <OptSpace()|/OptSpace> for more details.
Defaults to True.
Newlines are appended to each statement regardless of this value.
Indent()
and Indentkeys are True then hashes with more than one key
value pair are dumped such that the keys and values line up. Note however
this means each key has to be quoted twice. Not advised for very large
data structures. Additional logic may enhance this feature soon.
Defaults to True.
NOTE:
Must be set before Data()
is called.
If Indent is set to 0 then this value is automatically set to the empty string. When Indent is set back to a non zero value the old value will be restored if it has not been changed from the empty string in the intervening time.
TYPE_OR_OBJ may be a string representing a class, or ``'' for representing unblessed objects, or it maybe a reference to a hash.
VALUE may be a string representing one of built in sort mechanisms, or it may be a reference to a subroutine, or a method name if TYPE_OR_OBJ is not an object.
The built in sort mechanisms are 'aphabetical'/'lexical', 'numeric', 'smart'/'intelligent' and 'each'.
If VALUE is omitted returns the current value for the given type.
If TYPE_OR_OBJ is omitted or FALSE it defaults to ``'' which represents unblessed hashes.
See Controlling Hash Traversal and Display Order for more details.
$self->KeyOrder( "", @_ );
If Verbose if False then a simple placeholder saying 'A' or 'R' is provided. (In most situations perl requires a placeholder, and as such one is always provided, even if technically it could be omitted.)
This setting does not change the followup statements that fix up the structure, and does not result in a loss of accuracy, it just makes it a little harder to read. OTOH, it means dumps can be quite a bit smaller and less noisy.
Defaults to True.
NOTE:
Must be set before Data()
is called.
Defaults to True
NOTE:
Must be set before Data()
is called.
CodeStub()
.
Caveat Emptor, dumping subroutine references is hardly a secure act, and it is provided here only for convenience.
Note using this routine is at your own risk as of DDS 1.11, how it interacts with the newer advanced closure dumping process is undefined.
"%s_eclipse_%d"
where the ``%s'' represents the name of the var being eclipsed, and the ``%d'' a counter to ensure all such mappings are unique.
new()
when dumping a CODE ref. If passed a list of scalars the list is used as
the arguments. If passed an array reference then this array is assumed to
contain a list of arguments. If no arguments are provided returns a an
array ref of arguments in scalar context, and a list of arguments in list
context.
Note using this routine is at your own risk as of DDS 1.11, how it interacts with the newer advanced closure dumping process is undefined.
Defaults to 'sub { Carp::confess ``Dumped code stub!'' }'
Defaults to 'do{ local *F; eval ``format F =\nFormat Stub\n.\n''; *F{FORMAT} }'
Defaults to True.
x
operator.
What this means is that if an array contains repeated elements then
instead of outputting each and every one a list multiplier will be output.
This means that considerably less space is taken to dump redundant data.
If ACTION is false it indicates that the given CLASS should not have any serilization hooks called.
If ACTION is a string then it is taken to be the method name that
will be executed to freeze the object. CLASS->can(METHOD)
must return
true or the setting will be ignored.
If ACTION is a code ref it is executed with the object as the argument.
When called with no arguments returns in scalar context the generic serialization method (defaults to 'DDS_freeze'), in list context returns the generic serialization method followed by a list of pairs of Classname=>ACTION.
If the action executes a sub or method it is expected to return a list of three values:
( $proxy, $thaw, $postdump )=$obj->DDS_Freeze();
See Controlling Object Representation for more details.
NOTE:
Must be set before Data()
is called.
If called with no args returns a list of items ignored (using the refaddr to represent objects). If called with a single argument returns whether that argument is ignored. If called with more than one arguments then expects a list of pairs of object => is_ignored.
Returns $self when setting.
NOTE:
Must be set before Data()
is called.
my $prelude_code=$compressor->(); # no arguments. my $code=$compressor->('string'); # string argument
The sub will be called with no arguments at the beginning of the dump to allow any require statments or similar to be added. During the dump the sub will be called with a single argument when compression is required. The code returned in this case is expected to be an EXPR that will evaluate back to the original string.
By default DDS will use the Compress::Zlib manpage in conjunction with the MIME::Base64 manpage to do compression and encoding, and exposes the 'usqz' subroutine for handling the decoding and decompression.
The abbreviated name was chosen as when using the default compressor every string will be represented by a string like
usqz('....')
Meaning that eight characters are required without considering the data itself. Likewise Base64 was chosen because it is a representation that is high-bit safe, compact and easy to quote. Escaped strings are much less efficient for storing binary data.
As mentioned in Verbose there is a notation used to make understanding the output easier. However at first glance it can probably be a bit confusing. Take the following example:
my $x=1; my $y=[]; my $array=sub{\@_ }->( $x,$x,$y ); push @$array,$y,1; unshift @$array,\$array->[-1]; Dump($array);
Which prints (without the comments of course):
$ARRAY1 = [ 'R: $ARRAY1->[5]', # resolved by fix 1 1, 'A: $ARRAY1->[1]', # resolved by fix 2 [], 'V: $ARRAY1->[3]', # resolved by fix 3 1 ]; $ARRAY1->[0] = \$ARRAY1->[5]; # fix 1 alias_av(@$ARRAY1, 2, $ARRAY1->[1]); # fix 2 $ARRAY1->[4] = $ARRAY1->[3]; # fix 3
The first entry, 'R: $ARRAY1->[5]'
indicates that this slot in the
array holds a reference to the currently undefined $ARRAY1->[5]
,
and as such the value will have to be provided later in what the author
calls 'fix' statements. The third entry 'A: $ARRAY1->[1]'
indicates
that is element of the array is in fact the exact same scalar as exists in
$ARRAY1->[1]
, or is in other words, an alias to that variable.
Again, this cannot be expressed in a single statment and so generates
another, different, fix statement. The fifth entry 'V: $ARRAY1->[3]'
indicates that this slots holds a value (actually a reference value)
that is identical to one elsewhere, but is currently undefined. In this
case it is because the value it needs is the reference returned by the
anonymous array constructer in the fourth element ($ARRAY1->[3]
).
Again this results in yet another different fix statement. If Verbose()
is off then only a 'R' 'A' or 'V' tag is emitted as a marker of some form
is necessary.
All of this specialized behaviour can be bypassed by setting Purity()
to
FALSE, in which case the output will look very similar to what
Data::Dumper outputs in low Purity setting.
In a later version I'll try to expand this section with more examples.
Data::Dumper is much faster than this module for many things. However IMO it is less readable, and definately less accurate. YMMV.
By default exports the Dump()
command. Or may export on request the same
command as Stream(). A Data::Dumper::Dumper compatibility routine is
provided via requesting Dumper and access to the real Data::Dumper::Dumper
routine is provided via DDumper. The later two are exported together with
the :Dumper tag.
Additionally there are a set of internally used routines that are exposed. These are mostly direct copies of routines from Array::RefElem, Lexical::Alias and Scalar::Util, however some where marked have had their semantics slightly changed, returning defined but false instead of undef for negative checks, or throwing errors on failure.
The following XS subs (and tagnames for various groupings) are exportable on request.
:Dumper Dumper DDumper
:undump # Collection of routines needed to undump something alias_av # aliases a given array value to a scalar alias_hv # aliases a given hashes value to a scalar alias_ref # aliases a scalar to another scalar make_ro # makes a scalar read only lock_keys # pass through to Hash::Util::lock_keys lock_keys_plus # like lock_keys, but adds keys to those present lock_ref_keys # like lock_keys but operates on a hashref lock_ref_keys_plus # like lock_keys_plus but operates on a hashref dualvar # make a variable with different string/numeric # representation alias_to # pretend to return an alias, used in low # purity mode to indicate a value is actually # an alias to something else.
:alias # all croak on failure alias_av(@Array,$index,$var); alias_hv(%hash,$key,$var); alias_ref(\$var1,\$var2); push_alias(@array,$var);
:util blessed($var) #undef or a class name. isweak($var) #returns true if $var contains a weakref reftype($var) #the underlying type or false but defined. refaddr($var) #a references address refcount($var) #the number of times a reference is referenced sv_refcount($var) #the number of times a scalar is referenced. weak_refcount($var) #the number of weakrefs to an object. #sv_refcount($var)-weak_refcount($var) is the true #SvREFCOUNT() of the var. looks_like_number($var) #if perl will think this is a number.
regex($var) # In list context returns the pattern and the modifiers, # in scalar context returns the pattern in (?msix:) form. # If not a regex returns false. readonly($var) # returns whether the $var is readonly weaken($var) # cause the reference contained in var to become weak. make_ro($var) # causes $var to become readonly, returns the value of $var. reftype_or_glob # returns the reftype of a reference, or if its not # a reference but a glob then the globs name refaddr_or_glob # similar to reftype_or_glob but returns an address # in the case of a reference. globname # returns an evalable string to represent a glob, or # the empty string if not a glob. :all # (Dump() and Stream() and Dumper() and DDumper() # and all of the XS) :bin # (not Dump() but all of the rest of the XS)
By default exports only Dump(), DumpLex() and DumpVars(). Tags are provided for exporting 'all' subroutines, as well as 'bin' (not Dump()), 'util' (only introspection utilities) and 'alias' for the aliasing utilities. If you need to ensure that you can eval the results (undump) then use the 'undump' tag.
Code with this many debug statements is certain to have errors. :-)
Please report them with as much of the error output as possible.
Be aware that to a certain extent this module is subject to whimsies of your local perl. The same code may not produce the same dump on two different installs and versions. Luckily these dont seem to pop up often.
Yves Orton, yves at cpan org.
Copyright (C) 2003-2005 Yves Orton
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Contains code derived from works by Gisle Aas, Graham Barr, Jeff Pinyan, Richard Clamp, and Gurusamy Sarathy.
Thanks to Dan Brook, Yitzchak Scott-Thoennes, eric256, Joshua ben Jore, Jim Cromie, Curtis ``Ovid'' Poe and anybody that I've forgotten for patches, feedback and ideas.
the Data::Dumper manpage - the mother of them all
the Data::Dumper::Simple manpage - Auto named vars with source filter interface.
the Data::Dumper::Names manpage - Auto named vars without source filtering.
the Data::Dumper::EasyOO manpage - easy to use wrapper for DD
the Data::Dump manpage - Has cool feature to squeeze data
the Data::Dump::Streamer manpage - The best perl dumper. But I would say that. :-)
the Data::TreeDumper manpage - Non perl output, lots of rendering options
And of course www.perlmonks.org and the perl manpage itself.
Data::Dump::Streamer - Accurately serialize a data structure as Perl code. |