YAML - YAML Ain't Markup Language

YAML TERMINOLOGY
ysh - The YAML Shell
BUGS & DEFICIENCIES
RESOURCES
IMPLEMENTATIONS
AUTHOR
COPYRIGHT

NAME

YAML - YAML Ain't Markup Language (tm)

SYNOPSIS

    use YAML;


    # Load a YAML stream of 3 YAML documents into Perl data structures.
    my ($hashref, $arrayref, $string) = Load(<<'...');
    ---
    name: ingy
    age: old
    weight: heavy
    # I should comment that I also like pink, but don't tell anybody.
    favorite colors:
        - red
        - white
        - blue
    ---
    - Clark Evans
    - Oren Ben-Kiki
    - Brian Ingerson
    --- >
    You probably think YAML stands for "Yet Another Markup Language". It
    ain't! YAML is really a data serialization language. But if you want
    to think of it as a markup, that's OK with me. A lot of people try
    to use XML as a serialization format.


    "YAML" is catchy and fun to say. Try it. "YAML, YAML, YAML!!!"
    ...


    # Dump the Perl data structures back into YAML.
    print Dump($string, $arrayref, $hashref);


    # YAML::Dump is used the same way you'd use Data::Dumper::Dumper
    use Data::Dumper;
    print Dumper($string, $arrayref, $hashref);

DESCRIPTION

The YAML.pm module implements a YAML Loader and Dumper based on the YAML 1.0 specification. http://www.yaml.org/spec/

YAML is a generic data serialization language that is optimized for human readability. It can be used to express the data structures of most modern programming languages. (Including Perl!!!)

For information on the YAML syntax, please refer to the YAML specification.

WHY YAML IS COOL

YAML is readable for people.

It makes clear sense out of complex data structures. You should find that YAML is an exceptional data dumping tool. Structure is shown through indentation, YAML supports recursive data, and hash keys are sorted by default. In addition, YAML supports several styles of scalar formatting for different types of data.

YAML is editable.

YAML was designed from the ground up to be an excellent syntax for configuration files. Almost all programs need configuration files, so why invent a new syntax for each one? And why subject users to the complexities of XML or native Perl code?

YAML is multilingual.

Yes, YAML supports Unicode. But I'm actually referring to programming languages. YAML was designed to meet the serialization needs of Perl, Python, Ruby, Tcl, PHP and Java. It was also designed to be interoperable between those languages. That means any YAML serialization produced by Perl can be processed by Python, and be guaranteed to return the data structure intact. (Even if it contained Perl specific structures like GLOBs)

YAML is taint safe.

Using modules like Data::Dumper for serialization is fine as long as you can be sure that nobody can tamper with your data files or transmissions. That's because you need to use Perl's eval() built-in to deserialize the data. Somebody could add a snippet of Perl to erase your files.

YAML's parser does not need to eval anything.

YAML is full featured.

YAML can accurately serialize all of the common Perl data structures and deserialize them again without losing data relationships. Although it is not 100% perfect (no serializer is or can be perfect), it fares as well as the popular current modules: Data::Dumper, Storable, XML::Dumper and Data::Denter.

YAML.pm also has the ability to handle code (subroutine) references and typeglobs. (Still experimental) These features are not found in Perl's other serialization modules.

YAML is extensible.

The YAML language has been designed to be flexible enough to solve it's own problems. The markup itself has 3 basic construct which resemble Perl's hash, array and scalar. By default, these map to their Perl equivalents. But each YAML node also supports a type (or ``transfer method'') which can cause that node to be interpreted in a completely different manner. That's how YAML can support oddball structures like Perl's typeglob.

YAML.pm plays well with others.

YAML has been designed to interact well with other Perl Modules like POE and Time::Object. (date support coming soon)

USAGE

Exported Functions

The following functions are exported by YAML.pm by default when you use YAML.pm like this:

    use YAML;

To prevent YAML.pm from exporting functions, say:

    use YAML ();

Dump(list-of-Perl-data-structures)

Turn Perl data into YAML. This function works very much like Data::Dumper::Dumper(). It takes a list of Perl data strucures and dumps them into a serialized form. It returns a string containing the YAML stream. The structures can be references or plain scalars.

Load(string-containing-a-YAML-stream)

Turn YAML into Perl data. This is the opposite of Dump. Just like Storable's thaw() function or the eval() function in relation to Data::Dumper. It parses a string containing a valid YAML stream into a list of Perl data structures.

Store()

This function is deprecated, and now refered to as Dump. It is still available for the time being, but will generate a warning if you are using -w. You are using -w, aren't you? :)

The reason for this deprecation is that the YAML spec talks about programs called Loaders and Dumpers. ``Storers'' is too hard to say, I guess...

Exportable Functions

DumpFile(filepath, list)

Writes the YAML stream to a file instead of just returning a string.

LoadFile(filepath)

Reads the YAML stream from a file instead of a string.

Bless(perl-node, [yaml-node | class-name])

Associate a normal Perl node, with a yaml node. A yaml node is an object tied to the YAML::Node class. The second argument is either a yaml node that you've already created or a class (package) name that supports a yaml_dump() function. A yaml_dump() function should take a perl node and return a yaml node. If no second argument is provided, Bless will create a yaml node. This node is not returned, but can be retrieved with the Blessed() function.

Here's an example of how to use Bless. Say you have a hash containing three keys, but you only want to dump two of them. Furthermore the keys must be dumped in a certain order. Here's how you do that:

    use YAML qw(Dump Bless);
    $hash = {apple => 'good', banana => 'bad', cauliflower => 'ugly'};
    print Dump $hash;
    Bless($hash)->keys(['banana', 'apple']);
    print Dump $hash;

produces:

    --- #YAML:1.0
    apple: good
    banana: bad
    cauliflower: ugly
    --- #YAML:1.0
    banana: bad
    apple: good

Bless returns the tied part of a yaml-node, so that you can call the YAML::Node methods. This is the same thing that YAML::Node::ynode() returns. So another way to do the above example is:

    use YAML qw(:all);
    use YAML::Node;
    $hash = {apple => 'good', banana => 'bad', cauliflower => 'ugly'};
    print Dump $hash;
    Bless($hash);
    $ynode = ynode(Blessed($hash));
    $ynode->keys(['banana', 'apple']);
    print Dump $hash;

Blessed(perl-node)

Returns the yaml node that a particular perl node is associated with (see above). Returns undef if the node is not (YAML) blessed.

Dumper()

Alias to Dump(). For Data::Dumper fans.

freeze() and thaw()

Aliases to Dump() and Load(). For Storable fans.

This will also allow YAML.pm to be plugged directly into modules like POE.pm, that use the freeze/thaw API for internal serialization.

Exportable Function Groups

This is a list of the various groups of exported functions that you can import using the following syntax:

    use YAML ':groupname';

all: Imports Dump(), Load(), Store(), DumpFile(), LoadFile(), Bless() and Blessed().
POE: Imports freeze() and thaw().
Storable: Imports freeze() and thaw().

Class Methods

YAML can also be used in an object oriented manner. At this point it offers no real advantage. This interface will be improved in a later release.

new()

New returns a new YAML object. For example:

    my $y = YAML->new;
    $y->Indent(4);
    $y->dump($foo, $bar);

Object Methods

dump(): OO version of Dump().
load(): OO version of Load().

Options

YAML options are set using a group of global variables in the YAML namespace. This is similar to how Data::Dumper works.

For example, to change the indentation width, do something like:

    local $YAML::Indent = 3;

The current options are:

Indent

This is the number of space characters to use for each indentation level when doing a Dump(). The default is 2.

By the way, YAML can use any number of characters for indentation at any level. So if you are editing YAML by hand feel free to do it anyway that looks pleasing to you; just be consistent for a given level.

UseHeader

Default is 1. (true)

This tells YAML.pm whether to use a separator string for a Dump operation. This only applies to the first document in a stream. Subsequent documents must have a YAML header by definition.

UseVersion

Default is 1. (true)

Tells YAML.pm whether to include the YAML version on the separator/header.

The canonical form is:

    --- YAML:1.0

SortKeys

Default is 1. (true)

Tells YAML.pm whether or not to sort hash keys when storing a document.

YAML::Node objects can have their own sort order, which is usually what you want. To override the YAML::Node order and sort the keys anyway, set SortKeys to 2.

AnchorPrefix

Default is ''.

Anchor names are normally numeric. YAML.pm simply starts with '1' and increases by one for each new anchor. This option allows you to specify a string to be prepended to each anchor number.

UseCode

Setting the UseCode option is a shortcut to set both the DumpCode and LoadCode options at once. Setting UseCode to '1' tells YAML.pm to dump Perl code references as Perl (using B::Deparse) and to load them back into memory using eval(). The reason this has to be an option is that using eval() to parse untrusted code is, well, untrustworthy. Safe deserialization is one of the core goals of YAML.

DumpCode

Determines if and how YAML.pm should serialize Perl code references. By default YAML.pm will dump code references as dummy placeholders (much like Data::Dumper). If DumpCode is set to '1' or 'deparse', code references will be dumped as actual Perl code.

DumpCode can also be set to a subroutine reference so that you can write your own serializing routine. YAML.pm passes you the code ref. You pass back the serialization (as a string) and a format indicator. The format indicator is a simple string like: 'deparse' or 'bytecode'.

LoadCode

LoadCode is the opposite of DumpCode. It tells YAML if and how to deserialize code references. When set to '1' or 'deparse' it will use eval(). Since this is potentially risky, only use this option if you know where your YAML has been.

LoadCode can also be set to a subroutine reference so that you can write your own deserializing routine. YAML.pm passes the serialization (as a string) and a format indicator. You pass back the code reference.

UseBlock

YAML.pm uses heuristics to guess which scalar style is best for a given node. Sometimes you'll want all multiline scalars to use the 'block' style. If so, set this option to 1.

NOTE: YAML's block style is akin to Perl's here-document.

ForceBlock

Force every possible scalar to be block formatted. NOTE: Escape characters cannot be formatted in a block scalar.

UseFold

If you want to force YAML to use the 'folded' style for all multiline scalars, then set $UseFold to 1.

NOTE: YAML's folded style is akin to the way HTML folds text, except smarter.

UseAliases

YAML has an alias mechanism such that any given structure in memory gets serialized once. Any other references to that structure are serialized only as alias markers. This is how YAML can serialize duplicate and recursive structures.

Sometimes, when you KNOW that your data is nonrecursive in nature, you may want to serialize such that every node is expressed in full. (ie as a copy of the original). Setting $YAML::UseAliases to 0 will allow you to do this. This also may result in faster processing because the lookup overhead is by bypassed.

THIS OPTION CAN BE DANGEROUS. *If* your data is recursive, this option *will* cause Dump() to run in an endless loop, chewing up your computers memory. You have been warned.

CompressSeries

Default is 1.

Compresses the formatting of arrays of hashes:

    -
      foo: bar
    - 
      bar: foo

becomes:

    - foo: bar
    - bar: foo

Since this output is usually more desirable, this option is turned on by default.

YAML TERMINOLOGY

YAML is a full featured data serialization language, and thus has its own terminology.

It is important to remember that although YAML is heavily influenced by Perl and Python, it is a language in it's own right, not merely just a representation of Perl structures.

YAML has three constructs that are conspicuously similar to Perl's hash, array, and scalar. They are called mapping, sequence, and string respectively. By default, they do what you would expect. But each instance may have an explicit or implicit type that makes it behave differently. In this manner, YAML can be extended to represent Perl's Glob or Python's tuple, or Ruby's Bigint.

stream

A YAML stream is the full sequence of bytes that a YAML parser would read or a YAML emitter would write. A stream may contain one or more YAML documents separated by YAML headers.

    ---
    a: mapping
    foo: bar
    ---
    - a
    - sequence

document

A YAML document is an independent data structure representation within a stream. It is a top level node.

    --- YAML:1.0
    This: top level mapping
    is:
        - a
        - YAML
        - document

node

A YAML node is the representation of a particular data stucture. Nodes may contain other nodes. (In Perl terms, nodes are like scalars. Strings, arrayrefs and hashrefs. But this refers to the serialized format, not the in-memory structure.)

transfer method

This is similar to a type. It indicates how a particular YAML node serialization should be transferred into or out of memory. For instance a Foo::Bar object would use the transfer 'perl/Foo::Bar':

    - !perl/Foo::Bar
        foo: 42
        bar: stool

collection

A collection is the generic term for a YAML data grouping. YAML has two types of collections: mappings and sequences. (Similar to hashes and arrays)

mapping

A mapping is a YAML collection defined by key/value pairs. By default YAML mappings are loaded into Perl hashes.

    a mapping:
        foo: bar
        two: times two is 4

sequence

A sequence is a YAML collection defined by an ordered list of elements. By default YAML sequences are loaded into Perl arrays.

    a sequence:
        - one bourbon
        - one scotch
        - one beer

scalar

A scalar is a YAML node that is a single value. By default YAML scalars are loaded into Perl scalars.

    a scalar key: a scalar value

YAML has many styles for representing scalars. This is important because varying data will have varying formatting requirements to retain the optimum human readability.

simple scalar

This is a single line of unquoted text. All simple scalars are automatic candidates for ``implicit transferring''. This means that their type is determined automatically by examination. Unless they match a set of predetermined YAML regex patterns, they will raise a parser exception. The typical uses for this are simple alpha strings, integers, real numbers, dates, times and currency.

    - a simple string
    - -42
    - 3.1415
    - 12:34
    - 123 this is an error

single quoted scalar

This is similar to Perl's use of single quotes. It means no escaping and no implicit transfer. It must be used on a single line.

    - 'When I say ''\n'' I mean "backslash en"'

double quoted scalar

This is similar to Perl's use of double quotes. Character escaping can be used. There is no implicit transfer and it must still be single line.

    - "This scalar\nhas two lines, and a bell -->\a"

folded scalar

This is a multiline scalar which begins on the next line. It is indicated by a single closing brace. It is unescaped like the single quoted scalar. Line folding is also performed.

    - > 
     This is a multiline scalar which begins on
     the next line. It is indicated by a single
     carat. It is unescaped like the single
     quoted scalar. Line folding is also
     performed.

block scalar

This final multiline form is akin to Perl's here-document except that (as in all YAML data) scope is indicated by indentation. Therefore, no ending marker is required. The data is verbatim. No line folding.

    - |
        QTY  DESC          PRICE  TOTAL
        ---  ----          -----  -----
          1  Foo Fighters  $19.95 $19.95
          2  Bar Belles    $29.95 $59.90

parser

A YAML processor has four stages: parse, load, dump, emit.

A parser parses a YAML stream. YAML.pm's Load() function contains a parser.

loader

The other half of the Load() function is a loader. This takes the information from the parser and loads it into a Perl data structure.

dumper

The Dump() function consists of a dumper and an emitter. The dumper walks through each Perl data structure and gives info to the emitter.

emitter

The emitter takes info from the dumper and turns it into a YAML stream.

NOTE: In YAML.pm the parser/loader and the dumper/emitter code are currently very closely tied together. When libyaml is written (in C) there will be a definite separation. libyaml will contain a parser and emitter, and YAML.pm (and YAML.py etc) will supply the loader and dumper.

For more information please refer to the immensely helpful YAML specification available at http://www.yaml.org/spec/.

ysh - The YAML Shell

The YAML distribution ships with a script called 'ysh', the YAML shell. ysh provides a simple, interactive way to play with YAML. If you type in Perl code, it displays the result in YAML. If you type in YAML it turns it into Perl code.

To run ysh, (assuming you installed it along with YAML.pm) simply type:

    ysh [options]

Please read ysh for the full details. There are lots of options.

BUGS & DEFICIENCIES

If you find a bug in YAML, please try to recreate it in the YAML Shell with logging turned on ('ysh -L'). When you have successfully reproduced the bug, please mail the LOG file to the author (ingy@cpan.org)

WARNING: This is *ALPHA* code.

BIGGER WARNING: This is *TRIAL1* of the YAML 1.0 specification. The YAML syntax may change before it is finalized. Based on past experience, it probably will change. The authors of this spec have worked for over a year putting together YAML 1.0, and we have flipped it on it's syntactical head almost every week. We're a fickle lot, we are. So use this at your own risk!!!

Circular Leaves

YAML is quite capable of serializing circular references. And for the most part it can deserialize them correctly too. One notable exception is a reference to a leaf node containing itself. This is hard to do from pure Perl in any elegant way. The ``canonical'' example is:

    $foo = \$foo;

This serializes fine, but I can't parse it correctly yet. Unfortunately, every wiseguy programmer in the world seems to try this first when you ask them to test your serialization module. Even though it is of almost no real world value. So please don't report this bug unless you have a pure Perl patch to fix it for me.

By the way, similar non-leaf structures Dump and Load just fine:

    $foo->[0] = $foo;

You can test these examples using 'ysh -r'. This option makes sure that the example can be deserialized after it is serialized. We call that ``roundtripping'', thus the '-r'.

Unicode

Unicode is not yet supported. The YAML specification dictates that all strings be unicode, but this early implementation just uses ASCII.

Structured Keys

Python, Java and perhaps others support using any data type as the key to a hash. YAML also supports this. Perl5 only uses strings as hash keys.

YAML.pm can currently parse structured keys, but their meaning gets lost when they are loaded into a Perl hash. Consider this example using the YAML Shell:

    ysh > ---
    yaml> ?
    yaml>  foo: bar
    yaml> : baz
    yaml> ...
    $VAR1 = {
              'HASH(0x1f1d20)' => 'baz'
            };
    ysh >

YAML.pm will need to be fixed to preserve these keys somehow. Why? Because if YAML.pm gets a YAML document from YAML.py it must be able to return it with the Python data intact.

Globs, Subroutines, Regexes and File Handles

As far as I know, other Perl serialization modules are not capable of serializing and deserializing typeglobs, subroutines (code refs), regexes and file handles. YAML.pm has dumping capabilities for all of these. Loading them may produce wild results. Take care.

NOTE: For a (huge) dump of Perl's global guts, try:

    perl -MYAML -e '$YAML::UseCode=1; print Dump \%main::'

To limit this to a single namespace try:

    perl -MCGI -MYAML -e '$YAML::UseCode=1; print Dump \%CGI::'

Speed

This is a pure Perl implementation that has been optimized for programmer readability, not for computational speed.

Neil Watkiss and Clark Evans are currently developing libyaml, the official C implementation of the YAML parser and emitter. YAML.pm will be refactoring to use this library once it is stable. Other languages like Python, Tcl, PHP, Ruby, JavaScript and Java can make use of the same core library.

Please join us on the YAML mailing list if you are interested in implementing something.

https://lists.sourceforge.net/lists/listinfo/yaml-core

Streaming Access

This module Dumps and Loads in one operation. There is no interface for parsing or emitting a YAML stream one node at a time. It's all or nothing.

An upcoming release will have support for incremental parsing. Incremental dumping is harder. Stay tuned.

RESOURCES

Please read the YAML::Node manpage for advanced YAML features.

http://www.yaml.org is the official YAML website.

http://www.yaml.org/spec/ is the YAML 1.0 specification.

http://wiki.yaml.org/spec/ is the official YAML wiki.

YAML has been registered as a Source Forge project. (http://www.sourceforge.net) Currently we are only using the mailing list facilities there.

IMPLEMENTATIONS

This is the first implementation of YAML functionality based on the 1.0 specification.

The following people have shown an interest in doing implementations. Please contact them if you are also interested in writing an implementation.

    ---
    - name:    Neil Watkiss
      project: 
        - libyaml
        - YAML mode for the vim editor
      email:   nwatkiss@ttul.org

    - name:    Brian Ingerson
      project: YAML.pm, libyaml Perl binding
      email:   ingy@ttul.org


    - name:    Clark Evans
      project: libyaml, Python binding
      email:   cce@clarkevans.com

    - name:    Oren Ben-Kiki
      project: Java Loader/Dumper
      email:   orenbk@richfx.com

    - name:    Paul Prescod
      project: YAML Antagonist/Anarchist
      email:   paul@prescod.net

    - name:    Ryan King
      project: YAML test specialist
      email:   rking@panoptic.com

    - name:    Steve Howell
      project: Python and Ruby implementations
      email:   showell@zipcon.net

    - name:    Patrick Leboutillier
      project: Java Loader/Dumper
      email:   patrick_leboutillier@hotmail.com

    - name:    Shane Caraveo
      project: PHP Loader/Dumper
      email:   shanec@activestate.com

    - name:    Brian Quinlan
      project: Python Loader/Dumper
      email:   brian@sweetapp.com

    - name:    Jeff Hobbs
      project: Tcl Loader/Dumper
      email:   jeff@hobbs.org

    - name:    Claes Jacobsson
      project: JavaScript Loader/Dumper
      email:   claes@contiller.se

AUTHOR

Brian Ingerson <INGY@cpan.org> is resonsible for YAML.pm.

The YAML language is the result of a ton of collaboration between Oren Ben-Kiki, Clark Evans and Brian Ingerson. Several others have added help along the way.

Neil Watkiss is pioneering libyaml. Bless that boy!

Ryan King offered much help on the 0.35 release. The XP advocate extraordinaire, help me refactor my entire test suite into its current form. Regression tests are extremely important to the success of this project.

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

YAML - YAML Ain't Markup Language