PDL::IO::FlexRaw -- A flexible binary i/o format for PerlDL.


PDL::IO::FlexRaw -- A flexible binary i/o format for PerlDL.


        use PDL;
        use PDL::IO::FlexRaw;
        ($x,$y,...) = readflex("filename" [, $hdr])
        ($x,$y,...) = mapflex("filename" [, $hdr] [, $opts])
        $hdr = writeflex($file, $pdl1, $pdl2,...)
        writeflexhdr($file, $hdr)


FlexRaw is a generic method for the input and output of `raw' data arrays. In particular, it is designed to read output from FORTRAN 77 UNFORMATTED files and the low-level C write function, even if the files are compressed or gzipped. As in FastRaw, the data file is supplemented by a header file (although this can be replaced by the optional $hdr argument). More information can be included in the header file than for FastRaw -- the description can be extended to several data objects within a single input file.

For example, to read the output of a FORTRAN program

        real*4 a(4,600,600)
        open (8,file='banana',status='new',form='unformatted')
        write (8) a
        close (8)

the header file (`banana.hdr') could look like

        # FlexRaw file header
        # Header word for F77 form=unformatted
        Byte 1 4
        # Data
        Float 3            # this is ignored
                 4 600 600
        Byte 1 4           As is this, as we've got all dims

The data can then be input using

        $a = (readflex('banana'))[1];

The format of the hdr file is an extension of that used by FastRaw. Comment lines (starting with #) are allowed, as are descriptive names (as elsewhere: byte, short, ushort, long, float, double) for the data types -- note that case is ignored by FlexRaw. After the type, one integer specifies the number of dimensions of the data `chunk', and subsequent integers the size of each dimension. So the specifier above (`Float 3 4 600 600') describes our FORTRAN array. A scalar can be described as `float 0' (or `float 1 1', or `float 2 1 1', etc.). When all the dimensions are read -- or a # appears after whitespace -- the rest of the current input line is ignored.

What about the extra 4 bytes at the head and tail, which we just threw away? These are added by FORTRAN (at least on Suns, Alphas and Linux), and specify the number of bytes written by each WRITE -- the same number is put at the start and the end of each chunk of data. You may need to know all this in some cases. In general, FlexRaw tries to handle it itself, if you simply add a line saying `f77' to the header file, before any data specifiers:

        # FlexRaw file header for F77 form=unformatted
        # Data
        Float 3
        4 600 600

-- the redundancy in FORTRAN data files even allows FlexRaw to automatically deal with files written on other machines which use back-to-front byte ordering. This won't always work -- it's a 1 in 4 billion chance it won't, even if you regularly read 4Gb files! Also, it currently doesn't work for compressed files, so you can say `swap' (again before any data specifiers) to make certain the byte order is swapped.

The optional $hdr argument allows the use of an anonymous array to give header information, rather than using a .hdr file. For example,

        $header = [
            {Type => 'f77'},
            {Type => 'float', NDims => 3, Dims => [ 4,600,600 ] }
        @a = readflex('banana',$header);

reads our example file again. As a special case, when NDims is 1, Dims may be given as a scalar.

Within PDL, readflex and writeflex can be used to write several pdls to a single file -- e.g.

        use PDL;
        use PDL::IO::FastRaw;
        @pdls = ($pdl1, $pdl2, ...);
        $hdr = writeflex("fname",@pdls);
        @pdl2 = readflex("fname",$hdr);
        @pdl3 = readflex("fname");

-- writeflex produces the data file and returns the file header as an anonymous hash, which can be written to a .hdr file using writeflexhdr.

The reading of compressed data is switched on automatically if the filename requested ends in .gz or .Z, or if the originally specified filename does not exist, but one of these compressed forms does.

If writeflex and readflex are given a reference to a file handle as a first parameter instead of a filename, then the data is read or written to the open filehandle. This gives an easy way to read an arbitrary slice in a big data volume, as in the following example:

        use PDL;
        use PDL::IO::FastRaw;
        open(DATA, "raw3d.dat");
        # assume we know the data size from an external source
        ($width, $height, $data_size) = (256,256, 4);
        my $slice_num = 64;   # slice to look at
        # Seek to slice
        seek(DATA, $width*$height*$data_size * $slice_num, 0);
        $pdl = readflex \*DATA, [{Dims=>[$width, $height], Type=>'long'}];

WARNING: In later versions of perl (5.8 and up) you must be sure that your file is in ``raw'' mode (see the perlfunc man page entry for ``binmode'', for details). Both readflex and writeflex automagically switch the file to raw mode for you -- but in code like the snipped above, you could end up seeking the wrong byte if you forget to make the binmode() call.

Mapflex memory maps, rather than reads, the data files. Its interface is similar to `readflex'. Extra options specify if the data is to be loaded `ReadOnly', if the data file is to be `Creat'-ed anew on the basis of the header information or `Trunc'-ated to the length of the data read. The extra speed of access brings with it some limitations: mapflex won't read compressed data, auto-detect f77 files or read f77 files written by more than a single unformatted write statement. More seriously, data alignment constraints mean that mapflex cannot read some files, depending on the requirements of the host OS (it may also vary depending on the setting of the `uac' flag on any given machine). You may have run into similar problems with common blocks in FORTRAN.

For instance, floating point numbers may have to align on 4 byte boundaries -- if the data file consists of 3 bytes then a float, it cannot be read. Mapflex will warn about this problem when it occurs, and return the PDLs mapped before the problem arose. This can be dealt with either by reorganizing the data file (large types first helps, as a rule-of-thumb), or more simply by using `readflex'.


The test on two dimensional byte arrays fail using g77 2.7.2, but not Sun f77. I hope this isn't my problem!

Assumes gzip is on the PATH.

Can't auto-swap compressed files, because it can't seek on them.

The header format may not agree with that used elsewhere.

Should it handle handles?

Mapflex should warn and fallback to reading on SEGV? Would have to make sure that the data was written back after it was `destroyed'.



Read a binary file with flexible format specification

 ($x,$y,...) = readflex("filename" [, $hdr])
 ($x,$y,...) = readflex(FILEHANDLE [, $hdr])


Write a binary file with flexible format specification

  $hdr = writeflex($file, $pdl1, $pdl2,...)
  $hdr = writeflex(FILEHANDLE, $pdl1, $pdl2,...)


Memory map a binary file with flexible format specification

 ($x,$y,...) = mapflex("filename" [, $hdr] [, $opts])


Copyright (C) Robin Williams <rjrw@ast.leeds.ac.uk> 1997. All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation under certain conditions. For details, see the file COPYING in the PDL distribution. If this file is separated from the PDL distribution, the copyright notice should be included in the file.

 PDL::IO::FlexRaw -- A flexible binary i/o format for PerlDL.