Text::xSV - read character separated files |
Text::xSV - read character separated files
use Text::xSV; my $csv = new Text::xSV; $csv->open_file("foo.csv"); $csv->bind_header(); while ($csv->get_row()) { my ($name, $age) = $csv->extract(qw(name age)); print "$name is $age years old\n"; }
This module is for reading character separated data. The most common example is comma-separated. However that is far from the only possibility, the same basic format is exported by Microsoft products using tabs, colons, or other characters.
The format is a series of rows separated by returns. Within each row you have a series of fields separated by your character separator. Fields may either be unquoted, in which case they do not contain a double-quote, separator, or return, or they are quoted, in which case they may contain anything, and will encode double-quotes by pairing them. In Microsoft products, quoted fields are strings and unquoted fields can be interpreted as being of various datatypes based on a set of heuristics. By and large this fact is irrelevant in Perl because Perl is largely untyped. The one exception that this module handles that empty unquoted fields are treated as nulls which are represented in Perl as undefined values. If you want a zero-length string, quote it.
People usually naively solve this with split. A next step up is to read a line and parse it. Unfortunately this choice of interface (which is made by Text::CSV on CPAN) makes it impossible to handle returns embedded in a field. Therefore you may need access to the whole file.
This module solves the problem by creating a CSV object with access to the filehandle, if in parsing it notices that a new line is needed, it can read at will.
First you set up and initialize an object, then you read the CSV file through it. The creation can also do multiple initializations as well. Here are the available methods
new
set_filename
set_fh
set_filter
set_sep
new
.
open_file
bind_fields
bind_headers
is preferred.
bind_headers
get_row
extract
When I say single character separator, I mean it.
Performance could be better. That is largely because the API was chosen for simplicity of a ``proof of concept'', rather than for performance. One idea to speed it up you would be to provide an API where you bind the requested fields once and then fetch many times rather than binding the request for every row.
Also note that should you ever play around with the special variables $`, $&, or $', you will find that it can get much, much slower. The cause of this problem is that Perl only calculates those if it has ever seen one of those. This does many, many matches and calculating those is slow.
I need to find out what conversions are done by Microsoft products that Perl won't do on the fly upon trying to use the values.
I need a real test suite.
Ben Tilly (ben_tilly@operamail.com) Originally posted at http://www.perlmonks.org/node_id=65094.
Copyright 2001. This may be modified and distributed on the same terms as Perl.
Text::xSV - read character separated files |