Spreadsheet::ParseExcel - Get information from Excel file


NAME

Spreadsheet::ParseExcel - Get information from Excel file


SYNOPSIS

    
I<new interface>
    use strict;
    use Spreadsheet::ParseExcel;
    my $excel = Spreadsheet::ParseExcel::Workbook->Parse($file);
    foreach my $sheet (@{$excel->{Worksheet}}) {
        printf("Sheet: %s\n", $sheet->{Name});
        $sheet->{MaxRow} ||= $sheet->{MinRow};
        foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow}) {
            $sheet->{MaxCol} ||= $sheet->{MinCol};
            foreach my $col ($sheet->{MinCol} ..  $sheet->{MaxCol}) {
                my $cell = $sheet->{Cells}[$row][$col];
                if ($cell) {
                    printf("( %s , %s ) => %s\n", $row, $col, $cell->{Val});
                }
            }
        }
    }

old interface use strict; use Spreadsheet::ParseExcel; my $oExcel = Spreadsheet::ParseExcel->new;

    #1.1 Normal Excel97
    my $oBook = $oExcel->Parse('Excel/Test97.xls');
    my($iR, $iC, $oWkS, $oWkC);
    print "FILE  :", $oBook->{File} , "\n";
    print "COUNT :", $oBook->{SheetCount} , "\n";
    print "AUTHOR:", $oBook->{Author} , "\n";
    for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
        $oWkS = $oBook->{Worksheet}[$iSheet];
        print "--------- SHEET:", $oWkS->{Name}, "\n";
        for(my $iR = $oWkS->{MinRow} ; 
                defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
            for(my $iC = $oWkS->{MinCol} ;
                            defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) {
                $oWkC = $oWkS->{Cells}[$iR][$iC];
                print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);  # Formatted Value
                print "( $iR , $iC ) =>", $oWkC->{Val}, "\n" if($oWkC);  # Original Value
            }
        }
    }


DESCRIPTION

Spreadsheet::ParseExcel makes you to get information from Excel95, Excel97, Excel2000 file.

Functions

new
$oExcel = Spreadsheet::ParseExcel->new( [ CellHandler => \&subCellHandler, NotSetCell => undef | 1, ]);

Constructor.

CellHandler (experimental)
specify callback function when a cell is detected.

subCellHandler gets arguments like below:

sub subCellHandler ($oBook, $iSheet, $iRow, $iCol, $oCell);

CAUTION : The atributes of Workbook may not be complete. This function will be called almost order by rows and columns. Take care almost, not perfectly.

NotSetCell (experimental)
specify set or not cell values to Workbook object.

Parse
$oWorkbook = $oParse->Parse($sFileName [, $oFmt]);

return Workbook object. if error occurs, returns undef.

$sFileName
name of the file to parse

From 0.12 (with OLE::Storage_Lite v.0.06), scalar reference of file contents (ex. \$sBuff) or IO::Handle object (inclucdng IO::File etc.) are also available.

$oFmt
Formatter Class to format the value of cells.

ColorIdxToRGB
$sRGB = $oParse->ColorIdxToRGB($iColorIdx);

ColorIdxToRGB returns RGB string corresponding to specified color index. RGB string has 6 charcters, representing RGB hex value. (ex. red = 'FF0000')

Workbook

Spreadsheet::ParseExcel::Workbook

Workbook class has these methods :

Parse
(class method) : same as Spreadsheet::ParseExcel

Worksheet
$oWorksheet = $oBook->Worksheet($sName);

Worksheet returns a Worksheet object with $sName or undef. If there is no worksheet with $sName and $sName contains only digits, it returns a Worksheet object at that position.

Workbook class has these properties :

File
Name of the file

Author
Author of the file

Flg1904
If this flag is on, date of the file count from 1904.

Version
Version of the file

SheetCount
Numbers of Worksheet s in that Workbook

Worksheet[SheetNo]
Array of Worksheets class

PrintArea[SheetNo]
Array of PrintArea array refs.

Each PrintArea is : [ StartRow, StartColumn, EndRow, EndColumn]

PrintTitle[SheetNo]
Array of PrintTitle hash refs.

Each PrintTitle is : { Row => [StartRow, EndRow], Column => [StartColumn, EndColumn]}

Worksheet

Spreadsheet::ParseExcel::Worksheet

Worksheet class has these methods:

Cell ( ROW, COL )
Return the Cell iobject at row ROW and column COL if it is defined. Otherwise return undef.

RowRange ()
Return a two-element list (MIN, MAX) containing the minimum and maximum of defined rows in the worksheet If there is no row defined MAX is smaller than MIN.

ColRange ()
Return a two-element list (MIN, MAX) containing the minimum and maximum of defined columns in the worksheet If there is no row defined MAX is smaller than MIN.

Worksheet class has these properties:

Name
Name of that Worksheet

DefRowHeight
Default height of rows

DefColWidth
Default width of columns

RowHeight[Row]
Array of row height

ColWidth[Col]
Array of column width (undef means DefColWidth)

Cells[Row][Col]
Array of Cells infomation in the worksheet

Landscape
Print in horizontal(0) or vertical (1).

Scale
Print scale.

FitWidth
Number of pages with fit in width.

FitHeight
Number of pages with fit in height.

PageFit
Print with fit (or not).

PaperSize
Papar size. The value is like below:
  Letter               1, LetterSmall          2, Tabloid              3 ,
  Ledger               4, Legal                5, Statement            6 ,
  Executive            7, A3                   8, A4                   9 ,
  A4Small             10, A5                  11, B4                  12 ,
  B5                  13, Folio               14, Quarto              15 ,
  10x14               16, 11x17               17, Note                18 ,
  Envelope9           19, Envelope10          20, Envelope11          21 ,
  Envelope12          22, Envelope14          23, Csheet              24 ,
  Dsheet              25, Esheet              26, EnvelopeDL          27 ,
  EnvelopeC5          28, EnvelopeC3          29, EnvelopeC4          30 ,
  EnvelopeC6          31, EnvelopeC65         32, EnvelopeB4          33 ,
  EnvelopeB5          34, EnvelopeB6          35, EnvelopeItaly       36 ,
  EnvelopeMonarch     37, EnvelopePersonal    38, FanfoldUS           39 ,
  FanfoldStdGerman    40, FanfoldLegalGerman  41, User                256

PageStart
Start page number.

UsePage
Use own start page number (or not).

LeftMergin, RightMergin, TopMergin, BottomMergin, HeaderMergin, FooterMergin
Mergins for left, right, top, bottom, header and footer.

HCenter
Print in horizontal center (or not)

VCenter
Print in vertical center (or not)

Header
Content of print header. Please refer Excel Help.

Footer
Content of print footer. Please refer Excel Help.

PrintGrid
Print with Gridlines (or not)

PrintHeaders
Print with headings (or not)

NoColor
Print in black-white (or not).

Draft
Print in draft mode (or not).

Notes
Print with notes (or not).

LeftToRight
Print left to right(0) or top to down(1).

HPageBreak
Array ref of horizontal page breaks.

VPageBreak
Array ref of vertical page breaks.

MergedArea
Array ref of merged areas. Each merged area is : [ StartRow, StartColumn, EndRow, EndColumn]

Cell

Spreadsheet::ParseExcel::Cell

Cell class has these properties:

Value
Method Formatted value of that cell

Val
Original Value of that cell

Type
Kind of that cell ('Text', 'Numeric', 'Date')

Code
Character code of that cell (undef, 'ucs2', '_native_') undef tells that cell seems to be ascii. '_native_' tells that cell seems to be 'sjis' or something like that.

Format
Format for that cell.

Merged
That cells is merged (or not).

Rich
Array ref of font informations about each characters.

Each entry has : [ Start Position, Font Object]

For more information please refer sample/dmpExR.pl

Format

Spreadsheet::ParseExcel::Format

Format class has these properties:

Font
Font object for that Format.

AlignH
Horizontal Alignment.
  0: (standard), 1: left,       2: center,     3: right,      
  4: fill ,      5: justify,    7:equal_space

Notice: 6 may be merge but it seems not to work.

AlignV
Vertical Alignment.
    0: top,  1: vcenter, 2: bottom, 3: vjustify, 4: vequal_space

Indent
Number of indent

Wrap
Wrap (or not).

Shrink
Display in shrinking (or not)

Rotate
In Excel97, 2000 : degrees of string rotation. In Excel95 or earlier : 0: No rotation, 1: Top down, 2: 90 degrees anti-clockwise, 3: 90 clockwise

JustLast
JustLast (or not). I have never seen this attribute.

ReadDir
Direction for read.

BdrStyle
Array ref of boder styles : [Left, Right, Top, Bottom]

BdrColor
Array ref of boder color indexes : [Left, Right, Top, Bottom]

BdrDiag
Array ref of diag boder kind, style and color index : [Kind, Style, Color] Kind : 0: None, 1: Right-Down, 2:Right-Up, 3:Both

Fill
Array ref of fill pattern and color indexes : [Pattern, Front Color, Back Color]

Lock
Locked (or not).

Hidden
Hiddedn (or not).

Style
Style format (or Cell format)

Font

Spreadsheet::ParseExcel::Font

Format class has these properties:

Name
Name of that font.

Bold
Bold (or not).

Italic
Italic (or not).

Height
Size (height) of that font.

Underline
Underline (or not).

UnderlineStyle
0: None, 1: Single, 2: Double, 0x21: Single(Account), 0x22: Double(Account)

Color
Color index for that font.

Strikeout
Strikeout (or not).

Super
0: None, 1: Upper, 2: Lower


Formatter class

Spreadsheet::ParseExcel::Fmt*

Formatter class will convert cell data.

Spreadsheet::ParseExcel includes 2 formatter classes: FmtDefault and FmtJapanese. You can create your own FmtClass as you like.

Formatter class(Spreadsheet::ParseExcel::Fmt*) should provide these functions:

ChkType($oSelf, $iNumeric, $iFmtIdx)
tells type of the cell that has specified value.
$oSelf
Formatter itself

$iNumeric
If on, the value seems to be number

$iFmtIdx
Format index number of that cell

TextFmt($oSelf, $sText, $sCode)
converts original text into applicatable for Value.
$oSelf
Formatter itself

$sText
Original text

$sCode
Character code of Original text

ValFmt($oSelf, $oCell, $oBook)
converts original value into applicatable for Value.
$oSelf
Formatter itself

$oCell
Cell object

$oBook
Workbook object

FmtString($oSelf, $oCell, $oBook)
get format string for the $oCell.
$oSelf
Formatter itself

$oCell
Cell object

$oBook
WorkBook object contains that cell


KNOWN PROBLEM

This module can not get the values of fomulas in Excel files made with Spreadsheet::WriteExcel. Normaly (ie. By Excel application), formula has the result with it. But Spreadsheet::WriteExcel writes formula with no result. If you set your Excel application ``Auto Calculation'' off. (maybe [Tool]-[Option]-[Calculation] or something) You will see the same result.


AUTHOR

Current maintainer: Gabor Szabo szabgab@cpan.org

    http://www.szabgab.com/

Original author: Kawai Takanori (Hippo2000) kwitknr@cpan.org

    http://member.nifty.ne.jp/hippo2000/            (Japanese)
    http://member.nifty.ne.jp/hippo2000/index_e.htm (English)


SEE ALSO

XLHTML, OLE::Storage, Spreadsheet::WriteExcel, OLE::Storage_Lite

This module is based on herbert within OLE::Storage and XLHTML.

XLSTools: http://perl.jonallen.info/projects/xlstools


TODO

- Add tests, and more tests

- Spreadsheet::ParseExcel : Password protected data, Formulas support, HyperLink support, Named Range support

- Spreadsheet::ParseExcel::SaveParser : Catch up Spreadsheet::WriteExce feature, Create new Excel fle

See also:

 L<http://www.cpanforum.com/dist/Spreadsheet-ParseExcel>;
 and
 http://www.perlmonks.org/index.pl?node_id=490656
 http://www.perlmonks.org/index.pl?node_id=379743 
 http://www.perlmonks.org/index.pl?node_id=433192
 http://www.perlmonks.org/index.pl?node_id=422147


COPYRIGHT

Copyright (c) 2006-2007 Gabor Szabo Copyright (c) 2000-2006 Kawai Takanori All rights reserved.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.


ACKNOWLEDGEMENTS

First of all, I would like to acknowledge valuable program and modules : XHTML, OLE::Storage and Spreadsheet::WriteExcel.

In no particular order: Yamaji Haruna, Simamoto Takesi, Noguchi Harumi, Ikezawa Kazuhiro, Suwazono Shugo, Hirofumi Morisada, Michael Edwards, Kim Namusk, Slaven Rezić, Grant Stevens, and many many people + Kawai Mikako.

 Spreadsheet::ParseExcel - Get information from Excel file