Locale::Script - ISO codes for script identification |
Locale::Script - ISO codes for script identification (ISO 15924)
use Locale::Script; use Locale::Constants; $script = code2script('ph'); # 'Phoenician' $code = script2code('Tibetan'); # 'bo' $code3 = script2code('Tibetan', LOCALE_CODE_ALPHA_3); # 'bod' $codeN = script2code('Tibetan', LOCALE_CODE_ALPHA_NUMERIC); # 330 @codes = all_script_codes(); @scripts = all_script_names();
The Locale::Script
module provides access to the ISO
codes for identifying scripts, as defined in ISO 15924.
For example, Egyptian hieroglyphs are denoted by the two-letter
code 'eg', the three-letter code 'egy', and the numeric code 050.
You can either access the codes via the conversion routines (described below), or with the two functions which return lists of all script codes or all script names.
There are three different code sets you can use for identifying scripts:
LOCALE_CODE_ALPHA_2
.
LOCALE_CODE_ALPHA_3
.
LOCALE_CODE_NUMERIC
.
All of the routines take an optional additional argument which specifies the code set to use. If not specified, it defaults to the two-letter codes. This is partly for backwards compatibility (previous versions of Locale modules only supported the alpha-2 codes), and partly because they are the most widely used codes.
The alpha-2 and alpha-3 codes are not case-dependent, so you can use 'BO', 'Bo', 'bO' or 'bo' for Tibetan. When a code is returned by one of the functions in this module, it will always be lower-case.
The standard defines various special codes.
The private codes are not recognised by Locale::Script, but the others are.
There are three conversion routines: code2script()
, script2code()
,
and script_code2code()
.
undef
will be returned:
$script = code2script('cy'); # Cyrillic
undef
will be returned:
$code = script2code('Gothic', LOCALE_CODE_ALPHA_3); # $code will now be 'gth'
The case of the script name is not important. See the section KNOWN BUGS AND LIMITATIONS below.
$alpha2 = script_code2code('jwi', LOCALE_CODE_ALPHA_3 => LOCALE_CODE_ALPHA_2); # $alpha2 will now be 'jw' (Javanese)
If the code passed is not a valid script code in
the first code set, or if there isn't a code for the
corresponding script in the second code set,
then undef
will be returned.
There are two function which can be used to obtain a list of all codes, or all script names:
all_script_codes ( [ CODESET ] )
all_script_names ( [ CODESET ] )
The following example illustrates use of the code2script()
function.
The user is prompted for a script code, and then told the corresponding
script name:
$| = 1; # turn off buffering print "Enter script code: "; chop($code = <STDIN>); $script = code2script($code, LOCALE_CODE_ALPHA_2); if (defined $script) { print "$code = $script\n"; } else { print "'$code' is not a valid script code!\n"; }
script2code()
, the script name must currently appear
exactly as it does in the source of the module. For example,
script2code('Egyptian hieroglyphs')
will return eg, as expected. But the following will all return undef
:
script2code('hieroglyphs') script2code('Egyptian Hieroglypics')
If there's need for it, a future version could have variants for script names.
In the current implementation, all data is read in when the module is loaded, and then held in memory. A lazy implementation would be more memory friendly.
Neil Bowers <neil@bowers.com>
Copyright (c) 2002-2004 Neil Bowers.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Locale::Script - ISO codes for script identification |