Locale::Messages - Gettext Like Message Retrieval |
Locale::Messages - Gettext Like Message Retrieval
use Locale::Messages (:locale_h :libintl_h);
gettext $msgid; dgettext $textdomain, $msgid; dcgettext $textdomain, $msgid, LC_MESSAGES; ngettext $msgid, $msgid_plural, $count; dngettext $textdomain, $msgid, $msgid_plural, $count; dcngettext $textdomain, $msgid, $msgid_plural, $count, LC_MESSAGES; textdomain $textdomain; bindtextdomain $textdomain, $directory; bind_textdomain_codeset $textdomain, $encoding; bind_textdomain_filter $textdomain, \&filter, $data; turn_utf_8_on ($variable); turn_utf_8_off ($variable); nl_putenv ('OUTPUT_CHARSET=koi8-r'); my $category = LC_CTYPE; my $category = LC_NUMERIC; my $category = LC_TIME; my $category = LC_COLLATE; my $category = LC_MONETARY; my $category = LC_MESSAGES; my $category = LC_ALL;
The module Locale::Messages is a wrapper around the interface to message translation according to the Uniforum approach that is for example used in GNU gettext and Sun's Solaris. It is intended to allow Locale::Messages(3) to switch between different implementations of the lower level libraries but this is not yet implemented.
Normally you should not use this module directly, but the high level interface Locale::TextDomain(3) that provides a much simpler interface. This description is therefore deliberately kept brief. Please refer to the GNU gettext documentation available at http://www.gnu.org/manual/gettext/ for in-depth and background information on the topic.
The lower level module Locale::gettext_pp(3) provides the Perl
implementation of gettext()
and related functions.
The module exports by default nothing. Every function has to be imported explicitely or via an export tag (EXPORT TAGS).
print gettext "Hello World!\n";
If no translation can be found, the unmodified MSGID is returned, i. e. the function can never fail, and will never mess up your original message.
Note for Perl 5.6 and later: The returned string will always have
the UTF-8 flag off by default. See the documentation for function
bind_textdomain_filter()
for a way to change this behavior.
One common mistake is this:
print gettext "Hello $name!";
Perl will interpolate the variable $name
before the function
will see the string. Unless the corresponding message catalog
contains a message ``Hello Tom!'', ``Hello Dick!'' or ``Hello Harry!'',
no translation will be found.
Using printf()
and friends has its own problems:
print sprintf (gettext ("This is the %s %s."), $color, $thing);
(The example is stupid because neither color nor thing will get translated here ...).
In English the adjective (the color) will precede the noun, many
other languages (for example French or Italian) differ here. The
translator of the message may therefore have a hard time to find
a translation that will still work and not sound stupid in the
target language. Many C implementations of printf()
allow to
change the order of the arguments, and a French translator could
then say:
"C'est le %$2s %$1s."
Perl printf()
implements this feature as of version 5.8 or better.
Consequently you can only use it, if you are sure that your software
will run with Perl 5.8 or a later version.
Another disadvantage of using printf()
is its cryptic syntax (maybe
not for you but translators of your software may have their own
opinion).
See the description of the function __x()
in Locale::TextDomain(3)
for a much better way to get around this problem.
Non-ASCII message ids ...
You should note that the function (and all other similar functions
in this module) does a bytewise comparison of the MSGID for the
lookup in the translation catalog, no matter whether obscure utf-8
flags are set on it, whether the string looks like utf-8, whether
the utf8(3pm)
pragma is used, or whatever other weird method past
or future perl(1)
versions invent for guessing character sets of
strings.
Using other than us-ascii characters in Perl source code is a call
for trouble, a compatibility nightmare. Furthermore, GNU gettext
only lately introduced support for non-ascii character sets in sources,
and support for this feature may not be available everywhere. If
you absolutely want to use MSGIDs in non-ascii character sets,
it is wise to choose utf-8. This will minimize the risk that perl(1)
itself will mess with the strings, and it will also be a guaranty
that you can later translate your project into arbitrary target
languages.
Other character sets can theoretically work. Yet, using another character set in the Perl source code than the one used in your message catalogs will never work, since the lookup is done bytewise, and all strings with non-ascii characters will not be found.
Even if you have solved all these problems, there is still one show stopper left: The gettext runtime API lacks a possibility to specify the character set of the source code (including the original strings). Consequently - in absence of a hint for the input encoding - strings without a translation are not subject to output character set conversion. In other words: If the (non-determinable) output character set differs from the character set used in the source code, output can be a mixture of two character sets. There is no point in trying to address this problem in the pure Perl version of the gettext functions. because breaking compatibilty between the Perl and the C version is a price too high to pay.
This all boils down to: Only use ASCII characters in your translatable strings!
dgettext()
but retrieves the message from the specified CATEGORY
instead of the default category LC_MESSAGES
.
print "$count file(s) deleted.\n";
or
printf "$count file%s deleted.\n", $count == 1 ? '' : 's';
The first example looks awkward, the second will only work in English
and languages with similar plural rules. Before ngettext()
was introduced,
the best practice for internationalized programs was:
if ($count == 1) { print gettext "One file deleted.\n"; } else { printf gettext "%d files deleted.\n"; }
This is a nuisance for the programmer and often still not sufficient for an adequate translation. Many languages have completely different ideas on numerals. Some (French, Italian, ...) treat 0 and 1 alike, others make no distinction at all (Japanese, Korean, Chinese, ...), others have two or more plural forms (Russian, Latvian, Czech, Polish, ...). The solution is:
printf (ngettext ("One file deleted.\n", "%d files deleted.\n", $count), # argument to ngettext! $count); # argument to printf!
In English, or if no translation can be found, the first argument
(MSGID) is picked if $count
is one, the second one otherwise.
For other languages, the correct plural form (of 1, 2, 3, 4, ...)
is automatically picked, too. You don't have to know anything about
the plural rules in the target language, ngettext()
will take care
of that.
This is most of the time sufficient but you will have to prove your creativity in cases like
printf "%d file(s) deleted, and %d file(s) created.\n";
ngettext()
but retrieves the translation from the specified
textdomain instead of the default domain.
dngettext()
but retrieves the translation from the specified
category, instead of the default category LC_MESSAGES
.
bindtextdomain "my-package", "./mylocale";
Say, the selected locale (actually the selected locale for category
LC_MESSAGES
) of the program is 'fr_CH', then the message catalog
will be expected in ./mylocale/fr_CH/LC_MESSAGES/my-package.mo.
If you want an object method to be called, pass the object itself in the data parameter and write a wrapper function. Example:
sub wrapper { my ($string, $obj) = @_; $obj->filterMethod ($string); } my $obj = MyPackage->new;
bind_textdomain_filter ('mydomain', \&wrapper, $obj);
The function cannot fail and always returns a true value.
Attention: If you use the function for setting the utf-8 flag,
it is your responsability to ensure that the output is really
utf-8. You should only use it, if you have set the environment
variable OUTPUT_CHARSET to ``utf-8''. Additionally you should
call bind_textdomain_codeset()
with ``utf-8'' as the second
argument.
This function has been introduced in libintl-perl 1.16 and it is not part of the standard gettext API.
The flag does not mean that the string is in fact valid utf-8!
The function was introduced with libintl-perl version 1.16.
The function was introduced with libintl-perl version 1.07.
You will normally want to use that in a BEGIN block of your main script.
The function was introduced with libintl-perl version 1.03 and is not part of the standard gettex API.
putenv(3)
function. The sole purpose of this
function is to work around some ideosyncrasies in the environment
processing of Windows systems. If you want to portably set or
unset environment variables, use this function instead of directly
manipulating %ENV
.
The argument ENVSPEC may have three different forms.
LANGUAGE
to ``fr_CH''.
LANGUAGE
to an
empty string. Under Windows, however, the environment variable will
be deleted instead (and is no longer present in %ENV
). Since
within libintl-perl empty environment variables are useless, consider
this usage as deprecated.
putenv(3)
(resp.
_putenv())
in the so-called standard C library of MS-Windows, you
may suspect that this is an invalid argument. This is not the case!
Passing a variable name not followed by an equal sign will always
delete the variable, no matter which operating system you use.
The function returns true for success, and false for failure. Possible reasons for failure are an invalid syntax or - only under Windows - failure to allocate space for the new environment entry ($! will be set accordingly in this case).
Why all this hassle? The 32-bit versions of MS-DOS (currently
Windows 95/98/ME/NT/2000/XP/CE/.NET) maintain two distinct blocks
of environment variables per process. Which block is considered
the ``correct'' environment is a compile-time option of the Perl
interpreter. Unfortunately, if you have build the XS version
Locale::gettext_xs(3) under Windows, the underlying library may use
a different environment block, and changes you make to %ENV
may
not be visible to the library.
The function nl_putenv()
is mostly a funny way of saying
LANGUAGE=some_value but it does its best, to pass this information to the gettext library. Under other operating systems than Windows, it only operates on C<%ENV>, under Windows it will call the C library function _putenv() (after doing some cleanup to its arguments), before manipulating C<%ENV>.
Please note, that you %ENV
is updated by nl_putenv()
automatically.
The function has been introduced in libintl-perl version 1.10.
You can (maybe) get the same constants from POSIX(3); see there for a detailed description
POSIX(3)
were included here. Even if it was present in your systems C include
file locale.h, it was not provided by POSIX(3). Perl 5.8 and later
seems to export the constant if available, although it is not documented
in POSIX(3).
Locale::Messages(3) makes an attempt to guess the value of this category for all systems, and assumes the arbitrary value 1729 otherwise.
The module does not export anything unless explicitely requested. You can import groups of functions via two tags:
A complete example:
1: use Locale::Messages qw (:locale_h :libintl_h); 2: use POSIX qw (setlocale); 3: setlocale (LC_MESSAGES, ''); 4: textdomain ('my-package'); 5: bindtextdomain ('my-package' => '/usr/local/share/locale'); 6: 7: print gettext ("Hello world!\n");
Step by step: Line 1 imports the necessary functions and constants. In line 3 we set the locale for category LC_MESSAGES to the default user settings. For C programs you will often read that LC_ALL is the best category here but this will also change the locale for LC_NUMERIC and many programs will not work reliably after changing that category in Perl; choose your own poison!
In line 4 we say that all messages (translations) without an explicit domain specification should be retrieved from the message catalog for the domain 'my-package'. Line 5 has the effect that the message catalog will be searched under the directory /usr/local/share/locale.
If the user has selected the locale 'fr_CH', and if the file /usr/local/share/locale/fr_CH/LC_MESSAGES/my-package.mo exists, and if it contains a GNU message object file with a translation for the string ``Hello world!\n'', then line 7 will print the French translation (for Switzerland CH) to STDOUT.
The documentation for GNU gettext explains how to extract translatable strings from your Perl files and how to create message catalogs.
Another less portable example: If your system uses the GNU libc you should be able to find various files with the name libc.mo, the message catalog for the library itself. If you have found these files under /usr/share/locale, then you can try the following:
use Locale::Messages qw (:locale_h :libintl_h); use POSIX qw (setlocale);
setlocale LC_MESSAGES, ""; textdomain "libc";
# The following is actually not needed, since this is # one of the default search directories. bindtextdomain libc => '/usr/share/locale'; bind_textdomain_codeset libc => 'iso-8859-1';
print gettext ("No such file or directory");
See Locale::TextDomain(3) for much simpler ways.
Copyright (C) 2002-2004, Guido Flohr <guido@imperia.net>, all rights reserved. See the source code for details.
This software is contributed to the Perl community by Imperia (http://www.imperia.net/).
Locale::TextDomain(3pm), Locale::gettext_pp(3pm), Encode(3pm),
perllocale(3pm), POSIX(3pm), perl(1), gettext(1), gettext(3)
Locale::Messages - Gettext Like Message Retrieval |