HTML::Mason::Admin - Mason Administrator's Guide |
HTML::Mason::Admin - Mason Administrator's Guide
This guide is intended for the sys admin/web master in charge of installing, configuring, or tuning a Mason system.
The Config.pm
file contains global settings for your system, such
as the presence or absence of certain optional modules.
Makefile.PL
initially creates Config.pm
based on your environment,
placing it in the lib/HTML/Mason
subdirectory of the distribution.
After that, you can edit it by hand, following the comments inside.
In most cases, you will not need to edit it.
``make install'' copies Config.pm
to your Perl library directory
(e.g. /usr/lib/perl5/site_perl/HTML/Mason
) along with the other module
files. This allows Mason internally to grab the configuration data
with ``use HTML::Mason::Config
''.
When upgrading from a previous version, ``make install'' will
maintain the previous Config.pm
values.
There are three ways to configure a Mason site:
The next three sections discuss these methods in detail. We recommend that you start with the simplest method and work your way forward as the need for flexibility arises.
It is important to note that you cannot mix httpd.conf configuration
directives with a handler script. Depending on how you declare your
PerlHandler
, one or the other will always take precedence and the
other will be ignored.
A simple configuration looks like:
DocumentRoot /usr/local/www/htdocs PerlSetVar MasonCompRoot /usr/local/www/htdocs PerlSetVar MasonDataDir /usr/local/www/mason PerlModule HTML::Mason::ApacheHandler
<FilesMatch "*.html"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </FilesMatch>
MasonCompRoot
specifies the component root, the top of your
component source tree. In simple configurations the component root is
the same as the server's DocumentRoot
. (See CONFIGURING VIRTUAL
SITES to understand why the component root and DocumentRoot
might
differ.)
When Mason handles a request, the requested filename must be underneath your component root -- that gives Mason a legitimate component to start with. If the filename is not underneath the component root, Mason will place a warning in the error logs and return 404.
MasonDataDir
specifies the data directory, a writable directory
that Mason uses for various features and optimizations. Mason will
create the directory on startup, if necessary, and set its permissions
according to the web server User/Group.
Mason's configuration parameters are set via mod_perl's PerlSetVar
and PerlAddVar
directives. The latter directive is only available
in mod_perl version 1.24 and greater. Though these parameters are all
strings, Mason treats them in different ways. The different parameter
types are:
eval
'ed. This is used
for parameters that expect subroutine references. For example, an
anonymous subroutine might look like:
PerlSetVar MasonDieHandler "sub { die join ' -- ', @_ }"
A named subroutine call would look like this:
PerlSetVar MasonDieHandler "\&Carp::croak"
PerlAddVar
for the values, like this:
PerlAddVar MasonPreloads /foo/bar/baz.comp PerlAddVar MasonPreloads /foo/bar/quux.comp
As noted above, PerlAddVar
is only available in mod_perl 1.24 and
up. This means that it is only possible to assign a single value
(using PerlSetVar
) to list parameters if you are using a mod_perl
older than 1.24.
Almost all of Mason's configuration parameters correspond to new()
parameters for the HTML::Mason::Parser,
HTML::Mason::Interp, and HTML::Mason::ApacheHandler
objects. Configuration parameters are named predictably by taking the
mixed-caps form of the new()
parameter and prefixing ``Mason''. For
example, comp_root becomes MasonCompRoot and data_dir becomes
MasonDataDir.
allow_globals
parameter.
default_escape_flags
parameter.
ignore_warnings_expr
parameter.
in_package
parameter.
postamble
parameter.
preamble
parameter.
taint_check
parameter.
use_strict
parameter.
preprocess
parameter.
postprocess
parameter.
comp_root
parameter. This parameter can either
be a single value, such as '/foo/comp/root', or a list of keys and
values such as:
PerlAddVar MasonCompRoot "root1 => /foo/comp/root1" PerlAddVar MasonCompRoot "shared => /my/shared/comps"
See the Multiple component roots section for more details on multiple component roots.
data_dir
parameter.
allow_recursive_autohandlers
parameter.
autohandler_name
parameter.
code_cache_max_size
parameter.
current_time
parameter.
data_cache_dir
parameter.
die_handler
parameter.
max_recurse
parameter.
out_method
parameter.
out_mode
parameter.
preloads
parameter.
static_file_root
parameter.
system_log_events
parameter.
system_log_file
parameter.
system_log_separator
parameter.
use_autohandlers
parameter.
use_data_cache
parameter.
use_dhandlers
parameter.
use_object_files
parameter.
use_reload_file
parameter.
apache_status_title
parameter.
auto_send_headers
parameter.
debug_handler_proc
parameter.
debug_handler_script
parameter.
debug_mode
parameter.
debug_perl_binary
parameter.
error_mode
parameter.
top_level_predicate
parameter.
These are the only parameters that do not correspond to new()
parameters.
CGI
or mod_perl
. This
determines whether to use the CGI
or Apache::Request
module for
argument handling. Please see HTML::Mason::ApacheHandler/PARAMETERS TO THE use()
DECLARATION
for more details.
This can only be set once, regardless of how you set MasonMultipleConfig.
Defaults to CGI
.
If you set this option, the Apache child processes will need the proper permissions to create the data directories.
To maximize shared memory, you should put your Mason configuration
directives in the configuration file before you load the
HTML::Mason::ApacheHandler
module via a PerlModule
directive.
Because certain features were not available in mod_perl before 1.21_01, Mason will be less memory efficient when using those older versions.
For maximum flexibility, you may choose to write a custom script to
create your Mason objects and handle requests. In our documentation
and examples we call this script handler.pl
, though you may name it
whatever you like.
The handler.pl
file is responsible for creating the three Mason
objects and supplying the many parameters that control how your
components are parsed and executed. It also provides the opportunity
to execute arbitrary code at three important junctures: the server
initialization, the beginning of a request, and the end of a request.
A wide set of behaviors can be implemented with a mere few lines of
well-placed Perl in your handler.pl
. In this section we present the
basics of setting up handler.pl
as well as some ideas for more
advanced applications.
handler.pl
creates three Mason objects: the Parser, Interpreter,
and ApacheHandler. The Parser compiles components into Perl
subroutines; the Interpreter executes those compiled components; and
the Apache handler routes mod_perl requests to Mason. These objects
are created once in the parent httpd and then copied to each child
process.
These objects have a fair number of possible parameters. Only two of them are required, comp_root and data_dir; these are discussed in the next two subsections. The various parameters are documented in the individual reference manuals for each object: HTML::Mason::Parser, HTML::Mason::Interp, and HTML::Mason::ApacheHandler.
The advantage of embedding these parameters in objects is that
advanced configurations can create more than one set of objects,
choosing which set to use at request time. For example, suppose you
have a staging site and a production site running on the same web
server, distinguishing between them with a configuration variable
called version
:
# Create Mason objects for staging site my $parser1 = new HTML::Mason::Parser; my $interp1 = new HTML::Mason::Interp (parser=>$parser1, ...); my $ah1 = new HTML::Mason::ApacheHandler (interp=>$interp1);
# Create Mason objects for production site my $parser2 = new HTML::Mason::Parser; my $interp2 = new HTML::Mason::Interp (parser=>$parser2, ...); my $ah2 = new HTML::Mason::ApacheHandler (interp=>$interp2);
sub handler { ...
# Choose the right ApacheHandler if ($r->dir_config('version') eq ' staging') { $ah1->handle_request($r); } else { $ah2->handle_request($r); } }
You may specify multiple component roots to be searched in the spirit
of Perl's @INC
. To do so you must specify a list of lists:
comp_root => [[key1, root1], [key2, root2], ...]
Each pair consists of a key and root. The key is a string that identifies the root mnemonically to a component developer. Keys are case-insensitive and must be distinct.
For example:
comp_root => [[private=>'/usr/home/joe/comps'], [main=>'/usr/local/www/htdocs']]
This specifies two component roots, a main component tree and a
private tree which overrides certain components. The order is
respected ala @INC
, so private is searched first and main second.
(I chose the =>
notation here because it looks cleaner, but note that
this is a list of lists, not a hash.)
The key has several purposes. Object and data cache filenames use the (uppercased) key to make sure different components sharing the same path have different cache and object files. For example, if a component /foo/bar is found in 'private', then the object file will be
<data_dir>/obj/PRIVATE/foo/bar
and the cache file
<data_dir>/cache/PRIVATE+2ffoo+2fbar
The key is also included whenever Mason prints the component title, as in an error message:
error while executing /foo/bar [private]: ...
This lets you know which version of the component was running.
Components will often need access to external Perl modules. Any such
modules that export symbols should be listed in handler.pl
, rather
than the standard practice of using a PerlModule configuration
directive. This is because components are executed inside the
HTML::Mason::Commands package, and can only access symbols exported
to that package. Here's sample module list:
{ package HTML::Mason::Commands; use CGI ':standard'; use LWP::UserAgent; ... }
In any case, for optimal memory utilization, make sure all Perl modules are used in the parent process, and not in components. Otherwise, each child allocates its own copy and you lose the benefit of shared memory between parent processes and their children. See Vivek Khera's mod_perl tuning FAQ (perl.apache.org/tuning) for details.
Unix web servers that run on privileged ports like 80 start with a root parent process, then spawn children running as the 'User' and 'Group' specified in httpd.conf. This difference leads to permission errors when child processes try to write files or directories created by the parent process.
To work around this conflict, Mason remembers all directories and
files created at startup, returning them in response to
interp->files_written
. This list can be fed to a chown()
at the
end of the startup code in handler.pl
:
chown (Apache->server->uid, Apache->server->gid, $interp->files_written);
With just a few lines in handler.pl
you can make a global hash
(e.g. %session) available to all components containing persistent user
session data. If you set a value in the hash, you will see the change
in future visits by the same user. The key piece is Jeffrey Baker's
Apache::Session module, available from CPAN.
The file eg/session_handler.pl
in the distribution contains the
lines to activate cookie-based sessions using Apache::Session and
CGI::Cookie. You can use eg/session_handler.pl
as your new handler.pl
base, or just copy out the appropriate pieces to your existing handler.pl
.
The session code is customizable; you can change the user ID location (e.g. URL instead of cookie), the user data storage mechanism (e.g. DBI database), and the name of the global hash.
Global variables generally make programs harder to read, maintain, and
debug, and this is no less true for Mason. Due to the persistent
mod_perl environment, globals require extra initialization and cleanup
care. And the highly modular nature of Mason pages does not mix well
with globals: it is no fun trying to track down which of twenty
components is stepping on your variable. With the ability to pass
parameters and declare lexical (my
) variables in components, there
is very little need for globals at all.
That said, there are times when it is very useful to make a value
available to all Mason components: a DBI database handler, a hash of
user session information, the server root for forming absolute URLs.
Usually you initialize the global in your handler.pl
, either outside
the handler()
subroutine (if you only need to set it once) or inside
(if you need to set it every request).
Mason by default parses components in strict
mode, so you can't
simply start referring to a new global or you'll get a fatal
warning. The solution is to invoke use vars
inside the
package that components execute in, by default HTML::Mason::Commands:
{ package HTML::Mason::Commands; use vars qw($dbh %session); }
Alternatively you can use the Parser/allow_globals parameter or method:
my $parser = new HTML::Mason::Parser (..., allow_globals => [qw($dbh %session)]); $parser->allow_globals(qw($foo @bar))
The only advantage to allow_globals
is that it will do
the right thing if you've chosen a different package for components
to run in (via the Parser/in_package Parser parameter.)
Similarly, to initialize the variable in handler.pl
, you need to
set it in the component package:
$HTML::Mason::Commands::dbh = DBI->connect(...);
Alternatively you can use the Interp/set_global Interp method:
$interp->set_global(dbh => DBI->connect(...));
Again, set_global
will do the right thing if you've chosen a
different package for components.
Now when referring to these globals inside components, you can use the plain variable name:
$dbh->prepare...
Mason should be prevented from serving images, tarballs, and other binary files as regular components. Such a file may inadvertently contain a Mason character sequence such as ``<%'', causing an error.
There are several ways to restrict which file types are handled by Mason. One way is with a line at the top of handler(), e.g.:
return -1 if $r->content_type && $r->content_type !~ m|^text/|i;
This line allows text/html and text/plain to pass through but not much
else. It is included (commented out) in the default handler.pl
.
Another way is specifying a filename pattern in the Apache configuration, e.g.:
<FilesMatch "(\.html|\.txt|^[^\.]+)$> SetHandler perl-script PerlHandler HTML::Mason </FilesMatch>
This directs Mason to handle only files with .html, .txt, or no extension.
Users may exploit a server-side scripting environment by invoking scripts with malicious or unintended arguments. Mason administrators need to be particularly wary of this because of the tendency to break out ``subroutines'' into individually accessible file components.
For example, a Mason developer might create a helpful shared component for performing sql queries:
$m->comp('sql_select', table=>'employee', where=>'id=315');
This is a perfectly reasonable component to create and call internally, but clearly presents a security risk if accessible via URL:
http://www.foo.com/sql_select?table=credit_cards&where=*
Of course a web user would have to obtain the name of this component through guesswork or other means, but obscurity alone does not properly secure a system. Rather, you should choose a site-wide policy for distinguishing top-level components from private components, and make sure your developers stick to this policy. You can then prevent private components from being served.
One solution is to place all private components inside a directory, say /private, that lies under the component root but outside the document root.
Another solution is to decide on a naming convention, for example, that all private components begin with ``_'', or that all top-level components must end in ``.html''. Then turn all private requests away with a 404 NOT_FOUND (rather than, say, a 403 FORBIDDEN which would provide more information than necessary). Use either an Apache directive
PerlModule Apache::Constants <FilesMatch "^_"> SetHandler perl-script PerlInitHandler Apache::Constants::NOT_FOUND </FilesMatch>
or a handler.pl
directive:
return 404 if $r->filename =~ m{^_[^/]+$};
Even after you've safely protected internal components, top-level components that process arguments (such as form handlers) still present a risk. Users can invoke such a component with arbitrary argument values via a handcrafted query string. Always check incoming arguments for validity and never place argument values directly into SQL, shell commands, etc. Unfortunately, Mason does not yet work with with Perl's taint checking which would help ensure these principles.
By default Mason will decline requests for directories, leaving Apache to serve up a directory index or a FORBIDDEN as appropriate. Unfortunately this rule applies even if there is a dhandler in the directory: /foo/bar/dhandler does not get a chance to handle a request for /foo/bar.
If you would like Mason to handle directory requests, do the following:
1. Set the ApacheHandler/decline_dirs ApacheHandler parameter to 0.
2. If your handler.pl
contains the standard ``return -1'' line to
decline non-text requests (as given in the previous section), add a
clause allowing directory types:
return -1 if $r->content_type && $r->content_type !~ m|^text/|i && $r->content_type !~ m|directory$|i;
The dhandler that catches a directory request is responsible for setting a reasonable content type.
This section explains how standard Mason features work and how to administer them.
Config.pm
controls which packages are used.
The most important task is selecting a good DBM package. Most standard DBM packages (SDBM, ODBM, NDBM) are unsuitable for data caching due to significant limitations on the size of keys and values. Perl only comes with SDBM, so you'll need to obtain a good-quality package if you haven't already. At this time the best options are Berkeley DB (DB_File) version 2.x, available at www.sleepycat.com, and GNU's gdbm (GDBM), available at GNU mirror sites everywhere. Stay away from Berkeley DB version 1.x on Linux which has a serious memory leak (and is unfortunately pre-installed on many distributions).
As far as the serialization methods, all of them should work fine. Data::Dumper is probably simplest: it comes with the latest versions of Perl, is required by Mason anyway, and produces readable output (possibly useful for debugging cache files). On the other hand Storable is significantly faster than the other options according to the MLDBM documentation.
Data caching will not work on systems lacking flock(), such as Windows 95 and 98.
$m->cache
or $m->cache_self
for the first time,
Mason automatically creates a new cache file under data_dir/cache
.
The name of the file is determined by encoding the path as follows:
s/([^\w\.\-\~])/sprintf('+%02x', ord $1)/eg;
like URL encoding with a '+' escape character. For example, the
cache file for component /foo/bar
is data_dir/cache/foo+2fbar
.
Currently Mason never deletes cache files, not even when the associated component file is modified. (This may change in the near future.) Thus cache files hang around and grow indefinitely. You may want to use a cron job or similar mechanism to delete cache files that get too large or too old. For example:
# Shoot cache files more than 30 days old foreach (<data_dir/cache>) { # path to cache directory unlink $_ if (-M >= 30); }
In general you can feel free to delete cache files periodically and without warning, because the data cache mechanism is explicitly not guaranteed -- developers are warned that cached data may disappear anytime and components must still function.
If some reason you want to disable data caching, specify
use_data_cache
=>0 to the Interp object. This will cause all $m->cache
calls to return undef without doing anything.
A debug file is a Perl script that creates a fake Apache request
object ($r
) and calls the same PerlHandler that Apache called.
Debug files are created under data_dir/debug/<username>
for
authenticated users, and in data_dir/debug/anon
for anonymous
users.
Debug files can only be used with the handler.pl
configuration method.
Several ApacheHandler parameters are required to activate and configure debug files:
/usr/bin/perl
. This is
used in the Unix ``shebang'' line at the top of each debug file.
handler.pl
script. Debug files invoke
handler.pl
just as Apache does as startup, to load needed modules
and create Mason objects.
handler.pl
. This routine
is called with the saved Apache request object.
Here's a sample ApacheHandler
constructor with all debug options:
my $ah = new HTML::Mason::ApacheHandler (interp=>$interp, debug_mode=>'all', debug_perl_binary=>'/usr/local/bin/perl', debug_handler_script=>'/usr/local/mason/eg/handler.pl', debug_handler_proc=>'HTML::Mason::handler');
When replaying a request through a debug file, the global variable
$HTML::Mason::IN_DEBUG_FILE
will be set to 1. This is useful if you
want to omit certain flags (like preloading) in handler.pl
when
running under debug. For example:
my %extra_flags = ($HTML::Mason::IN_DEBUG_FILE) ? () : (preloads=>[...]); my $interp = new HTML::Mason::Interp (..., %extra_flags);
The previewer is a web based utility that allows site developers to:
The web-based previewer interface (a single component, actually) allows the developer to select a variety of options such as time, browser, and display mode. The set of these options together is called a previewer configuration. Configurations can be saved under one of several preview ports. For more information on how the previewer is used, see HTML::Mason::Devel.
Follow these steps to activate the Previewer:
Listen your.site.ip.address:3001 ... Listen your.site.ip.address:3005
You'll also probably want to restrict access to these ports in your access.conf. If you have multiple site developers, it is helpful to use username/password access control, since the previewer will use the username to keep configurations separate.
handler.pl
, add the line
use HTML::Mason::Preview;
somewhere underneath ``use HTML::Mason''. Then add code to your handler routine to intercept Previewer requests on the ports defined above. Your handler should end up looking like this:
sub handler { my ($r) = @_;
# Compute port number from Host header my $host = $r->header_in('Host'); my ($port) = ($host =~ /:([0-9]+)$/); $port = 80 if (!defined($port));
# Handle previewer request on special ports if ($port >= 3001 && $port <= 3005) { my $parser = new HTML::Mason::Parser(...); my $interp = new HTML::Mason::Interp(...); my $ah = new HTML::Mason::ApacheHandler (...); return HTML::Mason::Preview::handle_preview_request($r,$ah); } else { $ah->handle_request($r); # else, normal request handler } }
The three ``new'' lines inside the if block should look exactly the same
as the lines at the top of handler.pl
. Note that these separate
Mason objects are created for a single request and discarded. The
reason is that the previewer may alter the objects' settings, so it is
safer to create new ones every time.
To test whether the previewer is working: restart your server, go to the previewer interface, and click ``View''. You should see your site's home page.
Mason will log various events to a system log file if you so desire. This can be useful for performance monitoring and debugging.
The format of the system log was designed to be easy to parse by programs, although it is not unduly hard to read for humans. Every event is logged on one line. Each line consists of multiple fields delimited by a common separator, by default ctrl-A. The first three fields are always the same: time, the name of the event, and the current pid ($$). These are followed by one or more fields specific to the event.
The events are:
EVENT NAME DESCRIPTION EXTRA FIELDS
REQ_START start of HTTP request request number, URL + query string REQ_END end of HTTP request request number, error flag (1 if error occurred, 0 otherwise) CACHE_READ attempt to read from component path, cache key, success data cache (C<$m-E<gt>cache>) flag (1 if item found, 0 otherwise) CACHE_STORE store to data cache component path, cache key COMP_LOAD component loaded into memory component path for first time
The request number is an incremental value that uniquely identifies each request for a given child process. Use it to match up REQ_START/REQ_END pairs.
To turn on logging, specify a string value to system_log_events
containing one or more event names separated by '|'. In additional to
individual event names, the following names can be used to specify
multiple events:
REQUEST = REQ_START | REQ_END CACHE = CACHE_READ | CACHE_STORE ALL = All events
For example, to log REQ_START, REQ_END, and COMP_LOAD events, you could use system_log_events => ``REQUEST|COMP_LOAD'' Note that this is a string, not a set of constants or'd together.
Configuration Options
By default, the system log will be placed in
data_dir/etc/system.log. You can change this with system_log_file
.
The default line separator is ctrl-A. The advantage of this separator
is that it is very unlikely to appear in any of the fields, making it
easy to split()
the line. The disadvantage is that it will not always
display, e.g. from a Unix shell, making the log harder to read
casually. You can change the separator to any sequence of characters
with system_log_separator
.
The time on each log line will be of the form ``seconds.microseconds''
if you are using Time::HiRes, and simply ``seconds'' otherwise. See
Config.pm
section.
Sample Log Parser
Here is a code skeleton for parsing the various events in a log. You
can also find this in eg/parselog.pl
in the Mason distribution.
open(LOG,"mason.log"); while (<LOG>) { chomp; my (@fields) = split("\cA"); my ($time,$event,$pid) = splice(@fields,0,3); if ($event eq 'REQ_START') { my ($reqnum,$url) = @fields; ... } elsif ($event eq 'REQ_END') { my ($reqnum,$errflag) = @fields; ... } elsif ($event eq 'CACHE_READ') { my ($comp,$key,$hitflag) = @fields; ... } elsif ($event eq 'CACHE_STORE') { my ($comp,$key) = @fields; ... } elsif ($event eq 'COMP_LOAD') { my ($comp) = @fields; ... } else { warn "unrecognized event type: $event\n"; } }
Suggested Uses
Performance: REQUEST events are useful for analyzing the performance
of all Mason requests occurring on your site, and identifying the
slowest requests. eg/perflog.pl
in the Mason distribution is a log
parser that outputs the average compute time of each unique URL, in
order from slowest to quickest.
Server activity: REQUEST events are useful for determining what your web server children are working on, especially when you have a runaway. For a given process, simply tail the log and find the last REQ_START event with that process id. (You can also use the Apache status page for this.)
Cache efficiency: CACHE events are useful for monitoring cache ``hit rates'' (number of successful reads over total number of reads) over all components that use a data cache. Because stores to a cache are more expensive than reads, a high hit rate is essential for the cache to have a beneficial effect. If a particular cache hit rate is too low, you may want to consider changing how frequently it is expired or whether to use it at all.
Load frequency: COMP_LOAD events are useful for monitoring your code cache. Too many loads may indicate that your code cache is too small. Also, if you can turn off the code cache for a short time, COMP_LOAD events will tell you which components are loaded most often and thus good candidates for preloading.
This section explains Mason's various performance enhancements and how to administer them.
When Mason loads a component, it places it in a memory cache.
The maximum size of the cache is specified with the Interp/code_cache_max_size Interp parameter; default is 10MB. When the cache fills up, Mason frees up space by discarding a number of components. The discard algorithm is least frequently used (LFU), with a periodic decay to gradually eliminate old frequency information. In a nutshell, the components called most often in recent history should remain in the cache. Very large components (over 20% of the maximum cache size) never get cached, on the theory that they would force out too many other components.
Note that the ``size'' of a component in memory cannot literally be measured. It is estimated by the length of the source text plus some overhead. Your process growth will not match the code cache size exactly.
You can monitor the performance of the memory cache by turning on system logs and counting the COMP_LOAD events. If these are occurring frequently even for a long-running process, you may want to increase the size of your code cache.
You can prepopulate the cache with components that you know will be accessed often; see Preloading. Note that preloaded components possess no special status in the cache and can be discarded like any others.
Naturally, a cache entry is invalidated if the corresponding component source file changes.
To turn off code caching completely, set Interp/code_cache_max_size to 0.
The in-memory code cache is only useful on a per-process basis. Each process must build and maintain its own cache. Shared memory caches are conceivable in the future, but even those will not survive between web server restarts.
As a secondary, longer-term cache mechanism, Mason stores a compiled
form of each component in an object file under
data_dir/obj/component-path
. Any server process can eval the
object file and save time on parsing the component source file. The
object file is recreated whenever the source file changes.
Besides improving performance, object files are essential for debugging and interpretation of errors. Line numbers in error messages are given in terms of the object file. The curious-minded can peek inside an object file to see exactly how Mason converted a given component to a Perl object.
If you change any Parser options, you must remove object files previously created under that parser for the changes to take effect.
If for some reason you don't want Mason to create object files, set the Interp/use_object_files Interp parameter to 0.
You can tell Mason to preload a set of components in the parent process, rather than loading them on demand, using the Interp/preloads Interp parameter. Each child server will start with those components loaded in the memory cache. The trade-offs are:
Try to preload components that are used frequently and do not change often. (If a preloaded component changes, all the children will have to reload it from scratch.)
Normally, every time Mason executes a component, it checks the last modified time of its source file to see if it needs to be reloaded. These file checks are convenient for development, but for a production site they degrade performance unnecessarily.
To remedy this, Mason has an accelerated mode that changes its behavior in two ways:
1. Does not check component source files at all, relying solely on object files. This means the developer or an automated system is responsible for recompiling any components that change and recreating object files, using the Parser/make_component Parser method.
2. Rather than continuously checking whether object files have
changed, Mason monitors a ``reload file'' containing an ever-growing
list of components that have changed. Whenever a component changes,
the developer or an automated system is responsible for appending the
component path to the reload file. The reload file is kept in
data_dir/etc/reload.lst
.
You can activate this mode with the Interp/use_reload_file Interp method.
The advantage of using this mode is that Mason stats one file per request instead of ten or twenty. The disadvantage is a increase in maintenance costs as the object and reload files have to be kept up-to-date. Automated editorial tools, and cron jobs that periodically scan the component hierarchy for changes, are two possible solutions. The Mason content management system automatically handles this task.
Site builders often maintain two versions of their sites: the production (published) version visible to the world, and the development (staging) version visible internally. Developers try out changes on the staging site and push the pages to production once they are satisfied.
The priorities for the staging site are rapid development and easy debugging, while the main priority for the production site is performance. This section describes various ways to adapt Mason for each case.
Mason can spew data in two modes. In ``batch'' mode Mason computes the entire page in a memory buffer and then transmits it all at once. In ``stream'' mode Mason outputs data as soon as it is computed. (This does not take into account buffering done by Apache or the O/S.) The default mode is ``batch''.
Batch mode has the advantage of better error handling. Suppose an error occurs in the middle of a page. In stream mode, the error message interrupts existing output, often appearing in an awkward HTML context such as the middle of a table which never gets closed. In batch mode, the error message is output neatly and alone.
Batch mode also offers more flexibility in controlling HTTP headers (see Devel/sending_http_headers) and in handling mid-request error conditions (see Request/clear_buffer).
Stream mode may help get data to the browser more quickly, allowing server and browser to work in parallel. It also prevents memory buildup for very large responses.
Since Apache does its own buffering, stream mode does not entail
immediate delivery of output to the client. You must set $|=1 to turn
off Apache buffering completely (generally not a good idea) or call
$m->flush_buffer
to flush the buffer selectively.
In terms of making your server seem responsive, the initial bytes are
most important. You can send these early by calling $m->flush_buffer
in key locations such as the common page header. However, this dilutes
the advantages of batch mode mentioned above. Tradeoffs...
You control output mode by setting interp->out_mode
to ``batch''
or ``stream''.
When an error occurs, Mason can respond by:
The first option is ideal for development, where you want immediate feedback on the error. The second option is usually desired for production so that users are not exposed to messy error messages. You control this option by setting ah->error_mode to ``html'' or ``fatal'' respectively.
As discussed in the debugging section, you can control when Mason creates a debug file. While creating a debug file is not incredibly expensive, it does involves a bit of work and the creation of a new file, so you probably want to avoid doing it on every request to a frequently visited site. I recommend setting debug_mode to 'all' in development, and 'error' or 'none' in production.
Consider reload files only for frequently visited production sites.
These examples extend the single site configurations given so far.
If you want to share some components between your sites, arrange your httpd.conf so that all DocumentRoots live under a single component space:
# Web site #1 <VirtualHost www.site1.com> DocumentRoot /usr/local/www/htdocs/site1 <Location /> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </Location> </VirtualHost>
# Web site #2 <VirtualHost www.site2.com> DocumentRoot /usr/local/www/htdocs/site2 <Location /> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </Location> </VirtualHost>
# Mason configuration PerlSetVar MasonCompRoot "/usr/local/www/htdocs" PerlSetVar MasonDataDir "/usr/local/mason" PerlModule HTML::Mason::ApacheHandler
The directory structure for this scenario might look like:
/usr/local/www/htdocs/ # component root +- shared/ # shared components +- site1/ # DocumentRoot for first site +- site2/ # DocumentRoot for second site
Incoming URLs for each site can only request components in their respective DocumentRoots, while components internally can call other components anywhere in the component space. The shared/ directory is a private directory for use by components, inaccessible from the Web.
Sometimes your sites need to have completely distinct component
hierarchies, e.g. if you are providing Mason ISP services for multiple
users. In this case the component root must change depending on the
site requested. Since you can't change an interpreter's component root
dynamically, you need to maintain separate Mason objects for each
site in your handler.pl
:
my (%interp,%ah); foreach my $site (qw(site1 site2 site3)) { $interp{$site} = new HTML::Mason::Interp (comp_root => "/usr/local/www/$site", data_dir => "/usr/local/mason/$site"); $ah{$site} = new HTML::Mason::ApacheHandler (interp => $interp{$site}); }
...
sub handler { my ($r) = @_; my $site = $r->dir_config('site'); $ah{$site}->handle_request($r); }
We assume each virtual server configuration section has a
PerlSetVar site <site_name>
Above we pre-create all Mason objects in the parent. Another scheme is to create objects on demand in the child:
my (%interp,%ah);
...
sub handler { my ($r) = @_; my $site = $r->dir_config('site'); unless exists($interp{$site}) { # get comp_root from PerlSetVar as well my $comp_root = $r->dir_config('comp_root'); $interp{$site} = new HTML::Mason::Interp(comp_root=>$comp_root,...); $ah{$site} = new HTML::Mason::ApacheHandler(interp=>$interp{$site},...); } }
The advantage of the second scheme is that you don't have to hardcode
as much information in the handler.pl
. The disadvantage is a slight
memory and performance impact. On development servers this shouldn't
matter; on production servers you may wish to profile the two schemes.
Jonathan Swartz, swartz@pobox.com and Dave Rolsky, autarch@urth.org
HTML::Mason, HTML::Mason::Parser, HTML::Mason::Interp, HTML::Mason::ApacheHandler
HTML::Mason::Admin - Mason Administrator's Guide |