LWP::Parallel::UserAgent - A class for parallel User Agents |
LWP::Parallel::UserAgent - A class for parallel User Agents
require LWP::Parallel::UserAgent; $ua = LWP::Parallel::UserAgent->new(); ...
$ua->redirect (0); # prevents automatic following of redirects $ua->max_hosts(5); # sets maximum number of locations accessed in parallel $ua->max_req (5); # sets maximum number of parallel requests per host ... $ua->register ($request); # or $ua->register ($request, '/tmp/sss'); # or $ua->register ($request, \&callback, 4096); ... $ua->wait ( $timeout ); ... sub callback { my($data, $response, $protocol) = @_; .... }
This class implements a user agent that access web sources in parallel.
Using a LWP::Parallel::UserAgent as your user agent, you typically start by registering your requests, along with how you want the Agent to process the incoming results (see $ua->register).
Then you wait for the results by calling $ua->wait. This method only returns, if all requests have returned an answer, or the Agent timed out. Also, individual callback functions might indicate that the Agent should stop waiting for requests and return. (see $ua->register)
See the file the LWP::Parallel manpage for a set of simple examples.
The LWP::Parallel::UserAgent is a sub-class of LWP::UserAgent, but not all of its methods are available here. However, you can use its main methods, $ua->simple_request and $ua->request, in order to simulate singular access with this package. Of course, if a single request is all you need, then you should probably use LWP::UserAgent in the first place, since it will be faster than our emulation here.
For parallel access, you will need to use the new methods that come with LWP::Parallel::UserAgent, called $pua->register and $pua->wait. See below for more information on each method.
Optionally, you can give it an existing LWP::Parallel::UserAgent (or even an LWP::UserAgent) as a first argument, and it will ``clone'' a new one from this (This just copies the behavior of LWP::UserAgent. I have never actually tried this, so let me know if this does not do what you want).
However, if you want to re-use the same UserAgent object for a number of ``runs'', you should call $ua->initialize after you have processed the results of the previous call to $ua->wait, but before registering any new requests.
See $ua-
register> for how to change the behaviour for particular
requests only.
Note: Although it says 'host', it really means 'netloc/server'! That is, multiple server on the same host (i.e. one server running on port 80, the other one on port 6060) will count as two 'hosts'.
HTTP::Request
object containing the HTML-Error message is
returned. Otherwise (that is, in case of a success) it will return
undef.
The $request
should be a reference to a HTTP::Request
object
with values defined for at least the method()
and url()
attributes.
$size
specifies the number of bytes Parallel::UserAgent should try
to read each time some new data arrives. Setting it to '0' or 'undef'
will make Parallel::UserAgent use the default. (8k)
Specifying $redirect_ok
will alter the redirection behaviour for
this particular request only. '1' or any other true value will force
Parallel::UserAgent to follow redirects, even if the default is set to
'no_redirect'. (see $ua-
redirect>) '0' or any other false value
should do the reverse. See LWP::UserAgent for using an object's
requests_redirectable
list for fine-tuning this behavior.
If $arg
is a scalar it is taken as a filename where the content of
the response is stored.
If $arg
is a reference to a subroutine, then this routine is called
as chunks of the content is received. An optional $size
argument
is taken as a hint for an appropriate chunk size. The callback
function is called with 3 arguments: the data received this time, a
reference to the response object and a reference to the protocol
object. The callback can use the predefined constants C_ENDCON,
C_LASTCON and C_ENDALL as a return value in order to influence pending
and active connections. C_ENDCON will end this connection immediately,
whereas C_LASTCON will inidicate that no further connections should be
made. C_ENDALL will immediately end all requests and let the
Parallel::UserAgent return from $pua->wait().
If $arg
is omitted, then the content is stored in the response
object itself.
If $arg
is a LPW::Parallel::UserAgent::Entry
object, then this
request will be registered as a follow-up request to this particular
entry. This will not create a new entry, but instead link the current
response (i.e. the reason for re-registering) as $response->previous
to the new response of this request. All other fields are either
re-initialized ($request, $fullpath, $proxy) or left untouched ($arg,
$size). (This should only be use internally)
LWP::Parallel::UserAgent->request also allows the registration of follow-up requests to existing requests, that required redirection or authentication. In order to do this, an Parallel::UserAgent::Entry object will be passed as the second argument to the call. Usually, this should not be used directly, but left to the internal $ua->handle_response method!
Please note that while $pua->on_return is a method (which should be overridden in a subclass), a callback function is NOT a method, and does not have $self as its first parameter. (See more on callbacks below)
The purpose of $pua->on_return is mainly to provide messages when a request returns. However, you can also re-register follow-up requests in case you need them.
If you need specialized follow-up requests depending on the request that just returend, use a callback function instead (which can be different for each request registered). Otherwise you might end up writing a HUGE if..elsif..else.. branch in this global method.
on_return
or <on_failure> if
you want to make sure an entry that you do not need does not occupy
valuable main memory.
This method should not be called directly. Instead, indicate for each
individual request registered with $ua-
register()> whether or not
you want Parallel::UserAgent to handle redirects and security, or
specify a default value for all requests in Parallel::UserAgent by
using $ua-
redirect()>.
$ua->simple_request dispatches a single WWW request on behalf of a
user, and returns the response received. The $request
should be a
reference to a HTTP::Request
object with values defined for at
least the method()
and url()
attributes.
If $arg
is a scalar it is taken as a filename where the content of
the response is stored.
If $arg
is a reference to a subroutine, then this routine is called
as chunks of the content is received. An optional $size
argument
is taken as a hint for an appropriate chunk size.
If $arg
is omitted, then the content is stored in the response
object itself.
Process a request, including redirects and security. This method may actually send several different simple reqeusts.
The arguments are the same as for simple_request()
.
use_alarm([$boolean])
You can register a callback function. See LWP::UserAgent for details.
Probably lots! This was meant only as an interim release until this functionality is incorporated into LWPng, the next generation libwww module (though it has been this way for over 2 years now!)
Needs a lot more documentation on how callbacks work!
Copyright 1997-2004 Marc Langheinrich <marclang@cpan.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
LWP::Parallel::UserAgent - A class for parallel User Agents |