WWW::Mechanize::Firefox - use Firefox as if it were WWW::Mechanize
SYNOPSIS
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
$mech->get('http://google.com');
$mech->eval_in_page('alert("Hello Firefox")');
my $png = $mech->content_as_png();
This module will let you automate Firefox through the
Mozrepl plugin. You need to have installed
that plugin in your Firefox.
The Mozrepl plugin that this module uses no longer works due to key technologies
it depends on being retired from the Mozilla platform in November 2017.
Therefore this module cannot be used on Firefox versions greather than 54.
CONSTRUCTOR and CONFIGURATION
$mech->new( %args )
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
Creates a new instance and connects it to Firefox.
Note that Firefox must have the mozrepl
extension installed and enabled.
The following options are recognized:
tab - regex for the title of the tab to reuse. If no matching tab is
found, the constructor dies.
If you pass in the string current, the currently
active tab will be used instead.
If you pass in a MozRepl::RemoteObject instance, this will be used
as the new tab. This is convenient if you have an existing tab
in Firefox as object already, for example created through
Firefox::Application->addTab().
create - will create a new tab if no existing tab matching
the criteria given in tab can be found.
activate - make the tab the active tab
launch - name of the program to launch if we can't connect to it on
the first try.
frames - an array reference of ids of subframes to include when
searching for elements on a page.
If you want to always search through all frames, just pass 1. This
is the default.
To prevent searching through frames, pass
frames => 0
To whitelist frames to be searched, pass the list
of frame selectors:
frames => ['#content_frame']
autodie - whether web failures converted are fatal Perl errors. See
the autodie accessor. True by default to make error checking easier.
To make errors non-fatal, pass
autodie => 0
in the constructor.
agent - the name of the User Agent to use. This overrides
how Firefox identifies itself.
bufsize - Net::Telnet buffer size, if the default of 1MB is not enough
events - the set of default Javascript events to listen for while
waiting for a reply. In fact, WWW::Mechanize::Firefox will almost always
wait until a 'DOMContentLoaded' or 'load' event. 'pagehide' events
will tell it for what frames to wait.
repl - a premade MozRepl::RemoteObject instance or a connection string
suitable for initializing one
use_queue - whether to use the command queueing of MozRepl::RemoteObject.
Default is 1.
js_JSON - whether to use native JSON encoder of Firefox
js_JSON => 'native', # force using the native JSON encoder
The default is to autodetect whether a native JSON encoder is available and
whether the transport is UTF-8 safe.
pre_events - the events that are sent to an input field before its
value is changed. By default this is [focus].
post_events - the events that are sent to an input field after its
value is changed. By default this is [blur, change].
$mech->agent( $product_id );
$mech->agent('wonderbot/JS 1.0');
Set the product token that is used to identify the user agent on the network.
The agent value is sent as the "User-Agent" header in the requests. The default
is whatever Firefox uses.
To reset the user agent to the Firefox default, pass an empty string:
$mech->agent('');
$mech->autodie( [$state] )
$mech->autodie(0);
Accessor to get/set whether warnings become fatal.
$mech->events()
$mech->events( ['load'] );
Sets or gets the set of Javascript events that WWW::Mechanize::Firefox
will wait for after requesting a new page. Returns an array reference.
Changing the set of events will most likely make WWW::Mechanize::Firefox
stall while waiting for a response.
This method is special to WWW::Mechanize::Firefox.
$mech->on_event()
$mech->on_event(1); # prints every page load event
# or give it a callback
$mech->on_event(sub { warn "Page loaded with $ev->{name} event" });
Gets/sets the notification handler for the Javascript event
that finished a page load. Set it to 1 to output via warn,
or a code reference to call it with the event.
This method is special to WWW::Mechanize::Firefox.
$mech->cookies()
my $cookie_jar = $mech->cookies();
Returns a HTTP::Cookies object that was initialized
from the live Firefox instance.
Note:->set_cookie is not yet implemented,
as is saving the cookie jar.
JAVASCRIPT METHODS
$mech->allow( %options )
Enables or disables browser features for the current tab.
The following options are recognized:
plugins - Whether to allow plugin execution.
javascript - Whether to allow Javascript execution.
metaredirects - Attribute stating if refresh based redirects can be allowed.
frames, subframes - Attribute stating if it should allow subframes (framesets/iframes) or not.
images - Attribute stating whether or not images should be loaded.
Options not listed remain unchanged.
Disable Javascript
$mech->allow( javascript => 0 );
$mech->js_errors()
print $_->{message}
for $mech->js_errors();
An interface to the Javascript Error Console
Returns the list of errors in the JEC
Maybe this should be called js_messages or
js_console_messages instead.
Evaluates the given Javascript fragment in the
context of the web page.
Returns a pair of value and Javascript type.
This allows access to variables and functions declared
"globally" on the web page.
The returned result needs to be treated with
extreme care because
it might lead to Javascript execution in the context of
your application instead of the context of the webpage.
This should be evident for functions and complex data
structures like objects. When working with results from
untrusted sources, you can only safely use simple
types like string.
If you want to modify the environment the code is run under,
pass in a hash reference as the second parameter. All keys
will be inserted into the this object as well as
this.window. Also, complex data structures are only
supported if they contain no objects.
If you need finer control, you'll have to
write the Javascript yourself.
This method is special to WWW::Mechanize::Firefox.
Also, using this method opens a potential security risk as
the returned values can be objects and using these objects
can execute malicious code in the context of the Firefox application.
$mech->unsafe_page_property_access( ELEMENT )
Allows you unsafe access to properties of the current page. Using
such properties is an incredibly bad idea.
This is why the function dies. If you really want to use
this function, edit the source code.
UI METHODS
See also Firefox::Application for how to add more than one tab
and how to manipulate windows and tabs.
$mech->application()
my $ff = $mech->application();
Returns the Firefox::Application object for manipulating
more parts of the Firefox UI and application.
$mech->autoclose_tab
$mech->autoclose_tab( 0 ); # keep tab open after program end
Set whether to close the tab associated with the instance.
$mech->tab()
Gets the object that represents the Firefox tab used by WWW::Mechanize::Firefox.
This method is special to WWW::Mechanize::Firefox.
$mech->make_progress_listener( %callbacks )
my $eventlistener = $mech->progress_listener(
onStateChange => \&onStateChange,
);
Creates an unconnected nsIWebProgressListener interface
which calls the Perl subroutines you pass in.
Returns a handle. Once the handle gets released, all callbacks will
get stopped. Also, all Perl callbacks will get deregistered from the
Javascript bridge, so make sure not to use the same callback
in different progress listeners at the same time.
The sender may still call your callbacks.
$mech->progress_listener( $source, %callbacks )
my $eventlistener = progress_listener(
$browser,
onLocationChange => \&onLocationChange,
);
Sets up the callbacks for the nsIWebProgressListener interface
to be the Perl subroutines you pass in.
$source needs to support .addProgressListener and .removeProgressListener.
Returns a handle. Once the handle gets released, all callbacks will
get stopped. Also, all Perl callbacks will get deregistered from the
Javascript bridge, so make sure not to use the same callback
in different progress listeners at the same time.
A Content-Length header will be automatically calculated if
it is not given.
The following options are recognized:
headers - a hash of HTTP headers to send. If not given,
the content type will be generated automatically.
data - the raw data to send, if you've encoded it already.
$mech->add_header( $name => $value, ... )
$mech->add_header(
'X-WWW-Mechanize-Firefox' => "I'm using it",
Encoding => 'text/klingon',
);
This method sets up custom headers that will be sent with every HTTP(S)
request that Firefox makes.
Using multiple instances of WWW::Mechanize::Firefox objects with the same
application together with changed request headers will most likely have weird
effects. So don't do that.
Note that currently, we only support one value per header.
Some versions of Firefox don't work with the method that is used to set
the custom headers. Please see t/60-mech-custom-headers.t for the exact
versions where the implemented mechanism doesn't work. Roughly, this is
for versions 17 to 24 of Firefox.
$mech->delete_header( $name , $name2... )
$mech->delete_header( 'User-Agent' );
Removes HTTP headers from the agent's list of special headers. Note
that Firefox may still send a header with its default value.
$mech->reset_headers
$mech->reset_headers();
Removes all custom headers and makes Firefox send its defaults again.
$mech->synchronize( $event, $callback )
Wraps a synchronization semaphore around the callback
and waits until the event $event fires on the browser.
If you want to wait for one of multiple events to occur,
pass an array reference as the first parameter.
Usually, you want to use it like this:
my $l = $mech->xpath('//a[@onclick]', single => 1);
$mech->synchronize('DOMFrameContentLoaded', sub {
$mech->click( $l );
});
It is necessary to synchronize with the browser whenever
a click performs an action that takes longer and
fires an event on the browser object.
The DOMFrameContentLoaded event is fired by Firefox when
the whole DOM and all iframes have been loaded.
If your document doesn't have frames, use the DOMContentLoaded
event instead.
If you leave out $event, the value of ->events() will
be used instead.
$mech->res() / $mech->response(%options)
my $response = $mech->response(headers => 0);
Returns the current response as a HTTP::Response object.
The headers option tells the module whether to fetch the headers
from Firefox or not. This is mainly an internal optimization hack.
$mech->success()
$mech->get('http://google.com');
print "Yay"
if $mech->success();
Returns a boolean telling whether the last request was successful.
If there hasn't been an operation yet, returns false.
This is a convenience function that wraps $mech->res->is_success.
Returns the HTTP status code of the response.
This is a 3-digit number like 200 for OK, 404 for not found, and so on.
$mech->reload( [$bypass_cache] )
$mech->reload();
Reloads the current page. If $bypass_cache
is a true value, the browser is not allowed to
use a cached page. This is the difference between
pressing F5 (cached) and shift-F5 (uncached).
Returns the (new) response.
$mech->back( [$synchronize] )
$mech->back();
Goes one page back in the page history.
Returns the (new) response.
$mech->forward( [$synchronize] )
$mech->forward();
Goes one page forward in the page history.
Returns the (new) response.
$mech->uri()
print "We are at " . $mech->uri;
Returns the current document URI.
CONTENT METHODS
$mech->document()
Returns the DOM document object.
This is WWW::Mechanize::Firefox specific.
$mech->docshell()
my $ds = $mech->docshell;
Returns the docShell Javascript object associated with the tab.
This is WWW::Mechanize::Firefox specific.
$mech->content( %options )
print $mech->content;
print $mech->content( format => 'html' ); # default
print $mech->content( format => 'text' ); # identical to ->text
This always returns the content as a Unicode string. It tries
to decode the raw content according to its input encoding.
This currently only works for HTML pages, not for images etc.
Recognized options:
document - the document to use.
Default is $self->document.
format - the stuff to return
The allowed values are html and text. The default is html.
$mech->text()
Returns the text of the current HTML content. If the content isn't
HTML, $mech will die.
$mech->content_encoding()
print "The content is encoded as ", $mech->content_encoding;
Returns the encoding that the content is in. This can be used
to convert the content from UTF-8 back to its native encoding.
$mech->update_html( $html )
$mech->update_html($html);
Writes $html into the current document. This is mostly
implemented as a convenience method for HTML::Display::MozRepl.
请发表评论