在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:openwpm/OpenWPM开源软件地址:https://github.com/openwpm/OpenWPM开源编程语言:Python 67.4%开源软件介绍:OpenWPMOpenWPM is a web privacy measurement framework which makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. Table of Contents
InstallationOpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we don't officially support other platforms, conda is a cross platform utility and the install script can be expected to work on OSX and other linux distributions. OpenWPM does not support windows: #503 Pre-requisitesThe main pre-requisite for OpenWPM is conda, a cross-platform package management tool. Conda is open-source, and can be installed from https://docs.conda.io/en/latest/miniconda.html. InstallAn installation script, All installation is confined to your conda environment and should not affect your machine. The installation script will, however, override any existing conda environment named openwpm. To run the install script, run
After running the install script, activate your conda environment by running:
Mac OSXYou may need to install We do not run CI tests for Mac, so new issues may arise. We welcome PRs to fix these issues and add full CI testing for Mac. Running Firefox with xvfb on OSX is untested and will require the user to install an X11 server. We suggest XQuartz. This setup has not been tested, we welcome feedback as to whether this is working. Quick StartOnce installed, it is very easy to run a quick test of OpenWPM. Check out
More information on the instrumentation and configuration parameters is given below. The docs provide a more in-depth tutorial, and a description of the methods of data collection available. Troubleshooting
This error indicates that Firefox exited during startup (or was prevented from starting). There are many possible causes of this error:
DocumentationFurther information is available at OPENWPM's Documentation Page. Advice for Measurement ResearchersOpenWPM is often used for web measurement research. We recommend the following for researchers using the tool: Use a versioned release. We aim to follow Firefox's release cadence, which is roughly once every four weeks. If we happen to fall behind on checking in new releases, please file an issue. Versions more than a few months out of date will use unsupported versions of Firefox, which are likely to have known security vulnerabilities. Versions less than v0.10.0 are from a previous architecture and should not be used. Include the OpenWPM version number in your publication. As of v0.10.0 OpenWPM pins all python, npm, and system dependencies. Including this information alongside your work will allow other researchers to contextualize the results, and can be helpful if future versions of OpenWPM have instrumentation bugs that impact results. Developer instructionsIf you want to contribute to OpenWPM have a look at our CONTRIBUTING.md Instrumentation and ConfigurationOpenWPM provides a breadth of configuration options which can be found in Configuration.md More detail on the output is available below. StorageOpenWPM distinguishes between two types of data, structured and unstructured. Structured data is all data captured by the instrumentation or emitted by the platform. Generally speaking all data you download is unstructured data. For each of the data classes we offer a variety of storage providers, and you are encouraged to implement your own, should the provided backends not be enough for you. We have an outstanding issue to enable saving content generated by commands, such as
screenshots and page dumps to unstructured storage (see #232). Local StorageFor storing structured data locally we offer two StorageProviders:
For storing unstructured data locally we also offer two solutions:
Remote storageWhen running in the cloud, saving records to disk is not a reasonable thing to do. So we offer a remote StorageProviders for S3 (See #823) and GCP. Currently, all remote StorageProviders write to the respective object storage service (S3/GCS). The structured providers use the Parquet format. NOTE: The Parquet and SQL schemas should be kept in sync except
output-specific columns (e.g., Docker Deployment for OpenWPMOpenWPM can be run in a Docker container. This is similar to running OpenWPM in a virtual machine, only with less overhead. Building the Docker ContainerStep 1: install Docker on your system. Most Linux distributions have Docker
in their repositories. It can also be installed from
docker.com. For Ubuntu you can use:
You can test the installation with: Note, in order to run Docker without root privileges, add your user to the
Step 2: to build the image, run the following command from a terminal within the root OpenWPM directory: docker build -f Dockerfile -t openwpm . After a few minutes, the container is ready to use. Running Measurements from inside the ContainerYou can run the demo measurement from inside the container, as follows: First of all, you need to give the container permissions on your local
X-server. You can do this by running: Then you can run the demo script using: mkdir -p docker-volume && docker run -v $PWD/docker-volume:/opt/OpenWPM/datadir \
-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --shm-size=2g \
-it openwpm Note: the This command uses bind-mounts to share scripts and output between the container and host, as explained below (note the paths in the command assume it's being run from the root OpenWPM directory):
Alternatively, it is possible to run jobs as the user openwpm in the container too, but this might cause problems with none headless browers. It is therefore only recommended for headless crawls. MacOS GUI applications in DockerRequirements: Install XQuartz by following these instructions. Given properly installed prerequisites (including a reboot), the helper script
To open a bash session within the environment:
Or, run commands directly:
CitationIf you use OpenWPM in your research, please cite our CCS 2016 publication on the infrastructure. You can use the following BibTeX.
OpenWPM has been used in over 75 studies. LicenseOpenWPM is licensed under GNU GPLv3. Additional code has been included from FourthParty and Privacy Badger, both of which are licensed GPLv3+. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论