jwm.robotstxt

Python Wrapper for Googles Robotstxt Parser

Provides python access to Googles parser for robot.txt files as used by their GoogleBot webscraper.

Websites may provide an optional robots.txt file in their domains root to govern the access and behavior of web scrapers. One of the most famous webscrapers GoogleBot is responsible for promoting this standard and sites interested in SEO will closely conform to GoogleBot behavior.

All credit for the parser goes to the Google team who created, open sourced and promoted it.

SEO (Search Engine Optimization): The process of modifying a websites content or metadata to boost rankings in search engines page indexes. Higher rankings lead to higher positions in user searches leading to more visitors. For further details, see the SEO wikipedia page.

Usage

Basic usage using the RobotsMatcher class provided by Google.

import jwm.robotstxt.googlebot

robotstxt = """
user-agent: GoodBot
allowed: /path
"""

matcher = jwm.robotstxt.googlebot.RobotsMatcher()
assert matcher.AllowedByRobots(robotstxt, ("GoodBot",), "/path")

Check out the documentation for further details. For more use cases, see the test cases for jwm.robotstxt and robotstxt.

Installation

Install from Pypi under the jwm.robotstxt distribution.

pip install jwm.robotstxt

Import into your program through the jwm.robotstxt.googlebot package.

import jwm.robotstxt.googlebot

Virtual Environment

It is highly recommended to install python projects into a virtual environment, see PEP405 for motivations.

Create a virtual environment in the .venv directory.

python3 -m venv ./.venv

Activate with the correct command for your system.

# Linux/MacOS
. ./.venv/bin/activate

# Windows
.\.venv\Scripts\activate

Installing from source

Make sure you have cloned the repository and its submodules.

git clone --recurse-submodules https://github.com/jwmorley73/jwm.robotstxt.git

Install the project using pip. This will build the required robotstxt static library files and link them into the produced python package.

pip install .

If you want to include the developer tooling, add the dev optional dependencies.

pip install .[dev]

Known Issues

Windows 32 bit is not supported.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
docs		docs
robotstxt @ a732377		robotstxt @ a732377
src/jwm/robotstxt		src/jwm/robotstxt
tests/jwm/robotstxt		tests/jwm/robotstxt
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

jwm.robotstxt

Python Wrapper for Googles Robotstxt Parser

Usage

Installation

Virtual Environment

Installing from source

Known Issues

About

Uh oh!

Releases 3

Uh oh!

Languages

License

jwmorley73/jwm.robotstxt

Folders and files

Latest commit

History

Repository files navigation

jwm.robotstxt

Python Wrapper for Googles Robotstxt Parser

Usage

Installation

Virtual Environment

Installing from source

Known Issues

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Languages