Skip to content

Commit

Permalink
Refactoring and improvements (#81)
Browse files Browse the repository at this point in the history
  • Loading branch information
fcanobrash committed Sep 21, 2020
1 parent b865832 commit b58da97
Show file tree
Hide file tree
Showing 13 changed files with 918 additions and 669 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
build/
dist/
*.egg-info/
.python-version
__pycache__
.tox/
.direnv/
.envrc
152 changes: 115 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,25 @@
# Scrapy Autounit

[![AppVeyor](https://ci.appveyor.com/api/projects/status/github/scrapinghub/scrapy-autounit?branch=master&svg=true)](https://ci.appveyor.com/project/scrapinghub/scrapy-autounit/branch/master)
[![PyPI Version](https://img.shields.io/pypi/v/scrapy-autounit.svg?color=blue)](https://pypi.python.org/pypi/scrapy-autounit/)
[![PyPI Version](https://img.shields.io/pypi/v/scrapy-autounit.svg?color=blue)](https://pypi.python.org/pypi/scrapy-autounit/)
 
## Documentation
- [Overview](#overview)
- [Installation](#installation)
- [Usage](#usage)
- [Caveats](#caveats)
- [Settings](#settings)
- [Command Line Interface](#command-line-interface)
- [Internals](#internals)
 

## Overview

Scrapy-Autounit is an automatic test generation tool for your Scrapy spiders.

It generates test fixtures and tests cases as you run your spiders.
The test fixtures are generated from the items and requests that your spider yields, then the test cases evaluate those fixtures against your spiders' callbacks.

The fixtures are generated from the items and requests that your spider returns, then the test cases evaluate those fixtures against your spiders' callbacks.

Scrapy Autounit generates fixtures and tests per spider and callback under the Scrapy project root directory.
Here is an example of the directory tree of your project once the fixtures are created:
Expand Down Expand Up @@ -36,12 +47,14 @@ my_project
│   └── my_spider.py
└── scrapy.cfg
```
 

## Installation

```
pip install scrapy_autounit
```
 

## Usage

Expand All @@ -62,74 +75,92 @@ To generate your fixtures and tests just run your spiders as usual, Scrapy Autou
$ scrapy crawl my_spider
```
When the spider finishes, a directory `autounit` is created in your project root dir, containing all the generated tests/fixtures for the spider you just ran (see the directory tree example above).
If you want to **update** your tests and fixtures you only need to run your spiders again.

If you want to **update** your tests and fixtures you only need to run your spiders again or use the [`autounit update`](#autounit-update) command line tool.

### Running tests
To run your tests you can use `unittest` regular commands.

###### Test all
```
$ python -m unittest
$ python -m unittest discover autounit/tests/
```
###### Test a specific spider
```
$ python -m unittest discover -s autounit.tests.my_spider
$ python -m unittest discover autounit/tests/my_spider/
```
###### Test a specific callback
```
$ python -m unittest discover -s autounit.tests.my_spider.my_callback
```
###### Test a specific fixture
```
$ python -m unittest autounit.tests.my_spider.my_callback.test_fixture2
$ python -m unittest discover autounit/tests/my_spider/my_callback/
```
 

## Caveats
- Keep in mind that as long as `AUTOUNIT_ENABLED` is on, each time you run a spider tests/fixtures are going to be generated for its callbacks.
This means that if you have your tests/fixtures ready to go, this setting should be off to prevent undesired overwrites.
Each time you want to regenerate your tests (e.g.: due to changes in your spiders), you can turn this on again and run your spiders as usual.
For example, this setting should be off when running your spiders in Scrapy Cloud.

- Autounit uses an internal `_autounit` key in requests' meta dictionaries. Avoid using/overriding this key in your spiders when adding data to meta to prevent unexpected behaviours.
- Autounit uses an internal `_autounit_cassette` key in requests' meta dictionaries. Avoid using/overriding this key in your spiders when adding data to meta to prevent unexpected behaviours.
 

## Settings

**AUTOUNIT_ENABLED**
###### General

- **AUTOUNIT_ENABLED**
Set this to `True` or `False` to enable or disable unit test generation.

**AUTOUNIT_MAX_FIXTURES_PER_CALLBACK**
- **AUTOUNIT_MAX_FIXTURES_PER_CALLBACK**
Sets the maximum number of fixtures to store per callback.
`Minimum: 10`
`Default: 10`

**AUTOUNIT_SKIPPED_FIELDS**
- **AUTOUNIT_EXTRA_PATH**
This is an extra string element to add to the test path and name between the spider name and callback name. You can use this to separate tests from the same spider with different configurations.
`Default: None`

###### Output

- **AUTOUNIT_DONT_TEST_OUTPUT_FIELDS**
Sets a list of fields to be skipped from testing your callbacks' items. It's useful to bypass fields that return a different value on each run.
For example if you have a field that is always set to `datetime.now()` in your spider, you probably want to add that field to this list to be skipped on tests. Otherwise you'll get a different value when you're generating your fixtures than when you're running your tests, making your tests fail.
`Default: []`

**AUTOUNIT_REQUEST_SKIPPED_FIELDS**
Sets a list of request fields to be skipped when running your tests.
Similar to AUTOUNIT_SKIPPED_FIELDS but applied to requests instead of items.
###### Requests

- **AUTOUNIT_DONT_TEST_REQUEST_ATTRS**
Sets a list of request attributes to be skipped when running your tests.
`Default: []`

**AUTOUNIT_EXCLUDED_HEADERS**
- **AUTOUNIT_DONT_RECORD_HEADERS**
Sets a list of headers to exclude from requests recording.
For security reasons, Autounit already excludes `Authorization` and `Proxy-Authorization` headers by default, if you want to include them in your fixtures see *`AUTOUNIT_INCLUDED_AUTH_HEADERS`*.
For security reasons, Autounit already excludes `Authorization` and `Proxy-Authorization` headers by default, if you want to record them in your fixtures see *`AUTOUNIT_RECORD_AUTH_HEADERS`*.
`Default: []`

**AUTOUNIT_INCLUDED_AUTH_HEADERS**
- **AUTOUNIT_RECORD_AUTH_HEADERS**
If you want to include `Authorization` or `Proxy-Authorization` headers in your fixtures, add one or both of them to this list.
`Default: []`

**AUTOUNIT_INCLUDED_SETTINGS**
Sets a list of settings names to be recorded in the generated test case.
###### Spider attributes

- **AUTOUNIT_DONT_RECORD_SPIDER_ATTRS**
Sets a list of spider attributes that won't be recorded into your fixtures.
`Default: []`

**AUTOUNIT_EXTRA_PATH**
This is an extra string element to add to the test path and name between the spider name and callback name. You can use this to separate tests from the same spider with different configurations.
`Default: None`
- **AUTOUNIT_DONT_TEST_SPIDER_ATTRS**
Sets a list of spider attributes to be skipped from testing your callbacks. These attributes will still be recorded.
`Default: []`

###### Settings

- **AUTOUNIT_RECORD_SETTINGS**
Sets a list of settings names to be recorded in the generated test case.
`Default: []`

---
**Note**: Remember that you can always apply any of these settings per spider including them in your spider's `custom_settings` class attribute - see https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider.
**Note**: Remember that you can always apply any of these settings per spider including them in your spider's `custom_settings` class attribute - see https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider.
 

## Command line interface

Expand Down Expand Up @@ -162,20 +193,26 @@ The original request that triggered the callback.
***`response`***
The response obtained from the original request and passed to the callback.

***`result`***
***`output_data`***
The callback's output such as items and requests.
_Same as ***`result`*** prior to v0.0.28._

***`middlewares`***
The relevant middlewares to replicate when running the tests.

***`settings`***
The settings explicitly recorded by the *`AUTOUNIT_INCLUDED_SETTINGS`* setting.

***`spider_args`***
The arguments passed to the spider in the crawl.
***`init_attrs`***
The spider's attributes right after its _\_\_init\_\__ call.

***`input_attrs`***
The spider's attributes right before running the callback.
_Same as ***`spider_args`*** or ***`spider_args_in`*** prior to v0.0.28._

***`python_version`***
Indicates if the fixture was recorded in python 2 or 3.
***`output_attrs`***
The spider's attributes right after running the callback.
_Same as ***`spider_args_out`*** prior to v0.0.28._

Then for example, to inspect a fixture's specific request we can do the following:
```
Expand All @@ -184,12 +221,53 @@ $ autounit inspect my_spider my_callback 4 | jq '.request'

### `autounit update`

You can update your fixtures to match your latest changes in a particular callback to avoid running the whole spider.
For example, this updates all the fixtures for a specific callback:
This command updates your fixtures to match your latest changes, avoiding to run the whole spider again.
You can update the whole project, an entire spider, just a callback or a single fixture.

###### Update the whole project
```
$ autounit update
WARNING: this will update all the existing fixtures from the current project
Do you want to continue? (y/n)
```

###### Update every callback in a spider
```
$ autounit update -s my_spider
```

###### Update every fixture in a callback
```
$ autounit update -s my_spider -c my_callback
```

###### Update a single fixture
```
$ autounit update my_spider my_callback
# Update fixture number 5
$ autounit update -s my_spider -c my_callback -f 5
```
Optionally you can specify a particular fixture to update with `-f` or `--fixture`:
 

## Internals

The `AutounitMiddleware` uses a [`Recorder`](scrapy_autounit/recorder.py) to record [`Cassettes`](scrapy_autounit/cassette.py) in binary fixtures.

Then, the tests use a [`Player`](scrapy_autounit/player.py) to playback those `Cassettes` and compare its output against your current callbacks.

The fixtures contain a pickled and compressed `Cassette` instance that you can get programmatically by doing:
```python
from scrapy_autounit.cassette import Cassette

cassette = Cassette.from_fixture(path_to_your_fixture)
# cassette.request
# cassette.response
# cassette.output_data
# ...
```

If you know what you're doing, you can modify that cassette and re-record it by using:
```python
from scrapy_autounit.recorder import Recorder

Recorder.update_fixture(cassette, path)
```
$ autounit update my_spider my_callback --fixture 4
```
98 changes: 98 additions & 0 deletions scrapy_autounit/cassette.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import pickle
import sys
import zlib

from scrapy.crawler import Crawler
from scrapy.utils.conf import build_component_list
from scrapy.utils.project import get_project_settings

from .utils import get_spider_class


class Cassette:
"""
Helper class to store request, response and output data.
"""
FIXTURE_VERSION = 2

def __init__(
self,
spider=None,
spider_name=None,
request=None,
response=None,
init_attrs=None,
input_attrs=None,
output_attrs=None,
output_data=None,
middlewares=None,
included_settings=None,
python_version=None,
filename=None,
):
self.spider_name = spider_name
self.middlewares = middlewares
self.included_settings = included_settings
if spider:
self.spider_name = spider.name
self.middlewares = self._get_middlewares(spider.settings)
self.included_settings = self._get_included_settings(spider.settings)

self.request = request
self.response = response
self.init_attrs = init_attrs
self.input_attrs = input_attrs
self.output_attrs = output_attrs
self.output_data = output_data
self.filename = filename
self.python_version = python_version or sys.version_info.major

@classmethod
def from_fixture(cls, fixture):
with open(fixture, 'rb') as f:
binary = f.read()
cassette = pickle.loads(zlib.decompress(binary))
return cassette

def _get_middlewares(self, settings):
full_list = build_component_list(settings.getwithbase('SPIDER_MIDDLEWARES'))
autounit_mw_path = list(filter(lambda x: x.endswith('AutounitMiddleware'), full_list))[0]
start = full_list.index(autounit_mw_path)
mw_paths = [mw for mw in full_list[start:] if mw != autounit_mw_path]
return mw_paths

def _get_included_settings(self, settings):
# Use the new setting, if empty, try the deprecated one
names = settings.getlist('AUTOUNIT_RECORD_SETTINGS', [])
if not names:
names = settings.getlist('AUTOUNIT_INCLUDED_SETTINGS', [])
included = {name: settings.get(name) for name in names}
return included

def get_spider(self):
settings = get_project_settings()
spider_cls = get_spider_class(self.spider_name, settings)

spider_cls.update_settings(settings)
for k, v in self.included_settings.items():
settings.set(k, v, priority=50)

crawler = Crawler(spider_cls, settings)
spider = spider_cls.from_crawler(crawler, **self.init_attrs)
return spider

def pack(self):
return zlib.compress(pickle.dumps(self, protocol=2))

def to_dict(self):
return {
'spider_name': self.spider_name,
'request': self.request,
'response': self.response,
'output_data': self.output_data,
'middlewares': self.middlewares,
'settings': self.included_settings,
'init_attrs': self.init_attrs,
'input_attrs': self.input_attrs,
'output_attrs': self.output_attrs,
}
Loading

0 comments on commit b58da97

Please sign in to comment.