Refactoring and improvements (#81)

scrapinghub · Sep 21, 2020 · b58da97 · b58da97
1 parent b865832
commit b58da97
Show file tree

Hide file tree

Showing 13 changed files with 918 additions and 669 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,8 @@
+build/
 dist/
 *.egg-info/
 .python-version
 __pycache__
+.tox/
+.direnv/
+.envrc
diff --git a/README.md b/README.md
@@ -1,14 +1,25 @@
 # Scrapy Autounit
 
 [![AppVeyor](https://ci.appveyor.com/api/projects/status/github/scrapinghub/scrapy-autounit?branch=master&svg=true)](https://ci.appveyor.com/project/scrapinghub/scrapy-autounit/branch/master)
-[![PyPI Version](https://img.shields.io/pypi/v/scrapy-autounit.svg?color=blue)](https://pypi.python.org/pypi/scrapy-autounit/)
+[![PyPI Version](https://img.shields.io/pypi/v/scrapy-autounit.svg?color=blue)](https://pypi.python.org/pypi/scrapy-autounit/)  
+&nbsp;
+## Documentation
+- [Overview](#overview)
+- [Installation](#installation)
+- [Usage](#usage)
+- [Caveats](#caveats)
+- [Settings](#settings)
+- [Command Line Interface](#command-line-interface)
+- [Internals](#internals)  
+&nbsp;
 
 ## Overview
 
 Scrapy-Autounit is an automatic test generation tool for your Scrapy spiders.
 
 It generates test fixtures and tests cases as you run your spiders.  
-The test fixtures are generated from the items and requests that your spider yields, then the test cases evaluate those fixtures against your spiders' callbacks.
+
+The fixtures are generated from the items and requests that your spider returns, then the test cases evaluate those fixtures against your spiders' callbacks.
 
 Scrapy Autounit generates fixtures and tests per spider and callback under the Scrapy project root directory.  
 Here is an example of the directory tree of your project once the fixtures are created:  
@@ -36,12 +47,14 @@ my_project
 │       └── my_spider.py
 └── scrapy.cfg
 ```
+&nbsp;
 
 ## Installation
 
 ```
 pip install scrapy_autounit
 ```
+&nbsp;
 
 ## Usage
 
@@ -62,74 +75,92 @@ To generate your fixtures and tests just run your spiders as usual, Scrapy Autou
 $ scrapy crawl my_spider
 ```
 When the spider finishes, a directory `autounit` is created in your project root dir, containing all the generated tests/fixtures for the spider you just ran (see the directory tree example above).  
-If you want to **update** your tests and fixtures you only need to run your spiders again.
+
+If you want to **update** your tests and fixtures you only need to run your spiders again or use the [`autounit update`](#autounit-update) command line tool.
 
 ### Running tests
 To run your tests you can use `unittest` regular commands.
 
 ###### Test all
 ```
-$ python -m unittest
+$ python -m unittest discover autounit/tests/
 ```
 ###### Test a specific spider
 ```
-$ python -m unittest discover -s autounit.tests.my_spider
+$ python -m unittest discover autounit/tests/my_spider/
 ```
 ###### Test a specific callback
 ```
-$ python -m unittest discover -s autounit.tests.my_spider.my_callback
-```
-###### Test a specific fixture
-```
-$ python -m unittest autounit.tests.my_spider.my_callback.test_fixture2
+$ python -m unittest discover autounit/tests/my_spider/my_callback/
 ```
+&nbsp;
 
 ## Caveats
 - Keep in mind that as long as `AUTOUNIT_ENABLED` is on, each time you run a spider tests/fixtures are going to be generated for its callbacks.  
 This means that if you have your tests/fixtures ready to go, this setting should be off to prevent undesired overwrites.  
 Each time you want to regenerate your tests (e.g.: due to changes in your spiders), you can turn this on again and run your spiders as usual.  
+For example, this setting should be off when running your spiders in Scrapy Cloud.  
 
-- Autounit uses an internal `_autounit` key in requests' meta dictionaries. Avoid using/overriding this key in your spiders when adding data to meta to prevent unexpected behaviours.  
+- Autounit uses an internal `_autounit_cassette` key in requests' meta dictionaries. Avoid using/overriding this key in your spiders when adding data to meta to prevent unexpected behaviours.  
+&nbsp;
 
 ## Settings
 
-**AUTOUNIT_ENABLED**  
+###### General
+
+- **AUTOUNIT_ENABLED**  
 Set this to `True` or `False` to enable or disable unit test generation.
 
-**AUTOUNIT_MAX_FIXTURES_PER_CALLBACK**  
+- **AUTOUNIT_MAX_FIXTURES_PER_CALLBACK**  
 Sets the maximum number of fixtures to store per callback.  
 `Minimum: 10`  
 `Default: 10`
 
-**AUTOUNIT_SKIPPED_FIELDS**  
+- **AUTOUNIT_EXTRA_PATH**  
+This is an extra string element to add to the test path and name between the spider name and callback name. You can use this to separate tests from the same spider with different configurations.  
+`Default: None`
+
+###### Output
+
+- **AUTOUNIT_DONT_TEST_OUTPUT_FIELDS**  
 Sets a list of fields to be skipped from testing your callbacks' items. It's useful to bypass fields that return a different value on each run.  
 For example if you have a field that is always set to `datetime.now()` in your spider, you probably want to add that field to this list to be skipped on tests. Otherwise you'll get a different value when you're generating your fixtures than when you're running your tests, making your tests fail.  
 `Default: []`
 
-**AUTOUNIT_REQUEST_SKIPPED_FIELDS**  
-Sets a list of request fields to be skipped when running your tests.  
-Similar to AUTOUNIT_SKIPPED_FIELDS but applied to requests instead of items.  
+###### Requests
+
+- **AUTOUNIT_DONT_TEST_REQUEST_ATTRS**  
+Sets a list of request attributes to be skipped when running your tests.  
 `Default: []`
 
-**AUTOUNIT_EXCLUDED_HEADERS**  
+- **AUTOUNIT_DONT_RECORD_HEADERS**  
 Sets a list of headers to exclude from requests recording.  
-For security reasons, Autounit already excludes `Authorization` and `Proxy-Authorization` headers by default, if you want to include them in your fixtures see *`AUTOUNIT_INCLUDED_AUTH_HEADERS`*.  
+For security reasons, Autounit already excludes `Authorization` and `Proxy-Authorization` headers by default, if you want to record them in your fixtures see *`AUTOUNIT_RECORD_AUTH_HEADERS`*.  
 `Default: []`  
 
-**AUTOUNIT_INCLUDED_AUTH_HEADERS**  
+- **AUTOUNIT_RECORD_AUTH_HEADERS**  
 If you want to include `Authorization` or `Proxy-Authorization` headers in your fixtures, add one or both of them to this list.  
 `Default: []`
 
-**AUTOUNIT_INCLUDED_SETTINGS**  
-Sets a list of settings names to be recorded in the generated test case.  
+###### Spider attributes
+
+- **AUTOUNIT_DONT_RECORD_SPIDER_ATTRS**  
+Sets a list of spider attributes that won't be recorded into your fixtures.  
 `Default: []`
 
-**AUTOUNIT_EXTRA_PATH**  
-This is an extra string element to add to the test path and name between the spider name and callback name. You can use this to separate tests from the same spider with different configurations.  
-`Default: None`
+- **AUTOUNIT_DONT_TEST_SPIDER_ATTRS**  
+Sets a list of spider attributes to be skipped from testing your callbacks. These attributes will still be recorded.  
+`Default: []`
+
+###### Settings
+
+- **AUTOUNIT_RECORD_SETTINGS**  
+Sets a list of settings names to be recorded in the generated test case.  
+`Default: []`
 
 ---
-**Note**: Remember that you can always apply any of these settings per spider including them in your spider's `custom_settings` class attribute - see https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider.
+**Note**: Remember that you can always apply any of these settings per spider including them in your spider's `custom_settings` class attribute - see https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider.  
+&nbsp;
 
 ## Command line interface
 
@@ -162,20 +193,26 @@ The original request that triggered the callback.
 ***`response`***  
 The response obtained from the original request and passed to the callback.  
 
-***`result`***  
+***`output_data`***  
 The callback's output such as items and requests.  
+_Same as ***`result`*** prior to v0.0.28._
 
 ***`middlewares`***  
 The relevant middlewares to replicate when running the tests.  
 
 ***`settings`***  
 The settings explicitly recorded by the *`AUTOUNIT_INCLUDED_SETTINGS`* setting.  
 
-***`spider_args`***  
-The arguments passed to the spider in the crawl.  
+***`init_attrs`***  
+The spider's attributes right after its _\_\_init\_\__ call.
+
+***`input_attrs`***  
+The spider's attributes right before running the callback.  
+_Same as ***`spider_args`*** or ***`spider_args_in`*** prior to v0.0.28._
 
-***`python_version`***  
-Indicates if the fixture was recorded in python 2 or 3.  
+***`output_attrs`***  
+The spider's attributes right after running the callback.  
+_Same as ***`spider_args_out`*** prior to v0.0.28._
 
 Then for example, to inspect a fixture's specific request we can do the following:
 ```
@@ -184,12 +221,53 @@ $ autounit inspect my_spider my_callback 4 | jq '.request'
 
 ### `autounit update`
 
-You can update your fixtures to match your latest changes in a particular callback to avoid running the whole spider.  
-For example, this updates all the fixtures for a specific callback:
+This command updates your fixtures to match your latest changes, avoiding to run the whole spider again.  
+You can update the whole project, an entire spider, just a callback or a single fixture.  
+
+###### Update the whole project
+```
+$ autounit update
+WARNING: this will update all the existing fixtures from the current project
+Do you want to continue? (y/n)
+```
+
+###### Update every callback in a spider
+```
+$ autounit update -s my_spider
+```
+
+###### Update every fixture in a callback
+```
+$ autounit update -s my_spider -c my_callback
+```
+
+###### Update a single fixture
 ```
-$ autounit update my_spider my_callback
+# Update fixture number 5
+$ autounit update -s my_spider -c my_callback -f 5
 ```
-Optionally you can specify a particular fixture to update with `-f` or `--fixture`:
+&nbsp;
+
+## Internals
+
+The `AutounitMiddleware` uses a [`Recorder`](scrapy_autounit/recorder.py) to record [`Cassettes`](scrapy_autounit/cassette.py) in binary fixtures.  
+
+Then, the tests use a [`Player`](scrapy_autounit/player.py) to playback those `Cassettes` and compare its output against your current callbacks.  
+
+The fixtures contain a pickled and compressed `Cassette` instance that you can get programmatically by doing:
+```python
+from scrapy_autounit.cassette import Cassette
+
+cassette = Cassette.from_fixture(path_to_your_fixture)
+# cassette.request
+# cassette.response
+# cassette.output_data
+# ...
+```
+
+If you know what you're doing, you can modify that cassette and re-record it by using:
+```python
+from scrapy_autounit.recorder import Recorder
+
+Recorder.update_fixture(cassette, path)
 ```
-$ autounit update my_spider my_callback --fixture 4
-```
diff --git a/scrapy_autounit/cassette.py b/scrapy_autounit/cassette.py
@@ -0,0 +1,98 @@
+import pickle
+import sys
+import zlib
+
+from scrapy.crawler import Crawler
+from scrapy.utils.conf import build_component_list
+from scrapy.utils.project import get_project_settings
+
+from .utils import get_spider_class
+
+
+class Cassette:
+    """
+    Helper class to store request, response and output data.
+    """
+    FIXTURE_VERSION = 2
+
+    def __init__(
+        self,
+        spider=None,
+        spider_name=None,
+        request=None,
+        response=None,
+        init_attrs=None,
+        input_attrs=None,
+        output_attrs=None,
+        output_data=None,
+        middlewares=None,
+        included_settings=None,
+        python_version=None,
+        filename=None,
+    ):
+        self.spider_name = spider_name
+        self.middlewares = middlewares
+        self.included_settings = included_settings
+        if spider:
+            self.spider_name = spider.name
+            self.middlewares = self._get_middlewares(spider.settings)
+            self.included_settings = self._get_included_settings(spider.settings)
+
+        self.request = request
+        self.response = response
+        self.init_attrs = init_attrs
+        self.input_attrs = input_attrs
+        self.output_attrs = output_attrs
+        self.output_data = output_data
+        self.filename = filename
+        self.python_version = python_version or sys.version_info.major
+
+    @classmethod
+    def from_fixture(cls, fixture):
+        with open(fixture, 'rb') as f:
+            binary = f.read()
+        cassette = pickle.loads(zlib.decompress(binary))
+        return cassette
+
+    def _get_middlewares(self, settings):
+        full_list = build_component_list(settings.getwithbase('SPIDER_MIDDLEWARES'))
+        autounit_mw_path = list(filter(lambda x: x.endswith('AutounitMiddleware'), full_list))[0]
+        start = full_list.index(autounit_mw_path)
+        mw_paths = [mw for mw in full_list[start:] if mw != autounit_mw_path]
+        return mw_paths
+
+    def _get_included_settings(self, settings):
+        # Use the new setting, if empty, try the deprecated one
+        names = settings.getlist('AUTOUNIT_RECORD_SETTINGS', [])
+        if not names:
+            names = settings.getlist('AUTOUNIT_INCLUDED_SETTINGS', [])
+        included = {name: settings.get(name) for name in names}
+        return included
+
+    def get_spider(self):
+        settings = get_project_settings()
+        spider_cls = get_spider_class(self.spider_name, settings)
+
+        spider_cls.update_settings(settings)
+        for k, v in self.included_settings.items():
+            settings.set(k, v, priority=50)
+
+        crawler = Crawler(spider_cls, settings)
+        spider = spider_cls.from_crawler(crawler, **self.init_attrs)
+        return spider
+
+    def pack(self):
+        return zlib.compress(pickle.dumps(self, protocol=2))
+
+    def to_dict(self):
+        return {
+            'spider_name': self.spider_name,
+            'request': self.request,
+            'response': self.response,
+            'output_data': self.output_data,
+            'middlewares': self.middlewares,
+            'settings': self.included_settings,
+            'init_attrs': self.init_attrs,
+            'input_attrs': self.input_attrs,
+            'output_attrs': self.output_attrs,
+        }