-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A collection of processors to clean-up field values #18
Comments
I think we have several options on how to implement it. Option 1: DecoratorsThe idea is that all processing logic would be implemented as decorators, which you can use to decorate extraction methods (later to be decorated with @field
@clean_str
def name(self):
return " some value \n\n\t" Pro:
Cons:
Option 1.1: Field decoratorsExtraction decorators can be pre-decorated with @clean_str_field
def name(self):
return " some value \n\n\t"
# or
from zyte_common_items import fields
# ...
@fields.clean_str
def name(self):
return " some value \n\n\t"
# or
from zyte_common_items.product import fields
# ...
@fields.name
def name(self):
return " some value \n\n\t" It's very similar to (1. Decorators) approach. Pros and Cons compared to (1): Pro:
Cons:
To be continued with other options :) |
Option 2. Regular functionsOne can write regular processing functions, and use them in the fields: @field
def name(self):
return clean_str(" some value \n\n\t") Pros:
Cons:
Option 3:
|
Option 5: combine Option 3 and Option 1.1We can implement Option 3, and provide some helpers for creation of combined "fields + processing" decorators. from functools import partial
from web_poet import field
name_field = partial(field, out=[clean_str])
# ...
# parameters work
@name_field(cached=True)
def name(self):
return " some value \n\n\t" We may have something more advanced to support customizing I'm currently in favor of going with Option 5, because
|
+1 to 3, +0.5 to 5. I find If a long clean_name = [clean_str]
@field(cache=True, out=clean_name)
def name(self):
pass |
Closing this, as processors feature is implemented in web-poet, and zyte-common-items already gained some built-in processors. |
Stemming off from the discussion in #15.
We need a library of functions that can be used to preprocess the data values placed into the item. Ideally, this should be used by https://github.com/scrapinghub/web-poet fields. For example:
A guideline in web-poet should be created as well to properly document the processors found in zyte-common-items.
The text was updated successfully, but these errors were encountered: