Skip to content

Commit 93aaf0c

Browse files
authored
Merge pull request ckan#44 from qld-gov-au/QOLDEV-347-fix-ckan-2.10
QOLDEV-347 Prepare for CKAN 2.10
2 parents 963ada1 + b62aa6c commit 93aaf0c

31 files changed

+926
-1497
lines changed

.flake8

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,4 @@ max-line-length=127
1717

1818
# List ignore rules one per line.
1919
ignore =
20-
E501
21-
C901
2220
W503
23-
F401
24-
F403

.github/workflows/test.yml

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,22 +10,20 @@ jobs:
1010
lint:
1111
runs-on: ubuntu-latest
1212
steps:
13-
- uses: actions/checkout@v2
14-
- uses: actions/setup-python@v2
13+
- uses: actions/checkout@v3
14+
- uses: actions/setup-python@v4
1515
with:
16-
python-version: '3.x'
16+
python-version: '3.10'
1717
- name: Install requirements
1818
run: pip install flake8 pycodestyle
1919
- name: Check syntax
2020
run: flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics --extend-exclude ckan
21-
#- name: Run flake8
22-
# run: flake8 . --count --max-line-length=127 --statistics --exclude ckan
2321

2422
test:
2523
needs: lint
2624
strategy:
2725
matrix:
28-
ckan-version: ["2.10", 2.9, 2.9-py2, 2.8, 2.7]
26+
ckan-version: ["2.10", 2.9]
2927
fail-fast: false
3028

3129
name: CKAN ${{ matrix.ckan-version }}
@@ -54,7 +52,7 @@ jobs:
5452
CKAN_REDIS_URL: redis://redis:6379/1
5553

5654
steps:
57-
- uses: actions/checkout@v2
55+
- uses: actions/checkout@v3
5856
- name: Install requirements
5957
run: |
6058
pip install -r requirements.txt
@@ -64,17 +62,7 @@ jobs:
6462
# Replace default path to CKAN core config file with the one on the container
6563
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
6664
- name: Setup extension (CKAN >= 2.9)
67-
if: ${{ matrix.ckan-version != '2.7' && matrix.ckan-version != '2.8' }}
6865
run: |
6966
ckan -c test.ini db init
70-
- name: Setup extension (CKAN 2.8)
71-
if: ${{ matrix.ckan-version == '2.8' }}
72-
run: |
73-
paster --plugin=ckan db init -c test.ini
74-
- name: Setup extension (CKAN 2.7)
75-
if: ${{ matrix.ckan-version == '2.7' }}
76-
run: |
77-
psql -d "postgresql://datastore_write:pass@postgres/datastore_test" -f full_text_function.sql
78-
paster --plugin=ckan db init -c test.ini
7967
- name: Run tests
8068
run: pytest --ckan-ini=test.ini --cov=ckanext.xloader --disable-warnings ckanext/xloader/tests

MANIFEST.in

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
include full_text_function.sql
21
include *requirements*.txt
32
include CHANGELOG
43
include LICENSE
54
include README.rst
5+
include ckanext/xloader/config_declaration.yaml
66
recursive-include ckanext/xloader/templates *.html
7-
recursive-include ckanext/xloader/templates-bs2 *.html

README.rst

Lines changed: 27 additions & 132 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ DataPusher - job queue is done by ckan-service-provider which is bespoke,
6969
complicated and stores jobs in its own database (sqlite by default).
7070

7171
XLoader - job queue is done by RQ, which is simpler, is backed by Redis, allows
72-
access to the CKAN model and is CKAN's default queue technology (since CKAN
73-
2.7). You can also debug jobs easily using pdb. Job results are stored in
72+
access to the CKAN model and is CKAN's default queue technology.
73+
You can also debug jobs easily using pdb. Job results are stored in
7474
Sqlite by default, and for production simply specify CKAN's database in the
7575
config and it's held there - easy.
7676

@@ -98,7 +98,7 @@ Caveat - column types
9898
Note: With XLoader, all columns are stored in DataStore's database as 'text'
9999
type (whereas DataPusher did some rudimentary type guessing - see 'Robustness'
100100
above). However once a resource is xloaded, an admin can use the resource's
101-
Data Dictionary tab (CKAN 2.7 onwards) to change these types to numeric or
101+
Data Dictionary tab to change these types to numeric or
102102
datestamp and re-load the file. When migrating from DataPusher to XLoader you
103103
can preserve the types of existing resources by using the ``migrate_types``
104104
command.
@@ -116,13 +116,10 @@ Compatibility with core CKAN versions:
116116
=============== =============
117117
CKAN version Compatibility
118118
=============== =============
119-
2.3 no longer tested and you must install ckanext-rq
120-
2.4 no longer tested and you must install ckanext-rq
121-
2.5 no longer tested and you must install ckanext-rq
122-
2.6 no longer tested and you must install ckanext-rq
123-
2.7 yes
124-
2.8 yes
125-
2.9 yes (both Python2 and Python3)
119+
2.7 no longer supported (last supported version: 0.12.2)
120+
2.8 no longer supported (last supported version: 0.12.2)
121+
2.9 yes (Python3) (last supported version for Python 2.7: 0.12.2))
122+
2.10 yes
126123
=============== =============
127124

128125
------------
@@ -144,24 +141,7 @@ To install XLoader:
144141
pip install -r https://raw.githubusercontent.com/ckan/ckanext-xloader/master/requirements.txt
145142
pip install -U requests[security]
146143

147-
4. If you are using CKAN version before 2.8.x you need to define the
148-
``populate_full_text_trigger`` in your database
149-
::
150-
151-
sudo -u postgres psql datastore_default -f full_text_function.sql
152-
153-
If successful it will print
154-
::
155-
156-
CREATE FUNCTION
157-
ALTER FUNCTION
158-
159-
NB this assumes you used the defaults for the database name and username.
160-
If in doubt, check your config's ``ckan.datastore.write_url``. If you don't have
161-
database name ``datastore_default`` and username ``ckan_default`` then adjust
162-
the psql option and ``full_text_function.sql`` before running this.
163-
164-
5. Add ``xloader`` to the ``ckan.plugins`` setting in your CKAN
144+
4. Add ``xloader`` to the ``ckan.plugins`` setting in your CKAN
165145
config file (by default the config file is located at
166146
``/etc/ckan/default/production.ini``).
167147

@@ -170,12 +150,12 @@ To install XLoader:
170150

171151
Ensure ``datastore`` is also listed, to enable CKAN DataStore.
172152

173-
6. Starting CKAN 2.10 you will need to set an API Token to be able to
153+
5. Starting CKAN 2.10 you will need to set an API Token to be able to
174154
execute jobs against the server::
175155

176156
ckanext.xloader.api_token = <your-CKAN-generated-API-Token>
177157

178-
7. If it is a production server, you'll want to store jobs info in a more
158+
6. If it is a production server, you'll want to store jobs info in a more
179159
robust database than the default sqlite file. It can happily use the main
180160
CKAN postgres db by adding this line to the config, but with the same value
181161
as you have for ``sqlalchemy.url``::
@@ -184,31 +164,13 @@ To install XLoader:
184164

185165
(This step can be skipped when just developing or testing.)
186166

187-
8. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::
167+
7. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::
188168

189169
sudo service apache2 reload
190170

191-
9. Run the worker. First test it on the command-line::
192-
193-
paster --plugin=ckan jobs -c /etc/ckan/default/ckan.ini worker
194-
195-
or if you have CKAN version 2.6.x or less (and are therefore using ckanext-rq)::
196-
197-
paster --plugin=ckanext-rq jobs -c /etc/ckan/default/ckan.ini worker
198-
199-
Test it will load a CSV ok by submitting a `CSV in the web interface <http://docs.ckan.org/projects/datapusher/en/latest/using.html#ckan-2-2-and-above>`_
200-
or in another shell::
171+
8. Run the worker::
201172

202-
paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini
203-
204-
Clearly, running the worker on the command-line is only for testing - for
205-
production services see:
206-
207-
http://docs.ckan.org/en/ckan-2.7.0/maintaining/background-tasks.html#using-supervisor
208-
209-
If you have CKAN version 2.6.x or less then you'll need to download
210-
`supervisor-ckan-worker.conf <https://raw.githubusercontent.com/ckan/ckan/master/ckan/config/supervisor-ckan-worker.conf>`_ and adjust the ``command`` to reference
211-
ckanext-rq.
173+
ckan -c /etc/ckan/default/ckan.ini jobs worker
212174

213175

214176
---------------
@@ -217,58 +179,7 @@ Config settings
217179

218180
Configuration:
219181

220-
::
221-
222-
# The connection string for the jobs database used by XLoader. The
223-
# default of an sqlite file is fine for development. For production use a
224-
# Postgresql database.
225-
ckanext.xloader.jobs_db.uri = sqlite:////tmp/xloader_jobs.db
226-
227-
# The formats that are accepted. If the value of the resource.format is
228-
# anything else then it won't be 'xloadered' to DataStore (and will therefore
229-
# only be available to users in the form of the original download/link).
230-
# Case insensitive.
231-
# (optional, defaults are listed in plugin.py - DEFAULT_FORMATS).
232-
ckanext.xloader.formats = csv application/csv xls application/vnd.ms-excel
233-
234-
# The maximum size of files to load into DataStore. In bytes. Default is 1 GB.
235-
ckanext.xloader.max_content_length = 1000000000
236-
237-
# To always use messytables to load data, instead of attempting a direct
238-
# PostgreSQL COPY, set this to True. This more closely matches the
239-
# DataPusher's behavior. It has the advantage that the column types
240-
# are guessed. However it is more error prone, far slower and you can't run
241-
# the CPU-intensive queue on a separate machine.
242-
ckanext.xloader.just_load_with_messytables = False
243-
244-
# The maximum time for the loading of a resource before it is aborted.
245-
# Give an amount in seconds. Default is 60 minutes
246-
ckanext.xloader.job_timeout = 3600
247-
248-
# Ignore the file hash when submitting to the DataStore, if set to True
249-
# resources are always submitted (if their format matches), if set to
250-
# False (default), resources are only submitted if their hash has changed.
251-
ckanext.xloader.ignore_hash = False
252-
253-
# When loading a file that is bigger than `max_content_length`, xloader can
254-
# still try and load some of the file, which is useful to display a
255-
# preview. Set this option to the desired number of lines/rows that it
256-
# loads in this case.
257-
# If the file-type is supported (CSV, TSV) an excerpt with the number of
258-
# `max_excerpt_lines` lines will be submitted while the `max_content_length`
259-
# is not exceeded.
260-
# If set to 0 (default) files that exceed the `max_content_length` will
261-
# not be loaded into the datastore.
262-
ckanext.xloader.max_excerpt_lines = 100
263-
264-
# Requests verifies SSL certificates for HTTPS requests. Setting verify to
265-
# False should only be enabled during local development or testing. Default
266-
# to True.
267-
ckanext.xloader.ssl_verify = True
268-
269-
# Uses a specific API token for the xloader_submit action instead of the
270-
# apikey of the site_user
271-
ckanext.xloader.api_token = ckan-provided-api-token
182+
See the extension's `config_declaration.yaml <ckanext/xloader/config_declaration.yaml>`_ file.
272183

273184

274185
------------------------
@@ -280,7 +191,7 @@ in the directory up from your local ckan repo::
280191

281192
git clone https://github.com/ckan/ckanext-xloader.git
282193
cd ckanext-xloader
283-
python setup.py develop
194+
pip install -e .
284195
pip install -r requirements.txt
285196
pip install -r dev-requirements.txt
286197

@@ -301,8 +212,8 @@ To upgrade from DataPusher to XLoader:
301212
``ckan.plugins`` line replace ``datapusher`` with ``xloader``.
302213

303214
4. (Optional) If you wish, you can disable the direct loading and continue to
304-
just use messytables - for more about this see the docs on config option:
305-
``ckanext.xloader.just_load_with_messytables``
215+
just use tabulator - for more about this see the docs on config option:
216+
``ckanext.xloader.use_type_guessing``
306217

307218
5. Stop the datapusher worker::
308219

@@ -322,35 +233,31 @@ command-line interface.
322233

323234
e.g. ::
324235

325-
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>
326-
[pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini
236+
ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>
327237

328238
For debugging you can try xloading it synchronously (which does the load
329239
directly, rather than asking the worker to do it) with the ``-s`` option::
330240

331-
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s
332-
[pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -s -c /etc/ckan/default/ckan.ini
241+
ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s
333242

334243
See the status of jobs::
335244

336-
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader status
337-
[pre-2.9] paster --plugin=ckanext-xloader xloader status -c /etc/ckan/default/development.ini
245+
ckan -c /etc/ckan/default/ckan.ini xloader status
338246

339247
Submit all datasets' resources to the DataStore::
340248

341-
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all
342-
[pre-2.9] paster --plugin=ckanext-xloader xloader submit all -c /etc/ckan/default/ckan.ini
249+
ckan -c /etc/ckan/default/ckan.ini xloader submit all
343250

344251
Re-submit all the resources already in the DataStore (Ignores any resources
345252
that have not been stored in DataStore e.g. because they are not tabular)::
346253

347-
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing
348-
[pre-2.9] paster --plugin=ckanext-xloader xloader submit all-existing -c /etc/ckan/default/ckan.ini
254+
ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing
255+
349256

350257
**Full list of XLoader CLI commands**::
351258

352-
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader --help
353-
[pre-2.9] paster --plugin=ckanext-xloader xloader --help
259+
ckan -c /etc/ckan/default/ckan.ini xloader --help
260+
354261

355262
Jobs and workers
356263
----------------
@@ -363,8 +270,7 @@ Useful commands:
363270

364271
Clear (delete) all outstanding jobs::
365272

366-
CKAN 2.9, Python 3 ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]
367-
CKAN <2.9, Python 2 paster --plugin=ckanext-xloader xloader jobs clear [QUEUES] -c /etc/ckan/default/development.ini
273+
ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]
368274

369275
If having trouble with the worker process, restarting it can help::
370276

@@ -385,13 +291,6 @@ exist**
385291
Your DataStore permissions have not been set-up - see:
386292
<https://docs.ckan.org/en/latest/maintaining/datastore.html#set-permissions>
387293

388-
**When editing a package, all its existing resources get re-loaded by xloader**
389-
390-
This behavior was documented in
391-
`Issue 75 <https://github.com/ckan/ckanext-xloader/issues/75>`_ and is related
392-
to a bug in CKAN that is fixed in versions 2.6.9, 2.7.7, 2.8.4
393-
and 2.9.0+.
394-
395294
-----------------
396295
Running the Tests
397296
-----------------
@@ -402,12 +301,8 @@ The first time, your test datastore database needs the trigger applied::
402301

403302
To run the tests, do::
404303

405-
nosetests --nologcapture --with-pylons=test.ini
406-
407-
To run the tests and produce a coverage report, first make sure you have
408-
coverage installed in your virtualenv (``pip install coverage``) then run::
304+
pytest ckan-ini=test.ini ckanext/xloader/tests
409305

410-
nosetests --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.xloader --cover-inclusive --cover-erase --cover-tests
411306

412307
----------------------------------
413308
Releasing a New Version of XLoader

ckanext/xloader/action.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from ckan.logic import side_effect_free
1111
import ckan.plugins as p
1212
from dateutil.parser import parse as parse_date
13-
from six import text_type as str
13+
from dateutil.parser import isoparse as parse_iso_date
1414

1515
import ckanext.xloader.schema
1616

@@ -99,8 +99,7 @@ def xloader_submit(context, data_dict):
9999
for job in get_queue().get_jobs()
100100
if 'xloader_to_datastore' in str(job) # filter out test_job etc
101101
]
102-
updated = datetime.datetime.strptime(
103-
existing_task['last_updated'], '%Y-%m-%dT%H:%M:%S.%f')
102+
updated = parse_iso_date(existing_task['last_updated'])
104103
time_since_last_updated = datetime.datetime.utcnow() - updated
105104
if (res_id not in queued_res_ids
106105
and time_since_last_updated > assume_task_stillborn_after):
@@ -158,11 +157,6 @@ def xloader_submit(context, data_dict):
158157
job = enqueue_job(
159158
jobs.xloader_data_into_datastore, [data], rq_kwargs=dict(timeout=timeout)
160159
)
161-
except TypeError:
162-
# This except provides support for 2.7.
163-
job = _enqueue(
164-
jobs.xloader_data_into_datastore, [data], timeout=timeout
165-
)
166160
except Exception:
167161
log.exception('Unable to enqueued xloader res_id=%s', res_id)
168162
return False

ckanext/xloader/command.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -119,10 +119,7 @@ def _submit_resource(self, resource, user, indent=0):
119119
self.error_occured = True
120120

121121
def print_status(self):
122-
try:
123-
import ckan.lib.jobs as rq_jobs
124-
except ImportError:
125-
import ckanext.rq.jobs as rq_jobs
122+
import ckan.lib.jobs as rq_jobs
126123
jobs = rq_jobs.get_queue().jobs
127124
if not jobs:
128125
print('No jobs currently queued')

0 commit comments

Comments
 (0)