@@ -69,8 +69,8 @@ DataPusher - job queue is done by ckan-service-provider which is bespoke,
69
69
complicated and stores jobs in its own database (sqlite by default).
70
70
71
71
XLoader - job queue is done by RQ, which is simpler, is backed by Redis, allows
72
- access to the CKAN model and is CKAN's default queue technology (since CKAN
73
- 2.7). You can also debug jobs easily using pdb. Job results are stored in
72
+ access to the CKAN model and is CKAN's default queue technology.
73
+ You can also debug jobs easily using pdb. Job results are stored in
74
74
Sqlite by default, and for production simply specify CKAN's database in the
75
75
config and it's held there - easy.
76
76
@@ -98,7 +98,7 @@ Caveat - column types
98
98
Note: With XLoader, all columns are stored in DataStore's database as 'text'
99
99
type (whereas DataPusher did some rudimentary type guessing - see 'Robustness'
100
100
above). However once a resource is xloaded, an admin can use the resource's
101
- Data Dictionary tab (CKAN 2.7 onwards) to change these types to numeric or
101
+ Data Dictionary tab to change these types to numeric or
102
102
datestamp and re-load the file. When migrating from DataPusher to XLoader you
103
103
can preserve the types of existing resources by using the ``migrate_types ``
104
104
command.
@@ -116,13 +116,10 @@ Compatibility with core CKAN versions:
116
116
=============== =============
117
117
CKAN version Compatibility
118
118
=============== =============
119
- 2.3 no longer tested and you must install ckanext-rq
120
- 2.4 no longer tested and you must install ckanext-rq
121
- 2.5 no longer tested and you must install ckanext-rq
122
- 2.6 no longer tested and you must install ckanext-rq
123
- 2.7 yes
124
- 2.8 yes
125
- 2.9 yes (both Python2 and Python3)
119
+ 2.7 no longer supported (last supported version: 0.12.2)
120
+ 2.8 no longer supported (last supported version: 0.12.2)
121
+ 2.9 yes (Python3) (last supported version for Python 2.7: 0.12.2))
122
+ 2.10 yes
126
123
=============== =============
127
124
128
125
------------
@@ -144,24 +141,7 @@ To install XLoader:
144
141
pip install -r https://raw.githubusercontent.com/ckan/ckanext-xloader/master/requirements.txt
145
142
pip install -U requests[security]
146
143
147
- 4. If you are using CKAN version before 2.8.x you need to define the
148
- ``populate_full_text_trigger `` in your database
149
- ::
150
-
151
- sudo -u postgres psql datastore_default -f full_text_function.sql
152
-
153
- If successful it will print
154
- ::
155
-
156
- CREATE FUNCTION
157
- ALTER FUNCTION
158
-
159
- NB this assumes you used the defaults for the database name and username.
160
- If in doubt, check your config's ``ckan.datastore.write_url ``. If you don't have
161
- database name ``datastore_default `` and username ``ckan_default `` then adjust
162
- the psql option and ``full_text_function.sql `` before running this.
163
-
164
- 5. Add ``xloader `` to the ``ckan.plugins `` setting in your CKAN
144
+ 4. Add ``xloader `` to the ``ckan.plugins `` setting in your CKAN
165
145
config file (by default the config file is located at
166
146
``/etc/ckan/default/production.ini ``).
167
147
@@ -170,12 +150,12 @@ To install XLoader:
170
150
171
151
Ensure ``datastore `` is also listed, to enable CKAN DataStore.
172
152
173
- 6 . Starting CKAN 2.10 you will need to set an API Token to be able to
153
+ 5 . Starting CKAN 2.10 you will need to set an API Token to be able to
174
154
execute jobs against the server::
175
155
176
156
ckanext.xloader.api_token = <your-CKAN-generated-API-Token>
177
157
178
- 7 . If it is a production server, you'll want to store jobs info in a more
158
+ 6 . If it is a production server, you'll want to store jobs info in a more
179
159
robust database than the default sqlite file. It can happily use the main
180
160
CKAN postgres db by adding this line to the config, but with the same value
181
161
as you have for ``sqlalchemy.url ``::
@@ -184,31 +164,13 @@ To install XLoader:
184
164
185
165
(This step can be skipped when just developing or testing.)
186
166
187
- 8 . Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::
167
+ 7 . Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::
188
168
189
169
sudo service apache2 reload
190
170
191
- 9. Run the worker. First test it on the command-line::
192
-
193
- paster --plugin=ckan jobs -c /etc/ckan/default/ckan.ini worker
194
-
195
- or if you have CKAN version 2.6.x or less (and are therefore using ckanext-rq)::
196
-
197
- paster --plugin=ckanext-rq jobs -c /etc/ckan/default/ckan.ini worker
198
-
199
- Test it will load a CSV ok by submitting a `CSV in the web interface <http://docs.ckan.org/projects/datapusher/en/latest/using.html#ckan-2-2-and-above>`_
200
- or in another shell::
171
+ 8. Run the worker::
201
172
202
- paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini
203
-
204
- Clearly, running the worker on the command-line is only for testing - for
205
- production services see:
206
-
207
- http://docs.ckan.org/en/ckan-2.7.0/maintaining/background-tasks.html#using-supervisor
208
-
209
- If you have CKAN version 2.6.x or less then you'll need to download
210
- `supervisor-ckan-worker.conf <https://raw.githubusercontent.com/ckan/ckan/master/ckan/config/supervisor-ckan-worker.conf>`_ and adjust the ``command`` to reference
211
- ckanext-rq.
173
+ ckan -c /etc/ckan/default/ckan.ini jobs worker
212
174
213
175
214
176
---------------
@@ -217,58 +179,7 @@ Config settings
217
179
218
180
Configuration:
219
181
220
- ::
221
-
222
- # The connection string for the jobs database used by XLoader. The
223
- # default of an sqlite file is fine for development. For production use a
224
- # Postgresql database.
225
- ckanext.xloader.jobs_db.uri = sqlite:////tmp/xloader_jobs.db
226
-
227
- # The formats that are accepted. If the value of the resource.format is
228
- # anything else then it won't be 'xloadered' to DataStore (and will therefore
229
- # only be available to users in the form of the original download/link).
230
- # Case insensitive.
231
- # (optional, defaults are listed in plugin.py - DEFAULT_FORMATS).
232
- ckanext.xloader.formats = csv application/csv xls application/vnd.ms-excel
233
-
234
- # The maximum size of files to load into DataStore. In bytes. Default is 1 GB.
235
- ckanext.xloader.max_content_length = 1000000000
236
-
237
- # To always use messytables to load data, instead of attempting a direct
238
- # PostgreSQL COPY, set this to True. This more closely matches the
239
- # DataPusher's behavior. It has the advantage that the column types
240
- # are guessed. However it is more error prone, far slower and you can't run
241
- # the CPU-intensive queue on a separate machine.
242
- ckanext.xloader.just_load_with_messytables = False
243
-
244
- # The maximum time for the loading of a resource before it is aborted.
245
- # Give an amount in seconds. Default is 60 minutes
246
- ckanext.xloader.job_timeout = 3600
247
-
248
- # Ignore the file hash when submitting to the DataStore, if set to True
249
- # resources are always submitted (if their format matches), if set to
250
- # False (default), resources are only submitted if their hash has changed.
251
- ckanext.xloader.ignore_hash = False
252
-
253
- # When loading a file that is bigger than `max_content_length`, xloader can
254
- # still try and load some of the file, which is useful to display a
255
- # preview. Set this option to the desired number of lines/rows that it
256
- # loads in this case.
257
- # If the file-type is supported (CSV, TSV) an excerpt with the number of
258
- # `max_excerpt_lines` lines will be submitted while the `max_content_length`
259
- # is not exceeded.
260
- # If set to 0 (default) files that exceed the `max_content_length` will
261
- # not be loaded into the datastore.
262
- ckanext.xloader.max_excerpt_lines = 100
263
-
264
- # Requests verifies SSL certificates for HTTPS requests. Setting verify to
265
- # False should only be enabled during local development or testing. Default
266
- # to True.
267
- ckanext.xloader.ssl_verify = True
268
-
269
- # Uses a specific API token for the xloader_submit action instead of the
270
- # apikey of the site_user
271
- ckanext.xloader.api_token = ckan-provided-api-token
182
+ See the extension's `config_declaration.yaml <ckanext/xloader/config_declaration.yaml >`_ file.
272
183
273
184
274
185
------------------------
@@ -280,7 +191,7 @@ in the directory up from your local ckan repo::
280
191
281
192
git clone https://github.com/ckan/ckanext-xloader.git
282
193
cd ckanext-xloader
283
- python setup.py develop
194
+ pip install -e .
284
195
pip install -r requirements.txt
285
196
pip install -r dev-requirements.txt
286
197
@@ -301,8 +212,8 @@ To upgrade from DataPusher to XLoader:
301
212
``ckan.plugins `` line replace ``datapusher `` with ``xloader ``.
302
213
303
214
4. (Optional) If you wish, you can disable the direct loading and continue to
304
- just use messytables - for more about this see the docs on config option:
305
- ``ckanext.xloader.just_load_with_messytables ``
215
+ just use tabulator - for more about this see the docs on config option:
216
+ ``ckanext.xloader.use_type_guessing ``
306
217
307
218
5. Stop the datapusher worker::
308
219
@@ -322,35 +233,31 @@ command-line interface.
322
233
323
234
e.g. ::
324
235
325
- [2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>
326
- [pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini
236
+ ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>
327
237
328
238
For debugging you can try xloading it synchronously (which does the load
329
239
directly, rather than asking the worker to do it) with the ``-s `` option::
330
240
331
- [2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s
332
- [pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -s -c /etc/ckan/default/ckan.ini
241
+ ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s
333
242
334
243
See the status of jobs::
335
244
336
- [2.9] ckan -c /etc/ckan/default/ckan.ini xloader status
337
- [pre-2.9] paster --plugin=ckanext-xloader xloader status -c /etc/ckan/default/development.ini
245
+ ckan -c /etc/ckan/default/ckan.ini xloader status
338
246
339
247
Submit all datasets' resources to the DataStore::
340
248
341
- [2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all
342
- [pre-2.9] paster --plugin=ckanext-xloader xloader submit all -c /etc/ckan/default/ckan.ini
249
+ ckan -c /etc/ckan/default/ckan.ini xloader submit all
343
250
344
251
Re-submit all the resources already in the DataStore (Ignores any resources
345
252
that have not been stored in DataStore e.g. because they are not tabular)::
346
253
347
- [2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing
348
- [pre-2.9] paster --plugin=ckanext-xloader xloader submit all-existing -c /etc/ckan/default/ckan.ini
254
+ ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing
255
+
349
256
350
257
**Full list of XLoader CLI commands **::
351
258
352
- [2.9] ckan -c /etc/ckan/default/ckan.ini xloader --help
353
- [pre-2.9] paster --plugin=ckanext-xloader xloader --help
259
+ ckan -c /etc/ckan/default/ckan.ini xloader --help
260
+
354
261
355
262
Jobs and workers
356
263
----------------
@@ -363,8 +270,7 @@ Useful commands:
363
270
364
271
Clear (delete) all outstanding jobs::
365
272
366
- CKAN 2.9, Python 3 ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]
367
- CKAN <2.9, Python 2 paster --plugin=ckanext-xloader xloader jobs clear [QUEUES] -c /etc/ckan/default/development.ini
273
+ ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]
368
274
369
275
If having trouble with the worker process, restarting it can help::
370
276
@@ -385,13 +291,6 @@ exist**
385
291
Your DataStore permissions have not been set-up - see:
386
292
<https://docs.ckan.org/en/latest/maintaining/datastore.html#set-permissions>
387
293
388
- **When editing a package, all its existing resources get re-loaded by xloader **
389
-
390
- This behavior was documented in
391
- `Issue 75 <https://github.com/ckan/ckanext-xloader/issues/75 >`_ and is related
392
- to a bug in CKAN that is fixed in versions 2.6.9, 2.7.7, 2.8.4
393
- and 2.9.0+.
394
-
395
294
-----------------
396
295
Running the Tests
397
296
-----------------
@@ -402,12 +301,8 @@ The first time, your test datastore database needs the trigger applied::
402
301
403
302
To run the tests, do::
404
303
405
- nosetests --nologcapture --with-pylons=test.ini
406
-
407
- To run the tests and produce a coverage report, first make sure you have
408
- coverage installed in your virtualenv (``pip install coverage ``) then run::
304
+ pytest ckan-ini=test.ini ckanext/xloader/tests
409
305
410
- nosetests --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.xloader --cover-inclusive --cover-erase --cover-tests
411
306
412
307
----------------------------------
413
308
Releasing a New Version of XLoader
0 commit comments