A Python library for storing Data Packages in different storages.
See section below how to get tabular storage object.
High-level API is easy to use.
Having Data Package in current directory we can import it to storage:
import datapackage_storage
datapackage_storage.import_package(<storage>, 'descriptor.json')
Also we can export it from storage:
import datapackage_storage
datapackage_storage.export_package(<storage>, 'descriptor.json', 'datapackage_name')
On level between the high-level interface and low-level driver package uses Tabular Storage concept:
Install jsontableschema-bigquery-py
package.
To start using Google BigQuery service:
- Create a new project - link
- Create a service key - link
- Download json credentials and set
GOOGLE_APPLICATION_CREDENTIALS
environment variable
We can get storage this way:
import io
import os
import json
import jtsbq
from apiclient.discovery import build
from oauth2client.client import GoogleCredentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '.credentials.json'
credentials = GoogleCredentials.get_application_default()
service = build('bigquery', 'v2', credentials=credentials)
project = json.load(io.open('.credentials.json', encoding='utf-8'))['project_id']
storage = jtsbq.Storage(service, project, 'dataset', prefix='prefix')
Install jsontableschema-sql-py
package.
SQLAlchemy is used as sql wrapper. We can get storage this way:
import jtssql
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:', prefix='prefix')
storage = jtssql.Storage(engine)
datapackage.json -> *not stored*
datapackage.json resources -> storage tables
data/data.csv schema -> storage table schema
data/data.csv data -> storage table data
API documentation is presented as docstings: