Skip to content

openknowledge-archive/datapackage-bigquery-py

Repository files navigation

datapackage-bigquery-py

Travis Coveralls

Generate and load BigQuery tables based on Data Package.

Usage

This section is intended to be used by end-users of the library.

Import/Export

See section below how to get tabular storage object.

High-level API is easy to use.

Having Data Package in current directory we can import it to bigquery database:

import dpbq

dpbq.import_package(<storage>, 'descriptor.json')

Also we can export it from bigquery database:

import dpbq

dpbq.export_package(<storage>, 'descriptor.json')

Tabular Storage

To start using Google BigQuery service:

  • Create a new project - link
  • Create a service key - link
  • Download json credentials and set GOOGLE_APPLICATION_CREDENTIALS environment variable

We can get storage this way:

import io
import os
import json
import jtsbq
from apiclient.discovery import build
from oauth2client.client import GoogleCredentials

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '.credentials.json'
credentials = GoogleCredentials.get_application_default()
service = build('bigquery', 'v2', credentials=credentials)
project = json.load(io.open('.credentials.json', encoding='utf-8'))['project_id']
storage = jtsbq.Storage(service, project, 'dataset')

Design Overview

Storage & Drivers

See jsontableschema layer readme.

Mappings

datapackage.json -> *not stored*
datapackage.json resources -> bigquery tables
data/data.csv schema -> bigquery table schema
data/data.csv data -> bigquery table data

Drivers

Default Google BigQuery client is used as part of jsontableschema-bigquery-py package - docs.

Development

This section is intended to be used by tech users collaborating on this project.

Getting Started

To activate virtual environment, install dependencies, add pre-commit hook to review and test code and get run command as unified developer interface:

$ source activate.sh

Reviewing

The project follow the next style guides:

To check the project against Python style guide:

$ run review

Testing

To run tests with coverage check:

$ run test

Coverage data will be in the .coverage file.