odata2avro

odata2avro is a Python command-line tool to automatically convert OData datasets to Avro. Using odata2avro together with standard Hadoop tooling, it should be very simple to ingest OData data from Microsoft Azure DataMarket to Hadoop.

Usage:

$ odata2avro ODATA_XML AVRO_SCHEMA AVRO_FILE

This command reads data from ODATA_XML and creates two files: AVRO_SCHEMA and AVRO_FILE. The Avro schema is in JSON format.

Example: Ingest data from Azure DataMarket to Hive/Impala

# Download OData data in XML format
$ curl 'https://api.datamarket.azure.com/opendata.rdw/VRTG.Open.Data/v1/KENT_VRTG_O_DAT?$top=100' > cars.xml

# Convert data to Avro
$ odata2avro cars.xml cars.avsc cars.avro

# Upload to HDFS
$ hdfs dfs -put cars.avro cars.avsc /tmp

# Create Avro-backed Hive table using Avro schema stored in /tmp/cars.avsc
$ hive -e "
  CREATE TABLE cars
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/cars.avsc');"

# Load data from /tmp/cars.avro to the cars table
$ hive -e "LOAD DATA INPATH '/tmp/cars.avro' INTO TABLE cars"

# Query with Impala
$ impala-shell -i <impala-daemon-ip> -q "REFRESH cars; select count(*) from cars"
+----------+
| count(*) |
+----------+
|      100 |
+----------+

Installation:

pip install odata2avro

Contributions:

Please create an issue if you spot any problem or bug. We'll try to get back to you as soon as possible.

Authors:

Created with passion by Marcel and Daan.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
odata2avro		odata2avro
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

odata2avro

Usage:

Example: Ingest data from Azure DataMarket to Hive/Impala

Installation:

Contributions:

Authors:

About

Releases

Packages

Contributors 2

Languages

License

datadudes/odata2avro

Folders and files

Latest commit

History

Repository files navigation

odata2avro

Usage:

Example: Ingest data from Azure DataMarket to Hive/Impala

Installation:

Contributions:

Authors:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages