A Python datalake client.
pip install pyarrow ness
import ness
dl = ness.dl(bucket="mybucket", key="mydatalake")
df = dl.read("mytable")
# Sync all tables
dl.sync()
# Sync a single table
dl.sync("mytable")
# Sync and read a single table
df = dl.read("mytable", sync=True)
Specify the input data source format, the default format is parquet
:
import ness
dl = ness.dl(bucket="mybucket", key="mydatalake", format="csv")
Files are synced using default
AWS profile, you can configure another one:
import ness
dl = ness.dl(bucket="mybucket", key="mydatalake", profile="myprofile")
Usage: ness sync [OPTIONS] S3_URI
Options:
--format TEXT Data lake source format.
--profile TEXT AWS profile.
--table TEXT Table name to sync.
--help Show this message and exit.
ness sync bucket/key --table mytable