-
Notifications
You must be signed in to change notification settings - Fork 198
Issue 627 parquet output #683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Issue 627 parquet output #683
Conversation
petl/test/io/test_parquet.py
Outdated
| def test_fromparquet(tmp_path): | ||
| path = make_sample(tmp_path) | ||
| tbl = etl.io.fromparquet(str(path)) | ||
| assert tbl.header() == ('x',) |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
petl/test/io/test_parquet.py
Outdated
| path = make_sample(tmp_path) | ||
| tbl = etl.io.fromparquet(str(path)) | ||
| assert tbl.header() == ('x',) | ||
| assert list(tbl.values()) == [(1,), (2,), (3,)] |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
petl/test/io/test_parquet.py
Outdated
| out = tmp_path / 'out.parquet' | ||
| tbl.toparquet(str(out)) | ||
| df2 = pd.read_parquet(out) | ||
| assert list(df2['y']) == [10,20] |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
| if not indices and field == () and len(hdr) == 1: | ||
| indices = [0] | ||
|
|
||
| assert indices, 'no field selected' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
petl/io/__init__.py
Outdated
|
|
||
| from petl.io.gsheet import fromgsheet, togsheet, appendgsheet | ||
|
|
||
| from petl.io.parquet import fromparquet, toparquet |
Check warning
Code scanning / Ruff (reported by Codacy)
`petl.io.parquet.fromparquet` imported but unused; consider removing, adding to `__all__`, or using a redundant alias (F401)
petl/io/__init__.py
Outdated
|
|
||
| from petl.io.gsheet import fromgsheet, togsheet, appendgsheet | ||
|
|
||
| from petl.io.parquet import fromparquet, toparquet |
Check warning
Code scanning / Ruff (reported by Codacy)
`petl.io.parquet.toparquet` imported but unused; consider removing, adding to `__all__`, or using a redundant alias (F401)
petl/io/parquet.py
Outdated
| from __future__ import absolute_import, print_function, division | ||
|
|
||
| # standard library dependencies | ||
| from petl.compat import PY2 |
Check warning
Code scanning / Ruff (reported by Codacy)
`petl.compat.PY2` imported but unused (F401)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Ruff (reported by Codacy)
Module level import not at top of file (E402)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Ruff (reported by Codacy)
Redefinition of unused `operator` from line 8 (F811)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint (reported by Codacy) found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
petl/io/__init__.py
Outdated
|
|
||
| from petl.io.gsheet import fromgsheet, togsheet, appendgsheet | ||
|
|
||
| from petl.io.parquet import fromparquet, toparquet |
Check warning
Code scanning / Prospector (reported by Codacy)
'petl.io.parquet.fromparquet' imported but unused (F401)
petl/io/parquet.py
Outdated
| from __future__ import absolute_import, print_function, division | ||
|
|
||
| # standard library dependencies | ||
| from petl.compat import PY2 |
Check warning
Code scanning / Prospector (reported by Codacy)
Unused PY2 imported from petl.compat (unused-import)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Prospector (reported by Codacy)
Reimport 'operator' (imported line 8) (reimported)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Prospector (reported by Codacy)
redefinition of unused 'operator' from line 8 (F811)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Prospector (reported by Codacy)
Import "import operator" should be placed at the top of the module (wrong-import-position)
petl/io/parquet.py
Outdated
| @@ -0,0 +1,64 @@ | |||
| # -*- coding: utf-8 -*- | |||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing module docstring
petl/io/parquet.py
Outdated
| from __future__ import absolute_import, print_function, division | ||
|
|
||
| # standard library dependencies | ||
| from petl.compat import PY2 |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Unused PY2 imported from petl.compat
petl/io/parquet.py
Outdated
|
|
||
|
|
||
| # third-party dependencies | ||
| import pandas as pd |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
third party import "pandas" should be placed before first party imports "petl.compat.PY2", "petl.io.pandas.fromdataframe", "petl.util.base.Table", "petl.io.sources.read_source_from_arg"
petl/io/parquet.py
Outdated
|
|
||
| src = read_source_from_arg(source) | ||
| with src.open('rb') as f: | ||
| df = pd.read_parquet(f, **kwargs) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Module 'pandas' has no 'read_parquet' member
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Imports from package operator are not grouped
|
|
||
|
|
||
|
|
||
| import operator |
Check notice
Code scanning / Pylintpython3 (reported by Codacy)
Reimport 'operator' (imported line 8)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Import "import operator" should be placed at the top of the module
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
standard import "operator" should be placed before first party imports "petl.compat.imap", "petl.errors.FieldSelectionError", "petl.comparison.comparable_itemgetter"
7e81cf6 to
8b302aa
Compare
petl/test/io/test_parquet.py
Outdated
| @@ -0,0 +1,23 @@ | |||
| import pandas as pd | |||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Django was not configured. For more information run pylint --load-plugins=pylint_django --help-msg=django-not-configured
petl/test/io/test_parquet.py
Outdated
| @@ -0,0 +1,23 @@ | |||
| import pandas as pd | |||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing module docstring
petl/test/io/test_parquet.py
Outdated
|
|
||
|
|
||
| def make_sample(tmp_path): | ||
| df = pd.DataFrame([{'x': 1}, {'x': 2}, {'x': 3}]) |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Module 'pandas' has no 'DataFrame' member
petl/io/parquet.py
Outdated
| @@ -0,0 +1,64 @@ | |||
| # -*- coding: utf-8 -*- | |||
Check warning
Code scanning / Pylint (reported by Codacy)
Missing module docstring
petl/io/parquet.py
Outdated
| from __future__ import absolute_import, print_function, division | ||
|
|
||
| # standard library dependencies | ||
| from petl.compat import PY2 |
Check notice
Code scanning / Pylint (reported by Codacy)
Unused PY2 imported from petl.compat
petl/io/parquet.py
Outdated
| """ | ||
|
|
||
| src = read_source_from_arg(source) | ||
| with src.open('rb') as f: |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "f" doesn't conform to snake_case naming style
petl/io/parquet.py
Outdated
|
|
||
| src = read_source_from_arg(source) | ||
| with src.open('rb') as f: | ||
| df = pd.read_parquet(f, **kwargs) |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "df" doesn't conform to snake_case naming style
petl/io/parquet.py
Outdated
|
|
||
| src = read_source_from_arg(source) | ||
| with src.open('rb') as f: | ||
| df = pd.read_parquet(f, **kwargs) |
Check warning
Code scanning / Pylint (reported by Codacy)
Module 'pandas' has no 'read_parquet' member
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Pylint (reported by Codacy)
standard import "import operator" should be placed before "from petl.compat import imap, izip, izip_longest, ifilter, ifilterfalse, reduce, next, string_types, text_type"
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Pylint (reported by Codacy)
Import "import operator" should be placed at the top of the module
|
|
||
|
|
||
|
|
||
| import operator |
Check notice
Code scanning / Pylint (reported by Codacy)
Reimport 'operator' (imported line 8)
|
|
||
|
|
||
|
|
||
| import operator |
Check warning
Code scanning / Pylint (reported by Codacy)
Imports from package operator are not grouped
setup.py
Outdated
| 'xlsx': ['openpyxl>=2.6.2'], | ||
| 'xpath': ['lxml>=4.4.0'], | ||
| 'whoosh': ['whoosh'], | ||
| "parquet": ["pandas>=1.3.0","pyarrow>=4.0.0"] |
Check warning
Code scanning / Pylint (reported by Codacy)
Exactly one space required after comma
Pull Request Test Coverage Report for Build 16583777847Details
💛 - Coveralls |
petl/io/parquet.py
Outdated
| """ | ||
| src = write_source_from_arg(source) | ||
| with src.open('wb') as f: | ||
| df = todataframe(table) |
Check warning
Code scanning / Pylint (reported by Codacy)
Variable name "df" doesn't conform to snake_case naming style
|
Nice addition! Maybe you would consider:
|
|
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
f2d0d2c to
56071d9
Compare
@juarezr Thank you for the feedback! I also noticed the new GitHub workflows are rejecting multiple SARIF runs (“Error: The CodeQL Action does not support uploading multiple SARIF runs with the same category.”). I’m not yet familiar enough with the Actions YAML to fix it immediately — would you like me to open a follow‑up issue and investigate it separately? |
This PR adds parquet file handling
adds fromparquet and toparquet for reading and writing Parquet tables.
Hooks these routines onto the core Table API.
Adds tests.
Updates the I/O documentation
Includes pandas and pyarrow in the test requirements
Closes issue #627.