Add parquet input support (fixes #40) #41

pbourke · 2025-12-17T22:08:51Z

This PR adds support for loading documents from Parquet files, refactors document loading code for better reusability, and improves the testing infrastructure.

Introduced InputDataType.PARQUET enum value and implemented load_parquet_doc() and load_parquet_dir() functions
Extracted common DataFrame processing logic into _load_docs_from_dataframe() helper function to support both CSV and Parquet formats
Added comprehensive tests in tests/autod/io/document_test.py
Added test step to GitHub Actions workflow

Copilot

Pull request overview

This PR adds support for loading documents from Parquet files, addressing issue #40. The implementation refactors the document loading code by extracting common DataFrame processing logic into a reusable helper function, which improves code maintainability and reduces duplication between CSV and Parquet format handling.

Introduced InputDataType.PARQUET enum value and implemented load_parquet_doc() and load_parquet_dir() functions for Parquet file support
Refactored common DataFrame processing logic into _load_docs_from_dataframe() helper function to support both CSV and Parquet formats
Added comprehensive parametrized tests covering both CSV and Parquet formats with file and directory scenarios

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/autod/io/document_test.py	New comprehensive test suite for document loading functionality, with parametrized tests covering CSV, Parquet, JSON, and text formats
tests/autod/io/init.py	Added package initialization file with copyright header
tests/autod/init.py	Added package initialization file with copyright header
benchmark_qed/autod/io/document.py	Refactored CSV loading to use new `_load_docs_from_dataframe()` helper, implemented `load_parquet_doc()` and `load_parquet_dir()`, updated `create_documents()` to support Parquet format, and improved `load_documents()` attribute handling
benchmark_qed/autod/io/enums.py	Added `PARQUET = "parquet"` enum value to `InputDataType`
ruff.toml	Added flake8-copyright configuration with regex pattern for Microsoft copyright headers
pyproject.toml	Updated test coverage path from `tests/unit` to `tests` and added pytest configuration for temporary path retention
.gitignore	Uncommented `.idea/` to ignore JetBrains IDE files
.github/workflows/python-ci.yml	Added test execution step to CI workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pyproject.toml

tests/autod/io/document_test.py

…ed ruff error

pbourke added 6 commits December 16, 2025 21:00

Add JetBrains .idea folder to .gitignore

ccc661a

add parquet input

1544ecb

added tests to cover all of autod.io.document

ac69af3

Fixed formatting errors

ba93404

fixed linting and formatting

e6f97ec

Enable tests in CI

d7b6d5a

pbourke marked this pull request as ready for review December 17, 2025 23:19

pbourke requested review from andresmor-ms and Copilot December 17, 2025 23:20

Copilot started reviewing on behalf of pbourke December 17, 2025 23:21 View session

Updated semversioner

5fe649f

Copilot AI reviewed Dec 17, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

tests/autod/io/document_test.py Outdated Show resolved Hide resolved

tests/autod/io/document_test.py Outdated Show resolved Hide resolved

resolved CoPilot review comments

d31a2ed

andresmor-ms reviewed Dec 18, 2025

View reviewed changes

tests/autod/io/document_test.py Outdated Show resolved Hide resolved

andresmor-ms approved these changes Dec 18, 2025

View reviewed changes

Removed noqa annotation from document_test.py imports section and fix…

cde34d5

…ed ruff error

pbourke merged commit b85285f into main Dec 19, 2025
4 checks passed

pbourke deleted the feat/add_parquet_input branch December 19, 2025 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add parquet input support (fixes #40) #41

Add parquet input support (fixes #40) #41

Uh oh!

pbourke commented Dec 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add parquet input support (fixes #40) #41

Add parquet input support (fixes #40) #41

Uh oh!

Conversation

pbourke commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pbourke commented Dec 17, 2025 •

edited

Loading