-
Notifications
You must be signed in to change notification settings - Fork 1
feat(api-ui): add conversion logic to convert ML format to OMOP format for cases #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a full pipeline for converting ML-format CSV data into OMOP-compliant CSVs, plus validation and repository changes to support safe persistence and upsert behavior.
- Introduces
convert.pyto generate four OMOP CSV outputs (person.csv,measurement.csv,observation.csv,visit_occurrence.csv). - Adds
verification.pyto perform primary-key uniqueness and referential-integrity checks on those outputs. - Updates service and repository layers to commit sessions and use PostgreSQL ON CONFLICT upserts.
Reviewed Changes
Copilot reviewed 7 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/user/service/configuration_service.py | Commit session after processing CSV rows |
| src/answer/service/answer_service.py | Correct typo in user_email variable |
| src/answer/repository/answer_repository.py | Add commit in add_answer(), annotate with Session type |
| src/analytics/service/analytics_service.py | Enhance record_metrics() with timezone, durations, upsert call |
| src/analytics/repository/analytics_repository.py | Implement add_or_update() using PostgreSQL ON CONFLICT upsert |
| script/transform_csv/verification.py | New validation script for OMOP CSV outputs |
| script/transform_csv/convert.py | New conversion script from ML-format CSV to OMOP-format CSVs |
Comments suppressed due to low confidence (2)
script/transform_csv/convert.py:1
- There are no automated tests covering the conversion logic in
convert.py. Adding unit tests for ID generation, field mappings, and output file creation will improve reliability and make future refactoring safer.
#!/usr/bin/env python3
src/analytics/service/analytics_service.py:37
- Missing import for
get_user_email_from_jwt(), which will cause a NameError at runtime. Please import it (for examplefrom src.user.auth_utils import get_user_email_from_jwt) at the top of the file.
user_email: str = get_user_email_from_jwt()
| result["status"] = "failed" | ||
| responses.append(result) | ||
|
|
||
| self.repository.session.commit() |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Committing directly in the service layer couples transaction management with business logic. Consider encapsulating commit behavior inside the repository or using a dedicated transaction manager to centralize persistence concerns.
| self.repository.session.commit() | |
| self.repository.commit_transaction() |
| print(f" {status:<4} – {name}") | ||
| all_ok &= result | ||
|
|
||
| print("\nOverall:", "✔︎ ALL CHECKS PASSED" if all_ok else "✘ SOME CHECKS FAILED") |
Copilot
AI
Jul 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation script prints a PASS/FAIL summary but always exits with status 0. Consider using sys.exit(1) when all_ok is false so that CI jobs can detect failures automatically.
| print("\nOverall:", "✔︎ ALL CHECKS PASSED" if all_ok else "✘ SOME CHECKS FAILED") | |
| print("\nOverall:", "✔︎ ALL CHECKS PASSED" if all_ok else "✘ SOME CHECKS FAILED") | |
| if not all_ok: | |
| sys.exit(1) |
SUMMARY
This PR introduces a transformation pipeline that converts ML-format input data (CSV) into standardized OMOP-format CSVs, enabling compatibility with downstream clinical data analytics tools.
Key features:
convert.pyscript that readsdata.csvand generates four OMOP-compliant files:person.csv,measurement.csv,observation.csv, andvisit_occurrence.csv.verification.py, a lightweight validation script to check primary key uniqueness and referential integrity across all output files.add_or_update()with PostgreSQL'sON CONFLICT, session commits).TEST PLAN
✅ Ran
convert.pylocally on a sampledata.csv, verified correct file generation and row counts.✅ Verified content and structure of all four OMOP output files (
.csv) via manual inspection and sample slicing.✅ Executed
verification.pyto validate:✅ Confirmed that new logic in
analytics_repository.pyandanswer_repository.pypreserves data as expected across repeated API invocations (tested with mocked sessions and live PostgreSQL).✅ All modified services and repository functions were manually tested for correctness, including access control checks and commit persistence.
Pre-merge author checklist
I've clearly explained:
I've included tests I've run to ensure my changes work.
I've added unit tests for any new code, if applicable.
I've documented any added code.