feat(api-ui): add conversion logic to convert ML format to OMOP format for cases #13

hoangsonww · 2025-07-11T21:27:31Z

SUMMARY

This PR introduces a transformation pipeline that converts ML-format input data (CSV) into standardized OMOP-format CSVs, enabling compatibility with downstream clinical data analytics tools.

Key features:

Adds a robust convert.py script that reads data.csv and generates four OMOP-compliant files: person.csv, measurement.csv, observation.csv, and visit_occurrence.csv.
Ensures reproducibility and safety by appending rows with fixed starting IDs and without modifying or reading existing rows.
Includes verification.py, a lightweight validation script to check primary key uniqueness and referential integrity across all output files.
Enhances analytics and answer repository/service logic with UPSERT behavior and transactional integrity (e.g., add_or_update() with PostgreSQL's ON CONFLICT, session commits).

TEST PLAN

✅ Ran convert.py locally on a sample data.csv, verified correct file generation and row counts.
✅ Verified content and structure of all four OMOP output files (.csv) via manual inspection and sample slicing.
✅ Executed verification.py to validate:
- uniqueness of primary keys,
- referential integrity of foreign keys across tables.
✅ Confirmed that new logic in analytics_repository.py and answer_repository.py preserves data as expected across repeated API invocations (tested with mocked sessions and live PostgreSQL).
✅ All modified services and repository functions were manually tested for correctness, including access control checks and commit persistence.

Pre-merge author checklist

I've clearly explained:
- What problem this PR is solving.
- How this problem was solved.
- How reviewers can test my changes.
I've included tests I've run to ensure my changes work.
I've added unit tests for any new code, if applicable.
I've documented any added code.

…t for cases

Copilot

Pull Request Overview

This PR adds a full pipeline for converting ML-format CSV data into OMOP-compliant CSVs, plus validation and repository changes to support safe persistence and upsert behavior.

Introduces convert.py to generate four OMOP CSV outputs (person.csv, measurement.csv, observation.csv, visit_occurrence.csv).
Adds verification.py to perform primary-key uniqueness and referential-integrity checks on those outputs.
Updates service and repository layers to commit sessions and use PostgreSQL ON CONFLICT upserts.

Reviewed Changes

Copilot reviewed 7 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/user/service/configuration_service.py	Commit session after processing CSV rows
src/answer/service/answer_service.py	Correct typo in `user_email` variable
src/answer/repository/answer_repository.py	Add commit in `add_answer()`, annotate with `Session` type
src/analytics/service/analytics_service.py	Enhance `record_metrics()` with timezone, durations, upsert call
src/analytics/repository/analytics_repository.py	Implement `add_or_update()` using PostgreSQL ON CONFLICT upsert
script/transform_csv/verification.py	New validation script for OMOP CSV outputs
script/transform_csv/convert.py	New conversion script from ML-format CSV to OMOP-format CSVs

Comments suppressed due to low confidence (2)

script/transform_csv/convert.py:1

There are no automated tests covering the conversion logic in convert.py. Adding unit tests for ID generation, field mappings, and output file creation will improve reliability and make future refactoring safer.

#!/usr/bin/env python3

src/analytics/service/analytics_service.py:37

Missing import for get_user_email_from_jwt(), which will cause a NameError at runtime. Please import it (for example from src.user.auth_utils import get_user_email_from_jwt) at the top of the file.

        user_email: str = get_user_email_from_jwt()

Copilot · 2025-07-11T21:28:55Z

src/user/service/configuration_service.py

                result["status"] = "failed"
            responses.append(result)

+        self.repository.session.commit()


[nitpick] Committing directly in the service layer couples transaction management with business logic. Consider encapsulating commit behavior inside the repository or using a dedicated transaction manager to centralize persistence concerns.

Suggested change

self.repository.session.commit()

self.repository.commit_transaction()

Copilot · 2025-07-11T21:28:56Z

script/transform_csv/verification.py

+    print(f"  {status:<4} – {name}")
+    all_ok &= result
+
+print("\nOverall:", "✔︎ ALL CHECKS PASSED" if all_ok else "✘ SOME CHECKS FAILED")


The validation script prints a PASS/FAIL summary but always exits with status 0. Consider using sys.exit(1) when all_ok is false so that CI jobs can detect failures automatically.

Suggested change

print("\nOverall:", "✔︎ ALL CHECKS PASSED" if all_ok else "✘ SOME CHECKS FAILED")

print("\nOverall:", "✔︎ ALL CHECKS PASSED" if all_ok else "✘ SOME CHECKS FAILED")

if not all_ok:

sys.exit(1)

feat(api-ui): add conversion logic to convert ML format to OMOP forma…

4bcff46

…t for cases

hoangsonww requested a review from Copilot July 11, 2025 21:27

hoangsonww self-assigned this Jul 11, 2025

hoangsonww added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers labels Jul 11, 2025

hoangsonww closed this Jul 11, 2025

Copilot AI reviewed Jul 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api-ui): add conversion logic to convert ML format to OMOP format for cases #13

feat(api-ui): add conversion logic to convert ML format to OMOP format for cases #13

Uh oh!

hoangsonww commented Jul 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Copilot AI Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	self.repository.session.commit()
	self.repository.commit_transaction()

feat(api-ui): add conversion logic to convert ML format to OMOP format for cases #13

feat(api-ui): add conversion logic to convert ML format to OMOP format for cases #13

Uh oh!

Conversation

hoangsonww commented Jul 11, 2025

SUMMARY

TEST PLAN

Pre-merge author checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants