Skip to content

fix: resolve 3 remaining issues from PR #16 review#18

Merged
Sudhendra merged 3 commits intomainfrom
fix/pr16-review-issues
Feb 12, 2026
Merged

fix: resolve 3 remaining issues from PR #16 review#18
Sudhendra merged 3 commits intomainfrom
fix/pr16-review-issues

Conversation

@Sudhendra
Copy link
Owner

Summary

Addresses the 3 remaining issues identified during the review of PR #16 (phase-6-loss_function).

  • Fix 1 (CRITICAL): NameError on malformed JSON — moved stats, sanitized_data, and unsanitized_data initialization before the JSONL loading loop so the except json.JSONDecodeError handler can safely reference them
  • Fix 2 (MEDIUM): Moved dagshub/mlflow from core dependencies to [project.optional-dependencies] mlflow since they're only used by scripts/mlflow_logger.py
  • Fix 3 (LOW): Removed stale RULE C FIX docstring, duplicate import re, unused from typing import Tuple; changed Tuple[bool, str] to tuple[bool, str]

Test Coverage

  • Added test_malformed_json_line_does_not_crash — writes a JSONL file with a bad line, verifies sanitize_and_extract() handles it gracefully (parse_errors == 1, no crash)
  • All 56 tests pass (test_sanitization_manager.py + test_tokenizer.py)

Files Changed

File Change
scripts/data_sanitization.py Fixes 1 & 3
pyproject.toml Fix 2
tests/test_sanitization_manager.py New regression test for Fix 1

- Fix NameError on malformed JSON lines by initializing stats/unsanitized_data
  before the JSONL loading loop instead of after it
- Move dagshub/mlflow from core dependencies to optional [mlflow] extra
- Remove stale RULE C FIX docstring, duplicate import re, and unused Tuple import;
  use builtin tuple[] instead of typing.Tuple
- Add test for malformed JSON line handling in sanitize_and_extract
CI installs only the [dev] extra, not [mlflow]. Add pytest.importorskip
for dagshub and mlflow so the test module is skipped gracefully when
these optional dependencies are absent.
@Sudhendra Sudhendra merged commit 48f6f73 into main Feb 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments