Skip to content

Conversation

@ibro45
Copy link
Collaborator

@ibro45 ibro45 commented Nov 24, 2025

Summary

This PR redesigns Lighter's architecture to embrace Lightning-native patterns while maintaining the power of YAML configuration.

Core Changes

New Core Classes:

  • LighterModule: Optional base class providing automatic optimizer setup and dual logging (step + epoch level)
  • LighterDataModule: Simple wrapper for organizing dataloaders in YAML

Removed Legacy Components:

  • System class (orchestrator replaced by modular Runner)
  • adapters.py (functionality integrated into LighterModule)
  • schema.py (validation delegated to Sparkwheel)

Simplified Callback System:

  • Flattened writer structure: CSVWriter, FileWriter, BaseWriter
  • Removed nested writer/ directory organization

Project Discovery: Marker-Based Auto-Discovery

Old approach (config-based):

project: path/to/my_project  # Explicit path in config

New approach (marker-based):

Projects are now auto-discovered via a __lighter__.py marker file. Simply create this file in your project root:

my_project/
├── __lighter__.py    # Marker file - marks this as a Lighter project
├── __init__.py 
├── models/
│   ├── __init__.py 
│   └── my_model.py
└── configs/
    └── train.yaml

When you run lighter from a directory containing __lighter__.py, the directory is automatically imported as a Python module named project. This lets you reference custom modules in YAML configs:

model:
  _target_: project.models.my_model.MyModel
  hidden_dim: 256

Benefits:

  • No config boilerplate - just add the marker file
  • Works from any subdirectory containing the marker
  • Clear project boundaries - the marker explicitly defines what's a Lighter project
  • Consistent module naming - always project.* in configs

Runner Refactoring

  • Modular architecture with ProjectImporter and ConfigLoader helper classes
  • ProjectImporter.auto_discover_and_import() handles marker-based discovery
  • Unified CLI inputs parameter replacing separate config/overrides args
  • Leverages Sparkwheel for auto-detection and validation

Dual Approach Philosophy

Users can now choose:

  1. LightningModule - Existing Lightning code works unchanged with YAML configs
  2. LighterModule - Less boilerplate with automatic features

Summary by CodeRabbit

  • New Features

    • New CI, PR-title, test-summary, release and dependency-review workflows; multi-config CLI/runner with project auto-discovery; prediction exporters (CSV + filesystem); LighterModule and LighterDataModule primitives; CIFAR‑10 example and example configs.
  • Documentation

    • Major site rewrite to YAML/config-first content (Quick Start, Guides, Examples, Reference, FAQ); legacy design/how‑to docs consolidated or removed.
  • Tests

    • Expanded test fixtures and updated integration tests.
  • Refactor

    • Public API reorganized and several legacy utilities removed.

✏️ Tip: You can customize this high-level summary in your review settings.

ibro45 and others added 19 commits November 16, 2025 17:35
- Add explicit Callable[..., Any] type hints in adapters and callbacks
- Add proper generic types for dicts and lists in schema and utils
- Add type annotations to writer functions (write_tensor, write_image, write_video)
- Improve type safety with Optional and Union types where appropriate
- Add type: ignore directives for complex generic scenarios

These changes improve IDE support, catch potential type errors earlier,
and make the codebase more maintainable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace stateful self.mode attribute with dynamic _get_current_mode() method
that queries the trainer's state (training, validating, testing, predicting).

Benefits:
- Mode is always in sync with trainer state
- Eliminates potential stale mode bugs
- Makes code more robust and easier to reason about
- Properly handles sanity check as validation mode

This change affects _prepare_batch, _forward, _calculate_loss, and
_calculate_metrics methods, all of which now query mode dynamically.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace separate config and overrides parameters with unified inputs list.
Sparkwheel now auto-detects file paths vs overrides based on content.

Changes:
- Replace config + overrides params with single inputs list
- Remove comma-separated config file parsing (use space separation)
- Simplify CLI examples: 'lighter fit base.yaml exp.yaml lr=0.001'
- Update Runner.run() to use Config.update() with auto-detection
- Improve CLI help text and examples

This aligns with Sparkwheel's idiomatic usage pattern and simplifies
the user experience.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Upgrade sparkwheel to >=0.0.6 for improved auto-detection features
- Update uv.lock with new dependency versions
- Fix GitHub workflow YAML formatting for ignoreLabels field

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Update documentation to reflect dynamic mode detection from trainer state.
Add explanations of how _get_current_mode() queries trainer.training,
trainer.validating, etc. to ensure mode is always in sync.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Update CLI tests to use new inputs parameter instead of config+overrides
- Update Runner tests to pass inputs list instead of separate parameters
- Remove tests for self.mode attribute (replaced with _get_current_mode())
- Update integration tests to use new CLI argument format
- Fix type annotations in test fixtures

All tests updated to align with the refactored Runner CLI interface
and System mode detection mechanism.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Major architectural changes:
- Replace System class with LighterModule (model.py) for cleaner separation
- Add Data module (data.py) for dataloader configuration
- Remove adapters pattern - functionality integrated into LighterModule
- Delete utility modules: model.py, patches.py, types/containers.py
- Update type enums and utility functions to support new architecture
- Simplify data utilities and misc helpers

This refactor simplifies the core API by removing the adapter abstraction
layer and providing a more direct interface through LighterModule.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Callback system improvements:
- Flatten writer callback structure (remove nested writer/ directory)
- Create base_writer.py with BaseWriter abstract class
- Consolidate csv_writer.py and file_writer.py at top level
- Enhance Freezer callback with improved parameter handling
- Remove callbacks/utils.py (functionality moved to callbacks themselves)

This simplifies the callback organization and makes writers easier to
discover and import.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Runner improvements:
- Leverage Sparkwheel's auto-detection for config validation
- Remove separate schema.py module (validation now handled by Sparkwheel)
- Simplify config loading and pruning logic
- Improve error handling and CLI interface
- Better integration with LighterModule instead of System

This change reduces code complexity by relying on Sparkwheel's built-in
schema validation capabilities rather than maintaining a separate schema module.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Example project changes:
- Add __lighter__.py for Sparkwheel auto-discovery
- Add model.py with CIFAR10Module (replaces inline System config)
- Move experiments/ → configs/ directory
- Create example.yaml, example_autodiscovery.yaml, example_overrides.yaml
- Update configurations to use LighterModule and new structure

This demonstrates the new pattern where models are defined in Python files
and referenced in configs, rather than being fully defined in YAML.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Test suite changes:
- Add conftest.py with shared fixtures
- Add tests/fixtures/ with plain Lightning module fixtures
- Refactor unit tests for LighterModule (replace System tests)
- Update callback tests for new writer structure
- Refactor runner tests for schema-less validation
- Add test_plain_lightning.py for plain Lightning integration
- Remove obsolete tests: adapters, schema, containers, model utils
- Update integration tests for CIFAR and config validation

All tests now work with the refactored Model/Data/Callback architecture
and Sparkwheel-based configuration system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Documentation overhaul:
- Delete old structure: design/, how-to/, tutorials/, migration/
- Add comprehensive guides: best-practices, configuration, custom-code,
  lighter-module, lightning-module, training
- Add practical examples: image-classification, multi-gpu
- Add quickstart guide and CLI reference
- Rewrite index.md and FAQ for new architecture
- Update mkdocs.yml navigation structure
- Remove outdated images (coverage.svg, feature screenshots, diagrams)

New docs focus on practical guides and examples rather than abstract
design documents, making it easier for users to get started.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Project infrastructure updates:
- Add .github/workflows/release.yml for automated releases
- Update pyproject.toml dependencies and metadata
- Expand .gitignore for better coverage
- Rewrite README.md to reflect new architecture and features
- Update uv.lock with latest dependency versions

These changes complete the redesign by updating all project-level
configuration to support the new architecture.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Documentation improvements:
- Add comprehensive coverage of _args_, _mode_, and _disabled_ syntax
- Document callable and debug modes with practical examples
- Expand logging documentation with detailed loss/metrics/optimizer stats
- Add examples for multi-component loss logging
- Clarify automatic optimizer stats logging (LR, momentum, betas, weight decay)
- Update quick reference table with new syntax elements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Breaking change to metrics API:
- Remove automatic list-to-MetricCollection conversion in LighterModule
- Require explicit MetricCollection wrapper in configs for multiple metrics
- Add validation with helpful error messages showing correct syntax
- Support both single Metric and MetricCollection in logging
- Update CIFAR10 examples to use MetricCollection wrapper
- Update test suite for new metrics API

This change makes metric configuration more explicit and aligns with
torchmetrics best practices for handling multiple metrics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add DDP testing infrastructure:
- Add test_callbacks_writers_ddp.py with 10 DDP tests for writers
- Test rank-specific temp files, distributed gathering, and cleanup
- Test barrier synchronization and edge cases (empty ranks, None paths)
- Mock distributed environment to enable fast, debuggable tests on single device
- Update existing error tests for consistency

This mock-based approach provides fast feedback during development while
ensuring comprehensive coverage of distributed code paths.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ibro45 ibro45 requested a review from surajpaib November 24, 2025 15:12
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Removes legacy System, adapters, schema, many writer/utils; adds LighterModule and LighterDataModule, writer primitives (BaseWriter, FileWriter, CsvWriter), Runner refactor (ProjectImporter + ConfigLoader, list-of-configs contract), CI/docs reorganization, CIFAR example project, and new tests/fixtures.

Changes

Cohort / File(s) Change Summary
CI & GitHub Actions
​.github/actions/setup/action.yml, ​.github/workflows/ci.yml, ​.github/workflows/ci-full.yml, ​.github/workflows/check_pull_request_title.yml, ​.github/workflows/publish.yml, ​.github/workflows/release.yml, ​.github/workflows/dependency-review.yml, ​.github/workflows/docs-publish.yml
Add composite setup action and new/updated workflows (matrix/full CI, semantic PR title check, dependency-review, docs publish, release); adjust concurrency, permissions, timeouts; remove legacy workflows.
CI scripts & coverage
​.github/scripts/test_summary.py, .codecov.yml
Add test-summary script for GitHub Actions and Codecov configuration; wire artifact uploads and summaries.
Documentation (add / reorganize)
README.md, mkdocs.yml, docs/index.md, docs/quickstart.md, docs/faq.md, docs/reference/cli.md, docs/guides/*, docs/examples/*, docs/tutorials/*
Large rewrite toward YAML/config-first docs; add Quick Start, guides, examples and CLI reference; restructure mkdocs navigation.
Documentation (removed)
docs/design/*, docs/how-to/*, docs/migration/from-pytorch-lightning.md, many legacy how-to/design/tutorial files
Bulk deletion of legacy design, how-to, migration, and tutorial docs.
Project example (CIFAR)
CIFAR project
projects/cifar10/__lighter__.py, projects/cifar10/configs/example.yaml, projects/cifar10/configs/example_overrides.yaml, projects/cifar10/model.py, projects/cifar10/experiments/*
Add CIFAR-10 example project and CIFAR10Model (subclassing LighterModule) plus configs/overrides; remove old experiment YAMLs.
Public API exports
src/lighter/__init__.py
Remove System from exports; add LighterModule and LighterDataModule to package exports.
Model & Data primitives
src/lighter/model.py, src/lighter/data.py
Add LighterModule (structured LightningModule base) and LighterDataModule (thin wrapper exposing provided DataLoaders).
Engine / Runner rewrite
src/lighter/engine/runner.py, src/lighter/engine/schema.py
Rework Runner into modular flow with ProjectImporter and ConfigLoader; Runner.run now accepts list-of-configs; remove dataclass-based schema module.
Adapters & System removed
src/lighter/adapters.py, src/lighter/system.py
Remove adapters module and the System implementation (large behavioral deletions).
Callbacks / writers refactor
src/lighter/callbacks/base_writer.py, src/lighter/callbacks/csv_writer.py, src/lighter/callbacks/file_writer.py, src/lighter/callbacks/__init__.py, src/lighter/callbacks/freezer.py, src/lighter/callbacks/utils.py
Introduce BaseWriter, CsvWriter, FileWriter and writer registry; expose Csv/File writers; remove legacy writer base/table and utils; update Freezer typing/logic.
Utilities — additions & removals
src/lighter/utils/dynamic_imports.py, src/lighter/utils/data.py, src/lighter/utils/misc.py, src/lighter/utils/logging.py, src/lighter/utils/model.py, src/lighter/utils/patches.py, src/lighter/utils/types/*
Add robust dynamic import + multiprocessing pickling support (import_module_from_path); rework collate_replace_corrupted and get_optimizer_stats; remove many model/patches/types modules and utilities.
Tests & fixtures
tests/conftest.py, tests/fixtures/*, tests/integration/test_cifar.py, tests/configs/test1.yaml
Add test fixtures, dummy dataset and Lightning module examples; update integration tests to use project auto-discovery and list-of-configs input; remove args block in test config.
Tooling / packaging / ignores
pyproject.toml, .gitignore, .editorconfig (removed), .safety-policy.yml (removed), .dockerignore (removed), docker/* (removed)
Bump sparkwheel, add cloudpickle, adjust dev/test deps; update .gitignore; remove repository config files (editorconfig, safety policy, dockerfiles).

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Runner
    participant ConfigLoader
    participant ProjectImporter
    participant Trainer
    participant Model
    participant DataModule

    CLI->>Runner: run(stage, inputs: list, **stage_kwargs)
    Runner->>ConfigLoader: load(inputs)
    ConfigLoader-->>Runner: Config
    Runner->>ProjectImporter: auto_discover_and_import()
    ProjectImporter-->>Runner: success/failure
    Runner->>Runner: _resolve_model(config)
    Runner->>Runner: _resolve_trainer(config)
    Runner->>Runner: _resolve_datamodule(config, model)
    Runner->>Trainer: start(stage, model, datamodule, **stage_kwargs)
    Trainer->>Model: training_step / validation_step / test_step / predict_step
    Model-->>Trainer: outputs (loss / pred / metrics)
    Trainer-->>Runner: save artifacts / logs, finish
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

  • Focus areas:
    • src/lighter/model.py — output normalization, logging semantics, configure_optimizers and step contracts.
    • src/lighter/engine/runner.py — ConfigLoader/ProjectImporter behavior, CLI integration, config merging and persistence, new run(inputs) contract.
    • src/lighter/callbacks/{base_writer,file_writer,csv_writer}.py — multi-process correctness, aggregation, indexing, and cleanup.
    • Large deletions (src/lighter/system.py, src/lighter/adapters.py, src/lighter/engine/schema.py, many utils/types) — search for stale imports and verify replacement flows.
    • tests/integration/test_cifar.py and tests/conftest.py — validate working-directory-based project auto-discovery and list-of-configs behavior.

Possibly related PRs

Suggested reviewers

  • surajpaib

Poem

🐰 I hopped through diffs both bold and neat,
swapped Systems for modules, made CI fleet.
Runner hums, CSVs stack bright rows,
configs chant YAML where the wind blows.
A little rabbit cheers — new paths to go.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.93% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Redesign architecture and embrace Lightning native patterns' directly and clearly describes the main architectural redesign presented in this PR, which transforms Lighter into a configuration layer aligned with PyTorch Lightning patterns while preserving YAML configuration.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch redesign

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2df7442 and a9c95fc.

⛔ Files ignored due to path filters (4)
  • paper/lighter_diagrams.pptx is excluded by !**/*.pptx
  • paper/overview_all.png is excluded by !**/*.png
  • paper/overview_system.png is excluded by !**/*.png
  • paper/paper.pdf is excluded by !**/*.pdf
📒 Files selected for processing (2)
  • paper/paper.bib (0 hunks)
  • paper/paper.md (0 hunks)
💤 Files with no reviewable changes (2)
  • paper/paper.md
  • paper/paper.bib

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ibro45 ibro45 changed the title Redesign: Simplify Architecture and Embrace PyTorch Lightning Native Patterns feat: Redesign architecture and embrace Lightning native patterns Nov 24, 2025
@ibro45 ibro45 marked this pull request as ready for review November 24, 2025 15:21
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Nov 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pyproject.toml (2)

7-7: Critical: Version number not updated to reflect v3.0 release.

The PR objectives specify a major version bump to v3.0 with breaking changes (System and Adapters removed, config keys renamed), but the version in pyproject.toml remains at 0.0.5. This needs to be updated to 3.0.0 before release.

-version = "0.0.5"
+version = "3.0.0"

83-105: Removal of disallow_any_generics = true reduces type safety.

The mypy configuration previously enforced stricter type checking with disallow_any_generics = true, but this constraint has been removed. Given the scale of the architectural refactor (1,500 LOC → 999 LOC) and the introduction of new abstractions (LighterModule, LighterDataModule), this relaxation degrades type safety guarantees at a critical moment.

Consider re-enabling this constraint or documenting why it was necessary to relax it.

 [tool.mypy]
 # https://mypy.readthedocs.io/en/latest/config_file.html#using-a-pyproject-toml-file
 python_version = "3.10"
 pretty = true
 show_traceback = true
 
 allow_redefinition = false
 check_untyped_defs = true
 disallow_incomplete_defs = true
+disallow_any_generics = true
 ignore_missing_imports = true
 implicit_reexport = false
 no_implicit_optional = true
 show_column_numbers = true
 show_error_codes = true
 show_error_context = true
 strict_equality = true
 strict_optional = true
 warn_no_return = true
 warn_redundant_casts = true
 warn_return_any = true
 warn_unreachable = true
 warn_unused_configs = true
 warn_unused_ignores = true
🧹 Nitpick comments (23)
src/lighter/utils/misc.py (1)

127-129: Consider documenting why weight_decay is conditionally extracted.

The weight_decay hyperparameter is only extracted when non-zero, while other hyperparameters (lr, momentum, betas) are always extracted when present. This design choice reduces log clutter but creates inconsistency—users might expect weight_decay to always appear in the returned dictionary.

Consider either:

  • Adding a comment explaining the rationale for conditional extraction
  • Or extracting weight_decay unconditionally for consistency
src/lighter/utils/data.py (1)

10-10: Consider adding type hint for dataset length.

The type hint Dataset is generic, but the code assumes len(dataset) is available (line 44). While this is a valid assumption for most PyTorch datasets, consider documenting this requirement or using Sized protocol for clarity.

Optional: Add to docstring:

     Args:
         batch: The batch of data from the DataLoader.
-        dataset: The dataset being used, which should return `None` for corrupted examples.
+        dataset: The dataset being used, which should return `None` for corrupted examples.
+            Must be a sized dataset (implement `__len__`).
         default_collate_fn: The default collate function to use once the batch is clean.
docs/reference/cli.md (1)

80-88: Consider adding language hint for directory structure.

The directory structure visualization is clear, but the markdown linter suggests adding a language specifier. Consider using text or plaintext to silence the linter warning while maintaining readability.

-```
+```text
 outputs/
 └── YYYY-MM-DD/
     └── HH-MM-SS/
docs/quickstart.md (2)

28-35: Add language specifier to fenced code block.

The directory structure code block should specify a language (e.g., text or leave it for plain formatting) for consistency with MkDocs rendering.

Apply this diff:

 Create these files:
 
-```
+```text
 mnist_classifier/
 ├── __lighter__.py          # Marker file (tells Lighter this is a project)
 ├── __init__.py             # Makes it a Python package
 ├── model.py                # Your model code
 └── configs/
     └── config.yaml         # Your experiment config

---

`61-154`: **Convert indented code blocks to fenced blocks.**

The tabbed content uses indented code blocks. For consistency with the rest of the documentation and better syntax highlighting, consider using fenced code blocks with language specifiers.



This applies to both the LightningModule tab (starting at line 63) and the LighterModule tab (starting at line 157).

</blockquote></details>
<details>
<summary>docs/guides/lighter-module.md (1)</summary><blockquote>

`642-646`: **Add language specifier to code block.**

The example logged metrics output should specify a language (e.g., `text` or `yaml`) for consistency.



Apply this diff:

```diff
 **Example logged metrics:**
 
-```
+```text
 train/optimizer/Adam/lr/epoch: 0.001
 train/optimizer/Adam/beta1/epoch: 0.9
 train/optimizer/Adam/beta2/epoch: 0.999

</blockquote></details>
<details>
<summary>.github/workflows/ci.yml (1)</summary><blockquote>

`21-100`: **CI workflow looks solid; optionally gate tests on quality job**

The quality matrix and test job are set up cleanly with sensible timeouts, concurrency, and artifacts. If you’d like to avoid burning cycles when format/lint/types fail, you could add `needs: quality` to the `tests` job so it only runs after a successful quality pass.

</blockquote></details>
<details>
<summary>tests/integration/test_cifar.py (1)</summary><blockquote>

`3-45`: **Auto-discovery setup and config paths look correct**

Switching into `projects/cifar10` via `monkeypatch.chdir` and then using `configs/example.yaml` plus a relative overrides file matches the new `Runner.run(stage, inputs=list)` API and should exercise the full config-loading pipeline. If you ever want CI coverage for this path, consider adding a lighter-weight variant (e.g., fewer epochs/smaller dataset) without the `slow` mark.

</blockquote></details>
<details>
<summary>docs/guides/lightning-module.md (1)</summary><blockquote>

`422-440`: **Minor markdownlint nit: use headings instead of bold text for “Option 1/2”**

Lines where you use `**Option 1: In Config**` / `**Option 2: In Code**` are effectively headings and trigger MD036 (`no-emphasis-as-heading`). If you care about a clean markdownlint run, consider changing them to `### Option 1: In Config` / `### Option 2: In Code`.

</blockquote></details>
<details>
<summary>docs/guides/training.md (1)</summary><blockquote>

`49-65`: **Add languages to a few fenced code blocks for markdownlint compliance**

The directory tree and config-structure examples are fenced with bare ``` blocks, which triggers MD040 (`fenced-code-language`). Adding a language such as `text` to those blocks (e.g., ```text) will keep markdownlint happy without affecting how the docs render.  



Also applies to: 761-789

</blockquote></details>
<details>
<summary>tests/conftest.py (1)</summary><blockquote>

`19-41`: **Consider adding a default value to mock `strategy.broadcast` for safety**

The current mock at line 35 is `trainer.strategy.broadcast = lambda x, src: x`. Based on codebase analysis, the only call to this method is in `src/lighter/callbacks/base_writer.py:41`, which explicitly passes `src=0`. However, to make the fixture more defensive against future usage patterns, adding a default value is recommended:

```diff
-    trainer.strategy.broadcast = lambda x, src: x
+    trainer.strategy.broadcast = lambda x, src=0: x

This change is optional but improves fixture robustness without impacting current test behavior.

src/lighter/data.py (1)

83-110: Confirm Lightning’s expectations when dataloaders are None

The wrapper is clean, but the current behavior will return None from val_dataloader / test_dataloader / predict_dataloader when they aren’t configured. LightningDataModule examples and helpers generally return actual dataloaders (or empty iterables) and skip stages by omitting the method or using flags, not by returning None. (lightning.ai)

It would be safer to either:

  • enforce that any defined loader argument is non-None, or
  • map None to an empty list (or similar) so that Trainer sees a valid “no data” dataloader.

Otherwise users may hit confusing runtime errors if they configure only a train loader.

src/lighter/callbacks/csv_writer.py (1)

32-151: CsvWriter logic looks solid; consider a couple of robustness tweaks

The per-rank temp-file pattern, sequence length inference, and DDP merge all look correct and align well with Lightning’s predict flow.

Two minor improvements you might consider:

  1. Empty-prediction epochs
    If no rank writes any rows (e.g., zero-length dataset with allow_zero_length_dataloader_with_multiple_devices=True), dfs will be empty and pd.concat(dfs, ...) will raise. A small guard avoids that edge case:
  •        dfs = [pd.read_csv(path) for path in all_temp_paths if path is not None]
    
  •        df = pd.concat(dfs, ignore_index=True)
    
  •        dfs = [pd.read_csv(path) for path in all_temp_paths if path is not None]
    
  •        if not dfs:
    
  •            return
    
  •        df = pd.concat(dfs, ignore_index=True)
    
    
    
  1. Doc/typo
    The docstring in on_predict_epoch_end reads “saves the temporary file it to the final destination”; drop the extra “it” for clarity.

These are minor; the core design looks good.

src/lighter/callbacks/base_writer.py (1)

37-75: BaseWriter DDP/path setup is clean; keep an eye on the private Lightning hook

The path broadcast + directory creation logic for Stage.PREDICT looks correct and mirrors common Lightning patterns. The predict hook also sensibly bails on empty outputs and delegates to write.

The only concern is already called out in the comment: touching trainer.predict_loop._predictions is relying on a private Lightning attribute and may break on trainer internals changes. Once the upstream issue is resolved, it’d be good to hide this behind a small utility or feature flag so it’s easy to remove.

docs/examples/multi-gpu.md (1)

160-162: Optional wording tweak: avoid repeated “very large …” phrasing

Minor style nit: “very large batch sizes” and “very large models” are slightly repetitive. If you care about polish, consider alternatives like “extremely large batch sizes” or “models that don’t fit on a single GPU.”

Also applies to: 295-299

docs/index.md (1)

36-36: Optional: Address markdownlint MD036/MD046 warnings.

If you care about a clean markdownlint run, consider:

  • Turning **YAML configuration for PyTorch Lightning experiments** into a proper heading (e.g., ## YAML configuration...) to satisfy MD036.
  • Ensuring all code blocks are fenced rather than indented (MD046), especially around the Python examples in the comparison tabs.

Also applies to: 167-217

projects/cifar10/model.py (1)

13-41: Guard against missing criterion in CIFAR10Model.

Both training_step and validation_step assume self.criterion is callable; if a config omits it, users will get a vague NoneType is not callable error. Consider either:

  • Enforcing it explicitly:
if self.criterion is None:
    raise RuntimeError("CIFAR10Model requires `criterion` to be configured in the YAML (e.g. CrossEntropyLoss).")
loss = self.criterion(pred, y)

or

  • Falling back to a sensible default loss when criterion is None.

This keeps the example more robust and beginner‑friendly.

src/lighter/callbacks/freezer.py (1)

60-81: Freezer behavior looks correct; consider loosening type hints and tightening logging.

The freezing/unfreezing logic (including until_step/until_epoch and the names/name_starts_with + exception handling) looks consistent and matches the described behavior.

Two small polish ideas:

  • on_train_batch_start currently annotates pl_module: LighterModule, but _set_model_requires_grad already works with any torch.nn.Module. You could widen the annotation (and docstring) to LightningModule | Module to avoid suggesting that Freezer only works with LighterModule.
  • In _set_model_requires_grad, you append exception parameters to unfrozen_layers when requires_grad is False, but only log unfrozen_layers when requires_grad is True. Either skip appending them in the freezing path or log them in both cases for clearer diagnostics.

These don’t affect correctness but would make the callback a bit clearer to users.

Also applies to: 89-147

src/lighter/engine/runner.py (2)

52-69: Handle auto-discovery failures from ProjectImporter more defensively.

ProjectImporter.auto_discover_and_import() treats the presence of __lighter__.py as a signal to import the current working directory as a project package, but import_module_from_path requires an __init__.py and will raise if it is missing. Right now that exception will bubble up and break the run.

Consider catching FileNotFoundError / ModuleNotFoundError here, logging a warning, and returning False so users with a marker file but no package scaffold get a clear message instead of a hard failure.


31-38: Optional: Avoid output-dir collisions within the same second.

OutputDir.create_timestamped() uses YYYY-MM-DD/HH-MM-SS as the directory path. If multiple runs start within the same second, they’ll share the same directory. If you want strict per-run isolation, you could append a short random suffix or a monotonic counter to the timestamp component.

src/lighter/callbacks/file_writer.py (1)

153-161: Adjust loguru message formatting for the empty-values debug log.

The debug call

logger.debug("FileWriter value key '%s' yielded no samples; skipping batch", self.value_key)

uses %s-style formatting, but loguru expects {} placeholders or f-strings, so self.value_key won’t be interpolated. You could switch to either:

logger.debug("FileWriter value key '{}' yielded no samples; skipping batch", self.value_key)
# or
logger.debug(f"FileWriter value key '{self.value_key}' yielded no samples; skipping batch")

for clearer logs.

src/lighter/model.py (2)

81-127: Constructor and _prepare_metrics look correct; optional centralization of metric conversion

Initializing network, optional criterion/optimizer/scheduler, and validating metrics via _prepare_metrics is type-safe and plays well with torchmetrics and Lightning module registration.

If you later want to avoid duplicating the “wrap into MetricCollection” logic that also exists in lighter.utils.containers.Metrics._convert_to_collection, you could centralize that conversion in a shared helper or accept the Metrics container directly here—but that’s a minor, optional refactor, not a blocker.


385-425: configure_optimizers and mode property behavior

configure_optimizers enforcing that an optimizer is provided (via a clear ValueError) is reasonable for training-centric usage, and the dictionary shape you return is compatible with Lightning’s optimizer configuration patterns. The mode property nicely wraps the trainer’s state flags, including mapping sanity checks to validation mode, which keeps log prefixes uniform across hooks.

One thing to double-check: if you expect some users to run pure evaluation or prediction with LighterModule instances that don’t configure an optimizer, you may want to allow self.optimizer to be None and have configure_optimizers return None in that case, since Lightning accepts that for optimizer-free loops. Whether that’s desirable depends on how tightly you want to couple LighterModule to training.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f73035 and 926ff14.

⛔ Files ignored due to path filters (6)
  • assets/images/coverage.svg is excluded by !**/*.svg
  • assets/images/features_dark.png is excluded by !**/*.png
  • assets/images/features_light.png is excluded by !**/*.png
  • docs/assets/images/overview_all.png is excluded by !**/*.png
  • docs/assets/images/overview_system.png is excluded by !**/*.png
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (80)
  • .github/actions/setup/action.yml (1 hunks)
  • .github/scripts/test_summary.py (1 hunks)
  • .github/workflows/check_pull_request_title.yml (1 hunks)
  • .github/workflows/ci-full.yml (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/code_quality.yml (0 hunks)
  • .github/workflows/dependency-review.yml (1 hunks)
  • .github/workflows/docs-publish.yml (1 hunks)
  • .github/workflows/publish.yml (3 hunks)
  • .github/workflows/release.yml (1 hunks)
  • .github/workflows/tests.yml (0 hunks)
  • .gitignore (1 hunks)
  • README.md (2 hunks)
  • docs/design/adapters.md (0 hunks)
  • docs/design/overview.md (0 hunks)
  • docs/design/philosophy.md (0 hunks)
  • docs/design/system.md (0 hunks)
  • docs/examples/image-classification.md (1 hunks)
  • docs/examples/multi-gpu.md (1 hunks)
  • docs/faq.md (1 hunks)
  • docs/guides/best-practices.md (1 hunks)
  • docs/guides/configuration.md (1 hunks)
  • docs/guides/custom-code.md (1 hunks)
  • docs/guides/lighter-module.md (1 hunks)
  • docs/guides/lightning-module.md (1 hunks)
  • docs/guides/training.md (1 hunks)
  • docs/how-to/adapters.md (0 hunks)
  • docs/how-to/configuration.md (0 hunks)
  • docs/how-to/experiment_tracking.md (0 hunks)
  • docs/how-to/freezers.md (0 hunks)
  • docs/how-to/inferers.md (0 hunks)
  • docs/how-to/metrics.md (0 hunks)
  • docs/how-to/project_module.md (0 hunks)
  • docs/how-to/recipes.md (0 hunks)
  • docs/how-to/run.md (0 hunks)
  • docs/how-to/troubleshooting.md (0 hunks)
  • docs/how-to/writers.md (0 hunks)
  • docs/index.md (1 hunks)
  • docs/migration/from-pytorch-lightning.md (0 hunks)
  • docs/quickstart.md (1 hunks)
  • docs/reference/cli.md (1 hunks)
  • docs/tutorials/get-started.md (0 hunks)
  • mkdocs.yml (3 hunks)
  • projects/cifar10/__lighter__.py (1 hunks)
  • projects/cifar10/configs/example.yaml (1 hunks)
  • projects/cifar10/configs/example_autodiscovery.yaml (1 hunks)
  • projects/cifar10/configs/example_overrides.yaml (1 hunks)
  • projects/cifar10/experiments/example.yaml (0 hunks)
  • projects/cifar10/experiments/example_overrides.yaml (0 hunks)
  • projects/cifar10/model.py (1 hunks)
  • pyproject.toml (2 hunks)
  • src/lighter/__init__.py (1 hunks)
  • src/lighter/adapters.py (0 hunks)
  • src/lighter/callbacks/__init__.py (1 hunks)
  • src/lighter/callbacks/base_writer.py (1 hunks)
  • src/lighter/callbacks/csv_writer.py (1 hunks)
  • src/lighter/callbacks/file_writer.py (1 hunks)
  • src/lighter/callbacks/freezer.py (2 hunks)
  • src/lighter/callbacks/utils.py (0 hunks)
  • src/lighter/callbacks/writer/base.py (0 hunks)
  • src/lighter/callbacks/writer/file.py (0 hunks)
  • src/lighter/callbacks/writer/table.py (0 hunks)
  • src/lighter/data.py (1 hunks)
  • src/lighter/engine/runner.py (4 hunks)
  • src/lighter/engine/schema.py (0 hunks)
  • src/lighter/model.py (1 hunks)
  • src/lighter/system.py (0 hunks)
  • src/lighter/utils/data.py (2 hunks)
  • src/lighter/utils/dynamic_imports.py (1 hunks)
  • src/lighter/utils/logging.py (2 hunks)
  • src/lighter/utils/misc.py (1 hunks)
  • src/lighter/utils/model.py (0 hunks)
  • src/lighter/utils/patches.py (0 hunks)
  • src/lighter/utils/types/containers.py (0 hunks)
  • src/lighter/utils/types/enums.py (0 hunks)
  • tests/configs/test1.yaml (0 hunks)
  • tests/conftest.py (1 hunks)
  • tests/fixtures/__init__.py (1 hunks)
  • tests/fixtures/plain_lightning_modules.py (1 hunks)
  • tests/integration/test_cifar.py (1 hunks)
💤 Files with no reviewable changes (33)
  • tests/configs/test1.yaml
  • docs/how-to/metrics.md
  • docs/how-to/project_module.md
  • docs/tutorials/get-started.md
  • docs/how-to/adapters.md
  • docs/design/adapters.md
  • docs/design/overview.md
  • src/lighter/system.py
  • docs/how-to/writers.md
  • src/lighter/callbacks/writer/table.py
  • src/lighter/callbacks/writer/file.py
  • docs/how-to/configuration.md
  • src/lighter/callbacks/utils.py
  • src/lighter/utils/model.py
  • docs/how-to/freezers.md
  • src/lighter/utils/patches.py
  • docs/how-to/experiment_tracking.md
  • .github/workflows/tests.yml
  • projects/cifar10/experiments/example.yaml
  • src/lighter/utils/types/containers.py
  • docs/how-to/recipes.md
  • docs/how-to/troubleshooting.md
  • docs/how-to/run.md
  • docs/design/system.md
  • docs/migration/from-pytorch-lightning.md
  • docs/how-to/inferers.md
  • src/lighter/engine/schema.py
  • docs/design/philosophy.md
  • projects/cifar10/experiments/example_overrides.yaml
  • src/lighter/callbacks/writer/base.py
  • src/lighter/utils/types/enums.py
  • .github/workflows/code_quality.yml
  • src/lighter/adapters.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-27T02:17:12.626Z
Learnt from: ibro45
Repo: project-lighter/lighter PR: 142
File: lighter/engine/schema.py:0-0
Timestamp: 2025-01-27T02:17:12.626Z
Learning: The adapters schema in Lighter requires:
- batch and logging keys for all modes (train, val, test, predict)
- criterion key only for train and val modes
- metrics key for train, val, and test modes but not for predict mode

Applied to files:

  • projects/cifar10/configs/example.yaml
🧬 Code graph analysis (15)
projects/cifar10/configs/example_overrides.yaml (1)
tests/unit/test_engine_config.py (1)
  • test_config_overrides (87-119)
src/lighter/utils/data.py (1)
tests/unit/test_utils_data.py (2)
  • test_collate_replace_corrupted_basic (4-27)
  • test_collate_replace_corrupted_all_corrupted (30-42)
src/lighter/callbacks/freezer.py (2)
src/lighter/model.py (1)
  • LighterModule (22-425)
tests/unit/test_callbacks_freezer.py (2)
  • test_freezer_set_model_requires_grad_with_exceptions (232-269)
  • test_freezer_functionality (84-97)
src/lighter/utils/misc.py (1)
tests/unit/test_utils_misc.py (2)
  • test_get_optimizer_stats_with_betas (143-181)
  • test_get_optimizer_stats (98-140)
tests/fixtures/plain_lightning_modules.py (1)
src/lighter/model.py (1)
  • LighterModule (22-425)
docs/guides/lighter-module.md (1)
src/lighter/system.py (1)
  • System (28-349)
tests/conftest.py (2)
tests/unit/test_model.py (1)
  • mock_trainer_state (24-34)
tests/unit/test_engine_runner.py (1)
  • mock_trainer (22-31)
src/lighter/callbacks/__init__.py (3)
src/lighter/callbacks/csv_writer.py (1)
  • CsvWriter (20-151)
src/lighter/callbacks/file_writer.py (1)
  • FileWriter (112-212)
src/lighter/callbacks/writer/table.py (2)
  • TableWriter (17-72)
  • __init__ (26-28)
src/lighter/model.py (3)
src/lighter/utils/misc.py (1)
  • get_optimizer_stats (86-131)
src/lighter/utils/types/enums.py (1)
  • Mode (21-25)
src/lighter/utils/patches.py (1)
  • PatchedModuleDict (8-99)
src/lighter/callbacks/base_writer.py (6)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
src/lighter/callbacks/csv_writer.py (2)
  • write (76-117)
  • setup (39-49)
src/lighter/callbacks/file_writer.py (2)
  • write (153-184)
  • setup (135-151)
src/lighter/callbacks/writer/base.py (5)
  • BaseWriter (20-143)
  • setup (69-102)
  • on_predict_batch_end (104-143)
  • __init__ (33-46)
  • write (56-67)
src/lighter/callbacks/writer/table.py (1)
  • TableWriter (17-72)
src/lighter/callbacks/writer/file.py (2)
  • FileWriter (19-64)
  • writers (40-48)
src/lighter/callbacks/file_writer.py (3)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
src/lighter/callbacks/writer/file.py (3)
  • writers (40-48)
  • FileWriter (19-64)
  • write (50-64)
src/lighter/callbacks/writer/base.py (2)
  • setup (69-102)
  • write (56-67)
src/lighter/__init__.py (3)
src/lighter/data.py (1)
  • LighterDataModule (12-110)
src/lighter/engine/runner.py (1)
  • Runner (100-248)
src/lighter/model.py (1)
  • LighterModule (22-425)
projects/cifar10/model.py (1)
src/lighter/model.py (1)
  • LighterModule (22-425)
tests/integration/test_cifar.py (2)
src/lighter/engine/runner.py (1)
  • run (107-152)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
src/lighter/engine/runner.py (2)
src/lighter/utils/dynamic_imports.py (1)
  • import_module_from_path (82-111)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
🪛 LanguageTool
docs/reference/cli.md

[style] ~38-~38: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...

(ADVERB_REPETITION_PREMIUM)


[style] ~107-~107: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...

(ADVERB_REPETITION_PREMIUM)


[style] ~149-~149: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...

(ADVERB_REPETITION_PREMIUM)


[style] ~198-~198: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...

(ADVERB_REPETITION_PREMIUM)

docs/quickstart.md

[style] ~384-~384: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 3141 characters long)
Context: ... Q: Can I use multiple GPUs? A: Yes! Just add trainer::devices=-1 (all GPU...

(EN_EXCESSIVE_EXCLAMATION)

docs/guides/lighter-module.md

[style] ~957-~957: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 5464 characters long)
Context: ... give you YAML configs and CLI overrides! ## Next Steps - [Lightning Module Gui...

(EN_EXCESSIVE_EXCLAMATION)

docs/faq.md

[style] ~255-~255: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 3714 characters long)
Context: ...5/joss.08101) ## Can I contribute? Yes! Lighter is open source: - **Report bug...

(EN_EXCESSIVE_EXCLAMATION)

docs/guides/best-practices.md

[grammar] ~515-~515: Use a hyphen to join words.
Context: ...) ### Use Structured Logging Group related metrics: python # Good - gro...

(QB_NEW_EN_HYPHEN)

docs/examples/multi-gpu.md

[style] ~160-~160: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...## Square Root Scaling Alternative for very large batch sizes: ```yaml model: optimize...

(EN_WEAK_ADJECTIVE)


[style] ~297-~297: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...FSDP (Fully Sharded Data Parallel) For very large models that don't fit on single GPU: `...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
docs/guides/training.md

49-49: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


761-761: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


782-782: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/reference/cli.md

80-80: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/lightning-module.md

422-422: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


431-431: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/quickstart.md

28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


63-63: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)


157-157: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/index.md

36-36: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


167-167: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)


217-217: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/guides/lighter-module.md

642-642: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/examples/image-classification.md

24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


246-246: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/best-practices.md

15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


239-239: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


239-239: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


356-356: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


356-356: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


729-729: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/custom-code.md

17-17: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


409-409: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


517-517: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


636-636: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


636-636: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


694-694: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/configuration.md

179-179: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🔇 Additional comments (38)
.gitignore (1)

171-171: LGTM!

Both additions are appropriate for the architectural redesign:

  • Line 171: **/outputs/ aligns with the Runner improvements that now generate timestamped output directories. Pattern is consistent with existing artifact ignores (lightning_logs/, logs/, tensorboard/).
  • Line 176: AGENTS.md extends the existing agent-file ignore pattern (CLAUDE.md, GEMINI.md), preventing accidental commits of AI agent state.

Also applies to: 176-176

src/lighter/utils/dynamic_imports.py (1)

106-107: Good defensive programming.

This guard prevents a potential AttributeError on line 109 if spec.loader is None, providing a clearer error message instead. According to Python's importlib documentation, the loader attribute can indeed be None in certain edge cases.

src/lighter/utils/misc.py (3)

86-106: Well-documented expansion of hyperparameter extraction.

The docstring clearly describes the expanded functionality and provides helpful examples of the key naming conventions for all extracted hyperparameters.


107-115: Clean key naming convention.

The base_key + conditional suffix approach keeps keys simple for single-group optimizers while clearly distinguishing multiple groups. The learning rate extraction is straightforward.


121-125: Original review comment is incorrect—implementation matches test expectations.

Verification shows the actual test assertions in tests/unit/test_utils_misc.py expect exactly what the implementation creates:

  • Tests expect: "optimizer/Adam/beta1" and "optimizer/Adam/beta2" keys
  • Implementation creates: stats_dict[f"{base_key}/beta1{suffix}"] and stats_dict[f"{base_key}/beta2{suffix}"]

The original review comment incorrectly claimed tests expect "optimizer/Adam/momentum", but the actual test file asserts "optimizer/Adam/beta1" and "optimizer/Adam/beta2". The implementation is correct and aligns with test expectations.

Likely an incorrect or invalid review comment.

pyproject.toml (1)

69-81: Dependency and test tooling updates align with PR improvements.

The additions of types-PyYAML>=6.0.0 and pytest-github-actions-annotate-failures>=0.2.0 support better type checking for YAML configuration and improved test reporting, both consistent with the PR's refactoring goals and CI/CD improvements.

.github/workflows/dependency-review.yml (1)

1-22: LGTM!

The dependency review workflow is well-configured with appropriate permissions, timeout, and failure thresholds. The comment-summary-in-pr: on-failure setting ensures PRs are only commented on when issues are detected, reducing noise.

.github/workflows/release.yml (1)

23-29: Excellent safeguard against accidental releases.

The tag verification step ensures that only tags on the main branch trigger releases, preventing accidental releases from feature branches or forks.

.github/scripts/test_summary.py (1)

9-52: LGTM!

The test summary script correctly handles:

  • Missing test results file
  • XML parsing errors
  • Metric extraction with safe defaults
  • Appropriate exit codes for CI integration

The logic for calculating passed tests and determining status is sound.

docs/guides/best-practices.md (1)

1-1001: Comprehensive and well-structured best practices guide.

This documentation provides excellent coverage of production-ready patterns, from project structure through deployment. The examples are clear and actionable, covering critical topics like reproducibility, performance optimization, and debugging strategies.

docs/examples/image-classification.md (1)

1-712: Excellent end-to-end example with clear progression.

This documentation effectively demonstrates both LightningModule and LighterModule approaches, with complete working code and configs. The troubleshooting section and hyperparameter tuning examples are particularly valuable for users.

.github/workflows/ci-full.yml (1)

1-73: Well-structured CI workflow with good coverage.

The workflow properly:

  • Tests across multiple Python versions with fail-fast disabled
  • Generates comprehensive test reports and artifacts
  • Uses test summary script for readable output
  • Conditionally uploads coverage to Codecov only for Python 3.12

The fail_ci_if_error: true on Codecov upload means the workflow will fail if Codecov upload fails (e.g., missing token). This is a strict but defensible choice for ensuring coverage tracking works.

.github/workflows/check_pull_request_title.yml (1)

7-15: LGTM! Well-configured concurrency and permissions.

The concurrency group correctly uses the PR number for grouping, and the permissions follow the principle of least privilege with only the necessary read/write scopes.

docs/guides/custom-code.md (1)

1-827: Excellent comprehensive guide!

This custom code guide is thorough, well-structured, and provides practical examples covering the full range of use cases. The troubleshooting section effectively addresses common pitfalls, and the progressive examples (from simple to complete) make it easy to follow.

projects/cifar10/__lighter__.py (1)

1-3: LGTM! Clear marker file for autodiscovery.

The comments effectively explain the purpose and usage pattern for the __lighter__.py marker file.

projects/cifar10/configs/example_overrides.yaml (1)

1-2: LGTM! Clear example overrides.

The override syntax correctly demonstrates the :: path navigation pattern documented in the CLI reference.

src/lighter/utils/logging.py (1)

10-10: LGTM! Improved type annotations.

The addition of explicit dict[str, Any] typing improves code clarity and type checking without changing behavior.

Also applies to: 47-47

src/lighter/callbacks/__init__.py (1)

1-5: Review comment partially unverifiable; CsvWriter documentation gap identified.

The verification revealed inconclusive findings regarding the original review comment:

  • TableWriter: No references found in the codebase, making the removal claim unverifiable. TableWriter may have existed in a prior version, but it's absent from the current searchable repository.

  • CsvWriter: Confirmed to exist in src/lighter/callbacks/csv_writer.py with comprehensive implementation and unit tests. However, CsvWriter has no documentation in the docs/ folder, unlike FileWriter which is documented in docs/guides/training.md and docs/reference/cli.md.

  • Migration guide: No MIGRATION.md, CHANGELOG.md, or v3.0 breaking changes documentation was found.

Key gap: CsvWriter is exported in the public API (__all__) but lacks corresponding documentation. FileWriter has usage examples, but CsvWriter does not.

.github/workflows/docs-publish.yml (1)

14-22: Verification complete—setup action correctly handles doc dependencies.

The setup action exists at .github/actions/setup/action.yml and properly implements the install-groups parameter by passing it to uv sync. The workflow's --only-group doc parameter is valid uv syntax, and the 'doc' dependency group is properly defined in pyproject.toml with mkdocs packages. The refactoring is sound.

docs/guides/configuration.md (1)

1-580: LGTM! Comprehensive and accurate configuration documentation.

The configuration guide is thorough, well-structured, and provides clear examples of all essential Sparkwheel symbols. The distinction between @ (lazy/resolved) and % (eager/raw) references is particularly well explained, which is critical for correct metric configuration.

.github/actions/setup/action.yml (1)

1-28: LGTM! Well-structured composite action.

The setup action follows GitHub Actions best practices:

  • Clear input definitions with sensible defaults
  • Caching enabled for uv packages
  • Conditional dependency installation for flexibility

The string comparison inputs.install-deps == 'true' is correct for GitHub Actions (inputs are always strings).

.github/workflows/publish.yml (3)

9-11: LGTM! Concurrency control added.

The concurrency configuration prevents multiple publish workflows from running simultaneously, which is appropriate for package publishing to avoid race conditions.


27-30: LGTM! Setup action usage is correct.

Using the new composite setup action with install-deps: 'false' is appropriate here since uv build doesn't require project dependencies to be installed - only the build tool itself.


43-47: LGTM! Artifact handling improves workflow reliability.

Uploading the dist directory as an artifact allows the publish job to run in a separate environment, improving isolation and reproducibility.

src/lighter/__init__.py (1)

11-15: LGTM! Public API correctly updated.

The exports correctly reflect the architectural changes:

  • ✅ Added LighterDataModule and LighterModule (new abstractions)
  • ✅ Removed System (breaking change as documented)
  • ✅ Retained Runner

Note: The version remains "0.0.5" but the PR objectives mention v3.0 with breaking changes. This may be addressed in a separate version bump commit as noted in the migration checklist.

mkdocs.yml (3)

20-28: LGTM! Enhanced navigation features improve UX.

The additional Material theme features provide a better documentation experience:

  • navigation.instant - SPA-like page loading
  • navigation.tracking - URL updates as user scrolls
  • search.suggest and search.highlight - Better search experience
  • content.tabs.link - Synchronized tabs across pages

These align well with the comprehensive documentation additions in this PR.


72-88: LGTM! Navigation restructure is clear and task-oriented.

The new navigation structure (Quick Start → Guides → Examples → Reference → FAQ) provides a logical learning path that aligns with the documented user journey in the PR objectives.


111-111: LGTM! Tables extension added for documentation needs.

The tables markdown extension is needed for reference tables in the new documentation (e.g., the quick reference table in the configuration guide).

projects/cifar10/configs/example_autodiscovery.yaml (1)

1-100: Otherwise, LGTM! Comprehensive CIFAR-10 autodiscovery example.

The configuration demonstrates key Lighter features:

  • ✅ Autodiscovery with __lighter__.py marker
  • ✅ Project references (project.model.*)
  • ✅ Proper metric configuration with MetricCollection
  • ✅ Reference reuse for dataloaders
  • ✅ Complete train/val/test/predict pipeline
docs/guides/lighter-module.md (1)

1-1013: LGTM! Comprehensive and accurate LighterModule guide.

Excellent documentation that covers:

  • ✅ Clear comparison with LightningModule
  • ✅ Multiple complete, runnable examples
  • ✅ Accurate API usage matching src/lighter/model.py
  • ✅ Common patterns and best practices
  • ✅ Integration with Lightning features (hooks, schedulers, mixed precision)

The automatic logging explanation (loss, metrics, optimizer stats) aligns perfectly with the implementation in the relevant code snippets.

projects/cifar10/configs/example.yaml (1)

1-98: CIFAR-10 example config matches the new schema and looks consistent

Top-level trainer, model, and data blocks are wired coherently, and using %::train_metrics plus %data::val_dataloader keeps metrics and dataloaders DRY while avoiding shared mutable state. I don’t see anything that needs changing here.

tests/fixtures/plain_lightning_modules.py (1)

1-118: Fixtures exercise key Lightning and LighterModule patterns cleanly

The dataset and three modules are minimal but well-structured for testing: batch shapes match the linear layers, loss usage is correct, and the dict-based outputs in MyLighterModule should integrate smoothly with Lighter’s step/output normalization. No changes needed.

docs/faq.md (1)

1-269: FAQ examples and reference semantics align with the new configuration model

The Q&A and examples around :: paths plus @ vs % references are consistent with the updated config behavior (e.g., cloning metrics vs reusing instantiated objects), and I don’t see any inaccuracies that need code or docs changes here.

README.md (1)

116-168: README examples line up well with the new LighterModule/LighterDataModule APIs

The LighterModule YAML and usage examples here match the new configuration patterns (network/criterion/optimizer/metrics) and the LighterDataModule wrapper shown elsewhere. This section does a good job demonstrating the intended “minimal boilerplate” workflow.

src/lighter/callbacks/file_writer.py (1)

21-110: Registry + FileWriter design looks solid.

The writer registry and the FileWriter implementation are clean and composable: resolving writers by name, handling tensors/CPU transfer, and using (global_rank, world_size) to compute unique per-sample indices is a nice fit for DDP prediction workflows. I don’t see functional issues in this core path.

Also applies to: 112-185

src/lighter/model.py (3)

1-79: Clear, user-focused API documentation for LighterModule

The top-level and class docstrings (including the example) make the responsibilities and return contract of subclasses very explicit and match the v3 “user control + automatic logging” goals. No changes needed here.


225-291: Batch-end hooks and _normalize_output enforce a clean step-output contract

Routing on_*_batch_end through _on_batch_end and _normalize_output gives you a single place to enforce the “Tensor or dict with optional structured loss” contract, and the 'loss'-dict 'total' check is a helpful guardrail for multi-component losses. This design keeps callbacks and logging predictable across train/val/test while intentionally skipping predict mode.


293-369: Logging helpers correctly implement dual step/epoch logging and optimizer stats

The _log_loss, _log_metrics, and _log_optimizer_stats helpers, together with _log, implement a consistent .../step vs .../epoch naming scheme and DDP-friendly sync_dist settings. Logging optimizer stats only once per epoch at batch_idx == 0 is a good compromise between insight and overhead. I don’t see any functional issues here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
.codecov.yml (2)

39-39: Expand docs ignore pattern for consistency.

The pattern "docs/*" only matches top-level files in the docs/ directory and won't ignore nested documentation files. For consistency with the test pattern on line 37 ("tests/**/*"), use "docs/**/*" to ignore all documentation paths recursively.

- - "docs/*"            # Documentation
+ - "docs/**/*"         # Documentation

14-16: Clarify comment on threshold behavior.

The comment on line 14 states "only fail if below target," but line 16 sets informational: false, which means the check will fail if below the 90% target. The threshold: null setting is correct—it means there's no threshold modifier beyond the target itself. Clarify the comment to avoid confusion:

- threshold: null       # No threshold - only fail if below target
+ threshold: null       # No additional threshold; fail check if below target
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 926ff14 and d854e52.

📒 Files selected for processing (1)
  • .codecov.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: types
  • GitHub Check: Tests (Python 3.12)
🔇 Additional comments (1)
.codecov.yml (1)

4-49: Coverage configuration aligns well with redesign objectives.

The configuration sets appropriate targets (90% project, 80% patch) given the substantial refactor outlined in the PR. Ignoring test files, docs, cache, and config fixtures is sensible. GitHub Checks annotations will provide clear feedback on changed files.

A few minor nitpicks above aside, the overall structure and values are sound for tracking coverage through this major redesign.

Since the PR mentions new CI workflows uploading coverage, verify that your .github/workflows/ci.yml and .github/workflows/ci-full.yml are properly configured to collect and upload coverage reports so Codecov has data to process against this configuration.

ibro45 and others added 3 commits November 27, 2025 12:53
Configuration improvements:
- Clarify codecov threshold comment for better understanding
- Use wildcard pattern for docs coverage ignore (docs/**/* instead of docs/*)
- Add test artifact ignores (config.yaml files in root and project directories)

These changes improve coverage reporting accuracy and prevent test-generated
configuration files from being accidentally committed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Major improvements to Runner class:
- Remove OutputDir class; save config directly to trainer.log_dir instead
- Add _save_config() method that integrates with Lightning's logging system
- Improve workflow documentation with detailed step-by-step docstrings
- Refactor CLI to use shared add_common_args() function (DRY principle)
- Change CLI boolean flags to action="store_true" for better UX
- Remove unused datetime and dataclass imports
- Remove config parameter from _execute() (no longer needed)

Benefits:
- Simpler architecture: leverages PyTorch Lightning's built-in directory management
- Better integration: config.yaml appears alongside checkpoints in version directories
- Cleaner CLI: --verbose instead of --verbose True
- More maintainable: shared argument definitions prevent drift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add safety mechanism to prevent infinite loops when replacement samples
are also corrupted:
- Add max_retries parameter (default: 100) to limit retry iterations
- Raise RuntimeError with detailed message if limit is exceeded
- Track iteration count separately from corrupted sample count
- Update docstring with parameter documentation and exception info

This prevents the collate function from hanging indefinitely when datasets
have high corruption rates or systematic corruption issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
ibro45 and others added 3 commits December 1, 2025 16:12
- Remove projects folder restrictions (now tracked)
- Add .claude/, lightning_logs/, .datasets/, *.zip to ignores

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Register the module in sys.modules before calling exec_module to
properly handle circular imports within dynamically loaded modules.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…pickler

Rewrites dynamic_imports module to work with PyTorch DataLoader using
spawn multiprocessing strategy (num_workers > 0). The key insight is that
cloudpickle serializes user classes by value, but ForkingPickler has
special reducers for multiprocessing internals (pipes, queues, fds).

Changes:
- Add _HybridPickler combining ForkingPickler reducers + cloudpickle dispatch
- Add _ModuleRegistry and _DynamicModuleFinder for submodule resolution
- Remove deprecated OPTIONAL_IMPORTS pattern
- Add cloudpickle>=3.0.0 dependency

Tested with all 8 projects using spawn multiprocessing successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
docs/examples/image-classification.md (1)

195-215: Minor inconsistency: Missing download: true and p parameter across examples.

In the LightningModule example (line 198), the train dataset includes download: true. In the LighterModule example (line 378), this parameter is omitted. Similarly, the RandomHorizontalFlip at line 207–208 explicitly sets p: 0.5, while line 388 omits it (relying on default).

These are functionally safe (defaults work fine), but the inconsistency may confuse readers trying to understand best practices. Consider making both examples consistent.

If harmonizing, add explicit parameters to the LighterModule section:

   dataset:
     _target_: torchvision.datasets.CIFAR10
     root: ./data
     train: true
+    download: true
     transform:
       _target_: torchvision.transforms.Compose
       transforms:
         - _target_: torchvision.transforms.RandomCrop
           size: 32
           padding: 4
         - _target_: torchvision.transforms.RandomHorizontalFlip
+          p: 0.5

Also applies to: 378-393

docs/guides/configuration.md (1)

17-17: Add language specifiers to fenced code blocks.

Code blocks should declare their language for proper syntax highlighting. Update lines 17, 38, 74, 409, 519 (and similar patterns throughout) to specify the language:

-```
+```yaml

This improves documentation rendering and code highlighting across platforms.

Also applies to: 38-38, 74-74, 409-409, 519-519

docs/guides/best-practices.md (1)

17-17: Add language specifiers to fenced code blocks.

Consistent with the configuration guide, specify language on all code blocks for proper syntax highlighting:

-```
+```yaml

or

-```
+```python

This pattern appears throughout the file (lines 17, 38, 74, 409, 519, 696).

Also applies to: 38-38, 74-74, 409-409, 519-519, 696-696

docs/guides/custom-code.md (1)

17-17: Add language specifiers to fenced code blocks.

Consistent with other guides, add language specifiers to all code blocks (lines 17, 38, 74, 409, 519, 638, 696):

-```
+```python

or

-```
+```yaml

This ensures consistent syntax highlighting across the documentation.

Also applies to: 38-38, 74-74, 409-409, 519-519, 638-638, 696-696

src/lighter/utils/dynamic_imports.py (1)

152-192: Consider the log level for already-imported modules.

The function implementation is correct and handles all edge cases appropriately. However, line 171 logs a warning when a module is already imported. If this function is called multiple times intentionally (e.g., in testing or initialization checks), the warning could be noisy. Consider using logger.debug instead of logger.warning for this case.

Apply this diff to adjust the log level:

-        logger.warning(f"Module '{module_name}' already imported, skipping.")
+        logger.debug(f"Module '{module_name}' already imported, skipping.")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78733a4 and 4d1c5ff.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • .gitignore (1 hunks)
  • docs/examples/image-classification.md (1 hunks)
  • docs/guides/best-practices.md (1 hunks)
  • docs/guides/configuration.md (1 hunks)
  • docs/guides/custom-code.md (1 hunks)
  • pyproject.toml (2 hunks)
  • src/lighter/utils/dynamic_imports.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .gitignore
  • pyproject.toml
🧰 Additional context used
🪛 LanguageTool
docs/guides/best-practices.md

[grammar] ~525-~525: Use a hyphen to join words.
Context: ...) ### Use Structured Logging Group related metrics: python # Good - gro...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/examples/image-classification.md

24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


246-246: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/best-practices.md

17-17: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


409-409: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


519-519: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


638-638: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


638-638: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


696-696: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/configuration.md

17-17: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


409-409: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


519-519: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/guides/custom-code.md

17-17: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


409-409: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


519-519: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


638-638: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


638-638: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


696-696: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (11)
docs/examples/image-classification.md (2)

1-420: Excellent comprehensive example aligning with PR redesign objectives.

The example demonstrates both traditional LightningModule and new LighterModule approaches end-to-end, with clear narrative progression. Configuration examples correctly use the new project.models.* pattern with __lighter__.py marker-based discovery, and validate proper use of Lighter's YAML-driven config system. Code snippets are syntactically sound and best-practice idiomatic. Troubleshooting section and "Next Steps" add practical value. Well-organized for learners at multiple levels.


594-594: Verify GitHub repository reference is correct.

Line 594 references the Lighter GitHub repository location for the complete CIFAR-10 example. Confirm this path exists on the target branch (main) and that the example code is actually committed there, or update the reference.

docs/guides/configuration.md (1)

1-590: Excellent comprehensive guide with clear structure and accurate examples.

The guide effectively teaches Sparkwheel's 5 essential symbols, progresses logically from basics to advanced patterns, and includes critical safety guidance (metrics via % not @). Examples are concrete, the distinction between :: (config navigation) and . (Python attributes) is well-emphasized, and the quick reference table is a valuable resource. Aligns well with PR objectives for Sparkwheel-based configuration.

docs/guides/best-practices.md (1)

1-1010: Excellent battle-tested best practices guide covering full ML lifecycle.

This guide is comprehensive and practical, covering development, training, debugging, reproducibility, testing, performance optimization, and production deployment. Project structure guidance correctly highlights __lighter__.py marker and demonstrates config composition via base+experiment pattern. Logging section properly emphasizes dual logging (step + epoch) which pairs well with LighterModule. Safety guidance is explicit (metrics via % not @, .eval() for inference, batch_size in logging). Checklists are valuable for preventing oversights. The guide bridges configuration system and practical implementations effectively.

docs/guides/custom-code.md (1)

1-828: Excellent comprehensive guide for custom code integration with marker-based auto-discovery.

The guide clearly explains the __lighter__.py marker pattern central to the PR's redesign, progressing from quick examples to complete working projects. Project structure guidance is explicit about __init__.py requirements. Custom component examples (models, datasets, transforms, LightningModule) are realistic and production-grade. Troubleshooting section effectively addresses common pain points (ModuleNotFoundError, ImportError, AttributeError) with clear root causes and solutions. Common patterns (separation of concerns, shared base classes, config inheritance) demonstrate best practices. The end-to-end CIFAR-10 example ties everything together convincingly. Quick reference provides easy syntax lookup.

src/lighter/utils/dynamic_imports.py (6)

36-53: LGTM!

The registry implementation correctly maps module names to paths and provides proper prefix matching for submodule resolution.


56-93: LGTM!

The meta-path finder correctly integrates with Python's import system and properly handles both package and module resolution paths.


95-108: LGTM!

The loader correctly executes modules and registers them with cloudpickle for by-value serialization, ensuring multiprocessing compatibility.


147-149: The global patching of multiprocessing.reduction.dump is not an undocumented side effect but the core intended functionality of this module, which is fully explained in the module docstring. The docstring explicitly describes the hybrid pickler strategy and why the multiprocessing.reduction module must be patched. The module initialization comment at line 147 ("# Module initialization: install finder and patch multiprocessing") also makes this explicit. No changes are needed.

Likely an incorrect or invalid review comment.


30-30: The cloudpickle dependency is already appropriately constrained (cloudpickle>=3.0.0 in pyproject.toml), which is compatible with Python 3.10+ and multiprocessing spawn mode. No version compatibility issues exist.


128-139: Acknowledged design trade-off: private attribute access is protected with defensive fallback.

The _extra_reducers attribute access on line 132 is already documented as a runtime-only private detail in the code comment (line 131). The implementation mitigates fragility by using getattr with a safe default fallback. The CloudPickler instantiation on line 138 appears intentional for maintaining independent pickler state rather than a performance concern. This hybrid approach is a justified design trade-off to preserve ForkingPickler's multiprocessing internal reducers while enabling cloudpickle's by-value serialization for dynamic imports.

Comment on lines +24 to +35
```
cifar10/
├── __lighter__.py # Marker file (enables project.* imports)
├── __init__.py
├── models.py # Model definitions
├── data.py # Data utilities (optional)
├── configs/
│ ├── resnet18.yaml # ResNet-18 config
│ ├── resnet50.yaml # ResNet-50 config
│ └── efficientnet.yaml # EfficientNet config
└── outputs/ # Generated by Lighter
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specification to fenced code block (directory structure).

Code fences should explicitly specify a language. The directory structure block is text/plaintext.

Apply this fix:

-```
+```text
 cifar10/
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/examples/image-classification.md around lines 24 to 35 the fenced code
block showing the directory structure lacks a language specifier; change the
opening fence from ``` to ```text (or ```plaintext) so the block is explicitly
marked as plain text, leaving the content unchanged.

Comment on lines +246 to +252
```
Epoch 0: 100%|████████| 391/391 [00:45<00:00, 8.58it/s, loss=1.82, train/acc=0.335, v_num=0]
Validation: 100%|████████| 79/79 [00:05<00:00, 14.23it/s]
Epoch 1: 100%|████████| 391/391 [00:44<00:00, 8.76it/s, loss=1.45, train/acc=0.472, v_num=0]
...
Epoch 99: 100%|████████| 391/391 [00:43<00:00, 8.98it/s, loss=0.23, train/acc=0.921, v_num=0]
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specification to fenced code block (expected output).

The expected output block should specify a language (typically text or bash for terminal output).

Apply this fix:

-```
+```text
 Epoch 0: 100%|████████| 391/391 [00:45<00:00, 8.58it/s, loss=1.82, train/acc=0.335, v_num=0]
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

246-246: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/examples/image-classification.md around lines 246 to 252, the fenced
code block showing expected terminal output lacks a language tag; update the
opening triple backticks to include a language such as text (e.g., ```text) so
the block is rendered with correct syntax/formatting for plain terminal output.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pyproject.toml (1)

84-106: Mypy configuration change—relax type-checking severity.

Line 92 previously had disallow_any_generics = true (removed). This relaxation acknowledges the complexity of type-hinting the new dynamic module loading patterns and configuration system but may hide legitimate type issues. Consider documenting why this strictness was removed and whether specific modules should retain stricter checking.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d1c5ff and 0dd9d38.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml (2 hunks)
🔇 Additional comments (3)
pyproject.toml (3)

41-41: New cloudpickle dependency is appropriate.

The addition of cloudpickle>=3.0.0 aligns with the architectural refactor (particularly the ProjectImporter and dynamic module loading patterns mentioned in the PR) where native pickling support for dynamic classes and functions is beneficial.


73-73: Type stubs addition is sound.

Adding types-PyYAML>=6.0.0 to the types group provides proper type hints for YAML operations, supporting the stricter mypy checks enabled elsewhere in the config.


81-81: CI/test integration dependency is reasonable.

The addition of pytest-github-actions-annotate-failures>=0.2.0 improves test reporting in GitHub Actions, aligning with the expanded test infrastructure present in this PR.

… in docstrings; update Freezer to support LightningModule and LighterModule out-of-the-box
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/lighter/callbacks/file_writer.py (3)

21-66: WriterRegistry API is clean; consider exposing available writers for diagnostics

The WriterRegistry implementation (decorator-based register with duplicate-name protection and get with a clear error) looks good and matches the intended usage in FileWriter.

One small enhancement you might consider: add a keys() method or include the list of available writers in the get() error message. Currently, when a misconfigured writer name is used, the error message only states the name is not registered without showing the available options (tensor, image_2d, image_3d, text), making it harder for users to diagnose the issue.


71-110: Document CPU tensor expectation and ensure safe reshaping in writer functions

The built-in writers are straightforward implementations, but two minor improvements would enhance robustness:

  • write_tensor and the image writers assume CPU tensors. While FileWriter._prepare_value calls detach().cpu() in normal callback flow, direct external use of these functions could receive GPU tensors. A docstring note would clarify the expectation (e.g., "expects CPU tensors; use tensor.detach().cpu() before calling if needed").

  • write_image_3d uses tensor.view(...) which fails if the tensor is non-contiguous. Use .reshape() instead or call .contiguous() first:

     shape = tensor.shape
-    tensor = tensor.view(shape[0], shape[1] * shape[2], shape[3])
+    tensor = tensor.reshape(shape[0], shape[1] * shape[2], shape[3])

153-203: Add comprehensive tests for FileWriter edge cases and DDP indexing

The setup() and write() methods correctly implement:

  • Stage gating and directory validation
  • DDP-safe global indexing with _counter = global_rank and _step = world_size, ensuring indices interleave across ranks without collisions
  • Sequence conversion with length validation between value and name keys
  • Per-sample file writing with atomic directory creation

To strengthen test coverage, add tests covering:

  1. Edge cases: calling write() before setup(), empty sequences from value_key, and value/name length mismatches. These paths currently lack automated coverage (notably line 156).

  2. DDP simulation: Verify global indexing under multi-rank scenarios (e.g., simulate two ranks with different _counter/_step values) to prevent unintended regressions in the naming scheme during future refactoring.

src/lighter/callbacks/csv_writer.py (1)

85-127: Add debug logging to CsvWriter.write() early returns and verify test coverage for logging behavior

The code correctly handles edge cases (empty batches, missing writer, length mismatches) but has two silent early returns:

  1. Line 86–87: if self._csv_writer is None (non‑predict stages or missing setup)
  2. Line 102–103: if num_samples == 0 (no matching keys or all empty)

These would benefit from debug logging to help diagnose misconfigurations. The BaseWriter base class already imports loguru.logger, so add similar logging here.

Regarding tests: The existing test suite (tests/unit/test_callbacks_writers.py) already covers all the scenarios mentioned:

  • test_write_non_sequence_values (line 587) covers all‑scalar case
  • test_write_inconsistent_lengths (line 494) covers length mismatches
  • test_write_missing_key (line 507) covers missing keys
  • test_write_empty_batch (line 576) and test_on_predict_epoch_end_before_setup (line 618) cover the silent return paths

However, none verify that logging occurs. Adding tests that assert debug logs are emitted during the silent returns would complete the coverage picture.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dd9d38 and 2897a57.

📒 Files selected for processing (3)
  • src/lighter/callbacks/csv_writer.py (1 hunks)
  • src/lighter/callbacks/file_writer.py (1 hunks)
  • src/lighter/callbacks/freezer.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lighter/callbacks/freezer.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/lighter/callbacks/csv_writer.py (3)
src/lighter/callbacks/base_writer.py (3)
  • BaseWriter (13-75)
  • setup (37-55)
  • write (27-35)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
src/lighter/callbacks/writer/table.py (3)
  • TableWriter (17-72)
  • on_predict_epoch_end (46-72)
  • write (36-44)
🪛 GitHub Check: codecov/patch
src/lighter/callbacks/csv_writer.py

[warning] 51-51: src/lighter/callbacks/csv_writer.py#L51
Added line #L51 was not covered by tests


[warning] 87-87: src/lighter/callbacks/csv_writer.py#L87
Added line #L87 was not covered by tests


[warning] 148-148: src/lighter/callbacks/csv_writer.py#L148
Added line #L148 was not covered by tests

src/lighter/callbacks/file_writer.py

[warning] 156-156: src/lighter/callbacks/file_writer.py#L156
Added line #L156 was not covered by tests

🔇 Additional comments (3)
src/lighter/callbacks/csv_writer.py (1)

128-162: No action needed — implementation and test coverage are complete

The distributed temp-file aggregation is sound. The current guard using dist.is_initialized() is the standard PyTorch pattern used consistently in similar code (e.g., file_writer.py checks trainer.world_size > 1). Adding dist.is_available() is defensive overkill since torch.distributed is imported unconditionally and dist.is_initialized() gracefully handles both single-process and multi-process contexts.

Comprehensive tests already cover all scenarios:

  • Single-process case: test_csv_writer_merge_without_dist
  • Multi-process case with concatenation and cleanup: test_csv_writer_merge_with_dist
  • Edge cases (empty ranks, None paths): test_csv_writer_handles_empty_rank and test_csv_writer_handles_none_paths
  • Line 148 (if not dfs: return) is exercised in test_csv_writer_handles_empty_rank

Likely an incorrect or invalid review comment.

src/lighter/callbacks/file_writer.py (2)

204-230: Length consistency check is already implemented; the concerns about tensor and sequence handling are well-addressed

The helper methods effectively handle tensors (detached and moved to CPU) and convert various sequence types (lists, tuples, and generic Sequence objects like range) into uniform lists. The code also correctly excludes strings, bytes, and bytearray from sequence iteration.

The length consistency check between value_key and name_key outputs is already present (raises ValueError with a clear mismatch message), so no additional validation is needed there. The support for arbitrary Sequence types beyond plain Python containers is tested and working as intended.


133-152: BaseWriter has already been refactored to a single-parameter signature—no mismatch exists.

BaseWriter.__init__ currently accepts only path (per src/lighter/callbacks/base_writer.py), and FileWriter.__init__ correctly calls super().__init__(directory) with that single argument. The constructor works as intended; no refactoring is needed.

Likely an incorrect or invalid review comment.

Comment on lines 41 to 59
def __init__(self, path: str | Path, keys: list[str]) -> None:
super().__init__(path)
self.keys = keys
self._temp_path: Path | None = None
self._csv_writer: Any = None # csv.writer type is not easily annotated
self._csv_file: TextIOWrapper | None = None

def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None:
super().setup(trainer, pl_module, stage)
if stage != Stage.PREDICT:
return

# Create a temporary file for writing predictions
self._temp_path = self.path.with_suffix(f".tmp_rank{trainer.global_rank}{self.path.suffix}")
self._csv_file = open(self._temp_path, "w", newline="")
self._csv_writer = csv.writer(self._csv_file)
# Write header
self._csv_writer.writerow(self.keys)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd src/lighter/callbacks && pwd && ls -la

Repository: project-lighter/lighter

Length of output: 531


🏁 Script executed:

cat -n src/lighter/callbacks/base_writer.py

Repository: project-lighter/lighter

Length of output: 3324


🏁 Script executed:

cat -n src/lighter/callbacks/csv_writer.py | head -70

Repository: project-lighter/lighter

Length of output: 3011


🏁 Script executed:

rg "class.*BaseWriter|def __init__" src/lighter/callbacks/ -A 3 -B 1

Repository: project-lighter/lighter

Length of output: 2601


Remove redundant stage check in CsvWriter.setup()

Lines 50–51 in CsvWriter.setup() repeat the stage filtering already performed by BaseWriter.setup(). Delete the inner check:

def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None:
    super().setup(trainer, pl_module, stage)
    # Create a temporary file for writing predictions
    self._temp_path = self.path.with_suffix(f".tmp_rank{trainer.global_rank}{self.path.suffix}")
    self._csv_file = open(self._temp_path, "w", newline="")
    self._csv_writer = csv.writer(self._csv_file)
    # Write header
    self._csv_writer.writerow(self.keys)

This eliminates the dead code branch and improves coverage.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 51-51: src/lighter/callbacks/csv_writer.py#L51
Added line #L51 was not covered by tests

🤖 Prompt for AI Agents
In src/lighter/callbacks/csv_writer.py around lines 41 to 59, remove the
redundant "if stage != Stage.PREDICT: return" check inside CsvWriter.setup()
because BaseWriter.setup() already performs stage filtering; update the method
to call super().setup(trainer, pl_module, stage) and then immediately create the
temp file, open the CSV file, create the csv.writer, and write the header
(self._csv_writer.writerow(self.keys)), leaving all other logic unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lighter/callbacks/freezer.py (1)

71-79: Consider validating until_step and until_epoch to prevent edge cases.

If until_step=0 or until_epoch=0, the unfreeze condition is immediately true on the first batch, but since _frozen_state is False, nothing happens and the model never gets frozen (the callback becomes a no-op). While this configuration is unlikely, adding validation in __init__ to ensure until_step > 0 and until_epoch > 0 would prevent confusion.

Example validation:

 if until_step is not None and until_epoch is not None:
     raise ValueError("Only one of `until_step` or `until_epoch` can be specified.")
+
+if until_step is not None and until_step <= 0:
+    raise ValueError("`until_step` must be greater than 0.")
+if until_epoch is not None and until_epoch <= 0:
+    raise ValueError("`until_epoch` must be greater than 0.")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2897a57 and f228ef0.

📒 Files selected for processing (1)
  • src/lighter/callbacks/freezer.py (2 hunks)
🔇 Additional comments (1)
src/lighter/callbacks/freezer.py (1)

8-8: LGTM! Import aligns with Lightning-native architecture.

The transition from System to LightningModule aligns with the PR objectives of embracing Lightning-native patterns.

…ion steps

fix(csv_writer): correct docstring for saving temporary file
refactor(runner): improve logging message for project module import
feat(dynamic_imports): enhance module import handling with error checks
test(tests): add tests for error handling in Runner and dynamic imports
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
src/lighter/engine/runner.py (1)

263-283: --weights_only added to incompatible stages (validate/test).

The add_common_args function adds --weights_only to all stage parsers, but PyTorch Lightning's Trainer.validate() and Trainer.test() methods do not accept this parameter. Running lighter validate config.yaml --weights_only will cause:

TypeError: validate() got an unexpected keyword argument 'weights_only'

The comment on line 240 ("argparse subparsers ensure only valid parameters") is incorrect—add_common_args adds weights_only universally.

Move --weights_only to only the stages that support it (fit, predict):

 def add_common_args(stage_parser):
     """Add common arguments to a stage subparser."""
     stage_parser.add_argument(
         "inputs",
         nargs="+",
         help="Config files and overrides. Example: config.yaml model::optimizer::lr=0.001",
     )
     stage_parser.add_argument(
         "--ckpt_path",
         type=str,
         default=None,
         help='Path to checkpoint. Can be "last", "best", or a file path.',
     )
-    stage_parser.add_argument(
-        "--weights_only",
-        action="store_true",
-        default=None,
-        help="Load only weights from checkpoint (security option).",
-    )
+
+def add_weights_only_arg(stage_parser):
+    """Add weights_only argument (only for fit/predict)."""
+    stage_parser.add_argument(
+        "--weights_only",
+        action="store_true",
+        default=None,
+        help="Load only weights from checkpoint (security option).",
+    )

Then call add_weights_only_arg only for fit_parser and predict_parser.

src/lighter/callbacks/csv_writer.py (1)

50-51: Remove redundant stage check.

These lines duplicate the stage filtering already performed in BaseWriter.setup(). As noted in the previous review, this check is unreachable dead code.

Apply this diff to remove the dead code:

 def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None:
     super().setup(trainer, pl_module, stage)
-    if stage != Stage.PREDICT:
-        return

     # Create a temporary file for writing predictions
🧹 Nitpick comments (5)
projects/cifar10/model.py (1)

57-61: Consider flexible batch unpacking for predict mode.

The method unpacks y from the batch but doesn't use it. While this works if the dataloader always provides labels, it could be more flexible:

Apply this diff to handle both labeled and unlabeled batches:

     def predict_step(self, batch, batch_idx):
         """Prediction step - return dict for FileWriter compatibility."""
-        x, y = batch
+        x = batch[0] if isinstance(batch, (tuple, list)) else batch
         pred = self(x)
         return {"pred": pred}

Or if labels are always present but intentionally unused, consider clarifying with:

     def predict_step(self, batch, batch_idx):
         """Prediction step - return dict for FileWriter compatibility."""
-        x, y = batch
+        x, _ = batch  # Labels provided but unused in prediction
         pred = self(x)
         return {"pred": pred}
src/lighter/utils/dynamic_imports.py (3)

35-53: _ModuleRegistry is not thread-safe.

The _modules dictionary is accessed without synchronization. While Python's GIL protects against data corruption, concurrent register and find_root operations from multiple threads (e.g., during parallel imports) could yield inconsistent views.

For this use case (typically single-threaded import at startup), this is likely acceptable, but worth noting if concurrent dynamic imports are ever needed.


131-142: Minor: CloudPickler instantiated per reducer_override call.

Each call to reducer_override creates a new CloudPickler(BytesIO()) instance. For large object graphs with many user-defined classes, this adds overhead. Consider caching or reusing a single instance if profiling shows this as a bottleneck.

For typical checkpoint/DataLoader use cases, this is likely negligible.


150-152: Global side effects on module import.

Importing this module globally patches sys.meta_path and multiprocessing.reduction.dump. This is intentional for DataLoader worker support but has implications:

  1. The finder is inserted at position 0, taking priority over all other finders
  2. The reduction.dump patch affects all multiprocessing in the process

This design is appropriate for the use case, but document these effects prominently (perhaps in the package's __init__.py or README) so users understand the behavior when importing lighter.

src/lighter/callbacks/csv_writer.py (1)

89-103: Consider simplifying the num_samples inference logic.

The nested conditionals for determining num_samples work correctly but could be more readable. The logic tries to find the first sequence-type key, and if none exists, assumes a single sample.

Here's a clearer alternative:

-# Determine the number of samples in the batch.
-num_samples = 0
-for key in self.keys:
-    if key in outputs:
-        length = self._get_sequence_length(outputs[key])
-        if length is not None:
-            num_samples = length
-            break
-        else:
-            # If it's not a sequence type we handle, assume it's a single sample
-            if num_samples == 0:
-                num_samples = 1
-
-if num_samples == 0:
-    return
+# Determine the number of samples by finding the first sequence-type output
+num_samples = None
+for key in self.keys:
+    if key in outputs:
+        length = self._get_sequence_length(outputs[key])
+        if length is not None:
+            num_samples = length
+            break
+
+# If no sequence found, assume single sample; if no keys in outputs, skip
+if num_samples is None:
+    num_samples = 1 if any(key in outputs for key in self.keys) else 0
+
+if num_samples == 0:
+    return

This makes the intent clearer: find a sequence length, or default to 1 if outputs exist but aren't sequences.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f228ef0 and f7f46a6.

📒 Files selected for processing (4)
  • projects/cifar10/model.py (1 hunks)
  • src/lighter/callbacks/csv_writer.py (1 hunks)
  • src/lighter/engine/runner.py (4 hunks)
  • src/lighter/utils/dynamic_imports.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/lighter/callbacks/csv_writer.py (2)
src/lighter/callbacks/base_writer.py (3)
  • BaseWriter (13-75)
  • setup (37-55)
  • write (27-35)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
src/lighter/engine/runner.py (3)
src/lighter/utils/dynamic_imports.py (2)
  • import_module_from_path (155-200)
  • get (44-46)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
tests/unit/test_utils_misc.py (1)
  • method (48-49)
🪛 GitHub Check: codecov/patch
src/lighter/utils/dynamic_imports.py

[warning] 94-94: src/lighter/utils/dynamic_imports.py#L94
Added line #L94 was not covered by tests


[warning] 137-137: src/lighter/utils/dynamic_imports.py#L137
Added line #L137 was not covered by tests


[warning] 147-147: src/lighter/utils/dynamic_imports.py#L147
Added line #L147 was not covered by tests

src/lighter/callbacks/csv_writer.py

[warning] 51-51: src/lighter/callbacks/csv_writer.py#L51
Added line #L51 was not covered by tests


[warning] 87-87: src/lighter/callbacks/csv_writer.py#L87
Added line #L87 was not covered by tests


[warning] 148-148: src/lighter/callbacks/csv_writer.py#L148
Added line #L148 was not covered by tests

src/lighter/engine/runner.py

[warning] 62-62: src/lighter/engine/runner.py#L62
Added line #L62 was not covered by tests

🔇 Additional comments (14)
projects/cifar10/model.py (4)

1-4: LGTM!

Clean import structure and clear module documentation.


13-31: Well-structured training step implementation.

The implementation demonstrates good practices: defensive checks for required components (criterion), optional metrics handling, and dict-based returns for framework compatibility. The logic is clear and maintainable.


33-44: LGTM!

Validation step follows the same solid pattern as training_step with appropriate defensive checks and consistent return structure.


46-55: LGTM!

Test step appropriately omits loss computation while maintaining consistent return structure for predictions and targets.

src/lighter/utils/dynamic_imports.py (2)

59-95: LGTM!

The meta path finder correctly handles both package and module resolution. The _resolve_path method properly tries the package path first (__init__.py) before falling back to a module file.


98-110: LGTM!

The loader correctly delegates execution to SourceFileLoader and registers the module with cloudpickle for by-value serialization, enabling proper multiprocessing support.

src/lighter/engine/runner.py (4)

23-40: LGTM!

Clean implementation of marker-based project discovery. The hardcoded "project" module name aligns with the PR objective of enabling project.* references in YAML configs.


43-62: LGTM!

The loader correctly delegates input type detection to Sparkwheel. The error re-wrapping provides clear messaging.


154-171: Verify dataloader method check is sufficient.

The check uses any() to see if the model has any dataloader method. A model with only train_dataloader would pass this check, but would fail at runtime if validate stage is requested without val_dataloader.

This is likely acceptable since PyTorch Lightning will raise a clear error, but consider whether a stage-aware check would provide better early feedback.


188-219: LGTM!

Good defensive checks for logger and log_dir existence. The yaml.dump options preserve config readability.

src/lighter/callbacks/csv_writer.py (4)

1-18: LGTM!

The imports are well-organized and all necessary for the CSV writing functionality with distributed support.


20-46: LGTM!

The class initialization is clean, with appropriate type hints and comprehensive documentation including a YAML configuration example.


60-83: LGTM!

The helper methods correctly handle various data types (lists, tuples, tensors including scalars) with appropriate conversions to Python types for CSV writing.


128-162: LGTM!

The distributed aggregation logic is well-implemented, correctly using all_gather_object to collect temporary file paths across ranks, with rank 0 performing the concatenation and cleanup. The fallback for non-distributed execution (lines 139-142) ensures compatibility with single-GPU setups.

Comment on lines 239 to 241
# Execute the stage method with CLI kwargs
# Note: argparse subparsers ensure only valid parameters for each stage are passed
stage_method = getattr(trainer, str(stage))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment.

The comment "argparse subparsers ensure only valid parameters for each stage are passed" is inaccurate. The subparsers don't validate parameter compatibility with Trainer methods—they only define which CLI flags are available. The --weights_only flag is currently added to all stages via add_common_args.

Remove or correct this comment after fixing the --weights_only issue above.

🤖 Prompt for AI Agents
In src/lighter/engine/runner.py around lines 239-241, the existing comment
claiming "argparse subparsers ensure only valid parameters for each stage are
passed" is misleading; replace it with a brief, accurate comment stating that
argparse subparsers only control which CLI flags are available and do not
validate compatibility with Trainer method signatures, and then remove the
incorrect assurance. Also, ensure the related --weights_only flag is not
indiscriminately added to all stages (it should be registered only for the
stage(s) that support it rather than via add_common_args) — if currently added
globally, move its registration to the appropriate per-stage subparser or update
add_common_args to accept a per-stage whitelist.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/lighter/utils/dynamic_imports.py (1)

150-152: Global side effects at import time.

The module installs a meta path finder and patches multiprocessing.reduction.dump when imported. While necessary for the functionality and documented in the module docstring, be aware that this affects all subsequent imports and multiprocessing usage in the process.

Consider documenting any potential interactions with other libraries that might also customize import mechanisms or multiprocessing serialization.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a3c567 and 68a1ca5.

📒 Files selected for processing (1)
  • src/lighter/utils/dynamic_imports.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/lighter/utils/dynamic_imports.py

[warning] 94-94: src/lighter/utils/dynamic_imports.py#L94
Added line #L94 was not covered by tests


[warning] 137-137: src/lighter/utils/dynamic_imports.py#L137
Added line #L137 was not covered by tests


[warning] 147-147: src/lighter/utils/dynamic_imports.py#L147
Added line #L147 was not covered by tests


[warning] 199-199: src/lighter/utils/dynamic_imports.py#L199
Added line #L199 was not covered by tests


[warning] 201-202: src/lighter/utils/dynamic_imports.py#L201-L202
Added lines #L201 - L202 were not covered by tests

🔇 Additional comments (1)
src/lighter/utils/dynamic_imports.py (1)

155-173: No action needed — cloudpickle API is stable and correctly versioned.

The cloudpickle.register_pickle_by_value() function is a documented, stable API available in cloudpickle 3.0.0 and later. The project's dependency constraint (cloudpickle>=3.0.0 in pyproject.toml) is appropriate. Both usages at lines 110 and 197 correctly pass module objects, which is the expected argument type. No version-specific compatibility issues exist.

Comment on lines +84 to +95
def _resolve_path(self, fullname: str, root_name: str, root_path: Path) -> Path | None:
"""Resolve 'project.models.net' to '/path/to/project/models/net.py'."""
if fullname == root_name:
return root_path / "__init__.py"

Args:
module_name: Name of the module to import.
relative = fullname[len(root_name) + 1 :].replace(".", "/")

Raises:
ImportError: If the module is not available.
# Try package first, then module
package_init = root_path / relative / "__init__.py"
if package_init.is_file():
return package_init
return root_path / f"{relative}.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider adding test coverage for edge cases.

The _resolve_path method has a branch (line 94-95) that returns a .py file path when no __init__.py exists. This path is not covered by tests according to static analysis.

While the logic is correct (the caller checks existence), adding a test case for importing a module (not a package) would improve reliability.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 94-94: src/lighter/utils/dynamic_imports.py#L94
Added line #L94 was not covered by tests

Comment on lines +131 to +142
def reducer_override(self, obj: Any) -> Any:
"""Use cloudpickle's reducer for objects not in dispatch_table."""
# If it's in ForkingPickler's extra reducers, let standard dispatch handle it
# _extra_reducers is a class attribute that exists at runtime but isn't typed
extra_reducers = cast(dict[type, Any], getattr(ForkingPickler, "_extra_reducers", {}))
if type(obj) in extra_reducers:
return NotImplemented

# For everything else, try cloudpickle's reducer_override
# This handles dynamically imported classes registered with register_pickle_by_value
pickler = cloudpickle.CloudPickler(BytesIO())
return pickler.reducer_override(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test coverage missing for multiprocessing internals path.

Line 137 returns NotImplemented when an object's type is in ForkingPickler._extra_reducers. This is a critical path that preserves ForkingPickler's special handling for multiprocessing objects (pipes, queues, connections). The lack of test coverage could hide regressions.

Consider adding a test that pickles a multiprocessing object (e.g., a Queue or Pipe) to verify this path works correctly.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 137-137: src/lighter/utils/dynamic_imports.py#L137
Added line #L137 was not covered by tests

🤖 Prompt for AI Agents
In src/lighter/utils/dynamic_imports.py lines 131-142, add a unit test that
exercises the ForkingPickler._extra_reducers branch: create a multiprocessing
object (e.g., multiprocessing.get_context('spawn').Queue() or a Pipe
connection), call reducer_override on that object and assert it returns
NotImplemented, then verify the object can still be pickled using the standard
forking/multiprocessing path (e.g., pickle.dumps with the appropriate
ForkingPickler/context or actually passing the object through a child process)
to ensure the special multiprocessing handling remains intact.

Comment on lines +195 to +202
try:
spec.loader.exec_module(module)
cloudpickle.register_pickle_by_value(module)
_registry.register(module_name, module_path)
except Exception:
# Clean up on failure so retry sees a clean state
sys.modules.pop(module_name, None)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Registry inconsistency fix confirmed.

The previous concern about registry being left inconsistent if module loading fails has been properly addressed. The _registry.register call now happens after successful exec_module (line 198), and the try-except block cleans up sys.modules on failure (line 201), ensuring consistent state for retries.

However, the error path (lines 199-202) is not covered by tests. Consider adding a test case that triggers a module loading failure (e.g., syntax error in __init__.py) to verify cleanup works correctly.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 199-199: src/lighter/utils/dynamic_imports.py#L199
Added line #L199 was not covered by tests


[warning] 201-202: src/lighter/utils/dynamic_imports.py#L201-L202
Added lines #L201 - L202 were not covered by tests

🤖 Prompt for AI Agents
In src/lighter/utils/dynamic_imports.py around lines 195 to 202, add a unit test
that intentionally triggers a module load failure (for example create a
temporary package directory with a syntax error in __init__.py), attempt to
import it through the dynamic import function used here, and assert that after
the failure sys.modules does not contain the failing module name and that
_registry has not registered the module; ensure the test cleans up temp files
and reverses any environment changes so it deterministically verifies the
cleanup path exercised by the except block.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/lighter/callbacks/csv_writer.py (1)

55-65: Re‑evaluate need for local stage check in setup

You still guard setup with if stage != Stage.PREDICT: return after calling super().setup(...). If BaseWriter.setup() already filters by stage, this extra check may be redundant and is currently untested (per coverage). Consider either:

  • Relying solely on BaseWriter.setup() for stage filtering, or
  • Keeping the check but adding a small test that asserts no file is created when stage != Stage.PREDICT.

Since this was already raised on a previous revision, feel free to keep it if BaseWriter.setup() has no stage logic or you prefer the explicit guard.

🧹 Nitpick comments (1)
src/lighter/callbacks/csv_writer.py (1)

169-176: Good hook coverage for cleanup; consider also removing temp files on exceptions

Calling _close_file() from both on_exception() and teardown() addresses the earlier file‑descriptor leak and makes cleanup robust across error paths and normal shutdown. One optional enhancement: if _temp_path is set, you could also unlink() it in these hooks to avoid leaving stray .tmp_rank*.csv files behind when prediction fails mid‑epoch.

For example:

    def on_exception(self, trainer: Trainer, pl_module: LighterModule, exception: BaseException) -> None:
        """Close the file on errors to prevent file handle leaks."""
-        self._close_file()
+        self._close_file()
+        if self._temp_path is not None and self._temp_path.exists():
+            self._temp_path.unlink()
+        self._temp_path = None

(Same pattern could be applied in teardown for the PREDICT stage.)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68a1ca5 and 98cdd5e.

📒 Files selected for processing (1)
  • src/lighter/callbacks/csv_writer.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/lighter/callbacks/csv_writer.py

[warning] 58-58: src/lighter/callbacks/csv_writer.py#L58
Added line #L58 was not covered by tests


[warning] 94-94: src/lighter/callbacks/csv_writer.py#L94
Added line #L94 was not covered by tests


[warning] 155-156: src/lighter/callbacks/csv_writer.py#L155-L156
Added lines #L155 - L156 were not covered by tests

🔇 Additional comments (2)
src/lighter/callbacks/csv_writer.py (2)

41-54: Centralized file handle cleanup looks good

The _close_file() helper plus the _csv_file/_csv_writer reset is a solid fix for the earlier resource-leak concern and simplifies cleanup logic across hooks. No issues here.


135-168: The callback signature is correct for this repository's PyTorch Lightning version.

The pyproject.toml pins pytorch-lightning>=2.1.3, which is major version 2.x. In PyTorch Lightning 2.x, the Callback.on_predict_epoch_end hook signature is (trainer, pl_module) without an outputs parameter. The current implementation matches this signature exactly, so no changes are needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/lighter/callbacks/csv_writer.py (1)

57-58: Redundant stage check (unaddressed from previous review).

The stage filtering at lines 57-58 duplicates the check already performed by BaseWriter.setup() (called at line 56). This was flagged in the previous review but remains unaddressed.

Apply this diff to remove the redundant check:

 def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None:
     super().setup(trainer, pl_module, stage)
-    if stage != Stage.PREDICT:
-        return
-
     # Create a temporary file for writing predictions
     self._temp_path = self.path.with_suffix(f".tmp_rank{trainer.global_rank}{self.path.suffix}")
🧹 Nitpick comments (1)
src/lighter/callbacks/csv_writer.py (1)

96-103: Consider validating all keys are present, not just one.

The current validation (lines 96-103) only ensures at least one configured key exists in outputs. However, later during record construction (lines 134-135), all keys are required, which means partial key coverage will fail late with a less informative error context.

For clearer error reporting, validate all keys upfront:

-# Validate that at least one configured key is present in outputs
-present_keys = [key for key in self.keys if key in outputs]
-if not present_keys:
-    missing_keys = self.keys
+# Validate that all configured keys are present in outputs
+missing_keys = [key for key in self.keys if key not in outputs]
+if missing_keys:
     raise KeyError(
-        f"CsvWriter: none of the configured keys {missing_keys} were found in outputs. "
+        f"CsvWriter: the following configured keys were not found in outputs: {missing_keys}. "
         f"Available keys in outputs: {list(outputs.keys())}"
     )

This ensures failures happen at validation time rather than mid-iteration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98cdd5e and 189179c.

📒 Files selected for processing (1)
  • src/lighter/callbacks/csv_writer.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/lighter/callbacks/csv_writer.py

[warning] 58-58: src/lighter/callbacks/csv_writer.py#L58
Added line #L58 was not covered by tests


[warning] 94-94: src/lighter/callbacks/csv_writer.py#L94
Added line #L94 was not covered by tests


[warning] 161-161: src/lighter/callbacks/csv_writer.py#L161
Added line #L161 was not covered by tests

🔇 Additional comments (2)
src/lighter/callbacks/csv_writer.py (2)

48-53: LGTM: Proper resource cleanup helper.

The _close_file() helper correctly implements defensive cleanup with null/closed checks and is properly invoked from on_exception(), teardown(), and on_predict_epoch_end(). This addresses the resource leak concern from the previous review.


141-173: LGTM: Robust distributed aggregation.

The implementation correctly handles both distributed and single-process scenarios:

  • Uses all_gather_object() to collect temp paths from all ranks
  • Rank 0 performs CSV concatenation with pandas and cleans up temp files
  • Defensive None checks guard against edge cases

The logic is sound for multi-GPU prediction workflows.

…dling

fix(tests): add test for multiprocessing objects in subprocess to verify ForkingPickler behavior
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/lighter/callbacks/csv_writer.py (1)

55-65: Redundant stage check in setup() (already handled by BaseWriter.setup).

BaseWriter.setup() returns early for non‑PREDICT stages, so the extra if stage != Stage.PREDICT here is dead code and hurts coverage. You can rely on the base implementation:

     def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None:
-        if stage != Stage.PREDICT:
-            return
         super().setup(trainer, pl_module, stage)

         # Create a temporary file for writing predictions

This keeps the logic in one place and removes an untested branch.

🧹 Nitpick comments (3)
src/lighter/callbacks/csv_writer.py (3)

41-54: Ctor/state handling looks solid; consider defensive copy of keys.

Implementation is clean and _close_file() correctly resets internal state. To avoid surprises if the caller mutates the original list in-place after construction, you could defensively copy:

-        self.keys = keys
+        self.keys = list(keys)

This keeps the writer’s configuration immutable from the outside.


67-90: Tighten write() validation to fail fast when configured keys are missing.

The per-sample loop still checks if key not in outputs and raises KeyError, even though the docstring says “These keys must be present in the outputs dictionary.” You can centralize this validation before any length logic or row construction:

     def write(self, outputs: dict[str, Any], batch: Any, batch_idx: int, dataloader_idx: int) -> None:
         if self._csv_writer is None:
             return

-        # Validate that at least one configured key is present in outputs
-        present_keys = [key for key in self.keys if key in outputs]
-        if not present_keys:
-            missing_keys = self.keys
-            raise KeyError(
-                f"CsvWriter: none of the configured keys {missing_keys} were found in outputs. "
-                f"Available keys in outputs: {list(outputs.keys())}"
-            )
+        # Validate that all configured keys are present in outputs
+        missing = [key for key in self.keys if key not in outputs]
+        if missing:
+            raise KeyError(
+                f"CsvWriter expected keys {missing} in outputs but they were missing. "
+                f"Available keys in outputs: {list(outputs.keys())}"
+            )

         # Determine the number of samples in the batch.
         num_samples = 0
         ...
-            for key in self.keys:
-                if key not in outputs:
-                    raise KeyError(f"CsvWriter expected key '{key}' in outputs but it was missing.")
-
-                value = outputs[key]
+            for key in self.keys:
+                value = outputs[key]
                 record.append(self._get_record_value(value, i))

This preserves current behavior (all keys required) but makes the failure mode immediate and avoids repeated presence checks inside the hot loop.

Also applies to: 92-140


141-182: Epoch-end aggregation and cleanup path looks correct; consider encoding for CSV opens.

The aggregation logic (close, gather per-rank temp paths, pandas.concat, final write, unlink temps, reset state) is coherent and should behave well in both single-process and DDP setups. One minor robustness tweak: explicitly set an encoding when opening the temp file to avoid platform-dependent defaults:

-        self._csv_file = open(self._temp_path, "w", newline="")
+        self._csv_file = open(self._temp_path, "w", newline="", encoding="utf-8")

That makes the whole CSV pipeline (csv.writer + pandas.read_csv / to_csv) encoding-predictable across environments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 189179c and bb9a1a2.

📒 Files selected for processing (1)
  • src/lighter/callbacks/csv_writer.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ibro45
Repo: project-lighter/lighter PR: 173
File: projects/cifar10/model.py:6-11
Timestamp: 2025-12-07T17:00:14.929Z
Learning: When reviewing code that uses LighterModule as a base class, the subclass does not need to define __init__ or forward methods. LighterModule provides __init__ that accepts network, criterion, optimizer, scheduler, and metrics parameters, and forward() that delegates to self.network. Subclasses only need to implement step methods (training_step, validation_step, test_step, predict_step). The network architecture is defined separately and passed via YAML configuration.
🧬 Code graph analysis (1)
src/lighter/callbacks/csv_writer.py (2)
src/lighter/callbacks/base_writer.py (3)
  • BaseWriter (13-75)
  • setup (37-55)
  • write (27-35)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/lighter/engine/runner.py (3)

32-36: Consider searching parent directories for the marker file.

The current implementation only checks Path.cwd() / "__lighter__.py". Users must run lighter from the exact directory containing the marker. Consider walking up the directory tree (e.g., checking ., .., ../.. until reaching a filesystem boundary) to allow running lighter from subdirectories within a project.

Example:

     @staticmethod
     def auto_discover_and_import() -> bool:
         """
         Auto-discover project from __lighter__.py marker file.
         Returns True if project was imported, False otherwise.
         """
-        cwd = Path.cwd()
-        marker = cwd / "__lighter__.py"
-
-        if not marker.exists():
-            return False
-
-        import_module_from_path("project", cwd)
-        logger.info(f"Imported 'project' module from '{cwd}'")
-        return True
+        current = Path.cwd()
+        for parent in [current, *current.parents]:
+            marker = parent / "__lighter__.py"
+            if marker.exists():
+                import_module_from_path("project", parent)
+                logger.info(f"Imported 'project' module from '{parent}'")
+                return True
+        return False

107-107: Make random seed configurable.

seed_everything() is called without arguments, using a random seed. For reproducibility, consider making the seed configurable via config or CLI.

Example:

-        seed_everything()
+        seed = config.get("seed")
+        if seed is not None:
+            seed_everything(seed)

Then users can add to their config:

seed: 42

361-361: Type inconsistency: args.command is string but Runner.run expects Stage.

args.command from argparse is a string ("fit", "validate", etc.), but Runner.run() is typed to accept stage: Stage. While this works because Stage is a StrEnum, explicitly converting improves type safety and makes the intent clear.

     # Execute command
     try:
-        Runner().run(args.command, args.inputs, **stage_kwargs)
+        Runner().run(Stage(args.command), args.inputs, **stage_kwargs)
     except Exception as e:
         # Suppress exception chain to avoid duplicate tracebacks
         raise e from None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb9a1a2 and 2df7442.

📒 Files selected for processing (1)
  • src/lighter/engine/runner.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ibro45
Repo: project-lighter/lighter PR: 173
File: projects/cifar10/model.py:6-11
Timestamp: 2025-12-07T17:00:14.929Z
Learning: When reviewing code that uses LighterModule as a base class, the subclass does not need to define __init__ or forward methods. LighterModule provides __init__ that accepts network, criterion, optimizer, scheduler, and metrics parameters, and forward() that delegates to self.network. Subclasses only need to implement step methods (training_step, validation_step, test_step, predict_step). The network architecture is defined separately and passed via YAML configuration.
🧬 Code graph analysis (1)
src/lighter/engine/runner.py (3)
src/lighter/utils/dynamic_imports.py (2)
  • import_module_from_path (155-204)
  • get (44-46)
src/lighter/utils/types/enums.py (1)
  • Stage (14-18)
tests/unit/test_engine_runner.py (2)
  • test_cli_commands (207-223)
  • test_setup_stage_with_project (132-143)

@ibro45 ibro45 merged commit 8361ca1 into main Dec 9, 2025
8 checks passed
@ibro45 ibro45 deleted the redesign branch December 9, 2025 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants