-
Notifications
You must be signed in to change notification settings - Fork 7
feat: Redesign architecture and embrace Lightning native patterns #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add explicit Callable[..., Any] type hints in adapters and callbacks - Add proper generic types for dicts and lists in schema and utils - Add type annotations to writer functions (write_tensor, write_image, write_video) - Improve type safety with Optional and Union types where appropriate - Add type: ignore directives for complex generic scenarios These changes improve IDE support, catch potential type errors earlier, and make the codebase more maintainable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Replace stateful self.mode attribute with dynamic _get_current_mode() method that queries the trainer's state (training, validating, testing, predicting). Benefits: - Mode is always in sync with trainer state - Eliminates potential stale mode bugs - Makes code more robust and easier to reason about - Properly handles sanity check as validation mode This change affects _prepare_batch, _forward, _calculate_loss, and _calculate_metrics methods, all of which now query mode dynamically. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Replace separate config and overrides parameters with unified inputs list. Sparkwheel now auto-detects file paths vs overrides based on content. Changes: - Replace config + overrides params with single inputs list - Remove comma-separated config file parsing (use space separation) - Simplify CLI examples: 'lighter fit base.yaml exp.yaml lr=0.001' - Update Runner.run() to use Config.update() with auto-detection - Improve CLI help text and examples This aligns with Sparkwheel's idiomatic usage pattern and simplifies the user experience. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Upgrade sparkwheel to >=0.0.6 for improved auto-detection features - Update uv.lock with new dependency versions - Fix GitHub workflow YAML formatting for ignoreLabels field 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Update documentation to reflect dynamic mode detection from trainer state. Add explanations of how _get_current_mode() queries trainer.training, trainer.validating, etc. to ensure mode is always in sync. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Update CLI tests to use new inputs parameter instead of config+overrides - Update Runner tests to pass inputs list instead of separate parameters - Remove tests for self.mode attribute (replaced with _get_current_mode()) - Update integration tests to use new CLI argument format - Fix type annotations in test fixtures All tests updated to align with the refactored Runner CLI interface and System mode detection mechanism. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Major architectural changes: - Replace System class with LighterModule (model.py) for cleaner separation - Add Data module (data.py) for dataloader configuration - Remove adapters pattern - functionality integrated into LighterModule - Delete utility modules: model.py, patches.py, types/containers.py - Update type enums and utility functions to support new architecture - Simplify data utilities and misc helpers This refactor simplifies the core API by removing the adapter abstraction layer and providing a more direct interface through LighterModule. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Callback system improvements: - Flatten writer callback structure (remove nested writer/ directory) - Create base_writer.py with BaseWriter abstract class - Consolidate csv_writer.py and file_writer.py at top level - Enhance Freezer callback with improved parameter handling - Remove callbacks/utils.py (functionality moved to callbacks themselves) This simplifies the callback organization and makes writers easier to discover and import. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Runner improvements: - Leverage Sparkwheel's auto-detection for config validation - Remove separate schema.py module (validation now handled by Sparkwheel) - Simplify config loading and pruning logic - Improve error handling and CLI interface - Better integration with LighterModule instead of System This change reduces code complexity by relying on Sparkwheel's built-in schema validation capabilities rather than maintaining a separate schema module. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Example project changes: - Add __lighter__.py for Sparkwheel auto-discovery - Add model.py with CIFAR10Module (replaces inline System config) - Move experiments/ → configs/ directory - Create example.yaml, example_autodiscovery.yaml, example_overrides.yaml - Update configurations to use LighterModule and new structure This demonstrates the new pattern where models are defined in Python files and referenced in configs, rather than being fully defined in YAML. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Test suite changes: - Add conftest.py with shared fixtures - Add tests/fixtures/ with plain Lightning module fixtures - Refactor unit tests for LighterModule (replace System tests) - Update callback tests for new writer structure - Refactor runner tests for schema-less validation - Add test_plain_lightning.py for plain Lightning integration - Remove obsolete tests: adapters, schema, containers, model utils - Update integration tests for CIFAR and config validation All tests now work with the refactored Model/Data/Callback architecture and Sparkwheel-based configuration system. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Documentation overhaul: - Delete old structure: design/, how-to/, tutorials/, migration/ - Add comprehensive guides: best-practices, configuration, custom-code, lighter-module, lightning-module, training - Add practical examples: image-classification, multi-gpu - Add quickstart guide and CLI reference - Rewrite index.md and FAQ for new architecture - Update mkdocs.yml navigation structure - Remove outdated images (coverage.svg, feature screenshots, diagrams) New docs focus on practical guides and examples rather than abstract design documents, making it easier for users to get started. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Project infrastructure updates: - Add .github/workflows/release.yml for automated releases - Update pyproject.toml dependencies and metadata - Expand .gitignore for better coverage - Rewrite README.md to reflect new architecture and features - Update uv.lock with latest dependency versions These changes complete the redesign by updating all project-level configuration to support the new architecture. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Documentation improvements: - Add comprehensive coverage of _args_, _mode_, and _disabled_ syntax - Document callable and debug modes with practical examples - Expand logging documentation with detailed loss/metrics/optimizer stats - Add examples for multi-component loss logging - Clarify automatic optimizer stats logging (LR, momentum, betas, weight decay) - Update quick reference table with new syntax elements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Breaking change to metrics API: - Remove automatic list-to-MetricCollection conversion in LighterModule - Require explicit MetricCollection wrapper in configs for multiple metrics - Add validation with helpful error messages showing correct syntax - Support both single Metric and MetricCollection in logging - Update CIFAR10 examples to use MetricCollection wrapper - Update test suite for new metrics API This change makes metric configuration more explicit and aligns with torchmetrics best practices for handling multiple metrics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add DDP testing infrastructure: - Add test_callbacks_writers_ddp.py with 10 DDP tests for writers - Test rank-specific temp files, distributed gathering, and cleanup - Test barrier synchronization and edge cases (empty ranks, None paths) - Mock distributed environment to enable fast, debuggable tests on single device - Update existing error tests for consistency This mock-based approach provides fast feedback during development while ensuring comprehensive coverage of distributed code paths. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
WalkthroughRemoves legacy System, adapters, schema, many writer/utils; adds LighterModule and LighterDataModule, writer primitives (BaseWriter, FileWriter, CsvWriter), Runner refactor (ProjectImporter + ConfigLoader, list-of-configs contract), CI/docs reorganization, CIFAR example project, and new tests/fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant Runner
participant ConfigLoader
participant ProjectImporter
participant Trainer
participant Model
participant DataModule
CLI->>Runner: run(stage, inputs: list, **stage_kwargs)
Runner->>ConfigLoader: load(inputs)
ConfigLoader-->>Runner: Config
Runner->>ProjectImporter: auto_discover_and_import()
ProjectImporter-->>Runner: success/failure
Runner->>Runner: _resolve_model(config)
Runner->>Runner: _resolve_trainer(config)
Runner->>Runner: _resolve_datamodule(config, model)
Runner->>Trainer: start(stage, model, datamodule, **stage_kwargs)
Trainer->>Model: training_step / validation_step / test_step / predict_step
Model-->>Trainer: outputs (loss / pred / metrics)
Trainer-->>Runner: save artifacts / logs, finish
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (2)
💤 Files with no reviewable changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pyproject.toml (2)
7-7: Critical: Version number not updated to reflect v3.0 release.The PR objectives specify a major version bump to v3.0 with breaking changes (System and Adapters removed, config keys renamed), but the version in pyproject.toml remains at 0.0.5. This needs to be updated to 3.0.0 before release.
-version = "0.0.5" +version = "3.0.0"
83-105: Removal ofdisallow_any_generics = truereduces type safety.The mypy configuration previously enforced stricter type checking with
disallow_any_generics = true, but this constraint has been removed. Given the scale of the architectural refactor (1,500 LOC → 999 LOC) and the introduction of new abstractions (LighterModule, LighterDataModule), this relaxation degrades type safety guarantees at a critical moment.Consider re-enabling this constraint or documenting why it was necessary to relax it.
[tool.mypy] # https://mypy.readthedocs.io/en/latest/config_file.html#using-a-pyproject-toml-file python_version = "3.10" pretty = true show_traceback = true allow_redefinition = false check_untyped_defs = true disallow_incomplete_defs = true +disallow_any_generics = true ignore_missing_imports = true implicit_reexport = false no_implicit_optional = true show_column_numbers = true show_error_codes = true show_error_context = true strict_equality = true strict_optional = true warn_no_return = true warn_redundant_casts = true warn_return_any = true warn_unreachable = true warn_unused_configs = true warn_unused_ignores = true
🧹 Nitpick comments (23)
src/lighter/utils/misc.py (1)
127-129: Consider documenting why weight_decay is conditionally extracted.The
weight_decayhyperparameter is only extracted when non-zero, while other hyperparameters (lr, momentum, betas) are always extracted when present. This design choice reduces log clutter but creates inconsistency—users might expectweight_decayto always appear in the returned dictionary.Consider either:
- Adding a comment explaining the rationale for conditional extraction
- Or extracting weight_decay unconditionally for consistency
src/lighter/utils/data.py (1)
10-10: Consider adding type hint for dataset length.The type hint
Datasetis generic, but the code assumeslen(dataset)is available (line 44). While this is a valid assumption for most PyTorch datasets, consider documenting this requirement or usingSizedprotocol for clarity.Optional: Add to docstring:
Args: batch: The batch of data from the DataLoader. - dataset: The dataset being used, which should return `None` for corrupted examples. + dataset: The dataset being used, which should return `None` for corrupted examples. + Must be a sized dataset (implement `__len__`). default_collate_fn: The default collate function to use once the batch is clean.docs/reference/cli.md (1)
80-88: Consider adding language hint for directory structure.The directory structure visualization is clear, but the markdown linter suggests adding a language specifier. Consider using
textorplaintextto silence the linter warning while maintaining readability.-``` +```text outputs/ └── YYYY-MM-DD/ └── HH-MM-SS/docs/quickstart.md (2)
28-35: Add language specifier to fenced code block.The directory structure code block should specify a language (e.g.,
textor leave it for plain formatting) for consistency with MkDocs rendering.Apply this diff:
Create these files: -``` +```text mnist_classifier/ ├── __lighter__.py # Marker file (tells Lighter this is a project) ├── __init__.py # Makes it a Python package ├── model.py # Your model code └── configs/ └── config.yaml # Your experiment config--- `61-154`: **Convert indented code blocks to fenced blocks.** The tabbed content uses indented code blocks. For consistency with the rest of the documentation and better syntax highlighting, consider using fenced code blocks with language specifiers. This applies to both the LightningModule tab (starting at line 63) and the LighterModule tab (starting at line 157). </blockquote></details> <details> <summary>docs/guides/lighter-module.md (1)</summary><blockquote> `642-646`: **Add language specifier to code block.** The example logged metrics output should specify a language (e.g., `text` or `yaml`) for consistency. Apply this diff: ```diff **Example logged metrics:** -``` +```text train/optimizer/Adam/lr/epoch: 0.001 train/optimizer/Adam/beta1/epoch: 0.9 train/optimizer/Adam/beta2/epoch: 0.999</blockquote></details> <details> <summary>.github/workflows/ci.yml (1)</summary><blockquote> `21-100`: **CI workflow looks solid; optionally gate tests on quality job** The quality matrix and test job are set up cleanly with sensible timeouts, concurrency, and artifacts. If you’d like to avoid burning cycles when format/lint/types fail, you could add `needs: quality` to the `tests` job so it only runs after a successful quality pass. </blockquote></details> <details> <summary>tests/integration/test_cifar.py (1)</summary><blockquote> `3-45`: **Auto-discovery setup and config paths look correct** Switching into `projects/cifar10` via `monkeypatch.chdir` and then using `configs/example.yaml` plus a relative overrides file matches the new `Runner.run(stage, inputs=list)` API and should exercise the full config-loading pipeline. If you ever want CI coverage for this path, consider adding a lighter-weight variant (e.g., fewer epochs/smaller dataset) without the `slow` mark. </blockquote></details> <details> <summary>docs/guides/lightning-module.md (1)</summary><blockquote> `422-440`: **Minor markdownlint nit: use headings instead of bold text for “Option 1/2”** Lines where you use `**Option 1: In Config**` / `**Option 2: In Code**` are effectively headings and trigger MD036 (`no-emphasis-as-heading`). If you care about a clean markdownlint run, consider changing them to `### Option 1: In Config` / `### Option 2: In Code`. </blockquote></details> <details> <summary>docs/guides/training.md (1)</summary><blockquote> `49-65`: **Add languages to a few fenced code blocks for markdownlint compliance** The directory tree and config-structure examples are fenced with bare ``` blocks, which triggers MD040 (`fenced-code-language`). Adding a language such as `text` to those blocks (e.g., ```text) will keep markdownlint happy without affecting how the docs render. Also applies to: 761-789 </blockquote></details> <details> <summary>tests/conftest.py (1)</summary><blockquote> `19-41`: **Consider adding a default value to mock `strategy.broadcast` for safety** The current mock at line 35 is `trainer.strategy.broadcast = lambda x, src: x`. Based on codebase analysis, the only call to this method is in `src/lighter/callbacks/base_writer.py:41`, which explicitly passes `src=0`. However, to make the fixture more defensive against future usage patterns, adding a default value is recommended: ```diff - trainer.strategy.broadcast = lambda x, src: x + trainer.strategy.broadcast = lambda x, src=0: xThis change is optional but improves fixture robustness without impacting current test behavior.
src/lighter/data.py (1)
83-110: Confirm Lightning’s expectations when dataloaders areNoneThe wrapper is clean, but the current behavior will return
Nonefromval_dataloader/test_dataloader/predict_dataloaderwhen they aren’t configured. LightningDataModule examples and helpers generally return actual dataloaders (or empty iterables) and skip stages by omitting the method or using flags, not by returningNone. (lightning.ai)It would be safer to either:
- enforce that any defined loader argument is non-
None, or- map
Noneto an empty list (or similar) so that Trainer sees a valid “no data” dataloader.Otherwise users may hit confusing runtime errors if they configure only a train loader.
src/lighter/callbacks/csv_writer.py (1)
32-151: CsvWriter logic looks solid; consider a couple of robustness tweaksThe per-rank temp-file pattern, sequence length inference, and DDP merge all look correct and align well with Lightning’s predict flow.
Two minor improvements you might consider:
- Empty-prediction epochs
If no rank writes any rows (e.g., zero-length dataset withallow_zero_length_dataloader_with_multiple_devices=True),dfswill be empty andpd.concat(dfs, ...)will raise. A small guard avoids that edge case:
dfs = [pd.read_csv(path) for path in all_temp_paths if path is not None]df = pd.concat(dfs, ignore_index=True)
dfs = [pd.read_csv(path) for path in all_temp_paths if path is not None]if not dfs:returndf = pd.concat(dfs, ignore_index=True)
- Doc/typo
The docstring inon_predict_epoch_endreads “saves the temporary file it to the final destination”; drop the extra “it” for clarity.These are minor; the core design looks good.
src/lighter/callbacks/base_writer.py (1)
37-75: BaseWriter DDP/path setup is clean; keep an eye on the private Lightning hookThe path broadcast + directory creation logic for
Stage.PREDICTlooks correct and mirrors common Lightning patterns. The predict hook also sensibly bails on empty outputs and delegates towrite.The only concern is already called out in the comment: touching
trainer.predict_loop._predictionsis relying on a private Lightning attribute and may break on trainer internals changes. Once the upstream issue is resolved, it’d be good to hide this behind a small utility or feature flag so it’s easy to remove.docs/examples/multi-gpu.md (1)
160-162: Optional wording tweak: avoid repeated “very large …” phrasingMinor style nit: “very large batch sizes” and “very large models” are slightly repetitive. If you care about polish, consider alternatives like “extremely large batch sizes” or “models that don’t fit on a single GPU.”
Also applies to: 295-299
docs/index.md (1)
36-36: Optional: Address markdownlint MD036/MD046 warnings.If you care about a clean markdownlint run, consider:
- Turning
**YAML configuration for PyTorch Lightning experiments**into a proper heading (e.g.,## YAML configuration...) to satisfy MD036.- Ensuring all code blocks are fenced rather than indented (MD046), especially around the Python examples in the comparison tabs.
Also applies to: 167-217
projects/cifar10/model.py (1)
13-41: Guard against missingcriterionin CIFAR10Model.Both
training_stepandvalidation_stepassumeself.criterionis callable; if a config omits it, users will get a vagueNoneType is not callableerror. Consider either:
- Enforcing it explicitly:
if self.criterion is None: raise RuntimeError("CIFAR10Model requires `criterion` to be configured in the YAML (e.g. CrossEntropyLoss).") loss = self.criterion(pred, y)or
- Falling back to a sensible default loss when
criterionisNone.This keeps the example more robust and beginner‑friendly.
src/lighter/callbacks/freezer.py (1)
60-81: Freezer behavior looks correct; consider loosening type hints and tightening logging.The freezing/unfreezing logic (including
until_step/until_epochand thenames/name_starts_with+ exception handling) looks consistent and matches the described behavior.Two small polish ideas:
on_train_batch_startcurrently annotatespl_module: LighterModule, but_set_model_requires_gradalready works with anytorch.nn.Module. You could widen the annotation (and docstring) toLightningModule | Moduleto avoid suggesting that Freezer only works with LighterModule.- In
_set_model_requires_grad, you append exception parameters tounfrozen_layerswhenrequires_gradisFalse, but only logunfrozen_layerswhenrequires_gradisTrue. Either skip appending them in the freezing path or log them in both cases for clearer diagnostics.These don’t affect correctness but would make the callback a bit clearer to users.
Also applies to: 89-147
src/lighter/engine/runner.py (2)
52-69: Handle auto-discovery failures fromProjectImportermore defensively.
ProjectImporter.auto_discover_and_import()treats the presence of__lighter__.pyas a signal to import the current working directory as aprojectpackage, butimport_module_from_pathrequires an__init__.pyand will raise if it is missing. Right now that exception will bubble up and break the run.Consider catching
FileNotFoundError/ModuleNotFoundErrorhere, logging a warning, and returningFalseso users with a marker file but no package scaffold get a clear message instead of a hard failure.
31-38: Optional: Avoid output-dir collisions within the same second.
OutputDir.create_timestamped()usesYYYY-MM-DD/HH-MM-SSas the directory path. If multiple runs start within the same second, they’ll share the same directory. If you want strict per-run isolation, you could append a short random suffix or a monotonic counter to the timestamp component.src/lighter/callbacks/file_writer.py (1)
153-161: Adjust loguru message formatting for the empty-values debug log.The debug call
logger.debug("FileWriter value key '%s' yielded no samples; skipping batch", self.value_key)uses
%s-style formatting, but loguru expects{}placeholders or f-strings, soself.value_keywon’t be interpolated. You could switch to either:logger.debug("FileWriter value key '{}' yielded no samples; skipping batch", self.value_key) # or logger.debug(f"FileWriter value key '{self.value_key}' yielded no samples; skipping batch")for clearer logs.
src/lighter/model.py (2)
81-127: Constructor and_prepare_metricslook correct; optional centralization of metric conversionInitializing
network, optionalcriterion/optimizer/scheduler, and validating metrics via_prepare_metricsis type-safe and plays well with torchmetrics and Lightning module registration.If you later want to avoid duplicating the “wrap into
MetricCollection” logic that also exists inlighter.utils.containers.Metrics._convert_to_collection, you could centralize that conversion in a shared helper or accept theMetricscontainer directly here—but that’s a minor, optional refactor, not a blocker.
385-425:configure_optimizersandmodeproperty behavior
configure_optimizersenforcing that an optimizer is provided (via a clearValueError) is reasonable for training-centric usage, and the dictionary shape you return is compatible with Lightning’s optimizer configuration patterns. Themodeproperty nicely wraps the trainer’s state flags, including mapping sanity checks to validation mode, which keeps log prefixes uniform across hooks.One thing to double-check: if you expect some users to run pure evaluation or prediction with
LighterModuleinstances that don’t configure an optimizer, you may want to allowself.optimizerto beNoneand haveconfigure_optimizersreturnNonein that case, since Lightning accepts that for optimizer-free loops. Whether that’s desirable depends on how tightly you want to couple LighterModule to training.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
assets/images/coverage.svgis excluded by!**/*.svgassets/images/features_dark.pngis excluded by!**/*.pngassets/images/features_light.pngis excluded by!**/*.pngdocs/assets/images/overview_all.pngis excluded by!**/*.pngdocs/assets/images/overview_system.pngis excluded by!**/*.pnguv.lockis excluded by!**/*.lock
📒 Files selected for processing (80)
.github/actions/setup/action.yml(1 hunks).github/scripts/test_summary.py(1 hunks).github/workflows/check_pull_request_title.yml(1 hunks).github/workflows/ci-full.yml(1 hunks).github/workflows/ci.yml(1 hunks).github/workflows/code_quality.yml(0 hunks).github/workflows/dependency-review.yml(1 hunks).github/workflows/docs-publish.yml(1 hunks).github/workflows/publish.yml(3 hunks).github/workflows/release.yml(1 hunks).github/workflows/tests.yml(0 hunks).gitignore(1 hunks)README.md(2 hunks)docs/design/adapters.md(0 hunks)docs/design/overview.md(0 hunks)docs/design/philosophy.md(0 hunks)docs/design/system.md(0 hunks)docs/examples/image-classification.md(1 hunks)docs/examples/multi-gpu.md(1 hunks)docs/faq.md(1 hunks)docs/guides/best-practices.md(1 hunks)docs/guides/configuration.md(1 hunks)docs/guides/custom-code.md(1 hunks)docs/guides/lighter-module.md(1 hunks)docs/guides/lightning-module.md(1 hunks)docs/guides/training.md(1 hunks)docs/how-to/adapters.md(0 hunks)docs/how-to/configuration.md(0 hunks)docs/how-to/experiment_tracking.md(0 hunks)docs/how-to/freezers.md(0 hunks)docs/how-to/inferers.md(0 hunks)docs/how-to/metrics.md(0 hunks)docs/how-to/project_module.md(0 hunks)docs/how-to/recipes.md(0 hunks)docs/how-to/run.md(0 hunks)docs/how-to/troubleshooting.md(0 hunks)docs/how-to/writers.md(0 hunks)docs/index.md(1 hunks)docs/migration/from-pytorch-lightning.md(0 hunks)docs/quickstart.md(1 hunks)docs/reference/cli.md(1 hunks)docs/tutorials/get-started.md(0 hunks)mkdocs.yml(3 hunks)projects/cifar10/__lighter__.py(1 hunks)projects/cifar10/configs/example.yaml(1 hunks)projects/cifar10/configs/example_autodiscovery.yaml(1 hunks)projects/cifar10/configs/example_overrides.yaml(1 hunks)projects/cifar10/experiments/example.yaml(0 hunks)projects/cifar10/experiments/example_overrides.yaml(0 hunks)projects/cifar10/model.py(1 hunks)pyproject.toml(2 hunks)src/lighter/__init__.py(1 hunks)src/lighter/adapters.py(0 hunks)src/lighter/callbacks/__init__.py(1 hunks)src/lighter/callbacks/base_writer.py(1 hunks)src/lighter/callbacks/csv_writer.py(1 hunks)src/lighter/callbacks/file_writer.py(1 hunks)src/lighter/callbacks/freezer.py(2 hunks)src/lighter/callbacks/utils.py(0 hunks)src/lighter/callbacks/writer/base.py(0 hunks)src/lighter/callbacks/writer/file.py(0 hunks)src/lighter/callbacks/writer/table.py(0 hunks)src/lighter/data.py(1 hunks)src/lighter/engine/runner.py(4 hunks)src/lighter/engine/schema.py(0 hunks)src/lighter/model.py(1 hunks)src/lighter/system.py(0 hunks)src/lighter/utils/data.py(2 hunks)src/lighter/utils/dynamic_imports.py(1 hunks)src/lighter/utils/logging.py(2 hunks)src/lighter/utils/misc.py(1 hunks)src/lighter/utils/model.py(0 hunks)src/lighter/utils/patches.py(0 hunks)src/lighter/utils/types/containers.py(0 hunks)src/lighter/utils/types/enums.py(0 hunks)tests/configs/test1.yaml(0 hunks)tests/conftest.py(1 hunks)tests/fixtures/__init__.py(1 hunks)tests/fixtures/plain_lightning_modules.py(1 hunks)tests/integration/test_cifar.py(1 hunks)
💤 Files with no reviewable changes (33)
- tests/configs/test1.yaml
- docs/how-to/metrics.md
- docs/how-to/project_module.md
- docs/tutorials/get-started.md
- docs/how-to/adapters.md
- docs/design/adapters.md
- docs/design/overview.md
- src/lighter/system.py
- docs/how-to/writers.md
- src/lighter/callbacks/writer/table.py
- src/lighter/callbacks/writer/file.py
- docs/how-to/configuration.md
- src/lighter/callbacks/utils.py
- src/lighter/utils/model.py
- docs/how-to/freezers.md
- src/lighter/utils/patches.py
- docs/how-to/experiment_tracking.md
- .github/workflows/tests.yml
- projects/cifar10/experiments/example.yaml
- src/lighter/utils/types/containers.py
- docs/how-to/recipes.md
- docs/how-to/troubleshooting.md
- docs/how-to/run.md
- docs/design/system.md
- docs/migration/from-pytorch-lightning.md
- docs/how-to/inferers.md
- src/lighter/engine/schema.py
- docs/design/philosophy.md
- projects/cifar10/experiments/example_overrides.yaml
- src/lighter/callbacks/writer/base.py
- src/lighter/utils/types/enums.py
- .github/workflows/code_quality.yml
- src/lighter/adapters.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-27T02:17:12.626Z
Learnt from: ibro45
Repo: project-lighter/lighter PR: 142
File: lighter/engine/schema.py:0-0
Timestamp: 2025-01-27T02:17:12.626Z
Learning: The adapters schema in Lighter requires:
- batch and logging keys for all modes (train, val, test, predict)
- criterion key only for train and val modes
- metrics key for train, val, and test modes but not for predict mode
Applied to files:
projects/cifar10/configs/example.yaml
🧬 Code graph analysis (15)
projects/cifar10/configs/example_overrides.yaml (1)
tests/unit/test_engine_config.py (1)
test_config_overrides(87-119)
src/lighter/utils/data.py (1)
tests/unit/test_utils_data.py (2)
test_collate_replace_corrupted_basic(4-27)test_collate_replace_corrupted_all_corrupted(30-42)
src/lighter/callbacks/freezer.py (2)
src/lighter/model.py (1)
LighterModule(22-425)tests/unit/test_callbacks_freezer.py (2)
test_freezer_set_model_requires_grad_with_exceptions(232-269)test_freezer_functionality(84-97)
src/lighter/utils/misc.py (1)
tests/unit/test_utils_misc.py (2)
test_get_optimizer_stats_with_betas(143-181)test_get_optimizer_stats(98-140)
tests/fixtures/plain_lightning_modules.py (1)
src/lighter/model.py (1)
LighterModule(22-425)
docs/guides/lighter-module.md (1)
src/lighter/system.py (1)
System(28-349)
tests/conftest.py (2)
tests/unit/test_model.py (1)
mock_trainer_state(24-34)tests/unit/test_engine_runner.py (1)
mock_trainer(22-31)
src/lighter/callbacks/__init__.py (3)
src/lighter/callbacks/csv_writer.py (1)
CsvWriter(20-151)src/lighter/callbacks/file_writer.py (1)
FileWriter(112-212)src/lighter/callbacks/writer/table.py (2)
TableWriter(17-72)__init__(26-28)
src/lighter/model.py (3)
src/lighter/utils/misc.py (1)
get_optimizer_stats(86-131)src/lighter/utils/types/enums.py (1)
Mode(21-25)src/lighter/utils/patches.py (1)
PatchedModuleDict(8-99)
src/lighter/callbacks/base_writer.py (6)
src/lighter/utils/types/enums.py (1)
Stage(14-18)src/lighter/callbacks/csv_writer.py (2)
write(76-117)setup(39-49)src/lighter/callbacks/file_writer.py (2)
write(153-184)setup(135-151)src/lighter/callbacks/writer/base.py (5)
BaseWriter(20-143)setup(69-102)on_predict_batch_end(104-143)__init__(33-46)write(56-67)src/lighter/callbacks/writer/table.py (1)
TableWriter(17-72)src/lighter/callbacks/writer/file.py (2)
FileWriter(19-64)writers(40-48)
src/lighter/callbacks/file_writer.py (3)
src/lighter/utils/types/enums.py (1)
Stage(14-18)src/lighter/callbacks/writer/file.py (3)
writers(40-48)FileWriter(19-64)write(50-64)src/lighter/callbacks/writer/base.py (2)
setup(69-102)write(56-67)
src/lighter/__init__.py (3)
src/lighter/data.py (1)
LighterDataModule(12-110)src/lighter/engine/runner.py (1)
Runner(100-248)src/lighter/model.py (1)
LighterModule(22-425)
projects/cifar10/model.py (1)
src/lighter/model.py (1)
LighterModule(22-425)
tests/integration/test_cifar.py (2)
src/lighter/engine/runner.py (1)
run(107-152)src/lighter/utils/types/enums.py (1)
Stage(14-18)
src/lighter/engine/runner.py (2)
src/lighter/utils/dynamic_imports.py (1)
import_module_from_path(82-111)src/lighter/utils/types/enums.py (1)
Stage(14-18)
🪛 LanguageTool
docs/reference/cli.md
[style] ~38-~38: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...
(ADVERB_REPETITION_PREMIUM)
[style] ~107-~107: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...
(ADVERB_REPETITION_PREMIUM)
[style] ~149-~149: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...
(ADVERB_REPETITION_PREMIUM)
[style] ~198-~198: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...) | No | | --weights_only BOOL | Load only weights (security option) | No | | `OVE...
(ADVERB_REPETITION_PREMIUM)
docs/quickstart.md
[style] ~384-~384: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 3141 characters long)
Context: ... Q: Can I use multiple GPUs? A: Yes! Just add trainer::devices=-1 (all GPU...
(EN_EXCESSIVE_EXCLAMATION)
docs/guides/lighter-module.md
[style] ~957-~957: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 5464 characters long)
Context: ... give you YAML configs and CLI overrides! ## Next Steps - [Lightning Module Gui...
(EN_EXCESSIVE_EXCLAMATION)
docs/faq.md
[style] ~255-~255: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 3714 characters long)
Context: ...5/joss.08101) ## Can I contribute? Yes! Lighter is open source: - **Report bug...
(EN_EXCESSIVE_EXCLAMATION)
docs/guides/best-practices.md
[grammar] ~515-~515: Use a hyphen to join words.
Context: ...) ### Use Structured Logging Group related metrics: python # Good - gro...
(QB_NEW_EN_HYPHEN)
docs/examples/multi-gpu.md
[style] ~160-~160: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...## Square Root Scaling Alternative for very large batch sizes: ```yaml model: optimize...
(EN_WEAK_ADJECTIVE)
[style] ~297-~297: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...FSDP (Fully Sharded Data Parallel) For very large models that don't fit on single GPU: `...
(EN_WEAK_ADJECTIVE)
🪛 markdownlint-cli2 (0.18.1)
docs/guides/training.md
49-49: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
761-761: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
782-782: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/reference/cli.md
80-80: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/lightning-module.md
422-422: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
431-431: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
docs/quickstart.md
28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
63-63: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
157-157: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
docs/index.md
36-36: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
167-167: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
217-217: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
docs/guides/lighter-module.md
642-642: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/examples/image-classification.md
24-24: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
246-246: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/best-practices.md
15-15: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
239-239: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
239-239: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
356-356: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
356-356: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
729-729: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/custom-code.md
17-17: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
74-74: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
517-517: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
636-636: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
636-636: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
694-694: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/configuration.md
179-179: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
🔇 Additional comments (38)
.gitignore (1)
171-171: LGTM!Both additions are appropriate for the architectural redesign:
- Line 171:
**/outputs/aligns with the Runner improvements that now generate timestamped output directories. Pattern is consistent with existing artifact ignores (lightning_logs/,logs/,tensorboard/).- Line 176:
AGENTS.mdextends the existing agent-file ignore pattern (CLAUDE.md,GEMINI.md), preventing accidental commits of AI agent state.Also applies to: 176-176
src/lighter/utils/dynamic_imports.py (1)
106-107: Good defensive programming.This guard prevents a potential
AttributeErroron line 109 ifspec.loaderisNone, providing a clearer error message instead. According to Python's importlib documentation, the loader attribute can indeed beNonein certain edge cases.src/lighter/utils/misc.py (3)
86-106: Well-documented expansion of hyperparameter extraction.The docstring clearly describes the expanded functionality and provides helpful examples of the key naming conventions for all extracted hyperparameters.
107-115: Clean key naming convention.The base_key + conditional suffix approach keeps keys simple for single-group optimizers while clearly distinguishing multiple groups. The learning rate extraction is straightforward.
121-125: Original review comment is incorrect—implementation matches test expectations.Verification shows the actual test assertions in
tests/unit/test_utils_misc.pyexpect exactly what the implementation creates:
- Tests expect:
"optimizer/Adam/beta1"and"optimizer/Adam/beta2"keys- Implementation creates:
stats_dict[f"{base_key}/beta1{suffix}"]andstats_dict[f"{base_key}/beta2{suffix}"]The original review comment incorrectly claimed tests expect
"optimizer/Adam/momentum", but the actual test file asserts"optimizer/Adam/beta1"and"optimizer/Adam/beta2". The implementation is correct and aligns with test expectations.Likely an incorrect or invalid review comment.
pyproject.toml (1)
69-81: Dependency and test tooling updates align with PR improvements.The additions of
types-PyYAML>=6.0.0andpytest-github-actions-annotate-failures>=0.2.0support better type checking for YAML configuration and improved test reporting, both consistent with the PR's refactoring goals and CI/CD improvements..github/workflows/dependency-review.yml (1)
1-22: LGTM!The dependency review workflow is well-configured with appropriate permissions, timeout, and failure thresholds. The
comment-summary-in-pr: on-failuresetting ensures PRs are only commented on when issues are detected, reducing noise..github/workflows/release.yml (1)
23-29: Excellent safeguard against accidental releases.The tag verification step ensures that only tags on the main branch trigger releases, preventing accidental releases from feature branches or forks.
.github/scripts/test_summary.py (1)
9-52: LGTM!The test summary script correctly handles:
- Missing test results file
- XML parsing errors
- Metric extraction with safe defaults
- Appropriate exit codes for CI integration
The logic for calculating passed tests and determining status is sound.
docs/guides/best-practices.md (1)
1-1001: Comprehensive and well-structured best practices guide.This documentation provides excellent coverage of production-ready patterns, from project structure through deployment. The examples are clear and actionable, covering critical topics like reproducibility, performance optimization, and debugging strategies.
docs/examples/image-classification.md (1)
1-712: Excellent end-to-end example with clear progression.This documentation effectively demonstrates both LightningModule and LighterModule approaches, with complete working code and configs. The troubleshooting section and hyperparameter tuning examples are particularly valuable for users.
.github/workflows/ci-full.yml (1)
1-73: Well-structured CI workflow with good coverage.The workflow properly:
- Tests across multiple Python versions with fail-fast disabled
- Generates comprehensive test reports and artifacts
- Uses test summary script for readable output
- Conditionally uploads coverage to Codecov only for Python 3.12
The
fail_ci_if_error: trueon Codecov upload means the workflow will fail if Codecov upload fails (e.g., missing token). This is a strict but defensible choice for ensuring coverage tracking works..github/workflows/check_pull_request_title.yml (1)
7-15: LGTM! Well-configured concurrency and permissions.The concurrency group correctly uses the PR number for grouping, and the permissions follow the principle of least privilege with only the necessary read/write scopes.
docs/guides/custom-code.md (1)
1-827: Excellent comprehensive guide!This custom code guide is thorough, well-structured, and provides practical examples covering the full range of use cases. The troubleshooting section effectively addresses common pitfalls, and the progressive examples (from simple to complete) make it easy to follow.
projects/cifar10/__lighter__.py (1)
1-3: LGTM! Clear marker file for autodiscovery.The comments effectively explain the purpose and usage pattern for the
__lighter__.pymarker file.projects/cifar10/configs/example_overrides.yaml (1)
1-2: LGTM! Clear example overrides.The override syntax correctly demonstrates the
::path navigation pattern documented in the CLI reference.src/lighter/utils/logging.py (1)
10-10: LGTM! Improved type annotations.The addition of explicit
dict[str, Any]typing improves code clarity and type checking without changing behavior.Also applies to: 47-47
src/lighter/callbacks/__init__.py (1)
1-5: Review comment partially unverifiable; CsvWriter documentation gap identified.The verification revealed inconclusive findings regarding the original review comment:
TableWriter: No references found in the codebase, making the removal claim unverifiable. TableWriter may have existed in a prior version, but it's absent from the current searchable repository.
CsvWriter: Confirmed to exist in
src/lighter/callbacks/csv_writer.pywith comprehensive implementation and unit tests. However, CsvWriter has no documentation in the docs/ folder, unlike FileWriter which is documented indocs/guides/training.mdanddocs/reference/cli.md.Migration guide: No MIGRATION.md, CHANGELOG.md, or v3.0 breaking changes documentation was found.
Key gap: CsvWriter is exported in the public API (
__all__) but lacks corresponding documentation. FileWriter has usage examples, but CsvWriter does not..github/workflows/docs-publish.yml (1)
14-22: Verification complete—setup action correctly handles doc dependencies.The setup action exists at
.github/actions/setup/action.ymland properly implements theinstall-groupsparameter by passing it touv sync. The workflow's--only-group docparameter is valid uv syntax, and the 'doc' dependency group is properly defined in pyproject.toml with mkdocs packages. The refactoring is sound.docs/guides/configuration.md (1)
1-580: LGTM! Comprehensive and accurate configuration documentation.The configuration guide is thorough, well-structured, and provides clear examples of all essential Sparkwheel symbols. The distinction between
@(lazy/resolved) and%(eager/raw) references is particularly well explained, which is critical for correct metric configuration..github/actions/setup/action.yml (1)
1-28: LGTM! Well-structured composite action.The setup action follows GitHub Actions best practices:
- Clear input definitions with sensible defaults
- Caching enabled for uv packages
- Conditional dependency installation for flexibility
The string comparison
inputs.install-deps == 'true'is correct for GitHub Actions (inputs are always strings)..github/workflows/publish.yml (3)
9-11: LGTM! Concurrency control added.The concurrency configuration prevents multiple publish workflows from running simultaneously, which is appropriate for package publishing to avoid race conditions.
27-30: LGTM! Setup action usage is correct.Using the new composite setup action with
install-deps: 'false'is appropriate here sinceuv builddoesn't require project dependencies to be installed - only the build tool itself.
43-47: LGTM! Artifact handling improves workflow reliability.Uploading the dist directory as an artifact allows the publish job to run in a separate environment, improving isolation and reproducibility.
src/lighter/__init__.py (1)
11-15: LGTM! Public API correctly updated.The exports correctly reflect the architectural changes:
- ✅ Added
LighterDataModuleandLighterModule(new abstractions)- ✅ Removed
System(breaking change as documented)- ✅ Retained
RunnerNote: The version remains "0.0.5" but the PR objectives mention v3.0 with breaking changes. This may be addressed in a separate version bump commit as noted in the migration checklist.
mkdocs.yml (3)
20-28: LGTM! Enhanced navigation features improve UX.The additional Material theme features provide a better documentation experience:
navigation.instant- SPA-like page loadingnavigation.tracking- URL updates as user scrollssearch.suggestandsearch.highlight- Better search experiencecontent.tabs.link- Synchronized tabs across pagesThese align well with the comprehensive documentation additions in this PR.
72-88: LGTM! Navigation restructure is clear and task-oriented.The new navigation structure (Quick Start → Guides → Examples → Reference → FAQ) provides a logical learning path that aligns with the documented user journey in the PR objectives.
111-111: LGTM! Tables extension added for documentation needs.The
tablesmarkdown extension is needed for reference tables in the new documentation (e.g., the quick reference table in the configuration guide).projects/cifar10/configs/example_autodiscovery.yaml (1)
1-100: Otherwise, LGTM! Comprehensive CIFAR-10 autodiscovery example.The configuration demonstrates key Lighter features:
- ✅ Autodiscovery with
__lighter__.pymarker- ✅ Project references (
project.model.*)- ✅ Proper metric configuration with MetricCollection
- ✅ Reference reuse for dataloaders
- ✅ Complete train/val/test/predict pipeline
docs/guides/lighter-module.md (1)
1-1013: LGTM! Comprehensive and accurate LighterModule guide.Excellent documentation that covers:
- ✅ Clear comparison with LightningModule
- ✅ Multiple complete, runnable examples
- ✅ Accurate API usage matching src/lighter/model.py
- ✅ Common patterns and best practices
- ✅ Integration with Lightning features (hooks, schedulers, mixed precision)
The automatic logging explanation (loss, metrics, optimizer stats) aligns perfectly with the implementation in the relevant code snippets.
projects/cifar10/configs/example.yaml (1)
1-98: CIFAR-10 example config matches the new schema and looks consistentTop-level
trainer,model, anddatablocks are wired coherently, and using%::train_metricsplus%data::val_dataloaderkeeps metrics and dataloaders DRY while avoiding shared mutable state. I don’t see anything that needs changing here.tests/fixtures/plain_lightning_modules.py (1)
1-118: Fixtures exercise key Lightning and LighterModule patterns cleanlyThe dataset and three modules are minimal but well-structured for testing: batch shapes match the linear layers, loss usage is correct, and the dict-based outputs in
MyLighterModuleshould integrate smoothly with Lighter’s step/output normalization. No changes needed.docs/faq.md (1)
1-269: FAQ examples and reference semantics align with the new configuration modelThe Q&A and examples around
::paths plus@vs%references are consistent with the updated config behavior (e.g., cloning metrics vs reusing instantiated objects), and I don’t see any inaccuracies that need code or docs changes here.README.md (1)
116-168: README examples line up well with the new LighterModule/LighterDataModule APIsThe LighterModule YAML and usage examples here match the new configuration patterns (network/criterion/optimizer/metrics) and the LighterDataModule wrapper shown elsewhere. This section does a good job demonstrating the intended “minimal boilerplate” workflow.
src/lighter/callbacks/file_writer.py (1)
21-110: Registry + FileWriter design looks solid.The writer registry and the
FileWriterimplementation are clean and composable: resolving writers by name, handling tensors/CPU transfer, and using(global_rank, world_size)to compute unique per-sample indices is a nice fit for DDP prediction workflows. I don’t see functional issues in this core path.Also applies to: 112-185
src/lighter/model.py (3)
1-79: Clear, user-focused API documentation for LighterModuleThe top-level and class docstrings (including the example) make the responsibilities and return contract of subclasses very explicit and match the v3 “user control + automatic logging” goals. No changes needed here.
225-291: Batch-end hooks and_normalize_outputenforce a clean step-output contractRouting
on_*_batch_endthrough_on_batch_endand_normalize_outputgives you a single place to enforce the “Tensor or dict with optional structured loss” contract, and the'loss'-dict'total'check is a helpful guardrail for multi-component losses. This design keeps callbacks and logging predictable across train/val/test while intentionally skipping predict mode.
293-369: Logging helpers correctly implement dual step/epoch logging and optimizer statsThe
_log_loss,_log_metrics, and_log_optimizer_statshelpers, together with_log, implement a consistent.../stepvs.../epochnaming scheme and DDP-friendlysync_distsettings. Logging optimizer stats only once per epoch atbatch_idx == 0is a good compromise between insight and overhead. I don’t see any functional issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
.codecov.yml (2)
39-39: Expand docs ignore pattern for consistency.The pattern
"docs/*"only matches top-level files in thedocs/directory and won't ignore nested documentation files. For consistency with the test pattern on line 37 ("tests/**/*"), use"docs/**/*"to ignore all documentation paths recursively.- - "docs/*" # Documentation + - "docs/**/*" # Documentation
14-16: Clarify comment on threshold behavior.The comment on line 14 states "only fail if below target," but line 16 sets
informational: false, which means the check will fail if below the 90% target. Thethreshold: nullsetting is correct—it means there's no threshold modifier beyond the target itself. Clarify the comment to avoid confusion:- threshold: null # No threshold - only fail if below target + threshold: null # No additional threshold; fail check if below target
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.codecov.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: types
- GitHub Check: Tests (Python 3.12)
🔇 Additional comments (1)
.codecov.yml (1)
4-49: Coverage configuration aligns well with redesign objectives.The configuration sets appropriate targets (90% project, 80% patch) given the substantial refactor outlined in the PR. Ignoring test files, docs, cache, and config fixtures is sensible. GitHub Checks annotations will provide clear feedback on changed files.
A few minor nitpicks above aside, the overall structure and values are sound for tracking coverage through this major redesign.
Since the PR mentions new CI workflows uploading coverage, verify that your
.github/workflows/ci.ymland.github/workflows/ci-full.ymlare properly configured to collect and upload coverage reports so Codecov has data to process against this configuration.
Configuration improvements: - Clarify codecov threshold comment for better understanding - Use wildcard pattern for docs coverage ignore (docs/**/* instead of docs/*) - Add test artifact ignores (config.yaml files in root and project directories) These changes improve coverage reporting accuracy and prevent test-generated configuration files from being accidentally committed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Major improvements to Runner class: - Remove OutputDir class; save config directly to trainer.log_dir instead - Add _save_config() method that integrates with Lightning's logging system - Improve workflow documentation with detailed step-by-step docstrings - Refactor CLI to use shared add_common_args() function (DRY principle) - Change CLI boolean flags to action="store_true" for better UX - Remove unused datetime and dataclass imports - Remove config parameter from _execute() (no longer needed) Benefits: - Simpler architecture: leverages PyTorch Lightning's built-in directory management - Better integration: config.yaml appears alongside checkpoints in version directories - Cleaner CLI: --verbose instead of --verbose True - More maintainable: shared argument definitions prevent drift 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add safety mechanism to prevent infinite loops when replacement samples are also corrupted: - Add max_retries parameter (default: 100) to limit retry iterations - Raise RuntimeError with detailed message if limit is exceeded - Track iteration count separately from corrupted sample count - Update docstring with parameter documentation and exception info This prevents the collate function from hanging indefinitely when datasets have high corruption rates or systematic corruption issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Remove projects folder restrictions (now tracked) - Add .claude/, lightning_logs/, .datasets/, *.zip to ignores 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Register the module in sys.modules before calling exec_module to properly handle circular imports within dynamically loaded modules. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…pickler Rewrites dynamic_imports module to work with PyTorch DataLoader using spawn multiprocessing strategy (num_workers > 0). The key insight is that cloudpickle serializes user classes by value, but ForkingPickler has special reducers for multiprocessing internals (pipes, queues, fds). Changes: - Add _HybridPickler combining ForkingPickler reducers + cloudpickle dispatch - Add _ModuleRegistry and _DynamicModuleFinder for submodule resolution - Remove deprecated OPTIONAL_IMPORTS pattern - Add cloudpickle>=3.0.0 dependency Tested with all 8 projects using spawn multiprocessing successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
docs/examples/image-classification.md (1)
195-215: Minor inconsistency: Missingdownload: trueandpparameter across examples.In the LightningModule example (line 198), the train dataset includes
download: true. In the LighterModule example (line 378), this parameter is omitted. Similarly, the RandomHorizontalFlip at line 207–208 explicitly setsp: 0.5, while line 388 omits it (relying on default).These are functionally safe (defaults work fine), but the inconsistency may confuse readers trying to understand best practices. Consider making both examples consistent.
If harmonizing, add explicit parameters to the LighterModule section:
dataset: _target_: torchvision.datasets.CIFAR10 root: ./data train: true + download: true transform: _target_: torchvision.transforms.Compose transforms: - _target_: torchvision.transforms.RandomCrop size: 32 padding: 4 - _target_: torchvision.transforms.RandomHorizontalFlip + p: 0.5Also applies to: 378-393
docs/guides/configuration.md (1)
17-17: Add language specifiers to fenced code blocks.Code blocks should declare their language for proper syntax highlighting. Update lines 17, 38, 74, 409, 519 (and similar patterns throughout) to specify the language:
-``` +```yamlThis improves documentation rendering and code highlighting across platforms.
Also applies to: 38-38, 74-74, 409-409, 519-519
docs/guides/best-practices.md (1)
17-17: Add language specifiers to fenced code blocks.Consistent with the configuration guide, specify language on all code blocks for proper syntax highlighting:
-``` +```yamlor
-``` +```pythonThis pattern appears throughout the file (lines 17, 38, 74, 409, 519, 696).
Also applies to: 38-38, 74-74, 409-409, 519-519, 696-696
docs/guides/custom-code.md (1)
17-17: Add language specifiers to fenced code blocks.Consistent with other guides, add language specifiers to all code blocks (lines 17, 38, 74, 409, 519, 638, 696):
-``` +```pythonor
-``` +```yamlThis ensures consistent syntax highlighting across the documentation.
Also applies to: 38-38, 74-74, 409-409, 519-519, 638-638, 696-696
src/lighter/utils/dynamic_imports.py (1)
152-192: Consider the log level for already-imported modules.The function implementation is correct and handles all edge cases appropriately. However, line 171 logs a warning when a module is already imported. If this function is called multiple times intentionally (e.g., in testing or initialization checks), the warning could be noisy. Consider using
logger.debuginstead oflogger.warningfor this case.Apply this diff to adjust the log level:
- logger.warning(f"Module '{module_name}' already imported, skipping.") + logger.debug(f"Module '{module_name}' already imported, skipping.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
.gitignore(1 hunks)docs/examples/image-classification.md(1 hunks)docs/guides/best-practices.md(1 hunks)docs/guides/configuration.md(1 hunks)docs/guides/custom-code.md(1 hunks)pyproject.toml(2 hunks)src/lighter/utils/dynamic_imports.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .gitignore
- pyproject.toml
🧰 Additional context used
🪛 LanguageTool
docs/guides/best-practices.md
[grammar] ~525-~525: Use a hyphen to join words.
Context: ...) ### Use Structured Logging Group related metrics: python # Good - gro...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
docs/examples/image-classification.md
24-24: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
246-246: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/best-practices.md
17-17: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
74-74: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
519-519: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
638-638: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
638-638: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
696-696: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/configuration.md
17-17: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
74-74: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
519-519: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/guides/custom-code.md
17-17: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
74-74: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
519-519: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
638-638: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
638-638: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
696-696: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (11)
docs/examples/image-classification.md (2)
1-420: Excellent comprehensive example aligning with PR redesign objectives.The example demonstrates both traditional LightningModule and new LighterModule approaches end-to-end, with clear narrative progression. Configuration examples correctly use the new
project.models.*pattern with__lighter__.pymarker-based discovery, and validate proper use of Lighter's YAML-driven config system. Code snippets are syntactically sound and best-practice idiomatic. Troubleshooting section and "Next Steps" add practical value. Well-organized for learners at multiple levels.
594-594: Verify GitHub repository reference is correct.Line 594 references the Lighter GitHub repository location for the complete CIFAR-10 example. Confirm this path exists on the target branch (main) and that the example code is actually committed there, or update the reference.
docs/guides/configuration.md (1)
1-590: Excellent comprehensive guide with clear structure and accurate examples.The guide effectively teaches Sparkwheel's 5 essential symbols, progresses logically from basics to advanced patterns, and includes critical safety guidance (metrics via
%not@). Examples are concrete, the distinction between::(config navigation) and.(Python attributes) is well-emphasized, and the quick reference table is a valuable resource. Aligns well with PR objectives for Sparkwheel-based configuration.docs/guides/best-practices.md (1)
1-1010: Excellent battle-tested best practices guide covering full ML lifecycle.This guide is comprehensive and practical, covering development, training, debugging, reproducibility, testing, performance optimization, and production deployment. Project structure guidance correctly highlights
__lighter__.pymarker and demonstrates config composition via base+experiment pattern. Logging section properly emphasizes dual logging (step + epoch) which pairs well with LighterModule. Safety guidance is explicit (metrics via%not@,.eval()for inference, batch_size in logging). Checklists are valuable for preventing oversights. The guide bridges configuration system and practical implementations effectively.docs/guides/custom-code.md (1)
1-828: Excellent comprehensive guide for custom code integration with marker-based auto-discovery.The guide clearly explains the
__lighter__.pymarker pattern central to the PR's redesign, progressing from quick examples to complete working projects. Project structure guidance is explicit about__init__.pyrequirements. Custom component examples (models, datasets, transforms, LightningModule) are realistic and production-grade. Troubleshooting section effectively addresses common pain points (ModuleNotFoundError, ImportError, AttributeError) with clear root causes and solutions. Common patterns (separation of concerns, shared base classes, config inheritance) demonstrate best practices. The end-to-end CIFAR-10 example ties everything together convincingly. Quick reference provides easy syntax lookup.src/lighter/utils/dynamic_imports.py (6)
36-53: LGTM!The registry implementation correctly maps module names to paths and provides proper prefix matching for submodule resolution.
56-93: LGTM!The meta-path finder correctly integrates with Python's import system and properly handles both package and module resolution paths.
95-108: LGTM!The loader correctly executes modules and registers them with cloudpickle for by-value serialization, ensuring multiprocessing compatibility.
147-149: The global patching ofmultiprocessing.reduction.dumpis not an undocumented side effect but the core intended functionality of this module, which is fully explained in the module docstring. The docstring explicitly describes the hybrid pickler strategy and why the multiprocessing.reduction module must be patched. The module initialization comment at line 147 ("# Module initialization: install finder and patch multiprocessing") also makes this explicit. No changes are needed.Likely an incorrect or invalid review comment.
30-30: The cloudpickle dependency is already appropriately constrained (cloudpickle>=3.0.0in pyproject.toml), which is compatible with Python 3.10+ and multiprocessing spawn mode. No version compatibility issues exist.
128-139: Acknowledged design trade-off: private attribute access is protected with defensive fallback.The
_extra_reducersattribute access on line 132 is already documented as a runtime-only private detail in the code comment (line 131). The implementation mitigates fragility by usinggetattrwith a safe default fallback. TheCloudPicklerinstantiation on line 138 appears intentional for maintaining independent pickler state rather than a performance concern. This hybrid approach is a justified design trade-off to preserve ForkingPickler's multiprocessing internal reducers while enabling cloudpickle's by-value serialization for dynamic imports.
| ``` | ||
| cifar10/ | ||
| ├── __lighter__.py # Marker file (enables project.* imports) | ||
| ├── __init__.py | ||
| ├── models.py # Model definitions | ||
| ├── data.py # Data utilities (optional) | ||
| ├── configs/ | ||
| │ ├── resnet18.yaml # ResNet-18 config | ||
| │ ├── resnet50.yaml # ResNet-50 config | ||
| │ └── efficientnet.yaml # EfficientNet config | ||
| └── outputs/ # Generated by Lighter | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specification to fenced code block (directory structure).
Code fences should explicitly specify a language. The directory structure block is text/plaintext.
Apply this fix:
-```
+```text
cifar10/🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
24-24: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In docs/examples/image-classification.md around lines 24 to 35 the fenced code
block showing the directory structure lacks a language specifier; change the
opening fence from ``` to ```text (or ```plaintext) so the block is explicitly
marked as plain text, leaving the content unchanged.
| ``` | ||
| Epoch 0: 100%|████████| 391/391 [00:45<00:00, 8.58it/s, loss=1.82, train/acc=0.335, v_num=0] | ||
| Validation: 100%|████████| 79/79 [00:05<00:00, 14.23it/s] | ||
| Epoch 1: 100%|████████| 391/391 [00:44<00:00, 8.76it/s, loss=1.45, train/acc=0.472, v_num=0] | ||
| ... | ||
| Epoch 99: 100%|████████| 391/391 [00:43<00:00, 8.98it/s, loss=0.23, train/acc=0.921, v_num=0] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specification to fenced code block (expected output).
The expected output block should specify a language (typically text or bash for terminal output).
Apply this fix:
-```
+```text
Epoch 0: 100%|████████| 391/391 [00:45<00:00, 8.58it/s, loss=1.82, train/acc=0.335, v_num=0]🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
246-246: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In docs/examples/image-classification.md around lines 246 to 252, the fenced
code block showing expected terminal output lacks a language tag; update the
opening triple backticks to include a language such as text (e.g., ```text) so
the block is rendered with correct syntax/formatting for plain terminal output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pyproject.toml (1)
84-106: Mypy configuration change—relax type-checking severity.Line 92 previously had
disallow_any_generics = true(removed). This relaxation acknowledges the complexity of type-hinting the new dynamic module loading patterns and configuration system but may hide legitimate type issues. Consider documenting why this strictness was removed and whether specific modules should retain stricter checking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
pyproject.toml(2 hunks)
🔇 Additional comments (3)
pyproject.toml (3)
41-41: New cloudpickle dependency is appropriate.The addition of
cloudpickle>=3.0.0aligns with the architectural refactor (particularly the ProjectImporter and dynamic module loading patterns mentioned in the PR) where native pickling support for dynamic classes and functions is beneficial.
73-73: Type stubs addition is sound.Adding
types-PyYAML>=6.0.0to the types group provides proper type hints for YAML operations, supporting the stricter mypy checks enabled elsewhere in the config.
81-81: CI/test integration dependency is reasonable.The addition of
pytest-github-actions-annotate-failures>=0.2.0improves test reporting in GitHub Actions, aligning with the expanded test infrastructure present in this PR.
… in docstrings; update Freezer to support LightningModule and LighterModule out-of-the-box
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/lighter/callbacks/file_writer.py (3)
21-66: WriterRegistry API is clean; consider exposing available writers for diagnosticsThe
WriterRegistryimplementation (decorator-basedregisterwith duplicate-name protection andgetwith a clear error) looks good and matches the intended usage inFileWriter.One small enhancement you might consider: add a
keys()method or include the list of available writers in theget()error message. Currently, when a misconfigured writer name is used, the error message only states the name is not registered without showing the available options (tensor, image_2d, image_3d, text), making it harder for users to diagnose the issue.
71-110: Document CPU tensor expectation and ensure safe reshaping in writer functionsThe built-in writers are straightforward implementations, but two minor improvements would enhance robustness:
write_tensorand the image writers assume CPU tensors. WhileFileWriter._prepare_valuecallsdetach().cpu()in normal callback flow, direct external use of these functions could receive GPU tensors. A docstring note would clarify the expectation (e.g., "expects CPU tensors; usetensor.detach().cpu()before calling if needed").
write_image_3dusestensor.view(...)which fails if the tensor is non-contiguous. Use.reshape()instead or call.contiguous()first:shape = tensor.shape - tensor = tensor.view(shape[0], shape[1] * shape[2], shape[3]) + tensor = tensor.reshape(shape[0], shape[1] * shape[2], shape[3])
153-203: Add comprehensive tests for FileWriter edge cases and DDP indexingThe
setup()andwrite()methods correctly implement:
- Stage gating and directory validation
- DDP-safe global indexing with
_counter = global_rankand_step = world_size, ensuring indices interleave across ranks without collisions- Sequence conversion with length validation between value and name keys
- Per-sample file writing with atomic directory creation
To strengthen test coverage, add tests covering:
Edge cases: calling
write()beforesetup(), empty sequences fromvalue_key, and value/name length mismatches. These paths currently lack automated coverage (notably line 156).DDP simulation: Verify global indexing under multi-rank scenarios (e.g., simulate two ranks with different
_counter/_stepvalues) to prevent unintended regressions in the naming scheme during future refactoring.src/lighter/callbacks/csv_writer.py (1)
85-127: Add debug logging to CsvWriter.write() early returns and verify test coverage for logging behaviorThe code correctly handles edge cases (empty batches, missing writer, length mismatches) but has two silent early returns:
- Line 86–87:
if self._csv_writer is None(non‑predict stages or missing setup)- Line 102–103:
if num_samples == 0(no matching keys or all empty)These would benefit from debug logging to help diagnose misconfigurations. The
BaseWriterbase class already importsloguru.logger, so add similar logging here.Regarding tests: The existing test suite (
tests/unit/test_callbacks_writers.py) already covers all the scenarios mentioned:
test_write_non_sequence_values(line 587) covers all‑scalar casetest_write_inconsistent_lengths(line 494) covers length mismatchestest_write_missing_key(line 507) covers missing keystest_write_empty_batch(line 576) andtest_on_predict_epoch_end_before_setup(line 618) cover the silent return pathsHowever, none verify that logging occurs. Adding tests that assert debug logs are emitted during the silent returns would complete the coverage picture.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/lighter/callbacks/csv_writer.py(1 hunks)src/lighter/callbacks/file_writer.py(1 hunks)src/lighter/callbacks/freezer.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lighter/callbacks/freezer.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/lighter/callbacks/csv_writer.py (3)
src/lighter/callbacks/base_writer.py (3)
BaseWriter(13-75)setup(37-55)write(27-35)src/lighter/utils/types/enums.py (1)
Stage(14-18)src/lighter/callbacks/writer/table.py (3)
TableWriter(17-72)on_predict_epoch_end(46-72)write(36-44)
🪛 GitHub Check: codecov/patch
src/lighter/callbacks/csv_writer.py
[warning] 51-51: src/lighter/callbacks/csv_writer.py#L51
Added line #L51 was not covered by tests
[warning] 87-87: src/lighter/callbacks/csv_writer.py#L87
Added line #L87 was not covered by tests
[warning] 148-148: src/lighter/callbacks/csv_writer.py#L148
Added line #L148 was not covered by tests
src/lighter/callbacks/file_writer.py
[warning] 156-156: src/lighter/callbacks/file_writer.py#L156
Added line #L156 was not covered by tests
🔇 Additional comments (3)
src/lighter/callbacks/csv_writer.py (1)
128-162: No action needed — implementation and test coverage are completeThe distributed temp-file aggregation is sound. The current guard using
dist.is_initialized()is the standard PyTorch pattern used consistently in similar code (e.g.,file_writer.pycheckstrainer.world_size > 1). Addingdist.is_available()is defensive overkill sincetorch.distributedis imported unconditionally anddist.is_initialized()gracefully handles both single-process and multi-process contexts.Comprehensive tests already cover all scenarios:
- Single-process case:
test_csv_writer_merge_without_dist- Multi-process case with concatenation and cleanup:
test_csv_writer_merge_with_dist- Edge cases (empty ranks, None paths):
test_csv_writer_handles_empty_rankandtest_csv_writer_handles_none_paths- Line 148 (
if not dfs: return) is exercised intest_csv_writer_handles_empty_rankLikely an incorrect or invalid review comment.
src/lighter/callbacks/file_writer.py (2)
204-230: Length consistency check is already implemented; the concerns about tensor and sequence handling are well-addressedThe helper methods effectively handle tensors (detached and moved to CPU) and convert various sequence types (lists, tuples, and generic
Sequenceobjects likerange) into uniform lists. The code also correctly excludes strings, bytes, and bytearray from sequence iteration.The length consistency check between
value_keyandname_keyoutputs is already present (raisesValueErrorwith a clear mismatch message), so no additional validation is needed there. The support for arbitrarySequencetypes beyond plain Python containers is tested and working as intended.
133-152: BaseWriter has already been refactored to a single-parameter signature—no mismatch exists.
BaseWriter.__init__currently accepts onlypath(persrc/lighter/callbacks/base_writer.py), andFileWriter.__init__correctly callssuper().__init__(directory)with that single argument. The constructor works as intended; no refactoring is needed.Likely an incorrect or invalid review comment.
| def __init__(self, path: str | Path, keys: list[str]) -> None: | ||
| super().__init__(path) | ||
| self.keys = keys | ||
| self._temp_path: Path | None = None | ||
| self._csv_writer: Any = None # csv.writer type is not easily annotated | ||
| self._csv_file: TextIOWrapper | None = None | ||
|
|
||
| def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None: | ||
| super().setup(trainer, pl_module, stage) | ||
| if stage != Stage.PREDICT: | ||
| return | ||
|
|
||
| # Create a temporary file for writing predictions | ||
| self._temp_path = self.path.with_suffix(f".tmp_rank{trainer.global_rank}{self.path.suffix}") | ||
| self._csv_file = open(self._temp_path, "w", newline="") | ||
| self._csv_writer = csv.writer(self._csv_file) | ||
| # Write header | ||
| self._csv_writer.writerow(self.keys) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd src/lighter/callbacks && pwd && ls -laRepository: project-lighter/lighter
Length of output: 531
🏁 Script executed:
cat -n src/lighter/callbacks/base_writer.pyRepository: project-lighter/lighter
Length of output: 3324
🏁 Script executed:
cat -n src/lighter/callbacks/csv_writer.py | head -70Repository: project-lighter/lighter
Length of output: 3011
🏁 Script executed:
rg "class.*BaseWriter|def __init__" src/lighter/callbacks/ -A 3 -B 1Repository: project-lighter/lighter
Length of output: 2601
Remove redundant stage check in CsvWriter.setup()
Lines 50–51 in CsvWriter.setup() repeat the stage filtering already performed by BaseWriter.setup(). Delete the inner check:
def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None:
super().setup(trainer, pl_module, stage)
# Create a temporary file for writing predictions
self._temp_path = self.path.with_suffix(f".tmp_rank{trainer.global_rank}{self.path.suffix}")
self._csv_file = open(self._temp_path, "w", newline="")
self._csv_writer = csv.writer(self._csv_file)
# Write header
self._csv_writer.writerow(self.keys)This eliminates the dead code branch and improves coverage.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 51-51: src/lighter/callbacks/csv_writer.py#L51
Added line #L51 was not covered by tests
🤖 Prompt for AI Agents
In src/lighter/callbacks/csv_writer.py around lines 41 to 59, remove the
redundant "if stage != Stage.PREDICT: return" check inside CsvWriter.setup()
because BaseWriter.setup() already performs stage filtering; update the method
to call super().setup(trainer, pl_module, stage) and then immediately create the
temp file, open the CSV file, create the csv.writer, and write the header
(self._csv_writer.writerow(self.keys)), leaving all other logic unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/lighter/callbacks/freezer.py (1)
71-79: Consider validatinguntil_stepanduntil_epochto prevent edge cases.If
until_step=0oruntil_epoch=0, the unfreeze condition is immediately true on the first batch, but since_frozen_stateisFalse, nothing happens and the model never gets frozen (the callback becomes a no-op). While this configuration is unlikely, adding validation in__init__to ensureuntil_step > 0anduntil_epoch > 0would prevent confusion.Example validation:
if until_step is not None and until_epoch is not None: raise ValueError("Only one of `until_step` or `until_epoch` can be specified.") + +if until_step is not None and until_step <= 0: + raise ValueError("`until_step` must be greater than 0.") +if until_epoch is not None and until_epoch <= 0: + raise ValueError("`until_epoch` must be greater than 0.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lighter/callbacks/freezer.py(2 hunks)
🔇 Additional comments (1)
src/lighter/callbacks/freezer.py (1)
8-8: LGTM! Import aligns with Lightning-native architecture.The transition from System to LightningModule aligns with the PR objectives of embracing Lightning-native patterns.
…ion steps fix(csv_writer): correct docstring for saving temporary file refactor(runner): improve logging message for project module import feat(dynamic_imports): enhance module import handling with error checks test(tests): add tests for error handling in Runner and dynamic imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
src/lighter/engine/runner.py (1)
263-283:--weights_onlyadded to incompatible stages (validate/test).The
add_common_argsfunction adds--weights_onlyto all stage parsers, but PyTorch Lightning'sTrainer.validate()andTrainer.test()methods do not accept this parameter. Runninglighter validate config.yaml --weights_onlywill cause:TypeError: validate() got an unexpected keyword argument 'weights_only'The comment on line 240 ("argparse subparsers ensure only valid parameters") is incorrect—
add_common_argsaddsweights_onlyuniversally.Move
--weights_onlyto only the stages that support it (fit, predict):def add_common_args(stage_parser): """Add common arguments to a stage subparser.""" stage_parser.add_argument( "inputs", nargs="+", help="Config files and overrides. Example: config.yaml model::optimizer::lr=0.001", ) stage_parser.add_argument( "--ckpt_path", type=str, default=None, help='Path to checkpoint. Can be "last", "best", or a file path.', ) - stage_parser.add_argument( - "--weights_only", - action="store_true", - default=None, - help="Load only weights from checkpoint (security option).", - ) + +def add_weights_only_arg(stage_parser): + """Add weights_only argument (only for fit/predict).""" + stage_parser.add_argument( + "--weights_only", + action="store_true", + default=None, + help="Load only weights from checkpoint (security option).", + )Then call
add_weights_only_argonly forfit_parserandpredict_parser.src/lighter/callbacks/csv_writer.py (1)
50-51: Remove redundant stage check.These lines duplicate the stage filtering already performed in
BaseWriter.setup(). As noted in the previous review, this check is unreachable dead code.Apply this diff to remove the dead code:
def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None: super().setup(trainer, pl_module, stage) - if stage != Stage.PREDICT: - return # Create a temporary file for writing predictions
🧹 Nitpick comments (5)
projects/cifar10/model.py (1)
57-61: Consider flexible batch unpacking for predict mode.The method unpacks
yfrom the batch but doesn't use it. While this works if the dataloader always provides labels, it could be more flexible:Apply this diff to handle both labeled and unlabeled batches:
def predict_step(self, batch, batch_idx): """Prediction step - return dict for FileWriter compatibility.""" - x, y = batch + x = batch[0] if isinstance(batch, (tuple, list)) else batch pred = self(x) return {"pred": pred}Or if labels are always present but intentionally unused, consider clarifying with:
def predict_step(self, batch, batch_idx): """Prediction step - return dict for FileWriter compatibility.""" - x, y = batch + x, _ = batch # Labels provided but unused in prediction pred = self(x) return {"pred": pred}src/lighter/utils/dynamic_imports.py (3)
35-53:_ModuleRegistryis not thread-safe.The
_modulesdictionary is accessed without synchronization. While Python's GIL protects against data corruption, concurrentregisterandfind_rootoperations from multiple threads (e.g., during parallel imports) could yield inconsistent views.For this use case (typically single-threaded import at startup), this is likely acceptable, but worth noting if concurrent dynamic imports are ever needed.
131-142: Minor:CloudPicklerinstantiated perreducer_overridecall.Each call to
reducer_overridecreates a newCloudPickler(BytesIO())instance. For large object graphs with many user-defined classes, this adds overhead. Consider caching or reusing a single instance if profiling shows this as a bottleneck.For typical checkpoint/DataLoader use cases, this is likely negligible.
150-152: Global side effects on module import.Importing this module globally patches
sys.meta_pathandmultiprocessing.reduction.dump. This is intentional for DataLoader worker support but has implications:
- The finder is inserted at position 0, taking priority over all other finders
- The
reduction.dumppatch affects all multiprocessing in the processThis design is appropriate for the use case, but document these effects prominently (perhaps in the package's
__init__.pyor README) so users understand the behavior when importinglighter.src/lighter/callbacks/csv_writer.py (1)
89-103: Consider simplifying the num_samples inference logic.The nested conditionals for determining
num_sampleswork correctly but could be more readable. The logic tries to find the first sequence-type key, and if none exists, assumes a single sample.Here's a clearer alternative:
-# Determine the number of samples in the batch. -num_samples = 0 -for key in self.keys: - if key in outputs: - length = self._get_sequence_length(outputs[key]) - if length is not None: - num_samples = length - break - else: - # If it's not a sequence type we handle, assume it's a single sample - if num_samples == 0: - num_samples = 1 - -if num_samples == 0: - return +# Determine the number of samples by finding the first sequence-type output +num_samples = None +for key in self.keys: + if key in outputs: + length = self._get_sequence_length(outputs[key]) + if length is not None: + num_samples = length + break + +# If no sequence found, assume single sample; if no keys in outputs, skip +if num_samples is None: + num_samples = 1 if any(key in outputs for key in self.keys) else 0 + +if num_samples == 0: + returnThis makes the intent clearer: find a sequence length, or default to 1 if outputs exist but aren't sequences.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
projects/cifar10/model.py(1 hunks)src/lighter/callbacks/csv_writer.py(1 hunks)src/lighter/engine/runner.py(4 hunks)src/lighter/utils/dynamic_imports.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/lighter/callbacks/csv_writer.py (2)
src/lighter/callbacks/base_writer.py (3)
BaseWriter(13-75)setup(37-55)write(27-35)src/lighter/utils/types/enums.py (1)
Stage(14-18)
src/lighter/engine/runner.py (3)
src/lighter/utils/dynamic_imports.py (2)
import_module_from_path(155-200)get(44-46)src/lighter/utils/types/enums.py (1)
Stage(14-18)tests/unit/test_utils_misc.py (1)
method(48-49)
🪛 GitHub Check: codecov/patch
src/lighter/utils/dynamic_imports.py
[warning] 94-94: src/lighter/utils/dynamic_imports.py#L94
Added line #L94 was not covered by tests
[warning] 137-137: src/lighter/utils/dynamic_imports.py#L137
Added line #L137 was not covered by tests
[warning] 147-147: src/lighter/utils/dynamic_imports.py#L147
Added line #L147 was not covered by tests
src/lighter/callbacks/csv_writer.py
[warning] 51-51: src/lighter/callbacks/csv_writer.py#L51
Added line #L51 was not covered by tests
[warning] 87-87: src/lighter/callbacks/csv_writer.py#L87
Added line #L87 was not covered by tests
[warning] 148-148: src/lighter/callbacks/csv_writer.py#L148
Added line #L148 was not covered by tests
src/lighter/engine/runner.py
[warning] 62-62: src/lighter/engine/runner.py#L62
Added line #L62 was not covered by tests
🔇 Additional comments (14)
projects/cifar10/model.py (4)
1-4: LGTM!Clean import structure and clear module documentation.
13-31: Well-structured training step implementation.The implementation demonstrates good practices: defensive checks for required components (criterion), optional metrics handling, and dict-based returns for framework compatibility. The logic is clear and maintainable.
33-44: LGTM!Validation step follows the same solid pattern as training_step with appropriate defensive checks and consistent return structure.
46-55: LGTM!Test step appropriately omits loss computation while maintaining consistent return structure for predictions and targets.
src/lighter/utils/dynamic_imports.py (2)
59-95: LGTM!The meta path finder correctly handles both package and module resolution. The
_resolve_pathmethod properly tries the package path first (__init__.py) before falling back to a module file.
98-110: LGTM!The loader correctly delegates execution to
SourceFileLoaderand registers the module with cloudpickle for by-value serialization, enabling proper multiprocessing support.src/lighter/engine/runner.py (4)
23-40: LGTM!Clean implementation of marker-based project discovery. The hardcoded
"project"module name aligns with the PR objective of enablingproject.*references in YAML configs.
43-62: LGTM!The loader correctly delegates input type detection to Sparkwheel. The error re-wrapping provides clear messaging.
154-171: Verify dataloader method check is sufficient.The check uses
any()to see if the model has any dataloader method. A model with onlytrain_dataloaderwould pass this check, but would fail at runtime ifvalidatestage is requested withoutval_dataloader.This is likely acceptable since PyTorch Lightning will raise a clear error, but consider whether a stage-aware check would provide better early feedback.
188-219: LGTM!Good defensive checks for logger and log_dir existence. The
yaml.dumpoptions preserve config readability.src/lighter/callbacks/csv_writer.py (4)
1-18: LGTM!The imports are well-organized and all necessary for the CSV writing functionality with distributed support.
20-46: LGTM!The class initialization is clean, with appropriate type hints and comprehensive documentation including a YAML configuration example.
60-83: LGTM!The helper methods correctly handle various data types (lists, tuples, tensors including scalars) with appropriate conversions to Python types for CSV writing.
128-162: LGTM!The distributed aggregation logic is well-implemented, correctly using
all_gather_objectto collect temporary file paths across ranks, with rank 0 performing the concatenation and cleanup. The fallback for non-distributed execution (lines 139-142) ensures compatibility with single-GPU setups.
src/lighter/engine/runner.py
Outdated
| # Execute the stage method with CLI kwargs | ||
| # Note: argparse subparsers ensure only valid parameters for each stage are passed | ||
| stage_method = getattr(trainer, str(stage)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading comment.
The comment "argparse subparsers ensure only valid parameters for each stage are passed" is inaccurate. The subparsers don't validate parameter compatibility with Trainer methods—they only define which CLI flags are available. The --weights_only flag is currently added to all stages via add_common_args.
Remove or correct this comment after fixing the --weights_only issue above.
🤖 Prompt for AI Agents
In src/lighter/engine/runner.py around lines 239-241, the existing comment
claiming "argparse subparsers ensure only valid parameters for each stage are
passed" is misleading; replace it with a brief, accurate comment stating that
argparse subparsers only control which CLI flags are available and do not
validate compatibility with Trainer method signatures, and then remove the
incorrect assurance. Also, ensure the related --weights_only flag is not
indiscriminately added to all stages (it should be registered only for the
stage(s) that support it rather than via add_common_args) — if currently added
globally, move its registration to the appropriate per-stage subparser or update
add_common_args to accept a per-stage whitelist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/lighter/utils/dynamic_imports.py (1)
150-152: Global side effects at import time.The module installs a meta path finder and patches
multiprocessing.reduction.dumpwhen imported. While necessary for the functionality and documented in the module docstring, be aware that this affects all subsequent imports and multiprocessing usage in the process.Consider documenting any potential interactions with other libraries that might also customize import mechanisms or multiprocessing serialization.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lighter/utils/dynamic_imports.py(1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/lighter/utils/dynamic_imports.py
[warning] 94-94: src/lighter/utils/dynamic_imports.py#L94
Added line #L94 was not covered by tests
[warning] 137-137: src/lighter/utils/dynamic_imports.py#L137
Added line #L137 was not covered by tests
[warning] 147-147: src/lighter/utils/dynamic_imports.py#L147
Added line #L147 was not covered by tests
[warning] 199-199: src/lighter/utils/dynamic_imports.py#L199
Added line #L199 was not covered by tests
[warning] 201-202: src/lighter/utils/dynamic_imports.py#L201-L202
Added lines #L201 - L202 were not covered by tests
🔇 Additional comments (1)
src/lighter/utils/dynamic_imports.py (1)
155-173: No action needed — cloudpickle API is stable and correctly versioned.The
cloudpickle.register_pickle_by_value()function is a documented, stable API available in cloudpickle 3.0.0 and later. The project's dependency constraint (cloudpickle>=3.0.0in pyproject.toml) is appropriate. Both usages at lines 110 and 197 correctly pass module objects, which is the expected argument type. No version-specific compatibility issues exist.
| def _resolve_path(self, fullname: str, root_name: str, root_path: Path) -> Path | None: | ||
| """Resolve 'project.models.net' to '/path/to/project/models/net.py'.""" | ||
| if fullname == root_name: | ||
| return root_path / "__init__.py" | ||
|
|
||
| Args: | ||
| module_name: Name of the module to import. | ||
| relative = fullname[len(root_name) + 1 :].replace(".", "/") | ||
|
|
||
| Raises: | ||
| ImportError: If the module is not available. | ||
| # Try package first, then module | ||
| package_init = root_path / relative / "__init__.py" | ||
| if package_init.is_file(): | ||
| return package_init | ||
| return root_path / f"{relative}.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding test coverage for edge cases.
The _resolve_path method has a branch (line 94-95) that returns a .py file path when no __init__.py exists. This path is not covered by tests according to static analysis.
While the logic is correct (the caller checks existence), adding a test case for importing a module (not a package) would improve reliability.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 94-94: src/lighter/utils/dynamic_imports.py#L94
Added line #L94 was not covered by tests
| def reducer_override(self, obj: Any) -> Any: | ||
| """Use cloudpickle's reducer for objects not in dispatch_table.""" | ||
| # If it's in ForkingPickler's extra reducers, let standard dispatch handle it | ||
| # _extra_reducers is a class attribute that exists at runtime but isn't typed | ||
| extra_reducers = cast(dict[type, Any], getattr(ForkingPickler, "_extra_reducers", {})) | ||
| if type(obj) in extra_reducers: | ||
| return NotImplemented | ||
|
|
||
| # For everything else, try cloudpickle's reducer_override | ||
| # This handles dynamically imported classes registered with register_pickle_by_value | ||
| pickler = cloudpickle.CloudPickler(BytesIO()) | ||
| return pickler.reducer_override(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test coverage missing for multiprocessing internals path.
Line 137 returns NotImplemented when an object's type is in ForkingPickler._extra_reducers. This is a critical path that preserves ForkingPickler's special handling for multiprocessing objects (pipes, queues, connections). The lack of test coverage could hide regressions.
Consider adding a test that pickles a multiprocessing object (e.g., a Queue or Pipe) to verify this path works correctly.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 137-137: src/lighter/utils/dynamic_imports.py#L137
Added line #L137 was not covered by tests
🤖 Prompt for AI Agents
In src/lighter/utils/dynamic_imports.py lines 131-142, add a unit test that
exercises the ForkingPickler._extra_reducers branch: create a multiprocessing
object (e.g., multiprocessing.get_context('spawn').Queue() or a Pipe
connection), call reducer_override on that object and assert it returns
NotImplemented, then verify the object can still be pickled using the standard
forking/multiprocessing path (e.g., pickle.dumps with the appropriate
ForkingPickler/context or actually passing the object through a child process)
to ensure the special multiprocessing handling remains intact.
| try: | ||
| spec.loader.exec_module(module) | ||
| cloudpickle.register_pickle_by_value(module) | ||
| _registry.register(module_name, module_path) | ||
| except Exception: | ||
| # Clean up on failure so retry sees a clean state | ||
| sys.modules.pop(module_name, None) | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registry inconsistency fix confirmed.
The previous concern about registry being left inconsistent if module loading fails has been properly addressed. The _registry.register call now happens after successful exec_module (line 198), and the try-except block cleans up sys.modules on failure (line 201), ensuring consistent state for retries.
However, the error path (lines 199-202) is not covered by tests. Consider adding a test case that triggers a module loading failure (e.g., syntax error in __init__.py) to verify cleanup works correctly.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 199-199: src/lighter/utils/dynamic_imports.py#L199
Added line #L199 was not covered by tests
[warning] 201-202: src/lighter/utils/dynamic_imports.py#L201-L202
Added lines #L201 - L202 were not covered by tests
🤖 Prompt for AI Agents
In src/lighter/utils/dynamic_imports.py around lines 195 to 202, add a unit test
that intentionally triggers a module load failure (for example create a
temporary package directory with a syntax error in __init__.py), attempt to
import it through the dynamic import function used here, and assert that after
the failure sys.modules does not contain the failing module name and that
_registry has not registered the module; ensure the test cleans up temp files
and reverses any environment changes so it deterministically verifies the
cleanup path exercised by the except block.
…event handle leaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lighter/callbacks/csv_writer.py (1)
55-65: Re‑evaluate need for local stage check insetupYou still guard
setupwithif stage != Stage.PREDICT: returnafter callingsuper().setup(...). IfBaseWriter.setup()already filters by stage, this extra check may be redundant and is currently untested (per coverage). Consider either:
- Relying solely on
BaseWriter.setup()for stage filtering, or- Keeping the check but adding a small test that asserts no file is created when
stage != Stage.PREDICT.Since this was already raised on a previous revision, feel free to keep it if
BaseWriter.setup()has no stage logic or you prefer the explicit guard.
🧹 Nitpick comments (1)
src/lighter/callbacks/csv_writer.py (1)
169-176: Good hook coverage for cleanup; consider also removing temp files on exceptionsCalling
_close_file()from bothon_exception()andteardown()addresses the earlier file‑descriptor leak and makes cleanup robust across error paths and normal shutdown. One optional enhancement: if_temp_pathis set, you could alsounlink()it in these hooks to avoid leaving stray.tmp_rank*.csvfiles behind when prediction fails mid‑epoch.For example:
def on_exception(self, trainer: Trainer, pl_module: LighterModule, exception: BaseException) -> None: """Close the file on errors to prevent file handle leaks.""" - self._close_file() + self._close_file() + if self._temp_path is not None and self._temp_path.exists(): + self._temp_path.unlink() + self._temp_path = None(Same pattern could be applied in
teardownfor the PREDICT stage.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lighter/callbacks/csv_writer.py(1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/lighter/callbacks/csv_writer.py
[warning] 58-58: src/lighter/callbacks/csv_writer.py#L58
Added line #L58 was not covered by tests
[warning] 94-94: src/lighter/callbacks/csv_writer.py#L94
Added line #L94 was not covered by tests
[warning] 155-156: src/lighter/callbacks/csv_writer.py#L155-L156
Added lines #L155 - L156 were not covered by tests
🔇 Additional comments (2)
src/lighter/callbacks/csv_writer.py (2)
41-54: Centralized file handle cleanup looks goodThe
_close_file()helper plus the_csv_file/_csv_writerreset is a solid fix for the earlier resource-leak concern and simplifies cleanup logic across hooks. No issues here.
135-168: The callback signature is correct for this repository's PyTorch Lightning version.The
pyproject.tomlpinspytorch-lightning>=2.1.3, which is major version 2.x. In PyTorch Lightning 2.x, theCallback.on_predict_epoch_endhook signature is(trainer, pl_module)without anoutputsparameter. The current implementation matches this signature exactly, so no changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/lighter/callbacks/csv_writer.py (1)
57-58: Redundant stage check (unaddressed from previous review).The stage filtering at lines 57-58 duplicates the check already performed by
BaseWriter.setup()(called at line 56). This was flagged in the previous review but remains unaddressed.Apply this diff to remove the redundant check:
def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None: super().setup(trainer, pl_module, stage) - if stage != Stage.PREDICT: - return - # Create a temporary file for writing predictions self._temp_path = self.path.with_suffix(f".tmp_rank{trainer.global_rank}{self.path.suffix}")
🧹 Nitpick comments (1)
src/lighter/callbacks/csv_writer.py (1)
96-103: Consider validating all keys are present, not just one.The current validation (lines 96-103) only ensures at least one configured key exists in
outputs. However, later during record construction (lines 134-135), all keys are required, which means partial key coverage will fail late with a less informative error context.For clearer error reporting, validate all keys upfront:
-# Validate that at least one configured key is present in outputs -present_keys = [key for key in self.keys if key in outputs] -if not present_keys: - missing_keys = self.keys +# Validate that all configured keys are present in outputs +missing_keys = [key for key in self.keys if key not in outputs] +if missing_keys: raise KeyError( - f"CsvWriter: none of the configured keys {missing_keys} were found in outputs. " + f"CsvWriter: the following configured keys were not found in outputs: {missing_keys}. " f"Available keys in outputs: {list(outputs.keys())}" )This ensures failures happen at validation time rather than mid-iteration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lighter/callbacks/csv_writer.py(1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/lighter/callbacks/csv_writer.py
[warning] 58-58: src/lighter/callbacks/csv_writer.py#L58
Added line #L58 was not covered by tests
[warning] 94-94: src/lighter/callbacks/csv_writer.py#L94
Added line #L94 was not covered by tests
[warning] 161-161: src/lighter/callbacks/csv_writer.py#L161
Added line #L161 was not covered by tests
🔇 Additional comments (2)
src/lighter/callbacks/csv_writer.py (2)
48-53: LGTM: Proper resource cleanup helper.The
_close_file()helper correctly implements defensive cleanup with null/closed checks and is properly invoked fromon_exception(),teardown(), andon_predict_epoch_end(). This addresses the resource leak concern from the previous review.
141-173: LGTM: Robust distributed aggregation.The implementation correctly handles both distributed and single-process scenarios:
- Uses
all_gather_object()to collect temp paths from all ranks- Rank 0 performs CSV concatenation with pandas and cleans up temp files
- Defensive None checks guard against edge cases
The logic is sound for multi-GPU prediction workflows.
…dling fix(tests): add test for multiprocessing objects in subprocess to verify ForkingPickler behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/lighter/callbacks/csv_writer.py (1)
55-65: Redundant stage check insetup()(already handled byBaseWriter.setup).
BaseWriter.setup()returns early for non‑PREDICTstages, so the extraif stage != Stage.PREDICThere is dead code and hurts coverage. You can rely on the base implementation:def setup(self, trainer: Trainer, pl_module: LighterModule, stage: str) -> None: - if stage != Stage.PREDICT: - return super().setup(trainer, pl_module, stage) # Create a temporary file for writing predictionsThis keeps the logic in one place and removes an untested branch.
🧹 Nitpick comments (3)
src/lighter/callbacks/csv_writer.py (3)
41-54: Ctor/state handling looks solid; consider defensive copy ofkeys.Implementation is clean and
_close_file()correctly resets internal state. To avoid surprises if the caller mutates the original list in-place after construction, you could defensively copy:- self.keys = keys + self.keys = list(keys)This keeps the writer’s configuration immutable from the outside.
67-90: Tightenwrite()validation to fail fast when configured keys are missing.The per-sample loop still checks
if key not in outputsand raisesKeyError, even though the docstring says “These keys must be present in theoutputsdictionary.” You can centralize this validation before any length logic or row construction:def write(self, outputs: dict[str, Any], batch: Any, batch_idx: int, dataloader_idx: int) -> None: if self._csv_writer is None: return - # Validate that at least one configured key is present in outputs - present_keys = [key for key in self.keys if key in outputs] - if not present_keys: - missing_keys = self.keys - raise KeyError( - f"CsvWriter: none of the configured keys {missing_keys} were found in outputs. " - f"Available keys in outputs: {list(outputs.keys())}" - ) + # Validate that all configured keys are present in outputs + missing = [key for key in self.keys if key not in outputs] + if missing: + raise KeyError( + f"CsvWriter expected keys {missing} in outputs but they were missing. " + f"Available keys in outputs: {list(outputs.keys())}" + ) # Determine the number of samples in the batch. num_samples = 0 ... - for key in self.keys: - if key not in outputs: - raise KeyError(f"CsvWriter expected key '{key}' in outputs but it was missing.") - - value = outputs[key] + for key in self.keys: + value = outputs[key] record.append(self._get_record_value(value, i))This preserves current behavior (all keys required) but makes the failure mode immediate and avoids repeated presence checks inside the hot loop.
Also applies to: 92-140
141-182: Epoch-end aggregation and cleanup path looks correct; consider encoding for CSV opens.The aggregation logic (close, gather per-rank temp paths,
pandas.concat, final write, unlink temps, reset state) is coherent and should behave well in both single-process and DDP setups. One minor robustness tweak: explicitly set an encoding when opening the temp file to avoid platform-dependent defaults:- self._csv_file = open(self._temp_path, "w", newline="") + self._csv_file = open(self._temp_path, "w", newline="", encoding="utf-8")That makes the whole CSV pipeline (
csv.writer+pandas.read_csv/to_csv) encoding-predictable across environments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lighter/callbacks/csv_writer.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ibro45
Repo: project-lighter/lighter PR: 173
File: projects/cifar10/model.py:6-11
Timestamp: 2025-12-07T17:00:14.929Z
Learning: When reviewing code that uses LighterModule as a base class, the subclass does not need to define __init__ or forward methods. LighterModule provides __init__ that accepts network, criterion, optimizer, scheduler, and metrics parameters, and forward() that delegates to self.network. Subclasses only need to implement step methods (training_step, validation_step, test_step, predict_step). The network architecture is defined separately and passed via YAML configuration.
🧬 Code graph analysis (1)
src/lighter/callbacks/csv_writer.py (2)
src/lighter/callbacks/base_writer.py (3)
BaseWriter(13-75)setup(37-55)write(27-35)src/lighter/utils/types/enums.py (1)
Stage(14-18)
…checkpoint loading help text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/lighter/engine/runner.py (3)
32-36: Consider searching parent directories for the marker file.The current implementation only checks
Path.cwd() / "__lighter__.py". Users must runlighterfrom the exact directory containing the marker. Consider walking up the directory tree (e.g., checking.,..,../..until reaching a filesystem boundary) to allow runninglighterfrom subdirectories within a project.Example:
@staticmethod def auto_discover_and_import() -> bool: """ Auto-discover project from __lighter__.py marker file. Returns True if project was imported, False otherwise. """ - cwd = Path.cwd() - marker = cwd / "__lighter__.py" - - if not marker.exists(): - return False - - import_module_from_path("project", cwd) - logger.info(f"Imported 'project' module from '{cwd}'") - return True + current = Path.cwd() + for parent in [current, *current.parents]: + marker = parent / "__lighter__.py" + if marker.exists(): + import_module_from_path("project", parent) + logger.info(f"Imported 'project' module from '{parent}'") + return True + return False
107-107: Make random seed configurable.
seed_everything()is called without arguments, using a random seed. For reproducibility, consider making the seed configurable via config or CLI.Example:
- seed_everything() + seed = config.get("seed") + if seed is not None: + seed_everything(seed)Then users can add to their config:
seed: 42
361-361: Type inconsistency:args.commandis string butRunner.runexpectsStage.
args.commandfrom argparse is a string ("fit", "validate", etc.), butRunner.run()is typed to acceptstage: Stage. While this works becauseStageis aStrEnum, explicitly converting improves type safety and makes the intent clear.# Execute command try: - Runner().run(args.command, args.inputs, **stage_kwargs) + Runner().run(Stage(args.command), args.inputs, **stage_kwargs) except Exception as e: # Suppress exception chain to avoid duplicate tracebacks raise e from None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lighter/engine/runner.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ibro45
Repo: project-lighter/lighter PR: 173
File: projects/cifar10/model.py:6-11
Timestamp: 2025-12-07T17:00:14.929Z
Learning: When reviewing code that uses LighterModule as a base class, the subclass does not need to define __init__ or forward methods. LighterModule provides __init__ that accepts network, criterion, optimizer, scheduler, and metrics parameters, and forward() that delegates to self.network. Subclasses only need to implement step methods (training_step, validation_step, test_step, predict_step). The network architecture is defined separately and passed via YAML configuration.
🧬 Code graph analysis (1)
src/lighter/engine/runner.py (3)
src/lighter/utils/dynamic_imports.py (2)
import_module_from_path(155-204)get(44-46)src/lighter/utils/types/enums.py (1)
Stage(14-18)tests/unit/test_engine_runner.py (2)
test_cli_commands(207-223)test_setup_stage_with_project(132-143)
Summary
This PR redesigns Lighter's architecture to embrace Lightning-native patterns while maintaining the power of YAML configuration.
Core Changes
New Core Classes:
LighterModule: Optional base class providing automatic optimizer setup and dual logging (step + epoch level)LighterDataModule: Simple wrapper for organizing dataloaders in YAMLRemoved Legacy Components:
Systemclass (orchestrator replaced by modular Runner)adapters.py(functionality integrated into LighterModule)schema.py(validation delegated to Sparkwheel)Simplified Callback System:
CSVWriter,FileWriter,BaseWriterwriter/directory organizationProject Discovery: Marker-Based Auto-Discovery
Old approach (config-based):
New approach (marker-based):
Projects are now auto-discovered via a
__lighter__.pymarker file. Simply create this file in your project root:When you run
lighterfrom a directory containing__lighter__.py, the directory is automatically imported as a Python module namedproject. This lets you reference custom modules in YAML configs:Benefits:
project.*in configsRunner Refactoring
ProjectImporterandConfigLoaderhelper classesProjectImporter.auto_discover_and_import()handles marker-based discoveryinputsparameter replacing separate config/overrides argsDual Approach Philosophy
Users can now choose:
Summary by CodeRabbit
New Features
Documentation
Tests
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.