Rssa large files by AlvaroCubi · Pull Request #194 · Fusion4Energy/F4Enix

AlvaroCubi · 2026-02-10T14:55:46Z

Adds functions to scan RSSA file instead of loading them into memory all at once. Enables working with very large files that do not fit into memory.

Summary by CodeRabbit

New Features
- Added file persistence capabilities to save and reload RSSA data and parameters.
- Introduced efficient track scanning functionality for streaming large datasets.
Refactor
- Restructured RSSA with explicit parameters and tracks attributes for improved usability.
Tests
- Added comprehensive tests for new persistence and track scanning workflows.

…ntly

coderabbitai · 2026-02-10T14:56:17Z

Walkthrough

This PR refactors RSSA from a path-based class to a dataclass with explicit public fields (parameters, tracks) and introduces IO methods (read_from_file, save_to_files, load_from_saved_files). FileParameters gains JSON serialization support. The rssa_reader is enhanced with streaming via scan_tracks, and parse_header/parse_tracks are refactored to accept file paths and return typed results.

Changes

Cohort / File(s)	Summary
RSSA Dataclass Refactoring `src/f4enix/output/rssa/rssa.py`	Converted class to dataclass with public fields (parameters: FileParameters, tracks: pl.DataFrame). Added read_from_file(), save_to_files(), and load_from_saved_files() methods for IO. Removed reliance on internal path attribute; get_summary now builds output without file path reference.
FileParameters Serialization `src/f4enix/output/rssa/rssa_helpers.py`	Added save_to_json() and load_from_json() methods to FileParameters for JSON persistence. Serialization converts numpy-backed fields to Python basic types and handles nested SurfaceParameters reconstruction.
Reader Enhancement `src/f4enix/output/rssa/rssa_reader.py`	Refactored parse_header() and parse_tracks() to accept file paths (Path \| str) and return typed results (FileParameters, pl.DataFrame). Added BYTES_PER_TRACK constant and SCHEMA definition. Introduced scan_tracks() for streaming large files via Polars IO plugin with batch reading support.
Test Updates `tests/test_rssa.py`	Updated fixture creation to use read_from_file() instead of direct path construction. Added tests for save_to_files/load_from_saved_files round-trip and scan_tracks workflow with header parsing and track streaming.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

RSSA spectra plotting #188 — Overlapping modifications to RSSA surface/file model, FileParameters, and reader codepaths for parsing and IO operations.
Bring latest developing into main #176 — Direct changes to RSSA data model refactoring, FileParameters types, and related test updates.

Suggested reviewers

dodu94

Poem

🐰 A dataclass springs forth, with fields so clean and bright,
From path-bound paths we hop to JSON's crystalline light,
Tracks stream in batches, save and load with grace,
The RSSA hops forward to a brighter place! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and generic, using non-descriptive terms ('large files') that don't clearly convey the main technical changes or objectives.	Consider a more descriptive title like 'Add streaming support for large RSSA files' or 'Refactor RSSA to dataclass with streaming I/O capabilities' to better communicate the primary changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rssa-large-files

Tip

We've launched Issue Planner and it is currently in beta. Please try it out and share your feedback on Discord!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-10T15:00:32Z

Codecov Report

❌ Patch coverage is 95.89041% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/f4enix/output/rssa/rssa_reader.py	92.30%	3 Missing ⚠️

Files with missing lines	Coverage Δ
src/f4enix/output/rssa/rssa.py	`100.00% <100.00%> (ø)`
src/f4enix/output/rssa/rssa_helpers.py	`100.00% <100.00%> (ø)`
src/f4enix/output/rssa/rssa_reader.py	`87.35% <92.30%> (+2.74%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/f4enix/output/rssa/rssa.py (1)
107-108: ⚠️ Potential issue | 🟡 Minor

Minor: trailing comma in summary output.

Line 108 produces " photons: 0, " with a trailing comma and space. This looks like a formatting oversight.
📝 Suggested fix
-        summary += f" photons: {self.photon_tracks.shape[0]}, "
+        summary += f" photons: {self.photon_tracks.shape[0]}.\n"
src/f4enix/output/rssa/rssa_reader.py (1)

200-231: ⚠️ Potential issue | 🟡 Minor

The "b" column transformation loses semantic information for multi-digit photon indicators.

The formula abs(b) / 10^floor(log10(abs(b))) extracts only the leading single digit. Photon tracks documented as having indicator 16 are transformed to 1, not 16. While the binary filtering logic (b == 8 for neutrons, b != 8 for photons) still works correctly, the transformed b column no longer matches the documented particle-type encoding stated at line 37 of rssa.py.

This appears intentional for filtering efficiency, but the docstring is misleading: after transformation, photons become 1, not 16. Either clarify the documentation or preserve the original b value if the full encoded information is needed elsewhere.

Additionally, guard against b == 0 to prevent log10(0) from producing -inf.

🤖 Fix all issues with AI agents

In `@src/f4enix/output/rssa/rssa.py`:
- Around line 49-60: The docstring example for RSSA shows a filename in the
get_summary output but the current get_summary() implementation (in class RSSA)
is path-agnostic and emits "RSSA file was recorded..." without the filename;
update the example in the docstring (or adjust RSSA.get_summary) so they match:
either remove the filename from the example output or change RSSA.get_summary to
include a filename attribute when available (consult RSSA.read_from_file and the
RSSA instance attributes to include the original filename). Ensure the docstring
example and the get_summary() behavior are consistent and reference
RSSA.read_from_file, the RSSA class, and get_summary() for locating the code to
change.
- Line 1: The module docstring is stale and mentions "parsing of D1S-UNED
meshinfo files" while this file handles RSSA parsing; update the top-level
docstring in rssa.py to accurately describe the module's purpose (e.g., parsing
RSSA files, expected inputs/outputs, and any relevant classes or functions like
parse_rssa, RSSAParser or similar symbols present in the file) so it matches the
actual functionality.

In `@tests/test_rssa.py`:
- Around line 71-83: The test_scan_tracks_file currently calls .with_columns()
and .filter() with no arguments which will raise a TypeError; update the test to
either remove these no-arg calls or replace them with concrete Polars
expressions (e.g., .with_columns([...]) or .filter(some_predicate)) so
scanned_tracks remains a valid LazyFrame before .collect(); look for the
test_scan_tracks_file function and modify the chain using scan_tracks,
scanned_tracks, and RSSA to eliminate the empty .with_columns()/.filter() calls.

🧹 Nitpick comments (4)

src/f4enix/output/rssa/rssa_helpers.py (1)
66-81: Consider using @classmethod instead of @staticmethod for the alternate constructor.

load_from_json is a factory that returns a FileParameters instance — the idiomatic Python pattern for this is @classmethod so subclasses work correctly and the intent (alternate constructor) is clearer. Not blocking, but a nice improvement.
♻️ Suggested change
-    `@staticmethod`
-    def load_from_json(path: Path) -> "FileParameters":
+    `@classmethod`
+    def load_from_json(cls, path: Path) -> "FileParameters":
         """Loads the file parameters from a JSON file."""
         with open(path) as infile:
             data = json.load(infile)
             data["surfaces"] = [
                 SurfaceParameters(
                     id=surface["id"],
                     info=surface["info"],
                     type=surface["type"],
                     num_parameters=surface["num_parameters"],
                     parameters=surface["parameters"],
                 )
                 for surface in data["surfaces"]
             ]
-            return FileParameters(**data)
+            return cls(**data)
src/f4enix/output/rssa/rssa_reader.py (2)
200-210: Reshape at EOF may fail if the file has trailing bytes not aligned to 96.

When number_of_bytes_to_read > 0 and the file has fewer bytes remaining than requested, np.fromfile returns whatever is left. If that remainder isn't a multiple of 96, data.reshape(-1, 96) at line 209 raises a ValueError. For well-formed RSSA files this shouldn't occur, but a brief guard would improve robustness.
🛡️ Suggested defensive trim
     data = np.fromfile(file, BYTE, number_of_bytes_to_read)
 
+    # Trim any trailing bytes that don't form a complete track record
+    remainder = len(data) % 96
+    if remainder != 0:
+        data = data[:-remainder]
+
+    if len(data) == 0:
+        return pl.DataFrame(schema=SCHEMA)
+
     data = data.reshape(-1, 96)
45-51: parse_tracks re-parses the header just to skip it — minor inefficiency.

The header is parsed and discarded. This is fine for correctness, but callers who already have the header (e.g., after calling parse_header) pay the cost twice. Consider accepting an optional pre-parsed offset or combining both into a single call. Not blocking given the header is small.
src/f4enix/output/rssa/rssa.py (1)
65-70: read_from_file opens and parses the file header twice.

parse_header(path) opens the file and reads the header, then parse_tracks(path) opens the file again and re-parses the header just to skip it. For small headers this is negligible, but it's easy to avoid by reading header + tracks in a single open:
♻️ Suggested single-pass read
     `@staticmethod`
     def read_from_file(path: Path | str) -> "RSSA":
         """Loads the RSSA file from the given path as generated by MCNP."""
-        parameters = parse_header(path)
-        tracks = parse_tracks(path)
+        with open(path, "rb") as infile:
+            from f4enix.output.rssa.rssa_reader import _parse_header_binary, _parse_tracks_binary
+            parameters = _parse_header_binary(infile)
+            tracks = _parse_tracks_binary(infile)
         return RSSA(parameters, tracks)
Alternatively, the reader module could expose a single parse_file function that returns both in one pass.

coderabbitai · 2026-02-10T15:02:04Z

src/f4enix/output/rssa/rssa.py

@@ -1,79 +1,95 @@
 """This module is related to the parsing of D1S-UNED meshinfo files."""


⚠️ Potential issue | 🟡 Minor

Stale module docstring.

The docstring says "parsing of D1S-UNED meshinfo files" but this module handles RSSA files. Please update.

📝 Suggested fix

-"""This module is related to the parsing of D1S-UNED meshinfo files.""" +"""This module is related to the parsing and representation of RSSA files."""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""This module is related to the parsing of D1S-UNED meshinfo files."""

"""This module is related to the parsing and representation of RSSA files."""

🤖 Prompt for AI Agents

In `@src/f4enix/output/rssa/rssa.py` at line 1, The module docstring is stale and mentions "parsing of D1S-UNED meshinfo files" while this file handles RSSA parsing; update the top-level docstring in rssa.py to accurately describe the module's purpose (e.g., parsing RSSA files, expected inputs/outputs, and any relevant classes or functions like parse_rssa, RSSAParser or similar symbols present in the file) so it matches the actual functionality.

coderabbitai · 2026-02-10T15:02:04Z

src/f4enix/output/rssa/rssa.py

+    Examples
+    --------
+    >>> from f4enix.output.rssa import RSSA
+    ... my_rssa = RSSA.read_from_file('small_cyl.w')
+    ... print(my_rssa)
+    RSSA file small_cyl.w was recorded using the following surfaces:
+      Surface ID: 1, type: 1
+    The total number of tracks recorded is 72083.
+    Neutrons: 72083 photons: 0.
+    The simulation that produced this RSSA run 100000 histories.
+    The amount of independent histories that reached the RSSA surfaces was 70797.
+    """


⚠️ Potential issue | 🟡 Minor

Docstring example output doesn't match get_summary() implementation.

The example on line 54 shows "RSSA file small_cyl.w was recorded..." but get_summary() at line 102 produces "RSSA file was recorded..." (no filename, since the class is now path-agnostic).

📝 Suggested fix for the example output

- RSSA file small_cyl.w was recorded using the following surfaces: + RSSA file was recorded using the following surfaces:

🤖 Prompt for AI Agents

In `@src/f4enix/output/rssa/rssa.py` around lines 49 - 60, The docstring example for RSSA shows a filename in the get_summary output but the current get_summary() implementation (in class RSSA) is path-agnostic and emits "RSSA file was recorded..." without the filename; update the example in the docstring (or adjust RSSA.get_summary) so they match: either remove the filename from the example output or change RSSA.get_summary to include a filename attribute when available (consult RSSA.read_from_file and the RSSA instance attributes to include the original filename). Ensure the docstring example and the get_summary() behavior are consistent and reference RSSA.read_from_file, the RSSA class, and get_summary() for locating the code to change.

coderabbitai · 2026-02-10T15:02:04Z

tests/test_rssa.py

+def test_scan_tracks_file(rssa):
+    path = Path(RESOURCES.joinpath("small_cyl.w"))  # type: ignore
+    file_parameters = parse_header(path)
+    scanned_tracks = scan_tracks(path)
+    scanned_tracks = (
+        scanned_tracks.head(3)
+        .with_columns()  # We can filter by columns
+        .filter()  # We can apply any predicate to the LazyFrame
+    )
+    small_rssa = RSSA(file_parameters, scanned_tracks.collect())
+
+    assert small_rssa.tracks.shape == (3, 11)
+    assert rssa.tracks.head(3).equals(small_rssa.tracks)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Polars LazyFrame filter method no arguments behavior TypeError

💡 Result:

In Polars, LazyFrame.filter is not a “no-arg”/no-op method.

In older versions (e.g., docs for 0.18), the signature is filter(predicate: IntoExpr), so calling lf.filter() raises Python’s TypeError because the required predicate argument is missing. (docs.pola.rs)

In current stable/dev docs, the signature is filter(*predicates, **constraints), but it’s still defined as “filter rows … based on a predicate expression”, so calling it with no predicates/constraints is meaningless and you should avoid it. (docs.pola.rs)

What to do instead

If you meant “don’t filter”, just don’t call filter:

lf = lf if predicate is None else lf.filter(predicate)

If you need a deliberate no-op filter, pass a trivial predicate:

lf.filter(True) (keeps all rows) (docs.pola.rs)

Citations:

1: https://docs.pola.rs/docs/python/version/0.18/reference/lazyframe/api/polars.LazyFrame.filter.html?utm_source=openai

2: https://docs.pola.rs/api/python/stable/reference/lazyframe/?utm_source=openai

3: https://docs.pola.rs/api/python/stable/reference/lazyframe/?utm_source=openai

🌐 Web query:

Polars LazyFrame with_columns no arguments

💡 Result:

In Python Polars, LazyFrame.with_columns is defined with only optional varargs/kwargs:

LazyFrame.with_columns(*exprs, **named_exprs) -> LazyFrame

So calling it with no arguments is valid and is effectively a no-op (it returns a new LazyFrame representing “add zero columns”). [1]

If you’re conditionally building expressions, the common pattern is:

exprs: list[pl.Expr] = [...] lf = lf.with_columns(exprs) if exprs else lf

(You can pass a list of expressions directly; Polars accepts that form too.) [1]

Sources
[1] Polars docs: polars.LazyFrame.with_columns signature and usage: (docs.pola.rs)

Citations:

1: https://docs.pola.rs/py-polars/html/reference/lazyframe/api/polars.LazyFrame.with_columns.html?utm_source=openai

🌐 Web query:

Polars documentation LazyFrame filter required arguments

💡 Result:

In current Polars Python docs, LazyFrame.filter is defined as:

LazyFrame.filter( *predicates: IntoExprColumn | Iterable[IntoExprColumn] | bool | list[bool] | np.ndarray, **constraints: Any, ) -> LazyFrame

So there are no strictly required named parameters in the signature; instead you pass:

one or more positional predicates (boolean expression(s)), and/or

zero or more keyword constraints like colname=value (equivalent to pl.col("colname") == value, AND-ed together). [1][2]

In older docs (e.g., 0.18) the signature was:

LazyFrame.filter(predicate: IntoExpr) -> Self

where the single predicate argument was required. [3]

Sources: Polars LazyFrame.filter (stable/current) [1], (py-polars reference) [2], and version 0.18 docs [3].

🏁 Script executed:

cd tests && find . -name "test_rssa.py" -type f | head -5

Repository: Fusion4Energy/F4Enix

Length of output: 78

🏁 Script executed:

cat pyproject.toml | grep -A 5 -B 5 "polars"

Repository: Fusion4Energy/F4Enix

Length of output: 237

🏁 Script executed:

grep -r "polars" . --include="*.py" | grep "import" | head -3

Repository: Fusion4Energy/F4Enix

Length of output: 238

🏁 Script executed:

cat -n tests/test_rssa.py | sed -n '71,83p'

Repository: Fusion4Energy/F4Enix

Length of output: 680

🏁 Script executed:

cat -n tests/test_rssa.py | sed -n '1,30p'

Repository: Fusion4Energy/F4Enix

Length of output: 1124

🏁 Script executed:

grep -r "pytest" . --include="pyproject.toml" -A 5

Repository: Fusion4Energy/F4Enix

Length of output: 412

🏁 Script executed:

grep -r "filter()" . --include="*.py" -B 2 -A 2

Repository: Fusion4Energy/F4Enix

Length of output: 387

Remove the no-arg calls to .with_columns() and .filter().

.with_columns() with no arguments is a valid no-op in Polars, but .filter() without a predicate expression will raise a TypeError. If the intent is to demonstrate these methods are available, use actual expressions or remove the calls entirely.

♻️ Suggested fix

scanned_tracks = ( scanned_tracks.head(3) - .with_columns() # We can filter by columns - .filter() # We can apply any predicate to the LazyFrame )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_scan_tracks_file(rssa):

path = Path(RESOURCES.joinpath("small_cyl.w")) # type: ignore

file_parameters = parse_header(path)

scanned_tracks = scan_tracks(path)

scanned_tracks = (

scanned_tracks.head(3)

.with_columns() # We can filter by columns

.filter() # We can apply any predicate to the LazyFrame

)

small_rssa = RSSA(file_parameters, scanned_tracks.collect())

assert small_rssa.tracks.shape == (3, 11)

assert rssa.tracks.head(3).equals(small_rssa.tracks)

def test_scan_tracks_file(rssa):

path = Path(RESOURCES.joinpath("small_cyl.w")) # type: ignore

file_parameters = parse_header(path)

scanned_tracks = scan_tracks(path)

scanned_tracks = (

scanned_tracks.head(3)

)

small_rssa = RSSA(file_parameters, scanned_tracks.collect())

assert small_rssa.tracks.shape == (3, 11)

assert rssa.tracks.head(3).equals(small_rssa.tracks)

🤖 Prompt for AI Agents

In `@tests/test_rssa.py` around lines 71 - 83, The test_scan_tracks_file currently calls .with_columns() and .filter() with no arguments which will raise a TypeError; update the test to either remove these no-arg calls or replace them with concrete Polars expressions (e.g., .with_columns([...]) or .filter(some_predicate)) so scanned_tracks remains a valid LazyFrame before .collect(); look for the test_scan_tracks_file function and modify the chain using scan_tracks, scanned_tracks, and RSSA to eliminate the empty .with_columns()/.filter() calls.

AlvaroCubi added 7 commits February 10, 2026 15:38

Move b column processing to reader side

f19bdfb

Make RSSA a dataclass that can be read from a file or built independe…

9344a80

…ntly

Read and save RSSA to disk

dbe86ba

RSSA scanning

0aa0fdf

Add testing

8045b36

Add documentation

8475a81

Extra info in the example

29c1752

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

dodu94 approved these changes Feb 10, 2026

View reviewed changes

dodu94 merged commit 3cfeaa4 into developing Feb 10, 2026
8 checks passed

dodu94 deleted the rssa-large-files branch February 10, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rssa large files#194

Rssa large files#194
dodu94 merged 7 commits intodevelopingfrom
rssa-large-files

AlvaroCubi commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,79 +1,95 @@
		"""This module is related to the parsing of D1S-UNED meshinfo files."""

Conversation

AlvaroCubi commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlvaroCubi commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

codecov bot commented Feb 10, 2026 •

edited

Loading