LCORE-1276 Add configuration snapshot for telemetry that masks PII#1155
LCORE-1276 Add configuration snapshot for telemetry that masks PII#1155tisnik merged 4 commits intolightspeed-core:mainfrom
Conversation
WalkthroughAdds a telemetry package that constructs JSON-serializable, PII-masked configuration snapshots from lightspeed-stack and llama-stack sources, plus unit tests and fixtures validating extraction, masking, and serialization. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant BuildAPI as build_configuration_snapshot()
participant Lightspeed as build_lightspeed_stack_snapshot()
participant Llama as build_llama_stack_snapshot()
participant ExtractMask as FieldExtraction&Masking
participant Serialize as JSONSerialization
Client->>BuildAPI: invoke with Configuration + optional llama_config_path
BuildAPI->>Lightspeed: build lightspeed snapshot
Lightspeed->>ExtractMask: traverse LIGHTSPEED_STACK_FIELDS
ExtractMask->>ExtractMask: apply PASSTHROUGH / SENSITIVE masking
ExtractMask->>Serialize: convert to JSON-serializable values
Serialize-->>Lightspeed: return masked lightspeed snapshot
BuildAPI->>Llama: build llama snapshot (reads YAML if path provided)
Llama->>ExtractMask: traverse LLAMA_STACK_FIELDS and lists
ExtractMask->>ExtractMask: per-item extraction & masking
ExtractMask->>Serialize: convert nested/list items
Serialize-->>Llama: return masked llama snapshot
BuildAPI->>BuildAPI: merge lightspeed + llama snapshots
BuildAPI-->>Client: return combined masked configuration snapshot
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tests/unit/telemetry/conftest.py`:
- Around line 170-298: The fixture build_fully_populated_config constructs
TLSConfiguration.tls_key_password with PurePosixPath which mismatches the
model's FilePath/Path type; update the TLSConfiguration construction in
build_fully_populated_config to use Path(PII_TLS_PASS) or a plain string for
tls_key_password (instead of PurePosixPath) so the test's types match
TLSConfiguration.tls_key_password and reflect real validation.
🧹 Nitpick comments (5)
src/telemetry/configuration_snapshot.py (3)
300-314:_set_nested_valueassumes no path is a prefix of another path.If a
FieldSpecpath like"a.b"is registered alongside"a.b.c", the first call stores a scalar attarget["a"]["b"], and the second call tries to traverse into it as a dict — causing anAttributeError/TypeErrorat runtime. The current registries are safe, but this is fragile for future maintainers.Consider adding a guard or an assertion:
🛡️ Defensive check
parts = path.split(".") current = target for part in parts[:-1]: - if part not in current: + if part not in current or not isinstance(current[part], dict): current[part] = {} current = current[part] current[parts[-1]] = value
260-280:str(value)fallback in passthrough could leak unexpected types.The final
else str(value)branch converts unknown types to their string representation. If a new config type is introduced that embeds PII in__str__, it would leak through passthrough. TheSecretStr/PurePathguards are good defense-in-depth, but consider logging a warning when hitting the fallback branch so future regressions are visible.💡 Optional: add warning on fallback
# Safety: mask SecretStr and file paths; stringify everything else - return CONFIGURED if isinstance(value, (SecretStr, PurePath)) else str(value) + if isinstance(value, (SecretStr, PurePath)): + return CONFIGURED + logger.debug("Passthrough fallback to str() for type %s", type(value).__name__) + return str(value)
160-228: Provider loop is clean — verify the provider list stays in sync with llama-stack.The hardcoded provider names (
inference,safety,vector_io, …) will silently miss any new provider types added to llama-stack in the future. This is fine given the allowlist design intent, but consider adding a comment noting this list should be updated when llama-stack adds new provider categories.tests/unit/telemetry/test_configuration_snapshot.py (2)
505-512: Temp file cleanup pattern could be more robust.In
test_invalid_yaml,test_missing_providers_section, andtest_server_fields_masked, temp files are created withdelete=Falseand cleaned up inline. Ifbuild_llama_stack_snapshotraises an unexpected exception, the cleanup line is skipped. Consider usingpytest'stmp_pathfixture instead, which auto-cleans regardless of test outcome.💡 Example using tmp_path
- def test_invalid_yaml(self) -> None: + def test_invalid_yaml(self, tmp_path: Path) -> None: """Test that invalid YAML returns not_available status.""" - with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f: - f.write(": invalid: yaml: [") - path = f.name - result = build_llama_stack_snapshot(path) - Path(path).unlink() + path = tmp_path / "invalid.yaml" + path.write_text(": invalid: yaml: [") + result = build_llama_stack_snapshot(str(path)) assert result == {"status": NOT_AVAILABLE}
640-647: Allowlist check only validates top-level keys.
test_snapshot_only_contains_allowlisted_fieldschecks that no unexpected top-level keys exist, but doesn't verify nested keys. For example, if a bug in_set_nested_valueintroduced extra nested keys, this test wouldn't catch it. The PII leak tests provide a stronger guarantee, so this is mostly a belt-and-suspenders concern.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/telemetry/configuration_snapshot.py (2)
262-284: Safety-net fallback tostr()could silently leak sensitive data for unexpected types.If a new Pydantic field type is introduced that isn't
SecretStr,PurePath, or a primitive,_serialize_passthroughwill callstr()on it, potentially exposing sensitive content. Thedebug-level log makes this easy to miss in production. Consider usingwarninglevel or defaulting toCONFIGUREDfor unknown types to err on the side of safety.🛡️ Safer fallback proposal
# Safety: mask SecretStr and file paths; log and stringify everything else - if not isinstance(value, (SecretStr, PurePath)): - logger.debug("Passthrough fallback to str() for type %s", type(value).__name__) - return CONFIGURED if isinstance(value, (SecretStr, PurePath)) else str(value) + logger.warning("Passthrough fallback for unexpected type %s — masking as configured", type(value).__name__) + return CONFIGURED
335-360: Return typelist[dict[str, Any]] | stris imprecise — consider a dedicated sentinel or typed union.The function returns either a list or the string literal
"not_configured". Downstream consumers must checkisinstance(result, str)to distinguish. UsingLiteral["not_configured"]in the union would make the contract clearer, though this is a minor API polish point.tests/unit/telemetry/test_configuration_snapshot.py (1)
699-709: Duplicate detection has O(n²) complexity — minor but easily improved.The
paths.count(p)inside a list comprehension iterates the full list for each element. For the current registry sizes this is negligible, but aCounterorsetapproach would be cleaner.♻️ Simpler duplicate detection
def test_no_duplicate_paths_in_lightspeed_registry(self) -> None: """Verify no duplicate paths in lightspeed-stack registry.""" paths = [spec.path for spec in LIGHTSPEED_STACK_FIELDS] - duplicates = [p for p in paths if paths.count(p) > 1] - assert not duplicates, f"Duplicate paths: {set(duplicates)}" + seen: set[str] = set() + duplicates = {p for p in paths if p in seen or seen.add(p)} # type: ignore[func-returns-value] + assert not duplicates, f"Duplicate paths: {duplicates}"
|
@are-ces , could you PTAL? A few notes:
|
There was a problem hiding this comment.
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@src/telemetry/configuration_snapshot.py`:
- Around line 205-225: The current ListFieldSpec generator uses a hardcoded
tuple of provider category names; change it to iterate the actual provider keys
from the parsed config (e.g., use the dict keys from the variable holding
providers in the snapshot builder or Config object) so new llama-stack provider
categories are picked up automatically, while keeping the per-item FieldSpec
masking of "provider_id" and "provider_type" with MaskingType.PASSTHROUGH;
update the expression that creates the ListFieldSpec instances (the generator
producing ListFieldSpec(f"providers.{provider_name}",
item_fields=(FieldSpec("provider_id", ...), FieldSpec("provider_type", ...))))
to use dynamic provider_name values from the config's providers dict instead of
the hardcoded tuple.
- Around line 419-422: The snapshot currently treats db_path as not configured
when backend.get("db_path") is falsy; update the check in the result dict so
db_path is considered CONFIGURED whenever it is present (non-None) by using an
explicit is not None test (e.g., use CONFIGURED if backend.get("db_path") is not
None else NOT_CONFIGURED) to match the mask_value SENSITIVE handling and avoid
treating empty string as not configured; modify the code that builds result (the
dict with keys "type" and "db_path") accordingly.
In `@tests/unit/telemetry/test_configuration_snapshot.py`:
- Around line 32-38: The tests explicitly import helper functions
build_fully_populated_config and build_minimal_config from conftest; convert
those helpers into pytest fixtures in conftest (e.g., def
build_fully_populated_config() -> Config marked with `@pytest.fixture` and
similarly for build_minimal_config) and then update tests to accept them as
function arguments instead of importing them, so test_configuration_snapshot.py
uses dependency-injected fixtures rather than direct imports while keeping
existing helper names (build_fully_populated_config, build_minimal_config) for
minimal test changes.
🧹 Nitpick comments (3)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@src/telemetry/configuration_snapshot.py`: - Around line 205-225: The current ListFieldSpec generator uses a hardcoded tuple of provider category names; change it to iterate the actual provider keys from the parsed config (e.g., use the dict keys from the variable holding providers in the snapshot builder or Config object) so new llama-stack provider categories are picked up automatically, while keeping the per-item FieldSpec masking of "provider_id" and "provider_type" with MaskingType.PASSTHROUGH; update the expression that creates the ListFieldSpec instances (the generator producing ListFieldSpec(f"providers.{provider_name}", item_fields=(FieldSpec("provider_id", ...), FieldSpec("provider_type", ...)))) to use dynamic provider_name values from the config's providers dict instead of the hardcoded tuple. - Around line 419-422: The snapshot currently treats db_path as not configured when backend.get("db_path") is falsy; update the check in the result dict so db_path is considered CONFIGURED whenever it is present (non-None) by using an explicit is not None test (e.g., use CONFIGURED if backend.get("db_path") is not None else NOT_CONFIGURED) to match the mask_value SENSITIVE handling and avoid treating empty string as not configured; modify the code that builds result (the dict with keys "type" and "db_path") accordingly. In `@tests/unit/telemetry/test_configuration_snapshot.py`: - Around line 32-38: The tests explicitly import helper functions build_fully_populated_config and build_minimal_config from conftest; convert those helpers into pytest fixtures in conftest (e.g., def build_fully_populated_config() -> Config marked with `@pytest.fixture` and similarly for build_minimal_config) and then update tests to accept them as function arguments instead of importing them, so test_configuration_snapshot.py uses dependency-injected fixtures rather than direct imports while keeping existing helper names (build_fully_populated_config, build_minimal_config) for minimal test changes.src/telemetry/configuration_snapshot.py (2)
205-225: Hardcoded provider category list will silently miss new llama-stack providers.The
NOTEcomment is good, but this will fail silently when llama-stack adds new provider categories — the snapshot simply won't include them. Consider extracting the provider names dynamically from the parsed config'sprovidersdict keys so the snapshot stays in sync automatically, while still applying the same safe per-item masking (onlyprovider_idandprovider_type).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/telemetry/configuration_snapshot.py` around lines 205 - 225, The current ListFieldSpec generator uses a hardcoded tuple of provider category names; change it to iterate the actual provider keys from the parsed config (e.g., use the dict keys from the variable holding providers in the snapshot builder or Config object) so new llama-stack provider categories are picked up automatically, while keeping the per-item FieldSpec masking of "provider_id" and "provider_type" with MaskingType.PASSTHROUGH; update the expression that creates the ListFieldSpec instances (the generator producing ListFieldSpec(f"providers.{provider_name}", item_fields=(FieldSpec("provider_id", ...), FieldSpec("provider_type", ...)))) to use dynamic provider_name values from the config's providers dict instead of the hardcoded tuple.
419-422: Falsy check ondb_pathtreats empty string as not configured.
backend.get("db_path")evaluates to falsy for"",0, andFalse— not justNone. If a config ever setsdb_pathto an empty string (misconfiguration but possible), the snapshot would reportnot_configuredrather thanconfigured. Consider using an explicitis not Nonecheck for consistency with howmask_valuehandles SENSITIVE fields (any non-None →configured).Suggested fix
- "db_path": CONFIGURED if backend.get("db_path") else NOT_CONFIGURED, + "db_path": CONFIGURED if backend.get("db_path") is not None else NOT_CONFIGURED,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/telemetry/configuration_snapshot.py` around lines 419 - 422, The snapshot currently treats db_path as not configured when backend.get("db_path") is falsy; update the check in the result dict so db_path is considered CONFIGURED whenever it is present (non-None) by using an explicit is not None test (e.g., use CONFIGURED if backend.get("db_path") is not None else NOT_CONFIGURED) to match the mask_value SENSITIVE handling and avoid treating empty string as not configured; modify the code that builds result (the dict with keys "type" and "db_path") accordingly.tests/unit/telemetry/test_configuration_snapshot.py (1)
32-38: Explicit import from conftest module is fine for helper functions, but consider using fixtures instead.Importing
build_fully_populated_configandbuild_minimal_configas plain functions from conftest works, but defining them as pytest fixtures in conftest would be more idiomatic and align with pytest conventions. This would also allow automatic dependency injection and reduce boilerplate calls throughout the tests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/telemetry/test_configuration_snapshot.py` around lines 32 - 38, The tests explicitly import helper functions build_fully_populated_config and build_minimal_config from conftest; convert those helpers into pytest fixtures in conftest (e.g., def build_fully_populated_config() -> Config marked with `@pytest.fixture` and similarly for build_minimal_config) and then update tests to accept them as function arguments instead of importing them, so test_configuration_snapshot.py uses dependency-injected fixtures rather than direct imports while keeping existing helper names (build_fully_populated_config, build_minimal_config) for minimal test changes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/telemetry/configuration_snapshot.py`:
- Around line 424-425: Namespace is being exposed raw for the metadata store;
treat it as SENSITIVE like database.postgres.namespace in
LIGHTSPEED_STACK_FIELDS and mask it before assigning to result["namespace"].
Locate the metadata branch where store_name == "metadata" (and symbols store,
result, NOT_CONFIGURED) and replace the direct store.get("namespace",
NOT_CONFIGURED) assignment with the same masking utility used elsewhere in this
module for LIGHTSPEED_STACK_FIELDS (i.e., call the existing mask/obfuscation
helper used for sensitive fields and pass the field identifier for namespace),
so the stored namespace is masked consistently.
| if store_name == "metadata": | ||
| result["namespace"] = store.get("namespace", NOT_CONFIGURED) |
There was a problem hiding this comment.
Namespace value leaked without masking — inconsistent with lightspeed-stack treatment.
database.postgres.namespace is classified as SENSITIVE in LIGHTSPEED_STACK_FIELDS (Line 146), but the llama-stack metadata store namespace is passed through raw here. Namespace values can contain environment-specific or identifying strings and should be masked consistently.
Proposed fix
if store_name == "metadata":
- result["namespace"] = store.get("namespace", NOT_CONFIGURED)
+ result["namespace"] = CONFIGURED if store.get("namespace") is not None else NOT_CONFIGURED🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/telemetry/configuration_snapshot.py` around lines 424 - 425, Namespace is
being exposed raw for the metadata store; treat it as SENSITIVE like
database.postgres.namespace in LIGHTSPEED_STACK_FIELDS and mask it before
assigning to result["namespace"]. Locate the metadata branch where store_name ==
"metadata" (and symbols store, result, NOT_CONFIGURED) and replace the direct
store.get("namespace", NOT_CONFIGURED) assignment with the same masking utility
used elsewhere in this module for LIGHTSPEED_STACK_FIELDS (i.e., call the
existing mask/obfuscation helper used for sensitive fields and pass the field
identifier for namespace), so the stored namespace is masked consistently.
tisnik
left a comment
There was a problem hiding this comment.
LGTM, interesting code, nice one
| if isinstance(value, Enum): | ||
| return value.value | ||
| if isinstance(value, (list, tuple)): | ||
| return [_serialize_passthrough(v) for v in value] |
There was a problem hiding this comment.
ha, probably the only second recursion in our code :)
Description
Add configuration snapshot for telemetry that masks PII. New module builds JSON-serializable snapshots of both lightspeed-stack and llama-stack configurations at startup. It's opt-in, not opt-out, so additions don't get leaked, but they need to be explicitly added.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Output should show "configured" / "not_configured" for sensitive fields (e.g. API keys), but normal values for non-sensitive (e.g. port numbers).
Summary by CodeRabbit
New Features
Documentation
Tests