feat(gateway): add GET /apps/{id}/belongs-to discovery endpoint by bburda · Pull Request #398 · selfpatch/ros2_medkit

bburda · 2026-05-13T12:15:37Z

Summary

Implements GET /api/v1/apps/{app-id}/belongs-to. Returns the area that
contains an app, resolved via the app's parent component and that
component's area assignment.

The belongs-to resource was already listed in app capabilities; this PR
wires it up with a real handler, route, capability URI on GET /apps/{id},
unit + integration tests, and the matching REQ_INTEROP_106 spec entry.

Pattern mirrors the existing relationship endpoints (is-located-on,
depends-on): collection wrapper with 0 or 1 element, x-medkit.missing=true
on the item when an area id cannot be resolved.

Issue

closes Implement GET /apps/{app-id}/belongs-to endpoint #196

Type

Bug fix
New feature or tests
Breaking change
Documentation only

Testing

Unit: 6 new cases in test_discovery_handlers.cpp covering happy path,
no-component, no-area, missing-area, invalid id, unknown app. Plus
belongs-to URI assertion in the existing GetAppReturnsLinksAndCapabilities
and supports_resource("belongs-to") for the APP capability set.
Integration: 2 new cases in test_hateoas.test.py (happy path against
demo manifest, 404 for unknown app).
Full local unit suite (2130 tests): green.
Linters (clang-format, ament_copyright, cpplint, etc.) + clang-tidy on
changed files: clean.
Sphinx docs build: clean apart from the unrelated auto-generated
verification.rst path.

Also bumps the TriggerManagerTest.LifetimeExpiry sleep margin in a
separate commit. The test created a 1 s trigger and slept 1500 ms; under
parallel CTest load the 500 ms margin past expiry was occasionally short
and the trigger still reported active. Bumped to 2500 ms.

Checklist

Breaking changes are clearly described (and announced in docs / changelog if needed)
Tests were added or updated if needed
Docs were updated if behavior or public API changed

Copilot

Pull request overview

Adds a new discovery relationship endpoint GET /api/v1/apps/{app-id}/belongs-to to resolve an App’s parent Area via its host Component, and wires it through routing, docs, and tests in the gateway.

Changes:

Implement DiscoveryHandlers::handle_app_belongs_to() and register the new route in RESTServer::setup_routes().
Extend discovery/HATEOAS expectations with unit + integration tests, and assert APP capability support for belongs-to.
Update REST API docs and requirements/spec traceability for REQ_INTEROP_106; adjust a flaky trigger-manager timing test.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/ros2_medkit_integration_tests/test/features/test_hateoas.test.py	Adds integration coverage for `/apps/{id}/belongs-to` (happy path + 404).
src/ros2_medkit_gateway/test/test_trigger_manager.cpp	Increases the sleep margin in `LifetimeExpiry` to reduce CI flakiness.
src/ros2_medkit_gateway/test/test_entity_resource_model.cpp	Adds an assertion that APP capabilities include `belongs-to`.
src/ros2_medkit_gateway/test/test_discovery_handlers.cpp	Adds unit coverage for the new belongs-to handler and updates app detail expectations.
src/ros2_medkit_gateway/src/http/rest_server.cpp	Registers the `/belongs-to` route for apps (OpenAPI + handler wiring).
src/ros2_medkit_gateway/src/http/handlers/discovery_handlers.cpp	Implements the belongs-to handler and exposes the resource URI from `GET /apps/{id}`.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/core/http/handlers/discovery_handlers.hpp	Declares the new handler entrypoint in `DiscoveryHandlers`.
docs/requirements/specs/discovery.rst	Adds `REQ_INTEROP_106` requirement entry for the new endpoint.
docs/api/rest.rst	Documents `GET /api/v1/apps/{app_id}/belongs-to` behavior and example response.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

…head The gateway shutdown sequence (mdns stop, REST server stop, transport teardown, plugin manager shutdown, member destructors) completes well under a second normally, but under TSan/ASan overhead on CI runners it can exceed the launch_ros defaults (5s sigterm_timeout / 5s sigkill_timeout). When that happens launch_testing escalates SIGINT -> SIGTERM -> SIGKILL and the TestShutdown.test_exit_codes assertion reports exit code -9 even though the gateway is mid-shutdown. The TSan run for PR #398 hit exactly this on test_scenario_thermal_protection: the gateway logged TriggerTopicSubscriber shutdown complete and other steps but was still cleaning up when the SIGKILL timer fired. Set sigterm_timeout=30 and sigkill_timeout=15 in create_gateway_node so every integration test that goes through this factory gets the longer windows. The bigger numbers do not slow down healthy runs since the process exits as soon as shutdown actually finishes.

mfaferek93

A few comments :)

Returns the area that contains an app, resolved transitively via the app's parent component and that component's area assignment. Follows the existing relationship-endpoint pattern (is-located-on, depends-on): collection response wrapped in {"items": [...]} with 0 or 1 element. The "belongs-to" resource was already declared in app capabilities; this wires it through with a real handler, route registration, capability URI on the app detail response, unit + integration tests, and REQ_INTEROP_106. closes #196

Replace the fixed sleep in TriggerManagerTest.LifetimeExpiry with a bounded polling loop. The test now exits as soon as is_active flips to false on a fast machine (~1 s) and only sits up to the 5 s deadline if the scheduler stalls under heavy parallel test load. Also split AppSupportsIsLocatedOn into two cases so the name reflects what each test asserts: AppSupportsIsLocatedOn and AppSupportsBelongsTo.

…lity handle_get_app already added the top-level "belongs-to" URI when the app has a parent component but did not also add it to the _links object. That left the new relationship discoverable from the top-level field but not from _links, while is-located-on and depends-on appear in both. Mirror the existing pattern: gate the _links entry on the same component_id condition and assert it in the unit test.

FaultManagerTest cases that called start_spinning() and then invoked fault_manager.get_snapshots() or get_rosbag() did not join the spin thread before letting test-scope locals run their destructors. Under TSan this produced flaky "data race in operator delete" warnings where the executor thread cleaning up the rclcpp client's pending request overlapped with destruction of the result and local fixtures in the main thread. Add stop_spinning() right after each get_*() call so the spin thread is cancelled and joined before the test reads from result or anything goes out of scope. Mirrors the pattern already used by the notify and trigger-event tests in this file.

The "Upload coverage to Codecov" step runs in an ubuntu:noble container that does not include gpg. codecov/codecov-action@v5 runs a dependency check that requires gpg and exits with code 1 when it is missing, failing the coverage job on every push to main since the step is gated on github.ref == 'refs/heads/main'. Add gpg to the apt-get install line so the action can validate uploader signatures.

…head The gateway shutdown sequence (mdns stop, REST server stop, transport teardown, plugin manager shutdown, member destructors) completes well under a second normally, but under TSan/ASan overhead on CI runners it can exceed the launch_ros defaults (5s sigterm_timeout / 5s sigkill_timeout). When that happens launch_testing escalates SIGINT -> SIGTERM -> SIGKILL and the TestShutdown.test_exit_codes assertion reports exit code -9 even though the gateway is mid-shutdown. The TSan run for PR #398 hit exactly this on test_scenario_thermal_protection: the gateway logged TriggerTopicSubscriber shutdown complete and other steps but was still cleaning up when the SIGKILL timer fired. Set sigterm_timeout=30 and sigkill_timeout=15 in create_gateway_node so every integration test that goes through this factory gets the longer windows. The bigger numbers do not slow down healthy runs since the process exits as soon as shutdown actually finishes.

Three gateway plugins held rclcpp::Node or rclcpp::Subscription members and called shutdown() from a non-noexcept destructor. On Rolling rclcpp the destructor of those resources throws graph_listener::NodeNotFoundError when the rclcpp context has already been invalidated by rclcpp::shutdown() (which is exactly the situation during late teardown). An exception escaping a destructor invokes std::terminate(), which is what test_beacon_param TestShutdown.test_exit_codes saw as exit code -6 on the rolling job. Plugin_manager::shutdown_all() already catches exceptions from plugin->shutdown(), but that does not help when the plugin destructor itself runs the throw (either by calling shutdown() again or by member auto-destruction once the explicit reset() failed). Fix in three plugins (parameter_beacon, topic_beacon, graph_provider): 1. Mark the destructor noexcept and wrap the shutdown() call in try-catch so nothing escapes the destructor. 2. Wrap the rclcpp resource reset() inside shutdown() in try-catch so the first shutdown attempt (driven by plugin_manager) drops the resource even when rclcpp throws on tear-down. Lifted only the destruction sites; the rest of shutdown() and the normal happy path are unchanged.

auth_config.cpp had no entries for apps/*/belongs-to or apps/*/is-located-on under any of viewer / operator / configurator. The apps/* wildcard in the catch-all entry is single-segment (auth_requirement_policy maps * to [^/]+), so it does not cover the relationship subresources. Only admin had a matching apps/** fallback, which meant every other role hit 403 on these endpoints. Add the two new entries for each role next to the existing apps/*/depends-on entry. is-located-on is pre-existing - this also fixes the same gap that shipped with REQ_INTEROP_105.

handle_app_belongs_to silently returned items=[] when the app's component_id pointed to a component the discovery layer could not resolve. HATEOAS clients then could not tell 'app legitimately has no parent area' from 'manifest broken, parent component gone'. Add the same symmetric branch handle_app_is_located_on already has: emit a single item with x-medkit.missing=true and an x-medkit extension field carrying the dangling component id, plus a RCLCPP_WARN. A component that resolves but has no area_id assigned still produces items=[] - that is a valid manifest configuration, not a broken ref. Also tighten the chain traversal with an atomic ThreadSafeEntityCache snapshot so the three lookups (app, component, area) cannot observe mixed cache generations under a concurrent refresh, and switch the three app relationship handlers to validate_entity_for_route so remote apps in an aggregation cluster get forwarded instead of 404. Mirror unit test AppIsLocatedOnReturnsMissingItemWhenHostComponentUnresolved.

…shot GET /apps/{id} exposes the available subresources on three parallel surfaces (top-level URI references, _links, and the capabilities array); the belongs-to addition only updated two of them. App::to_capabilities, used by notifications / snapshots / change events, also lacked the link. Brought all four sites into agreement by adding BELONGS_TO and IS_LOCATED_ON entries to CapabilityBuilder::Capability (IS_LOCATED_ON was already missing - boy-scouted along with the new one), pushing them into the caps vector under the same component_id gate as the existing URI / _links emission, and adding belongs-to to App::to_capabilities. SOVD ISO 17978-3 §7.6 defines /belongs-to only for /apps. The previous EntityCapabilities table advertised belongs-to for SERVER and COMPONENT too, so supports_resource("belongs-to") returned true on those types and clients hit 404 when following the advertised resource. Dropped both entries from resources_; SERVER and COMPONENT keep their other relationship resources. Add an atomic ThreadSafeEntityCache::get_app_with_links() that resolves an App together with its parent Component and Area under a single shared_lock. Three sequential get_* calls each take a fresh shared_lock, which lets a writer refresh advance the generation between them and return a mixed-generation view (app from gen N, component from N+1, area from N). The handler refactor is in the previous commit; this one adds the method and its unit test coverage via the handler suite. Document on docs/api/rest.rst that standalone apps deliberately do not expose the belongs-to URI - SOVD §7.6 says "Each URI reference is only present if the entity supports that collection". HATEOAS-only clients must rely on absence of the link. Switch test_hateoas integration tests for the 200 paths to self.poll_endpoint(), which adds retry under sanitizer / CI load instead of the raw requests.get(timeout=10) used before. Update the 404 shape to the generic 'Entity not found' / parameters.entity_id payload emitted by validate_entity_for_route (the relationship handlers now go through it so remote apps in an aggregation cluster get forwarded).

The :ref:`sovd-api-discovery` target lives in the separate knowledge/ Sphinx project and is not reachable from ros2_medkit's docs build, so sphinx-build -W rejected it as an undefined label. Keep the ISO 17978-3 §7.6 citation in prose; consumers can look the section up directly.

… cases handle_app_depends_on iterates app.depends_on and resolves each entry via cache.get_app(). Each call takes a fresh shared_lock, which lets a writer refresh advance the cache generation between the app fetch and the N per-dependency fetches - a 5-dependency app could return 5 dependencies from 5 different generations. Add ThreadSafeEntityCache::get_app_with_dependencies() that resolves the app plus every entry in its depends_on under a single shared_lock, and switch the handler to use it. Discovery-manager fallback preserves the old shape (no atomic guarantee, but rare path used when the entity isn't cached yet). Seed the demo manifest with three HATEOAS edge-case fixtures (hateoas-edge-area, hateoas-component-no-area, hateoas-app-standalone, hateoas-app-on-area-less-component) and add the matching integration tests so the standalone-app and component-without-area branches of /belongs-to and /is-located-on are exercised end-to-end, not just in unit tests. The 'unresolved component' and 'dangling area' branches are not seeded in the integration manifest because validators R007 (app -> missing component) and R006 (component -> missing area) reject the manifest and trigger a fallback to runtime_only discovery, which disables the rest of the suite. Both branches remain covered by the unit tests AppBelongsToReturnsMissingItemWhenParentComponentUnresolved and AppBelongsToReturnsMissingItemWhenAreaUnresolved.

… count assertions The new hateoas-edge-area, hateoas-component-no-area, hateoas-app-standalone and hateoas-app-on-area-less-component manifest entries (added to support the standalone-app / area-less integration tests in test_hateoas) tripped the exact-count assertions in test_hybrid_suppression and the allow-list-style checks in test_discovery_gap_fill. Both suites load the same demo_nodes_manifest.yaml. Extend MANIFEST_ALL_AREAS, MANIFEST_TOP_LEVEL_AREAS, MANIFEST_COMPONENTS and MANIFEST_APPS in test_hybrid_suppression to include the new ids, plus add MANIFEST_APPS_WITH_BINDING so the test_manifest_apps_are_online check skips the manifest-only fixtures (no ros_binding means they never reach is_online=True). Mirror the allow-list updates in test_discovery_gap_fill.

…fixtures test_scenario_discovery_manifest::test_25_no_topic_components_in_manifest_only hard-codes the set of expected component IDs and rejects any others, so the hateoas-component-no-area fixture (added in dfc4dae to support the new integration tests for the standalone / area-less branches) tripped it under manifest_only mode. Add the fixture id to the allow-list.

ASan job hit 'ld: final link failed: No space left on device' three runs in a row on the c220212 CI cycle, deterministically aborting at the test_plugin_entity_routing / test_plugin_manager link step. The github-hosted runner gives the container ~14 GB on /, and the combination of Debug (-O0 -g) plus ASan + UBSan instrumentation produces enough object-file/debug-info bloat to exhaust that even with ccache warm. - Switch -DCMAKE_BUILD_TYPE=Debug to -DCMAKE_BUILD_TYPE=RelWithDebInfo for sanitizer-asan, matching what sanitizer-tsan already uses. The ROS2MedkitSanitizers cmake module overrides optimisation to -O1 in both cases, so this only drops the -O0 expansion and keeps debug info. ASan/UBSan stack traces stay readable. - Extend the Free disk space step with /var/cache/apt, /var/lib/apt/lists, /usr/share/doc, /usr/share/man, /usr/share/locale cleanup inside the container, plus df -h before and after the build for diagnostics. No source change - the test/build behaviour is preserved; only the disk footprint shrinks.

…apt install The aggressive disk cleanup in the prior commit rm'd /usr/share/doc, /usr/share/man and /usr/share/locale. dpkg postinstall scripts for ros-jazzy-desktop, openjdk-21, libvtk9 and libpcl-dev fail when those trees are missing - the ASan job aborted after 5 min in 'Set up ROS 2 Jazzy' with dpkg returning 100 on every dependency. Roll those three rm -rf paths back. Keep the RelWithDebInfo build-type change (which was the actual fix for the ASan disk-space failure) and the df -h diagnostic.

…head The gateway shutdown sequence (mdns stop, REST server stop, transport teardown, plugin manager shutdown, member destructors) completes well under a second normally, but under TSan/ASan overhead on CI runners it can exceed the launch_ros defaults (5s sigterm_timeout / 5s sigkill_timeout). When that happens launch_testing escalates SIGINT -> SIGTERM -> SIGKILL and the TestShutdown.test_exit_codes assertion reports exit code -9 even though the gateway is mid-shutdown. The TSan run for PR #398 hit exactly this on test_scenario_thermal_protection: the gateway logged TriggerTopicSubscriber shutdown complete and other steps but was still cleaning up when the SIGKILL timer fired. Set sigterm_timeout=30 and sigkill_timeout=15 in create_gateway_node so every integration test that goes through this factory gets the longer windows. The bigger numbers do not slow down healthy runs since the process exits as soon as shutdown actually finishes.

Copilot AI review requested due to automatic review settings May 13, 2026 12:15

Copilot started reviewing on behalf of bburda May 13, 2026 12:16 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread src/ros2_medkit_gateway/test/test_trigger_manager.cpp Outdated

Comment thread src/ros2_medkit_gateway/test/test_entity_resource_model.cpp

bburda self-assigned this May 13, 2026

bburda requested a review from Copilot May 13, 2026 15:44

Copilot started reviewing on behalf of bburda May 13, 2026 15:45 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread src/ros2_medkit_gateway/src/http/handlers/discovery_handlers.cpp

bburda requested a review from mfaferek93 May 13, 2026 17:51

bburda mentioned this pull request May 14, 2026

Refactor Ros2FaultServiceTransport to use spin_until_future_complete to remove client/spin-thread race #399

Open

mfaferek93 reviewed May 14, 2026

View reviewed changes

bburda added 10 commits May 14, 2026 17:28

bburda force-pushed the feat/issue-196-app-belongs-to branch from eb3ae1c to 97c654c Compare May 14, 2026 15:41

bburda added 6 commits May 14, 2026 18:12

mfaferek93 self-requested a review May 14, 2026 20:49

mfaferek93 approved these changes May 14, 2026

View reviewed changes

bburda merged commit aeac123 into main May 14, 2026
14 checks passed

bburda deleted the feat/issue-196-app-belongs-to branch May 14, 2026 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): add GET /apps/{id}/belongs-to discovery endpoint#398

feat(gateway): add GET /apps/{id}/belongs-to discovery endpoint#398
bburda merged 16 commits into
mainfrom
feat/issue-196-app-belongs-to

bburda commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mfaferek93 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bburda commented May 13, 2026

Summary

Issue

Type

Testing

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

mfaferek93 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mfaferek93 left a comment •

edited

Loading