feat(gateway): add GET /apps/{id}/belongs-to discovery endpoint#398
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new discovery relationship endpoint GET /api/v1/apps/{app-id}/belongs-to to resolve an App’s parent Area via its host Component, and wires it through routing, docs, and tests in the gateway.
Changes:
- Implement
DiscoveryHandlers::handle_app_belongs_to()and register the new route inRESTServer::setup_routes(). - Extend discovery/HATEOAS expectations with unit + integration tests, and assert APP capability support for
belongs-to. - Update REST API docs and requirements/spec traceability for
REQ_INTEROP_106; adjust a flaky trigger-manager timing test.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_integration_tests/test/features/test_hateoas.test.py | Adds integration coverage for /apps/{id}/belongs-to (happy path + 404). |
| src/ros2_medkit_gateway/test/test_trigger_manager.cpp | Increases the sleep margin in LifetimeExpiry to reduce CI flakiness. |
| src/ros2_medkit_gateway/test/test_entity_resource_model.cpp | Adds an assertion that APP capabilities include belongs-to. |
| src/ros2_medkit_gateway/test/test_discovery_handlers.cpp | Adds unit coverage for the new belongs-to handler and updates app detail expectations. |
| src/ros2_medkit_gateway/src/http/rest_server.cpp | Registers the /belongs-to route for apps (OpenAPI + handler wiring). |
| src/ros2_medkit_gateway/src/http/handlers/discovery_handlers.cpp | Implements the belongs-to handler and exposes the resource URI from GET /apps/{id}. |
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/core/http/handlers/discovery_handlers.hpp | Declares the new handler entrypoint in DiscoveryHandlers. |
| docs/requirements/specs/discovery.rst | Adds REQ_INTEROP_106 requirement entry for the new endpoint. |
| docs/api/rest.rst | Documents GET /api/v1/apps/{app_id}/belongs-to behavior and example response. |
bburda
added a commit
that referenced
this pull request
May 14, 2026
…head The gateway shutdown sequence (mdns stop, REST server stop, transport teardown, plugin manager shutdown, member destructors) completes well under a second normally, but under TSan/ASan overhead on CI runners it can exceed the launch_ros defaults (5s sigterm_timeout / 5s sigkill_timeout). When that happens launch_testing escalates SIGINT -> SIGTERM -> SIGKILL and the TestShutdown.test_exit_codes assertion reports exit code -9 even though the gateway is mid-shutdown. The TSan run for PR #398 hit exactly this on test_scenario_thermal_protection: the gateway logged TriggerTopicSubscriber shutdown complete and other steps but was still cleaning up when the SIGKILL timer fired. Set sigterm_timeout=30 and sigkill_timeout=15 in create_gateway_node so every integration test that goes through this factory gets the longer windows. The bigger numbers do not slow down healthy runs since the process exits as soon as shutdown actually finishes.
mfaferek93
reviewed
May 14, 2026
Returns the area that contains an app, resolved transitively via the
app's parent component and that component's area assignment. Follows
the existing relationship-endpoint pattern (is-located-on, depends-on):
collection response wrapped in {"items": [...]} with 0 or 1 element.
The "belongs-to" resource was already declared in app capabilities; this
wires it through with a real handler, route registration, capability URI
on the app detail response, unit + integration tests, and REQ_INTEROP_106.
closes #196
Replace the fixed sleep in TriggerManagerTest.LifetimeExpiry with a bounded polling loop. The test now exits as soon as is_active flips to false on a fast machine (~1 s) and only sits up to the 5 s deadline if the scheduler stalls under heavy parallel test load. Also split AppSupportsIsLocatedOn into two cases so the name reflects what each test asserts: AppSupportsIsLocatedOn and AppSupportsBelongsTo.
…lity handle_get_app already added the top-level "belongs-to" URI when the app has a parent component but did not also add it to the _links object. That left the new relationship discoverable from the top-level field but not from _links, while is-located-on and depends-on appear in both. Mirror the existing pattern: gate the _links entry on the same component_id condition and assert it in the unit test.
FaultManagerTest cases that called start_spinning() and then invoked fault_manager.get_snapshots() or get_rosbag() did not join the spin thread before letting test-scope locals run their destructors. Under TSan this produced flaky "data race in operator delete" warnings where the executor thread cleaning up the rclcpp client's pending request overlapped with destruction of the result and local fixtures in the main thread. Add stop_spinning() right after each get_*() call so the spin thread is cancelled and joined before the test reads from result or anything goes out of scope. Mirrors the pattern already used by the notify and trigger-event tests in this file.
The "Upload coverage to Codecov" step runs in an ubuntu:noble container that does not include gpg. codecov/codecov-action@v5 runs a dependency check that requires gpg and exits with code 1 when it is missing, failing the coverage job on every push to main since the step is gated on github.ref == 'refs/heads/main'. Add gpg to the apt-get install line so the action can validate uploader signatures.
…head The gateway shutdown sequence (mdns stop, REST server stop, transport teardown, plugin manager shutdown, member destructors) completes well under a second normally, but under TSan/ASan overhead on CI runners it can exceed the launch_ros defaults (5s sigterm_timeout / 5s sigkill_timeout). When that happens launch_testing escalates SIGINT -> SIGTERM -> SIGKILL and the TestShutdown.test_exit_codes assertion reports exit code -9 even though the gateway is mid-shutdown. The TSan run for PR #398 hit exactly this on test_scenario_thermal_protection: the gateway logged TriggerTopicSubscriber shutdown complete and other steps but was still cleaning up when the SIGKILL timer fired. Set sigterm_timeout=30 and sigkill_timeout=15 in create_gateway_node so every integration test that goes through this factory gets the longer windows. The bigger numbers do not slow down healthy runs since the process exits as soon as shutdown actually finishes.
Three gateway plugins held rclcpp::Node or rclcpp::Subscription members and called shutdown() from a non-noexcept destructor. On Rolling rclcpp the destructor of those resources throws graph_listener::NodeNotFoundError when the rclcpp context has already been invalidated by rclcpp::shutdown() (which is exactly the situation during late teardown). An exception escaping a destructor invokes std::terminate(), which is what test_beacon_param TestShutdown.test_exit_codes saw as exit code -6 on the rolling job. Plugin_manager::shutdown_all() already catches exceptions from plugin->shutdown(), but that does not help when the plugin destructor itself runs the throw (either by calling shutdown() again or by member auto-destruction once the explicit reset() failed). Fix in three plugins (parameter_beacon, topic_beacon, graph_provider): 1. Mark the destructor noexcept and wrap the shutdown() call in try-catch so nothing escapes the destructor. 2. Wrap the rclcpp resource reset() inside shutdown() in try-catch so the first shutdown attempt (driven by plugin_manager) drops the resource even when rclcpp throws on tear-down. Lifted only the destruction sites; the rest of shutdown() and the normal happy path are unchanged.
auth_config.cpp had no entries for apps/*/belongs-to or apps/*/is-located-on under any of viewer / operator / configurator. The apps/* wildcard in the catch-all entry is single-segment (auth_requirement_policy maps * to [^/]+), so it does not cover the relationship subresources. Only admin had a matching apps/** fallback, which meant every other role hit 403 on these endpoints. Add the two new entries for each role next to the existing apps/*/depends-on entry. is-located-on is pre-existing - this also fixes the same gap that shipped with REQ_INTEROP_105.
handle_app_belongs_to silently returned items=[] when the app's component_id pointed to a component the discovery layer could not resolve. HATEOAS clients then could not tell 'app legitimately has no parent area' from 'manifest broken, parent component gone'. Add the same symmetric branch handle_app_is_located_on already has: emit a single item with x-medkit.missing=true and an x-medkit extension field carrying the dangling component id, plus a RCLCPP_WARN. A component that resolves but has no area_id assigned still produces items=[] - that is a valid manifest configuration, not a broken ref. Also tighten the chain traversal with an atomic ThreadSafeEntityCache snapshot so the three lookups (app, component, area) cannot observe mixed cache generations under a concurrent refresh, and switch the three app relationship handlers to validate_entity_for_route so remote apps in an aggregation cluster get forwarded instead of 404. Mirror unit test AppIsLocatedOnReturnsMissingItemWhenHostComponentUnresolved.
…shot
GET /apps/{id} exposes the available subresources on three parallel
surfaces (top-level URI references, _links, and the capabilities array);
the belongs-to addition only updated two of them. App::to_capabilities,
used by notifications / snapshots / change events, also lacked the link.
Brought all four sites into agreement by adding BELONGS_TO and
IS_LOCATED_ON entries to CapabilityBuilder::Capability (IS_LOCATED_ON
was already missing - boy-scouted along with the new one), pushing them
into the caps vector under the same component_id gate as the existing
URI / _links emission, and adding belongs-to to App::to_capabilities.
SOVD ISO 17978-3 §7.6 defines /belongs-to only for /apps. The previous
EntityCapabilities table advertised belongs-to for SERVER and COMPONENT
too, so supports_resource("belongs-to") returned true on those types and
clients hit 404 when following the advertised resource. Dropped both
entries from resources_; SERVER and COMPONENT keep their other
relationship resources.
Add an atomic ThreadSafeEntityCache::get_app_with_links() that resolves
an App together with its parent Component and Area under a single
shared_lock. Three sequential get_* calls each take a fresh shared_lock,
which lets a writer refresh advance the generation between them and
return a mixed-generation view (app from gen N, component from N+1,
area from N). The handler refactor is in the previous commit; this one
adds the method and its unit test coverage via the handler suite.
Document on docs/api/rest.rst that standalone apps deliberately do not
expose the belongs-to URI - SOVD §7.6 says "Each URI reference is only
present if the entity supports that collection". HATEOAS-only clients
must rely on absence of the link.
Switch test_hateoas integration tests for the 200 paths to
self.poll_endpoint(), which adds retry under sanitizer / CI load instead
of the raw requests.get(timeout=10) used before. Update the 404 shape to
the generic 'Entity not found' / parameters.entity_id payload emitted by
validate_entity_for_route (the relationship handlers now go through it
so remote apps in an aggregation cluster get forwarded).
eb3ae1c to
97c654c
Compare
The :ref:`sovd-api-discovery` target lives in the separate knowledge/ Sphinx project and is not reachable from ros2_medkit's docs build, so sphinx-build -W rejected it as an undefined label. Keep the ISO 17978-3 §7.6 citation in prose; consumers can look the section up directly.
… cases handle_app_depends_on iterates app.depends_on and resolves each entry via cache.get_app(). Each call takes a fresh shared_lock, which lets a writer refresh advance the cache generation between the app fetch and the N per-dependency fetches - a 5-dependency app could return 5 dependencies from 5 different generations. Add ThreadSafeEntityCache::get_app_with_dependencies() that resolves the app plus every entry in its depends_on under a single shared_lock, and switch the handler to use it. Discovery-manager fallback preserves the old shape (no atomic guarantee, but rare path used when the entity isn't cached yet). Seed the demo manifest with three HATEOAS edge-case fixtures (hateoas-edge-area, hateoas-component-no-area, hateoas-app-standalone, hateoas-app-on-area-less-component) and add the matching integration tests so the standalone-app and component-without-area branches of /belongs-to and /is-located-on are exercised end-to-end, not just in unit tests. The 'unresolved component' and 'dangling area' branches are not seeded in the integration manifest because validators R007 (app -> missing component) and R006 (component -> missing area) reject the manifest and trigger a fallback to runtime_only discovery, which disables the rest of the suite. Both branches remain covered by the unit tests AppBelongsToReturnsMissingItemWhenParentComponentUnresolved and AppBelongsToReturnsMissingItemWhenAreaUnresolved.
… count assertions The new hateoas-edge-area, hateoas-component-no-area, hateoas-app-standalone and hateoas-app-on-area-less-component manifest entries (added to support the standalone-app / area-less integration tests in test_hateoas) tripped the exact-count assertions in test_hybrid_suppression and the allow-list-style checks in test_discovery_gap_fill. Both suites load the same demo_nodes_manifest.yaml. Extend MANIFEST_ALL_AREAS, MANIFEST_TOP_LEVEL_AREAS, MANIFEST_COMPONENTS and MANIFEST_APPS in test_hybrid_suppression to include the new ids, plus add MANIFEST_APPS_WITH_BINDING so the test_manifest_apps_are_online check skips the manifest-only fixtures (no ros_binding means they never reach is_online=True). Mirror the allow-list updates in test_discovery_gap_fill.
…fixtures test_scenario_discovery_manifest::test_25_no_topic_components_in_manifest_only hard-codes the set of expected component IDs and rejects any others, so the hateoas-component-no-area fixture (added in dfc4dae to support the new integration tests for the standalone / area-less branches) tripped it under manifest_only mode. Add the fixture id to the allow-list.
ASan job hit 'ld: final link failed: No space left on device' three runs in a row on the c220212 CI cycle, deterministically aborting at the test_plugin_entity_routing / test_plugin_manager link step. The github-hosted runner gives the container ~14 GB on /, and the combination of Debug (-O0 -g) plus ASan + UBSan instrumentation produces enough object-file/debug-info bloat to exhaust that even with ccache warm. - Switch -DCMAKE_BUILD_TYPE=Debug to -DCMAKE_BUILD_TYPE=RelWithDebInfo for sanitizer-asan, matching what sanitizer-tsan already uses. The ROS2MedkitSanitizers cmake module overrides optimisation to -O1 in both cases, so this only drops the -O0 expansion and keeps debug info. ASan/UBSan stack traces stay readable. - Extend the Free disk space step with /var/cache/apt, /var/lib/apt/lists, /usr/share/doc, /usr/share/man, /usr/share/locale cleanup inside the container, plus df -h before and after the build for diagnostics. No source change - the test/build behaviour is preserved; only the disk footprint shrinks.
…apt install The aggressive disk cleanup in the prior commit rm'd /usr/share/doc, /usr/share/man and /usr/share/locale. dpkg postinstall scripts for ros-jazzy-desktop, openjdk-21, libvtk9 and libpcl-dev fail when those trees are missing - the ASan job aborted after 5 min in 'Set up ROS 2 Jazzy' with dpkg returning 100 on every dependency. Roll those three rm -rf paths back. Keep the RelWithDebInfo build-type change (which was the actual fix for the ASan disk-space failure) and the df -h diagnostic.
mfaferek93
approved these changes
May 14, 2026
bburda
added a commit
that referenced
this pull request
May 14, 2026
…head The gateway shutdown sequence (mdns stop, REST server stop, transport teardown, plugin manager shutdown, member destructors) completes well under a second normally, but under TSan/ASan overhead on CI runners it can exceed the launch_ros defaults (5s sigterm_timeout / 5s sigkill_timeout). When that happens launch_testing escalates SIGINT -> SIGTERM -> SIGKILL and the TestShutdown.test_exit_codes assertion reports exit code -9 even though the gateway is mid-shutdown. The TSan run for PR #398 hit exactly this on test_scenario_thermal_protection: the gateway logged TriggerTopicSubscriber shutdown complete and other steps but was still cleaning up when the SIGKILL timer fired. Set sigterm_timeout=30 and sigkill_timeout=15 in create_gateway_node so every integration test that goes through this factory gets the longer windows. The bigger numbers do not slow down healthy runs since the process exits as soon as shutdown actually finishes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
GET /api/v1/apps/{app-id}/belongs-to. Returns the area thatcontains an app, resolved via the app's parent component and that
component's area assignment.
The
belongs-toresource was already listed in app capabilities; this PRwires it up with a real handler, route, capability URI on
GET /apps/{id},unit + integration tests, and the matching
REQ_INTEROP_106spec entry.Pattern mirrors the existing relationship endpoints (
is-located-on,depends-on): collection wrapper with 0 or 1 element,x-medkit.missing=trueon the item when an area id cannot be resolved.
Issue
GET /apps/{app-id}/belongs-toendpoint #196Type
Testing
test_discovery_handlers.cppcovering happy path,no-component, no-area, missing-area, invalid id, unknown app. Plus
belongs-toURI assertion in the existingGetAppReturnsLinksAndCapabilitiesand
supports_resource("belongs-to")for the APP capability set.test_hateoas.test.py(happy path againstdemo manifest, 404 for unknown app).
changed files: clean.
verification.rstpath.Also bumps the
TriggerManagerTest.LifetimeExpirysleep margin in aseparate commit. The test created a 1 s trigger and slept 1500 ms; under
parallel CTest load the 500 ms margin past expiry was occasionally short
and the trigger still reported active. Bumped to 2500 ms.
Checklist