Skip to content

Conversation

@Jackylee2233
Copy link

ADR 001: Relationship State Machine

Status

Proposed

Context

The current TSP SDK implementation lacks a formal state machine for managing relationship lifecycles. This leads to several issues:

  1. Undefined States: The ReverseUnidirectional status is defined but rarely used, leading to ambiguity when a node receives a relationship request.
  2. Concurrency Issues: If two nodes request a relationship with each other simultaneously, both end up in a Unidirectional state, with no clear resolution path.
  3. No Timeouts: There is no mechanism to handle lost messages or unresponsive peers during the handshake process.
  4. Idempotency: Duplicate control messages are not handled consistently.

Decision

We will implement a formal RelationshipMachine to govern state transitions.

1. State Machine Definition

The state machine will transition based on RelationshipEvents.

Current State Event New State Action/Notes
Unrelated SendRequest Unidirectional Store thread_id
Unrelated ReceiveRequest ReverseUnidirectional Store thread_id
Unidirectional ReceiveAccept Bidirectional Verify thread_id matches.
ReverseUnidirectional SendAccept Bidirectional Verify thread_id matches.
Bidirectional SendCancel Unrelated
Bidirectional ReceiveCancel Unrelated
Unidirectional SendRequest Unidirectional Idempotent (retransmission)
Unidirectional ReceiveRequest Conflict Resolution See Concurrency Handling

2. Concurrency Handling

When a node in Unidirectional state (sent a request) receives a RequestRelationship from the target (meaning they also sent a request):

  • Compare thread_ids: The request with the lower thread_id (lexicographically) wins.
  • If my thread_id < their thread_id: I ignore their request (or reject it). I expect them to accept my request.
  • If my thread_id > their thread_id: I accept their request. I cancel my pending request state and transition to ReverseUnidirectional (effectively accepting their flow).

3. Timeout & Retry

  • Timeout: A request_timeout field will be added to VidContext. If a Unidirectional state persists beyond the timeout (e.g., 60s), it transitions back to Unrelated.
  • Retry: Before timing out, the system may attempt retransmissions.

4. Idempotency

  • Duplicate Request: If in ReverseUnidirectional or Bidirectional and receive the same RequestRelationship (same thread_id), ignore it or resend the previous response.
  • Duplicate Accept: If in Bidirectional and receive AcceptRelationship with the same thread_id, ignore it.

Consequences

  • Robustness: Relationship establishment will be reliable under network jitter and concurrency.
  • Complexity: The store.rs logic will become more complex.
  • Breaking Changes: Existing tests that manually manipulate state might fail and need updating to respect the state machine.

pohlm01 and others added 6 commits December 11, 2025 22:23
Signed-off-by: Jackylee2233 <[email protected]>
v0.3.5 被 didwebvh-rs v0.1.9 直接依賴
v0.4.0 被 affinidi-data-integrity v0.2.4 依賴,而後者又被 didwebvh-rs v0.1.9 依賴

項目通過編譯及全部測試通過:

cargo test
   Compiling serde_json v1.0.145
   Compiling sqlx-core v0.8.6
   Compiling serde_with_macros v3.16.0
   Compiling quinn v0.11.9
   Compiling serde_with v3.16.0
   Compiling affinidi-secrets-resolver v0.4.0
   Compiling serde_json_canonicalizer v0.3.1
   Compiling reqwest v0.12.24
   Compiling affinidi-data-integrity v0.2.4
   Compiling sqlx-sqlite v0.8.6
   Compiling didwebvh-rs v0.1.10
   Compiling sqlx v0.8.6
   Compiling askar-storage v0.2.4
   Compiling aries-askar v0.4.6
   Compiling tsp_sdk v0.9.0-alpha2 (/home/qaoo8/Jobs/tsp/tsp_sdk)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 11.23s
     Running unittests src/lib.rs (/home/qaoo8/Jobs/tsp/target/debug/deps/tsp_sdk-acbb1369351abb06)

running 56 tests
test cesr::packet::test::test_message_to_parts ... ignored
test cesr::packet::test::envelope_without_confidential_data ... ok
test cesr::packet::test::envelope_without_nonconfidential_data ... ok
test cesr::packet::test::envelope_with_nonconfidential_data ... ok
test cesr::packet::test::envelope_failure ... ok
test cesr::packet::test::s_envelope_with_confidential_data_failure ... ok
test cesr::packet::test::test_nested_msg ... ok
test cesr::packet::test::test_decode_send_recv ... ok
test cesr::packet::test::test_3p_refer_rel ... ok
test cesr::packet::test::test_par_refer_rel ... ok
test cesr::packet::test::mut_envelope_with_nonconfidential_data ... ok
test cesr::packet::test::trailing_data ... ok
test cesr::packet::test::test_relation_forming ... ok
test cesr::packet::test::test_routed_msg ... ok
test cesr::test::demo_example ... ok
test cesr::test::decode_and_encode ... ok
test cesr::test::dont_gen_overlong_encoding ... ok
test cesr::test::encode_and_decode ... ok
test cesr::test::identifier_failure_3 - should panic ... ok
test cesr::test::identifier_failure_1 - should panic ... ok
test cesr::test::identifier_failure_variable - should panic ... ok
test cesr::test::identifier_failure_2 - should panic ... ok
test cesr::test::index_failure - should panic ... ok
test cesr::test::long_variable_data ... ok
test cesr::test::test_primitives ... ok
test store::test::test_add_verified_vid ... ok
test store::test::test_add_private_vid ... ok
test store::test::test_remove ... ok
test crypto::tests::seal_open_message ... ok
test store::test::test_make_relationship_request ... ok
test store::test::test_open_seal ... ok
test store::test::test_make_new_identity ... ok
test store::test::test_make_referral ... ok
test store::test::test_make_relationship_accept ... ok
test store::test::test_nested_manual ... ok
test secure_storage::test::test_vault ... ok
test vid::deserialize::test::deserialize ... ok
test store::test::test_make_relationship_cancel ... ok
test vid::did::web::tests::test_resolve_document ... ok
test vid::did::peer::test::encode_decode ... ok
test vid::did::web::tests::test_resolve_url ... ok
test vid::did::webvh::tests::test_create_webvh_success ... ok
test store::test::test_routed ... ok
test store::test::test_nested_automatic_setup ... ok
test transport::tls::tests::test_tls_transport ... ok
test transport::quic::tests::test_quic_transport ... ok
test test::test_large_messages ... ok
test test::test_anycast ... ok
test test::test_direct_mode ... ok
test test::test_nested_mode ... ok
test test::test_relation_forming ... ok
test test::test_routed_mode ... ok
test test::test_unverified_receiver_in_direct_mode ... ok
test transport::tcp::test::test_tcp_transport ... ok
test cesr::test::too_long_data_failure - should panic ... ok
test test::attack_failures ... ok

test result: ok. 55 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 1.52s

   Doc-tests tsp_sdk

running 4 tests
test tsp_sdk/src/lib.rs - (line 29) ... ok
test tsp_sdk/src/async_store.rs - async_store::AsyncSecureStore (line 22) ... ok
test tsp_sdk/src/async_store.rs - async_store::AsyncSecureStore::send_relationship_request (line 282) ... ok
test tsp_sdk/src/async_store.rs - async_store::AsyncSecureStore::send (line 213) ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.22s

all doctests ran in 2.49s; merged doctests compilation took 1.21s

Signed-off-by: Jackylee2233 <[email protected]>
Signed-off-by: Jackylee2233 <[email protected]>
… test wallet files, and add a `dead_code` property to the `PendingRequest` struct / 增加 CLI 测试超时时间以提高稳定性,并清理旧的测试钱包文件,同时为 PendingRequest 结构体添加 dead_code 属性。

Signed-off-by: Jackylee2233 <[email protected]>
@Jackylee2233 Jackylee2233 marked this pull request as draft December 20, 2025 12:17
@Jackylee2233 Jackylee2233 marked this pull request as ready for review December 20, 2025 12:18
@Jackylee2233 Jackylee2233 marked this pull request as draft December 20, 2025 12:18
@wenjing
Copy link
Contributor

wenjing commented Dec 23, 2025

Thanks @Jackylee2233 for the PR. Here are my preliminary feedback for each of the issues separately. Let's assume the local endpoint is Alice and the remote endpoint is Bob.

(1) the state "ReverseUnidirectional".
This state means Alice received a relationship request from Bob and is valid. The SDK passes this up to the application layer to decide what to do next. The application layer may decide to respond to accept, thereby creating a bidirectional relationship, but it can also stay silent for a long time (to make decisions) or forever. It is not an invalid state to be in. (Note: relationships are NOT connections). The SDK does not provide a convenient function for the application to transition to this state. That can be improved. But the SDK does provide a generic state setting function that the application can use. Also, another problem is that the testing code does not exercise this path. That's something to improve as well. Baseline, this state is valid but the two improvements are good to have.

(2) Concurrency when two relationship requests collide <A,B> and <B,A>, if both use the same VID pair exactly.
This is a problem that the current code deviates from the Spec:
The official TSP Specification handles this in Section 7.1.3: Race Condition of TSP_RFI:
"The rule is that the TSP_RFI with the lower value of Digest using lexicographical comparison. Both endpoints will keep the TSP_RFI with lower Digest and discard the other."
So fixing this is important.

(3) Timeout from Unidirectional to Unrelated
If I'm reading your writeup correct, the timeout is to automatically move from Unidirectional state to Unrelated state. If so, then it is a misunderstanding. A Unidirectional state is not necessarily a temporary state. It can be a long term or even forever state. Such a timeout therefore is not correct. However, if the application layer wants such a timeout, it can of course implement it. The SDK can't know for sure what the application wants and should not implement such a timeout. By the way, a relationship is just a local metadata entry, it does NOT consume connection resources - if we assume the HTTP or TCP has timeout to teardown unused connections. Again, that's the transport layer issue we could double check.

(4) Idempotency in handling TSP relationship messages
It seems to me the only potential issue is when the local endpoint receives AcceptRelationship when it's already in Bidirectional state. If so, then it's not a big issue. Current code returns an error code to the application, but that's ok, it's up to the application to handle that. The SDK does not do anything else. Maybe we could drop that error code or change to an alert only or silently drop. It seems not too bad to alert the application of duplicate accepts - unless we see more serious issues.

Final note: Could you please rewrite commit messages in English to be consistent with the rest of the project? AI translates are perfectly ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants