Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multichain-e2e: minimize ci flakes #9815

Closed
0xpatrickdev opened this issue Jul 31, 2024 · 3 comments
Closed

multichain-e2e: minimize ci flakes #9815

0xpatrickdev opened this issue Jul 31, 2024 · 3 comments
Labels
bug Something isn't working enhancement New feature or request flake flakey test

Comments

@0xpatrickdev
Copy link
Member

0xpatrickdev commented Jul 31, 2024

What is the Problem Being Solved?

When performing integration testing with networked applications, making all the sync points explicit and reliable is particularly challenging. Nonetheless, strategies can be employed to limit flakiness. In its current form this issue is intended to be a catch-all for flakes related to multichain-testing and the Multichain E2E CI Job.

The main retry mechanism that's current employed is a rudimentary retryUntilCondition helper that attempts maxRetries on retryIntervalMs.

Description of the Design

Security Considerations

Scaling Considerations

Test Plan

Upgrade Considerations

@0xpatrickdev 0xpatrickdev added the enhancement New feature or request label Jul 31, 2024
@0xpatrickdev
Copy link
Member Author

A recent motivating failure can be found here: https://github.com/Agoric/agoric-sdk/actions/runs/10154140014/job/28078772380?pr=9735#step:12:1574.

The condition we are waiting for is staking rewards to become available after delegating. This failure only happens in one of the scenarios (chains) which suggests we need to increase the retry attempts/interval logic for this particular condition.

0xpatrickdev added a commit that referenced this issue Jul 31, 2024
- effectively, increases the total timeout window from 21 seconds to 40 seconds
- see #9815 (comment) for details
@0xpatrickdev 0xpatrickdev added the bug Something isn't working label Jul 31, 2024
0xpatrickdev added a commit that referenced this issue Jul 31, 2024
- effectively, increases the total timeout window from 21 seconds to 40 seconds
- see #9815 (comment) for details
0xpatrickdev added a commit that referenced this issue Jul 31, 2024
- effectively, increases the total timeout window from 21 seconds to 40 seconds
- see #9815 (comment) for details
mergify bot added a commit that referenced this issue Jul 31, 2024
refs: #9815

## Description

- allow `retryUntilCondition` callers to override default `RetryOptions = { maxRetries?: number; retryIntervalMs?: number; }` parameter
- increase attempts and retry interval for "staking rewards available" condition in `stake-ica.test.ts`.

### Security Considerations
n/a

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
The goal of this PR is reduce observed CI flakes.

### Upgrade Considerations
n/a
@turadg turadg added the flake flakey test label Jul 31, 2024
@0xpatrickdev
Copy link
Member Author

0xpatrickdev commented Aug 27, 2024

More recently, we are seeing failures for the makeAccount step (the first part of most flows):

auto-stake-it › auto-stake-it on osmosis
  Rejected promise returned by test. Reason:

  Error {
    message: 'osmosis-makeAccountsInvitation-1724785332378 continuing invitation is in vstorage condition failed after 6 retries.',
  }

  Error: osmosis-makeAccountsInvitation-1724785332378 continuing invitation is in vstorage condition failed after 6 retries.
    at retryUntilCondition (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/tools/sleep.ts:28:11)
    at async exec (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/test/auto-stake-it.test.ts:112:37)



  basic-flows › Create account on osmosis
  Rejected promise returned by test. Reason:

  Error {
    message: 'osmosis-makeAccount-1724785470505 continuing invitation is in vstorage condition failed after 6 retries.',
  }

  Error: osmosis-makeAccount-1724785470505 continuing invitation is in vstorage condition failed after 6 retries.
    at retryUntilCondition (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/tools/sleep.ts:28:11)
    at async exec (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/test/basic-flows.test.ts:50:37)



  stake-ica › send wallet offers to stakeOsmo contract
  Rejected promise returned by test. Reason:

  Error {
    message: 'rewards available on osmosis condition failed after 8 retries.',
  }

  Error: rewards available on osmosis condition failed after 8 retries.
    at retryUntilCondition (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/tools/sleep.ts:28:11)
    at async file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/test/stake-ica.test.ts:99:23

Tying these to swingset logs, I see:

2024-08-27T19:13:55.3712126Z 2024-08-27T19:02:13.188Z SwingSet: vat: v32: walletFactory.fromBridge: { blockHeight: 323, blockTime: 1724785332, owner: 'agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86', spendAction: '{"body":"#{\\"method\\":\\"executeOffer\\",\\"offer\\":{\\"id\\":\\"osmosis-makeAccountsInvitation-1724785332378\\",\\"invitationSpec\\":{\\"callPipe\\":[[\\"makeAccountsInvitation\\"]],\\"instancePath\\":[\\"autoAutoStakeIt\\"],\\"source\\":\\"agoricContract\\"},\\"offerArgs\\":{\\"chainName\\":\\"osmosis\\",\\"localDenom\\":\\"ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518\\",\\"validator\\":{\\"chainId\\":\\"osmosislocal\\",\\"encoding\\":\\"bech32\\",\\"value\\":\\"osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43\\"}},\\"proposal\\":{}}}","slots":[]}', type: 'WALLET_SPEND_ACTION' }
2024-08-27T19:13:55.3715489Z 2024-08-27T19:02:13.189Z SwingSet: vat: v32: walletFactory: { wallet: Object [Alleged: SmartWallet self] {}, actionCapData: { body: '#{"method":"executeOffer","offer":{"id":"osmosis-makeAccountsInvitation-1724785332378","invitationSpec":{"callPipe":[["makeAccountsInvitation"]],"instancePath":["autoAutoStakeIt"],"source":"agoricContract"},"offerArgs":{"chainName":"osmosis","localDenom":"ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518","validator":{"chainId":"osmosislocal","encoding":"bech32","value":"osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43"}},"proposal":{}}}', slots: [] } }
2024-08-27T19:13:55.3716541Z 2024-08-27T19:02:13.201Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 starting executeOffer osmosis-makeAccountsInvitation-1724785332378

2024-08-27T19:13:55.3729232Z 2024-08-27T19:02:13.987Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 osmosis-makeAccountsInvitation-1724785332378 seated
2024-08-27T19:13:55.3732838Z 2024-08-27T19:02:14.599Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 offerStatus { id: 'osmosis-makeAccountsInvitation-1724785332378', invitationSpec: { callPipe: [ [ 'makeAccountsInvitation' ] ], instancePath: [ 'autoAutoStakeIt' ], source: 'agoricContract' }, offerArgs: { chainName: 'osmosis', localDenom: 'ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518', validator: { chainId: 'osmosislocal', encoding: 'bech32', value: 'osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43' } }, proposal: {}, numWantsSatisfied: 1 }
2024-08-27T19:13:55.3736799Z 2024-08-27T19:02:14.772Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 offerStatus { id: 'osmosis-makeAccountsInvitation-1724785332378', invitationSpec: { callPipe: [ [ 'makeAccountsInvitation' ] ], instancePath: [ 'autoAutoStakeIt' ], source: 'agoricContract' }, numWantsSatisfied: 1, offerArgs: { chainName: 'osmosis', localDenom: 'ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518', validator: { chainId: 'osmosislocal', encoding: 'bech32', value: 'osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43' } }, proposal: {}, payouts: {} }

2024-08-27T19:13:55.3859781Z 2024-08-27T19:02:21.404Z SwingSet: vat: v16: IBC fromBridge { blockHeight: 332, blockTime: 1724785340, channelID: 'channel-3', connectionHops: [ 'connection-0' ], counterparty: { channel_id: 'channel-2', port_id: 'icahost' }, counterpartyVersion: '{"version":"ics27-1","controller_connection_id":"connection-0","host_connection_id":"connection-1","address":"osmo18n0szrn4nwfc87w2dqt7uaf2sz3ml65q9rawcsljld5dftqrfycsdgax6p","encoding":"proto3","tx_type":"sdk_multi_msg"}', event: 'channelOpenAck', portID: 'icacontroller-1', type: 'IBC_EVENT' }
2024-08-27T19:13:55.3863242Z 2024-08-27T19:02:21.576Z SwingSet: vat: v19: ----- IcaAccountKit.2  2 ICA Channel Opened for /ibc-port/icacontroller-1/ordered/{"version":"ics27-1","controller_connection_id":"connection-0","host_connection_id":"connection-1","address":"osmo18n0szrn4nwfc87w2dqt7uaf2sz3ml65q9rawcsljld5dftqrfycsdgax6p","encoding":"proto3","tx_type":"sdk_multi_msg"}/ibc-channel/channel-3 at /ibc-hop/connection-0/ibc-port/icahost/ordered/{"version":"ics27-1","controller_connection_id":"connection-0","host_connection_id":"connection-1","address":"osmo18n0szrn4nwfc87w2dqt7uaf2sz3ml65q9rawcsljld5dftqrfycsdgax6p","encoding":"proto3","tx_type":"sdk_multi_msg"}/ibc-channel/channel-2

This seems to indicate we should have an offer result in ~9 seconds. The default timeout for retryWithCondition is 21 seconds (6x 3500ms), so this is surprising 🤔

Additional relevant context - with #9927, the first call to osmosisChain.makeAccount() will result in a network request to establish an ICQ channel. This will add some additional latency to the overall flow.

kriskowal pushed a commit that referenced this issue Aug 27, 2024
- effectively, increases the total timeout window from 21 seconds to 40 seconds
- see #9815 (comment) for details
mergify bot added a commit that referenced this issue Aug 29, 2024
refs: #9815

## Description
Ensures we are passing `controllerConnectionId`, and not `hostConnectionId`, to `provideICQConnection`. _Was previously passing in CI since there's a 50/50 chance controllerConnectionId will equal hostConnectionId._

### Security Considerations
n/a

### Scaling Considerations
n/a

### Documentation Considerations
n/a

### Testing Considerations
Addresses a "flake" that's really a bug.

### Upgrade Considerations
n/a, unreleased code
@0xpatrickdev
Copy link
Member Author

Closing as a dupe of #9934

@0xpatrickdev 0xpatrickdev closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request flake flakey test
Projects
None yet
Development

No branches or pull requests

2 participants