Skip to content

feat_: hash based query for outgoing messages.#5217

Merged
kaichaosun merged 19 commits intodevelopfrom
message-hash-query
Jun 11, 2024
Merged

feat_: hash based query for outgoing messages.#5217
kaichaosun merged 19 commits intodevelopfrom
message-hash-query

Conversation

@kaichaosun
Copy link
Copy Markdown
Contributor

@kaichaosun kaichaosun commented May 23, 2024

For outgoing messages, only mark it as it sent after successfully found in store node, the messages are not found will be marked as expired and resend.

Important changes:

  • find outgoing messages which were sent 5s ago, query the store node with the hashes of the messages
  • message found will trigger EventEnvelopeSent, messages missed will trigger EventEnvelopeExpired

Relates to #5234

@status-im-auto
Copy link
Copy Markdown
Member

status-im-auto commented May 23, 2024

Jenkins Builds

Click to see older builds (106)
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ a179d51 #1 2024-05-23 10:47:00 ~1 min tests 📄log
✔️ a179d51 #1 2024-05-23 10:50:00 ~4 min linux 📦zip
✔️ a179d51 #1 2024-05-23 10:50:53 ~4 min ios 📦zip
✔️ a179d51 #1 2024-05-23 10:51:54 ~6 min android 📦aar
✖️ 27167d2 #2 2024-05-23 11:30:32 ~1 min tests 📄log
✔️ 27167d2 #2 2024-05-23 11:31:59 ~2 min android 📦aar
✔️ 27167d2 #2 2024-05-23 11:32:47 ~3 min ios 📦zip
✔️ 27167d2 #2 2024-05-23 11:33:31 ~4 min linux 📦zip
✖️ e719178 #3 2024-05-24 00:50:17 ~1 min tests 📄log
✔️ e719178 #3 2024-05-24 00:52:51 ~3 min ios 📦zip
✔️ e719178 #3 2024-05-24 00:53:11 ~4 min linux 📦zip
✔️ e719178 #3 2024-05-24 00:55:07 ~5 min android 📦aar
✖️ 1d2a81a #4 2024-05-24 03:33:26 ~1 min tests 📄log
✔️ 1d2a81a #4 2024-05-24 03:34:12 ~1 min android 📦aar
✔️ 1d2a81a #4 2024-05-24 03:34:40 ~2 min linux 📦zip
✔️ 1d2a81a #4 2024-05-24 03:35:48 ~3 min ios 📦zip
✖️ 79399ce #5 2024-05-24 04:20:05 ~1 min tests 📄log
✖️ 79399ce #6 2024-05-24 04:22:37 ~1 min tests 📄log
✔️ 79399ce #5 2024-05-24 04:21:03 ~2 min linux 📦zip
✔️ 79399ce #5 2024-05-24 04:21:16 ~2 min android 📦aar
✔️ 79399ce #5 2024-05-24 04:21:45 ~3 min ios 📦zip
✔️ 6b56b45 #6 2024-05-24 04:34:39 ~2 min android 📦aar
✔️ 6b56b45 #6 2024-05-24 04:34:47 ~2 min linux 📦zip
✔️ 6b56b45 #6 2024-05-24 04:35:35 ~3 min ios 📦zip
✖️ 6b56b45 #7 2024-05-24 04:39:29 ~6 min tests 📄log
✖️ 6b56b45 #8 2024-05-24 07:27:49 ~5 min tests 📄log
✖️ 6b56b45 #9 2024-05-27 03:21:45 ~7 min tests 📄log
✔️ d305d7f #7 2024-05-29 08:39:34 ~3 min ios 📦zip
✔️ d305d7f #7 2024-05-29 08:39:48 ~4 min linux 📦zip
✔️ d305d7f #7 2024-05-29 08:40:47 ~5 min android 📦aar
✖️ d305d7f #10 2024-05-29 08:42:29 ~6 min tests 📄log
✔️ 9b924a4 #8 2024-05-31 06:34:09 ~4 min linux 📦zip
✔️ 9b924a4 #8 2024-05-31 06:34:59 ~5 min ios 📦zip
✔️ 9b924a4 #8 2024-05-31 06:36:16 ~6 min android 📦aar
✖️ 9b924a4 #11 2024-05-31 06:36:46 ~6 min tests 📄log
✔️ 3cdc7b8 #9 2024-05-31 08:26:38 ~3 min ios 📦zip
✔️ 3cdc7b8 #9 2024-05-31 08:27:19 ~4 min linux 📦zip
✔️ 3cdc7b8 #9 2024-05-31 08:28:56 ~5 min android 📦aar
✖️ 3cdc7b8 #12 2024-05-31 08:29:31 ~6 min tests 📄log
✔️ 86a0f31 #10 2024-06-03 03:46:46 ~2 min linux 📦zip
✔️ 86a0f31 #10 2024-06-03 03:46:59 ~2 min android 📦aar
✔️ 86a0f31 #10 2024-06-03 03:47:43 ~3 min ios 📦zip
✖️ 86a0f31 #13 2024-06-03 03:49:22 ~5 min tests 📄log
✔️ d78064f #11 2024-06-03 08:03:11 ~2 min linux 📦zip
✔️ d78064f #11 2024-06-03 08:04:08 ~3 min android 📦aar
✔️ d78064f #11 2024-06-03 08:04:20 ~3 min ios 📦zip
✖️ d78064f #14 2024-06-03 08:07:27 ~6 min tests 📄log
✔️ d1a2e5f #12 2024-06-04 07:06:03 ~2 min linux 📦zip
✔️ d1a2e5f #12 2024-06-04 07:06:07 ~2 min android 📦aar
✔️ d1a2e5f #12 2024-06-04 07:07:01 ~3 min ios 📦zip
✖️ d1a2e5f #15 2024-06-04 07:09:01 ~5 min tests 📄log
✖️ d1a2e5f #16 2024-06-04 07:34:56 ~4 min tests 📄log
✔️ fab0642 #13 2024-06-04 11:04:34 ~2 min linux 📦zip
✔️ fab0642 #13 2024-06-04 11:05:58 ~4 min ios 📦zip
✔️ fab0642 #13 2024-06-04 11:07:29 ~5 min android 📦aar
✖️ fab0642 #17 2024-06-04 11:10:21 ~8 min tests 📄log
✖️ fab0642 #18 2024-06-04 11:20:13 ~6 min tests 📄log
✔️ 6931b53 #14 2024-06-05 03:18:28 ~2 min linux 📦zip
✔️ 6931b53 #14 2024-06-05 03:18:41 ~2 min android 📦aar
✔️ 6931b53 #14 2024-06-05 03:19:40 ~3 min ios 📦zip
✖️ 6931b53 #19 2024-06-05 03:47:36 ~31 min tests 📄log
✖️ 6931b53 #20 2024-06-05 05:53:52 ~30 min tests 📄log
✔️ bef78cd #15 2024-06-05 06:03:10 ~2 min linux 📦zip
✔️ bef78cd #15 2024-06-05 06:03:25 ~2 min android 📦aar
✔️ bef78cd #15 2024-06-05 06:04:28 ~3 min ios 📦zip
✖️ bef78cd #21 2024-06-05 06:34:33 ~33 min tests 📄log
✔️ dc29345 #16 2024-06-05 08:20:57 ~2 min linux 📦zip
✔️ dc29345 #16 2024-06-05 08:21:04 ~2 min android 📦aar
✔️ dc29345 #16 2024-06-05 08:21:39 ~3 min ios 📦zip
✔️ dc29345 #22 2024-06-05 08:58:15 ~40 min tests 📄log
✔️ 56e61b7 #17 2024-06-05 10:22:15 ~2 min android 📦aar
✔️ 56e61b7 #17 2024-06-05 10:23:06 ~3 min linux 📦zip
✔️ 56e61b7 #17 2024-06-05 10:23:27 ~3 min ios 📦zip
✔️ 56e61b7 #23 2024-06-05 11:00:18 ~40 min tests 📄log
✖️ 969890d #24 2024-06-07 00:50:55 ~1 min tests 📄log
✔️ 969890d #18 2024-06-07 00:52:04 ~2 min android 📦aar
✔️ 969890d #18 2024-06-07 00:52:38 ~3 min ios 📦zip
✔️ 969890d #18 2024-06-07 00:54:49 ~5 min linux 📦zip
✔️ 634b97e #19 2024-06-07 01:02:03 ~2 min ios 📦zip
✔️ 634b97e #19 2024-06-07 01:02:11 ~2 min linux 📦zip
✔️ 634b97e #19 2024-06-07 01:02:16 ~2 min android 📦aar
✔️ 634b97e #25 2024-06-07 01:40:17 ~40 min tests 📄log
✔️ 8b4aa3a #20 2024-06-07 13:13:36 ~2 min linux 📦zip
✔️ 8b4aa3a #20 2024-06-07 13:13:55 ~2 min android 📦aar
✔️ 8b4aa3a #20 2024-06-07 13:14:13 ~3 min ios 📦zip
✔️ eef98ef #21 2024-06-07 13:16:12 ~2 min linux 📦zip
✔️ eef98ef #21 2024-06-07 13:16:31 ~2 min android 📦aar
✔️ eef98ef #21 2024-06-07 13:17:41 ~3 min ios 📦zip
✔️ 8b4aa3a #26 2024-06-07 13:51:43 ~40 min tests 📄log
✖️ eef98ef #27 2024-06-07 13:57:34 ~5 min tests 📄log
✔️ eef98ef #28 2024-06-09 08:50:08 ~40 min tests 📄log
✔️ 741250a #22 2024-06-11 00:35:35 ~2 min android 📦aar
✔️ 741250a #22 2024-06-11 00:35:50 ~2 min linux 📦zip
✔️ 741250a #22 2024-06-11 00:36:14 ~3 min ios 📦zip
✔️ 741250a #29 2024-06-11 01:13:48 ~40 min tests 📄log
✔️ 56d69e3 #23 2024-06-11 04:32:25 ~2 min android 📦aar
✔️ 56d69e3 #23 2024-06-11 04:32:59 ~3 min ios 📦zip
✔️ 56d69e3 #23 2024-06-11 04:33:21 ~3 min linux 📦zip
✖️ 56d69e3 #30 2024-06-11 04:35:18 ~5 min tests 📄log
✖️ 56d69e3 #31 2024-06-11 05:44:30 ~5 min tests 📄log
✔️ 17dafb5 #24 2024-06-11 06:01:35 ~2 min linux 📦zip
✔️ 17dafb5 #24 2024-06-11 06:01:52 ~2 min android 📦aar
✔️ 17dafb5 #24 2024-06-11 06:02:29 ~3 min ios 📦zip
✔️ 61e2d71 #25 2024-06-11 06:17:50 ~2 min linux 📦zip
✔️ 61e2d71 #25 2024-06-11 06:18:32 ~2 min android 📦aar
✔️ 61e2d71 #25 2024-06-11 06:18:41 ~3 min ios 📦zip
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ 17dafb5 #32 2024-06-11 06:33:07 ~34 min tests 📄log
✔️ 61e2d71 #33 2024-06-11 07:14:14 ~40 min tests 📄log

@kaichaosun kaichaosun force-pushed the message-hash-query branch from 27167d2 to e719178 Compare May 24, 2024 00:48
@kaichaosun kaichaosun changed the title feat: hash based query for outgoing messages. feat_: hash based query for outgoing messages. May 24, 2024
@kaichaosun kaichaosun force-pushed the message-hash-query branch 2 times, most recently from 79399ce to 6b56b45 Compare May 24, 2024 04:32
Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go
messageHashes[i] = pb.ToMessageHash(hash.Bytes())
}

result, err := w.node.Store().QueryByHash(ctx, messageHashes, opts...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider store Query limits of how many hashes can be queried at once and probably batch these requests in parallel to multiple store nodes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added limit, batch seems overkill since there is not many messages in a few seconds.

Comment thread wakuv2/waku.go
Comment thread wakuv2/waku.go
Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go Outdated
@kaichaosun kaichaosun force-pushed the message-hash-query branch 2 times, most recently from d305d7f to 9b924a4 Compare May 31, 2024 06:29
Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go
pubsubMessageIds := make([][]gethcommon.Hash, 0, len(w.sendMsgIDs))
for pubsubTopic, subMsgs := range w.sendMsgIDs {
var queryMsgIds []gethcommon.Hash
for msgID, sendTime := range subMsgs {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure, here you are taking max 20 random messages from the one sent to check on store? maybe we should use a sorted map (or any sorted struct)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each pubsub topic, it will choose 20 random outgoing messages for the check on store.
After a quick search, there is no sorted map built in Go, adding extra logic for this feature seems overkill consider the frequency of outgoing messages. @cammellos

Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go Outdated
Comment thread wakuv2/waku.go
Comment thread wakuv2/waku.go Outdated
Copy link
Copy Markdown
Contributor

@qfrank qfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. FYI, just saw sendDataSync invoked transport.TrackMany , so the packet sent via MVDS should also be monitored, it will retry with handleEnvelopeFailure until it hit maxAttempts. It would be cool if you can add a test @kaichaosun

@kaichaosun kaichaosun force-pushed the message-hash-query branch from d78064f to d1a2e5f Compare June 4, 2024 07:03
@kaichaosun kaichaosun marked this pull request as ready for review June 5, 2024 09:51
@chaitanyaprem
Copy link
Copy Markdown
Contributor

Wondering if there is an overlap between this and https://github.com/status-im/status-go/pull/5281/files.

Copy link
Copy Markdown
Contributor

@chaitanyaprem chaitanyaprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM or than few minor comments.

Would love to see the effectiveness and how much additional bandwidth is consumed during dogfooding.

Comment thread protocol/messenger_peersyncing.go Outdated
Comment thread wakuv2/waku.go Outdated
@kaichaosun
Copy link
Copy Markdown
Contributor Author

This PR is for outgoing messages, #5281 is for incoming messages if I'm not mistaken. @chaitanyaprem

Copy link
Copy Markdown
Contributor

@qfrank qfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e if a message is acknowledged e2e, it should be removed
if the message is acknowledged by the other peer, we should stop resending, we do that for datasync messages already, but we want to do the same in kaichao's PR

It seems we haven't reached this goal with this PR yet? watchExpiredMessages will check the raw_messages table every second and there's still a chance it will resend the sent message? @kaichaosun

@kaichaosun
Copy link
Copy Markdown
Contributor Author

@qfrank for messages acked through mvds, it's deleted in the query queue, so won't be marked as expired. Not sure if there's other ack or not other than mvds.

Comment thread wakuv2/waku.go
@kaichaosun
Copy link
Copy Markdown
Contributor Author

@qfrank mentioned there is an edge case that raw message resend could be triggered just before the message marked as sent, likely happen in a few milliseconds. This is possible because the coordination depends on the database table raw_message, it can be mitigated by watching message sent event within resend raw message (watchExpiredMessages method), it seems not necessary for this kind of fail over logic IMO. Appreciate if there are more inputs or ideas.
cc @cammellos

@qfrank
Copy link
Copy Markdown
Contributor

qfrank commented Jun 11, 2024

@qfrank mentioned there is an edge case that raw message resend could be triggered just before the message marked as sent, likely happen in a few milliseconds. This is possible because the coordination depends on the database table raw_message, it can be mitigated by watching message sent event within resend raw message (watchExpiredMessages method), it seems not necessary for this kind of fail over logic IMO. Appreciate if there are more inputs or ideas. cc @cammellos

Hi @kaichaosun , just had a DM with @cammellos , we can deal with it at a later time, worst case we send a message twice, but we won't process the same message twice on the receiver side according to this, so just wasteful. Thank you for your PR!

@kaichaosun kaichaosun merged commit 47899fd into develop Jun 11, 2024
@kaichaosun kaichaosun deleted the message-hash-query branch June 11, 2024 07:45
@cammellos
Copy link
Copy Markdown
Contributor

@kaichaosun has this been tested in the clients? I think at least running e2e on mobile should have been done before merging it, unless the feature is disabled, but I don't see any flag

@kaichaosun
Copy link
Copy Markdown
Contributor Author

I have tested it with
DM between status-desktop <-> status-desktop
DM between status-mobile <-> status-desktop,
this is the downstream PRs for testing, status-im/status-app#15130, status-im/status-legacy#20387.

Should we halt the changes for more QAs? @cammellos

@cammellos
Copy link
Copy Markdown
Contributor

@kaichaosun it's probably ok, maybe next time ping QA so they can run e2e tests on the build

Comment thread protocol/messenger_messages_tracking_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants