Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream reset in deal handling after 30 seconds #1850

Open
6 of 12 tasks
RobQuistNL opened this issue Dec 20, 2023 · 6 comments
Open
6 of 12 tasks

Stream reset in deal handling after 30 seconds #1850

RobQuistNL opened this issue Dec 20, 2023 · 6 comments
Labels

Comments

@RobQuistNL
Copy link
Contributor

Checklist

  • This is not a question or a support request. If you have any boost related questions, please ask in the discussion forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the discussion forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to boost.

Boost component

  • boost daemon - storage providers
  • boost client
  • boost UI
  • boost data-transfer
  • boost index-provider
  • Other

Boost Version

boostd v2.1.1

Describe the Bug

When deals come in and they get handled, this can take > 30 seconds (see #1747)

When that happens, the underlaying stream connection gets killed by something.
Not the deal handler itself.

Logging Information

no

Repo Steps

  • Add a sleep of > 30 seconds in your deal filter handling.
  • Make a deal
  • See that you get a stream reset on the client
  • See that once your provider succeeds, it shows an err: stream reset on closing it too (here)

Note that upgrading the timeouts here does not help. (I've changed them to 60, and it still happens)

@RobQuistNL
Copy link
Contributor Author

Also pinging @ribasushi as Spade is very much affected by this bug when clients have those slow databases as described here (or if they have a slow deal filter, or something else)

@cryptowhizzard
Copy link
Contributor

cryptowhizzard commented Dec 20, 2023

I have been on this issue for a while. I don’t think it is Boost related. The stream vanishes because the client who is proposing the deal is waiting longer then 30 seconds. In this case it is the full node Boost is using to propose the deal running Lotus or Venus who is dropping off

I must say that I think that it a good idea to have some flexibility in the timeout setting, however when Boost “behaves” right we are talking milliseconds. Moving to 300 seconds or higher does have some risks with it.

@LexLuthr
Copy link
Collaborator

LexLuthr commented Dec 22, 2023

@RobQuistNL I don't think this is boostd. I would recommend checking your client.

Screenshot 2023-12-22 at 12 36 38 PM
2023-12-22T08:35:15.028Z        DEBUG   boost-net       lp2pimpl/net.go:256     processed deal proposal accept  {"reqlog-uuid": "1bad5c14-754e-4780-8093-3dcfe554c438", "client-peer": "12D3KooWLjV7gpsJY2wGHHWeJQHxWwdnwq2zRQCPTsgVNQfMSjXB", "id": "83839011-d81d-49ad-a27c-9a91538e7ab3"}
2023-12-22T08:35:15.028Z        INFO    boost-prop      lp2pimpl/net.go:262     send deal proposal response     {"id": "83839011-d81d-49ad-a27c-9a91538e7ab3", "accepted": true, "msg": "", "peer id": "12D3KooWLjV7gpsJY2wGHHWeJQHxWwdnwq2zRQCPTsgVNQfMSjXB", "client address": "t3shsquipzctlsbk6wp37aj6rtsefyi5a4k5tx2cy3h4fvzq4epgvy3t543wysde7nxr26anzywk77ks77v7eq", "provider address": "t01000", "piece cid": "baga6ea4seaqafukelbiytuppsptonwmgqyjknsvhx3mguazrxqz46d2k5og4qnq", "piece size": 8388608, "verified": false, "label": "bafykbzacecbu7tfbcouw3lumuol3lhanatlboe3rvcbof7yjh6figf3d3t342", "start epoch": "5861", "end epoch": "524261", "price per epoch": "156250000", "duration": "40.060123643s"}

@RobQuistNL
Copy link
Contributor Author

Did your client receive that response? Also, what version are you running? Because I (we) think it might be a deeper level timeout set somewhere to 30s. The responses that take over 30 seconds never make it back to the requester. I also see "err: stream reset" in the debug logs, BUT I do see that send deal proposal response info log.

Please note that I have seen this happen for 2 out of 2 clients so far; Spade and Julien (TwinQuasar);

{"level":"info","ts":"2023-12-20T06:44:22.650Z","logger":"boost-prop","caller":"lp2pimpl/net.go:262","msg":"send deal proposal response","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","accepted":true,"msg":"","peer id":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","client address":"f1zo3n62q6j5b3rfc4drj67d5e7hr5pqtygxy6s6i","provider address":"xxxx","piece cid":"baga6ea4seaqxxxx","piece size":34359738368,"verified":true,"label":"bafkreixxxx","start epoch":"xxx","end epoch":"xxx","price per epoch":"0","duration":"41.290329433s"}
{"level":"warn","ts":"2023-12-20T06:44:24.430Z","logger":"boost-net","caller":"lp2pimpl/net.go:287","msg":"writing deal response","reqlog-uuid":"330fad56-07a9-425e-b320-d7057f21ebab","client-peer":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","err":"stream reset"}
{"level":"info","ts":"2023-12-20T06:44:24.430Z","logger":"boost-net","caller":"lp2pimpl/net.go:231","msg":"closing stream","reqlog-uuid":"330fad56-07a9-425e-b320-d7057f21ebab","client-peer":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","err":"stream reset"}
{"level":"debug","ts":"2023-12-20T06:44:24.430Z","logger":"boost-net","caller":"lp2pimpl/net.go:233","msg":"handled deal proposal request","reqlog-uuid":"330fad56-07a9-425e-b320-d7057f21ebab","client-peer":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","duration":"43.070655464s"}

@RobQuistNL
Copy link
Contributor Author

The code in net.go clearly tries to use other timeouts, so if thats not working, there might be a bug in libp2p or some adapter / service module

@RobQuistNL
Copy link
Contributor Author

In this case Riba also tested extensively and is 100% sure he does not disconnect the setup connection. I don't see it happening in the code on the surface on my end, so there's likely something deeper. Can you try again with 2 minutes or something? just to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

3 participants