Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK Sends Repetitive Requests (no replays allowed) #3

Open
cheeseandcereal opened this issue Sep 15, 2019 · 26 comments
Open

SDK Sends Repetitive Requests (no replays allowed) #3

cheeseandcereal opened this issue Sep 15, 2019 · 26 comments

Comments

@cheeseandcereal
Copy link
Contributor

  • SDK Version: 4.0.0
  • Node Version: 10.16.0
  • Platform: Linux

For some reason, the javascript SDK occasionally sends repeat requests which trigger the replay protection on Dragonchain (Previous matching request found (no replays allowed)). This is a bug that only occurs with this SDK.

Cause at this time is still unknown. If any details are discovered, this ticket should be updated.

@WhiteRavenTechnology
Copy link

WhiteRavenTechnology commented Sep 16, 2019

  • SDK Version: 3.2.0
  • Node Version: 10.16.3
  • Platform: Windows & Linux

Have experienced with queryBlocks calls similar to
client.queryBlocks({ luceneQuery: 'block_id:>${start_block_id} AND timestamp:>=${start_timestamp}', limit: 50, sort: 'block_id:asc'})

  • Calls run in a loop until there are no blocks remaining to pull
  • The next request in the loop does not occur until the previous request is complete
  • Each request is made on a "fresh" client instance (local webpage makes ajax call -> intermediary nodejs express server -> invokes createClient({credentials}) then client.queryBlocks())
  • The EXACT query should never be repeated (intentionally) with my code
  • Adding up to 500ms forced delay between calls has no effect on the frequency of this error

Interestingly, I've ONLY experienced it when querying a specific managed L2 node (public ID: cj4i7y1ucMtuz1jqTfa6edZ8x1XH85DU3gL8KaJZ1e2c). Once it errors, if I try to run the exact query (starting from the same block number) it continues to fail. If I bump the block ID by one, it "clears the pipes" and I can resume where I left off (with the original query) until a few more pulls later it errors again.

Other managed L3s (21mPVWRyyJHNTTQZEgy1hqefUpzsgCLLt9fhkRWQQrTTt) and unmanaged L2s (21aVsbEYCfSpe3aBaoT3J4Kp1Q8mjF7A6BSZF8yBGRJTD work fine.

In fact, on the managed L3, I'm able to rapid-fire requests to pull all 15000+ blocks 50 at a time and have done so perhaps 15-20 times in testing with no error at any time. The managed L2, though, errors within the first 5 calls or so every time.

The application experiencing this issue can be found here:
https://github.com/Dragonchain-Community/dragonchain-uvn-block-explorer/tree/dev

@deanshelton913
Copy link
Contributor

deanshelton913 commented Sep 16, 2019

The problem area is this line in think https://github.com/dragonchain/dragonchain-sdk-javascript/blob/master/src/services/dragonchain-client/DragonchainClient.ts#L1324

I suggest that we use more precise NODEJS timing functions.
Suggest this --> https://nodejs.org/api/process.html#process_process_hrtime_bigint

process.hrtime.bigint()

@cheeseandcereal
Copy link
Contributor Author

cheeseandcereal commented Sep 16, 2019

The date has to be sent as an ISO 8601 time string, so I don't know if getting a bigint unix timestamp would help. Additionally that function you linked isn't compatible with node 8, which we support, so we can't use that function anyways.

I'm also not 100% convinced it's that line with the issue, because I've seen this error happen when performing a query manually in the node repl for the first time and I still get the error. Could maybe be the http request library doing some weird re-request?

Either way, this warrants more investigation.

@deanshelton913
Copy link
Contributor

for(let i=0;i<100;i++){console.log(new Date().now().toISOString())}

yields:


2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z```

@deanshelton913
Copy link
Contributor

if we had more precision in that string we could solve this, but i actually agree that bigint might not be exactly what we need... after reading further on it

@cheeseandcereal
Copy link
Contributor Author

cheeseandcereal commented Sep 16, 2019

Yeah, of course you can get a raw for loop to generate the same date string when there's no other logic. But I've also seen this error occur when performing a single request, so although getting the date iso string may be a problem, I'm still not convinced that's the root cause.

@WhiteRavenTechnology
Copy link

WhiteRavenTechnology commented Sep 16, 2019

#begindumbness
In my case, I'm running into the issue when rapid-firing queryBlocks requests: doesn't the replay protection also depend on the actual query (and not just the timing of requests; so two different requests, even at the same instant, should theoretically go through fine)? In that case, I'd think we're back to some kind of poor error condition handling with the http request library.

I shouldn't ever be intentionally sending the same request through twice at all, and it continues to error if I DO reissue the same request, even seconds or minutes apart, until I issue a new (different) request.
#enddumbness

@cheeseandcereal
Copy link
Contributor Author

Yeah, the replay protection only comes into play if the signature of the request is identical, which means all parts of the request, including query parameters and time stamp would have to be identical to trigger replay protection.

@WhiteRavenTechnology
Copy link

Okay, pure speculation here: is it possible that the SDK client is reusing previous fetch options somehow if the query parameters match a previous failed request? If timestamp is enough to get past the replay protection, I'd think reissuing the same failed request a few seconds or more later would DEFINITELY get it through, but it doesn't.

@cheeseandcereal
Copy link
Contributor Author

cheeseandcereal commented Sep 16, 2019

I'm not sure why that would happen, but I wouldn't immediately rule it out until I see proof otherwise. There's no code to 'cache' any sort of request data or anything, but yeah, if you're making a request seconds later, it absolutely should work fine.

@WhiteRavenTechnology
Copy link

Let me see if I can force-break it again and prove that idea.

@WhiteRavenTechnology
Copy link

WhiteRavenTechnology commented Sep 16, 2019

Disregard that thought. It was a red herring. Once I get it to error, I can rerun that request over and over (with a few seconds between each request), and whether it succeeds appears to be random (no specific trigger from my end).

I'll leave this to the experts. 😄

It IS still ONLY happening on that specific managed L2 for me though. Not a single issue with other nodes.

@deanshelton913
Copy link
Contributor

I am fairly confident that the timestamp precision is the issue. Python gives 6 digits for the miliseconds. Node gives 3.
Even in a for-loop within python, 6 is sufficient

# Python
for x in range(0, 100): print(datetime.datetime.utcnow().isoformat() + 'Z')
2019-09-16T18:28:09.086199Z
2019-09-16T18:28:09.086205Z
2019-09-16T18:28:09.086211Z
2019-09-16T18:28:09.086216Z
2019-09-16T18:28:09.086221Z
2019-09-16T18:28:09.086226Z
2019-09-16T18:28:09.086232Z
2019-09-16T18:28:09.086237Z
2019-09-16T18:28:09.086242Z
2019-09-16T18:28:09.086248Z
2019-09-16T18:28:09.086253Z
2019-09-16T18:28:09.086259Z
2019-09-16T18:28:09.086264Z
2019-09-16T18:28:09.086269Z
2019-09-16T18:28:09.086275Z
2019-09-16T18:28:09.086280Z

@cheeseandcereal
Copy link
Contributor Author

I know that the node sdk has less precision, BUT I have seen this happen when performing a SINGLE request in a repl, which is why even if the timestamp was an issue, there is at least one other thing also causing this problem.

@deanshelton913
Copy link
Contributor

Check this out --> https://github.com/wadey/node-microtime
This guy built a C extension to the N-API which gets more precise time.

@deanshelton913
Copy link
Contributor

I know that the node sdk has less precision, BUT I have seen this happen when performing a SINGLE request in a repl, which is why even if the timestamp was an issue, there is at least one other thing also causing this problem.

ahh. gotcha

@WhiteRavenTechnology
Copy link

Check this out:
expressjs/express#2512 (comment)

I have crap for internet, and I’ve been having even more trouble with connection timeouts, etc., in recent days, and usually only to certain sites/addresses.

I wonder if I’m not causing the problem by failed requests that node/my browser is automatically repeating, but only to that endpoint?

@WhiteRavenTechnology
Copy link

If it’s failed request handling on the nodejs/expressjs side from my end, they would explain how I could be sending two identical queries, but I’d still think it wouldn’t be sending them in the same thousandth of a second.

@cheeseandcereal
Copy link
Contributor Author

cheeseandcereal commented Sep 16, 2019

Yeah, I've seen that problem before with express as the webserver, but dragonchain uses python flask+gunicorn for the webserver, so I highly doubt that's the issue. I currently suspect some sort of hidden retry logic within node's HTTP request code itself, or maybe in node-fetch, which is the package we're using to perform http requests in this sdk?

@cheeseandcereal
Copy link
Contributor Author

Yeah, unfortunately I haven't come across a consistent way to repro. Any consistent repro steps would be highly appreciated.

@cheeseandcereal
Copy link
Contributor Author

Attempted a fix with #5
Will leave this issue open until we confirm this actually fixes the problem

Note this was released as version 4.0.0-rc.2 under npm, so it can be installed before the production release for testing.

@WhiteRavenTechnology
Copy link

Same problem is occurring both with and without the adjusted timestamp format. I manually added the new timestamp format (with 3 random trailing digits) in SDK version 3.2.0 and ran the same method that was causing the problem.

In my tests, I've added code that should log ANY http/https request that's made.

Sample requests with old timestamp format:

{ protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'redacted',
  port: null,
  hostname: 'redacted',
  hash: null,
  search:
   '?q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  query:
   'q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  pathname: '/v1/block',
  path:
   '/v1/block?q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  href:
   'https://redacted.com/v1/block?q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  method: 'GET',
  headers:
   [Object: null prototype] {
     dragonchain: [ 'redacted' ],
     Authorization:
      [ 'redacted' ],
     timestamp: [ '2019-09-17T21:10:56.318Z' ],
     Accept: [ '*/*' ],
     'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
     'Accept-Encoding': [ 'gzip,deflate' ],
     Connection: [ 'close' ] },
  agent: undefined }



{ protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'redacted',
  port: null,
  hostname: 'redacted',
  hash: null,
  search:
   '?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  query:
   'q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  pathname: '/v1/block',
  path:
   '/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  href:
   'https://redacted.com/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  method: 'GET',
  headers:
   [Object: null prototype] {
     dragonchain: [ 'redacted' ],
     Authorization:
      [ 'redacted' ],
     timestamp: [ '2019-09-17T21:10:58.464Z' ],
     Accept: [ '*/*' ],
     'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
     'Accept-Encoding': [ 'gzip,deflate' ],
     Connection: [ 'close' ] },
  agent: undefined }
{ status: 401,
  response:
   { error:
      { type: 'AUTHENTICATION_ERROR',
        details: 'Previous matching request found (no replays allowed)' } },
  ok: false }

Sample requests with new timestamp format:

{ protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'redacted',
  port: null,
  hostname: 'redacted',
  hash: null,
  search:
   '?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  query:
   'q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  pathname: '/v1/block',
  path:
   '/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  href:
   'https://redacted.com/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  method: 'GET',
  headers:
   [Object: null prototype] {
     dragonchain: [ 'redacted' ],
     Authorization:
      [ 'redacted' ],
     timestamp: [ '2019-09-17T21:12:09.404669Z' ],
     Accept: [ '*/*' ],
     'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
     'Accept-Encoding': [ 'gzip,deflate' ],
     Connection: [ 'close' ] },
  agent: undefined }



{ protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'redacted',
  port: null,
  hostname: 'redacted',
  hash: null,
  search:
   '?q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  query:
   'q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  pathname: '/v1/block',
  path:
   '/v1/block?q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  href:
   'https://redacted.com/v1/block?q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
  method: 'GET',
  headers:
   [Object: null prototype] {
     dragonchain: [ 'redacted' ],
     Authorization:
      [ 'redacted' ],
     timestamp: [ '2019-09-17T21:12:11.053559Z' ],
     Accept: [ '*/*' ],
     'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
     'Accept-Encoding': [ 'gzip,deflate' ],
     Connection: [ 'close' ] },
  agent: undefined }
{ status: 401,
  response:
   { error:
      { type: 'AUTHENTICATION_ERROR',
        details: 'Previous matching request found (no replays allowed)' } },
  ok: false }

@cheeseandcereal
Copy link
Contributor Author

Yeah, this strengthens my suspicion that there must be some silent retry logic going on that we don't see. Warrants further investigation

@WhiteRavenTechnology
Copy link

WhiteRavenTechnology commented Sep 17, 2019

Could even be an http protocol-level matter (https://blogs.oracle.com/ravello/beware-http-requests-automatic-retries), but that seems to normally affect long-running requests, and I have no idea if node would see that as a separate request (and thus log it in my code) or not.

In fact, I'm inclined to think I WOULDN'T see the retry. From the article:

By capturing the requests on the server side, we saw that the repeating requests were identical and that there is no way to actually know (from server perspective) that they are part of a retry. Also, from the browser side, there’s no trace for this (e.g., opening the network view doesn’t show the retry).

Edit: though the actual requests are being made within my nodejs app (inside an expressjs request), not being made directly through the client SDK in the browser).

The weirdest part is that this is only happening against the one managed L2 for me in my testing, and it happens extremely consistently (every attempt to pull a large number of blocks, and within the first 5-10 requests).

@cheeseandcereal
Copy link
Contributor Author

Yeah, the even weirder part is I haven't ever seen this issue in the python SDK. At some point when I get some time, I'll proxy my requests through something that allows me to watch my http traffic so I can watch the actual packets going in/out and confirm if this is happening or not.

@WhiteRavenTechnology
Copy link

Quick update on this issue: tried loading from the problematic managed L2 I originally noticed the problem on, and since the 4.0 update, it looks like it's working fine now. Pulled 38k+ blocks with no trouble at all.

I'm still blaming elasticsearch. Will update here if I run into it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants