Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 GetObject fails with large files #209

Open
x64-latacora opened this issue Mar 30, 2022 · 13 comments
Open

S3 GetObject fails with large files #209

x64-latacora opened this issue Mar 30, 2022 · 13 comments

Comments

@x64-latacora
Copy link

Dependencies

{:dependencies [[org.clojure/clojure "1.10.2-alpha1"]
                 [com.cognitect.aws/api "0.8.539"]
                 [com.cognitect.aws/endpoints "1.1.11.692"]
                 [com.cognitect.aws/s3 "820.2.1083.0"]
                 [com.cognitect.aws/sns "811.2.834.0"]
                 [com.amazonaws/aws-lambda-java-core "1.2.1"]
                 [org.clojure/tools.cli "1.0.206"]
                 ...
}

Description with failing test case

Similar issue to #97, but the call fails when making a GetObject API call.

For reference, the file being fetches is 3.6GB.

Stack traces

(:Body (awsi/invoke @s3 {:op :GetObject
                                          :request {:Bucket bucket-name
                                          :Key    object-key}}))

2022-03-30 14:14:35.028:INFO::nRepl-session-014b242b-3674-43b2-ba09-c6e46b421b54: Logging initialized @52196ms to org.eclipse.jetty.util.log.StdErrLog
2022-03-30 14:15:38.907:INFO:oejc.ResponseNotifier:qtp1253317297-38: Exception while notifying listener org.eclipse.jetty.client.HttpRequest$10@2d6a60a1
java.lang.NegativeArraySizeException: -463848656
	at clojure.lang.Numbers.byte_array(Numbers.java:1394)
	at cognitect.http_client$empty_bbuf.invokeStatic(http_client.clj:37)
	at cognitect.http_client$empty_bbuf.invoke(http_client.clj:34)
	at cognitect.http_client$on_headers.invokeStatic(http_client.clj:132)
	at cognitect.http_client$on_headers.invoke(http_client.clj:111)
	at clojure.lang.Atom.swap(Atom.java:51)
	at clojure.core$swap_BANG_.invokeStatic(core.clj:2355)
	at clojure.core$swap_BANG_.invoke(core.clj:2347)
	at cognitect.http_client.Client$fn$reify__27175.onHeaders(http_client.clj:232)
	at org.eclipse.jetty.client.HttpRequest$10.onHeaders(HttpRequest.java:528)
	at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:100)
	at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:92)
	at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:296)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:310)
	at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1245)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1528)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:204)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:144)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:79)
	at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:131)
	at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:172)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:829)
@fogus
Copy link
Contributor

fogus commented Apr 4, 2022

Hello! Thank you for the report. We are actively investigating and will update ASAP.

@fogus
Copy link
Contributor

fogus commented Apr 4, 2022

Hello again. Quick question. The http client gets the size from the content-length header. Can you let us know what that value is?

@x64-latacora
Copy link
Author

@fogus I'm unable to get the value because of the exception. Here's what I get when trying to get a file of 3831118640 bytes:

(awsi/invoke @s3 {:op :GetObject
                  :request {:Bucket "..."
                            :Key     "..."}})
                            
2022-04-05 12:14:20.325:INFO:oejc.ResponseNotifier:qtp1881928864-52: Exception while notifying listener org.eclipse.jetty.client.HttpRequest$10@4df75b0d
java.lang.NegativeArraySizeException: -463087440
	at clojure.lang.Numbers.byte_array(Numbers.java:1394)
	at cognitect.http_client$empty_bbuf.invokeStatic(http_client.clj:37)
	at cognitect.http_client$empty_bbuf.invoke(http_client.clj:34)
	at cognitect.http_client$on_headers.invokeStatic(http_client.clj:132)
	at cognitect.http_client$on_headers.invoke(http_client.clj:111)
	at clojure.lang.Atom.swap(Atom.java:51)
	at clojure.core$swap_BANG_.invokeStatic(core.clj:2355)
	at clojure.core$swap_BANG_.invoke(core.clj:2347)
	at cognitect.http_client.Client$fn$reify__27187.onHeaders(http_client.clj:232)
	at org.eclipse.jetty.client.HttpRequest$10.onHeaders(HttpRequest.java:528)
	at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:100)
	at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:92)
	at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:296)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:310)
	at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1245)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1528)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:204)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:144)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:79)
	at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:131)
	at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:172)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:829)

=>
{:cognitect.anomalies/category :cognitect.anomalies/fault,
 :cognitect.anomalies/message "Value out of range for int: 2354315264",
 :cognitect.http-client/throwable #error{:cause "Value out of range for int: 2354315264",
                                         :via [{:type java.lang.IllegalArgumentException,
                                                :message "Value out of range for int: 2354315264",
                                                :at [clojure.lang.RT intCast "RT.java" 1248]}],
                                         :trace [[clojure.lang.RT intCast "RT.java" 1248]
                                                 [cognitect.http_client$expand_buffer invokeStatic "http_client.clj" 57]
                                                 [cognitect.http_client$expand_buffer invokePrim "http_client.clj" -1]
                                                 [cognitect.http_client$append_buffer invokeStatic "http_client.clj" 65]
                                                 [cognitect.http_client$append_buffer invoke "http_client.clj" 61]
                                                 [cognitect.http_client$on_content$fn__27143
                                                  invoke
                                                  "http_client.clj"
                                                  139]
                                                 [clojure.core$update invokeStatic "core.clj" 6198]
                                                 [clojure.core$update invoke "core.clj" 6190]
                                                 [cognitect.http_client$on_content invokeStatic "http_client.clj" 139]
                                                 [cognitect.http_client$on_content invoke "http_client.clj" 136]
                                                 [clojure.lang.Atom swap "Atom.java" 51]
                                                 [clojure.core$swap_BANG_ invokeStatic "core.clj" 2355]
                                                 [clojure.core$swap_BANG_ invoke "core.clj" 2347]
                                                 [cognitect.http_client.Client$fn$reify__27189
                                                  onContent
                                                  "http_client.clj"
                                                  236]
                                                 [org.eclipse.jetty.client.HttpRequest$11
                                                  onContent
                                                  "HttpRequest.java"
                                                  542]
                                                 [org.eclipse.jetty.client.api.Response$ContentListener
                                                  onContent
                                                  "Response.java"
                                                  158]
                                                 [org.eclipse.jetty.client.api.Response$AsyncContentListener
                                                  onContent
                                                  "Response.java"
                                                  189]
                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                  notifyContent
                                                  "ResponseNotifier.java"
                                                  155]
                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                  notifyContent
                                                  "ResponseNotifier.java"
                                                  139]
                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                  notifyContent
                                                  "HttpReceiver.java"
                                                  693]
                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                  access$500
                                                  "HttpReceiver.java"
                                                  655]
                                                 [org.eclipse.jetty.client.HttpReceiver
                                                  plainResponseContent
                                                  "HttpReceiver.java"
                                                  369]
                                                 [org.eclipse.jetty.client.HttpReceiver
                                                  responseContent
                                                  "HttpReceiver.java"
                                                  352]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  content
                                                  "HttpReceiverOverHTTP.java"
                                                  323]
                                                 [org.eclipse.jetty.http.HttpParser parseContent "HttpParser.java" 1716]
                                                 [org.eclipse.jetty.http.HttpParser parseNext "HttpParser.java" 1551]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  parse
                                                  "HttpReceiverOverHTTP.java"
                                                  204]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  process
                                                  "HttpReceiverOverHTTP.java"
                                                  144]
                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                  receive
                                                  "HttpReceiverOverHTTP.java"
                                                  79]
                                                 [org.eclipse.jetty.client.http.HttpChannelOverHTTP
                                                  receive
                                                  "HttpChannelOverHTTP.java"
                                                  131]
                                                 [org.eclipse.jetty.client.http.HttpConnectionOverHTTP
                                                  onFillable
                                                  "HttpConnectionOverHTTP.java"
                                                  172]
                                                 [org.eclipse.jetty.io.AbstractConnection$ReadCallback
                                                  succeeded
                                                  "AbstractConnection.java"
                                                  311]
                                                 [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
                                                 [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint
                                                  onFillable
                                                  "SslConnection.java"
                                                  555]
                                                 [org.eclipse.jetty.io.ssl.SslConnection
                                                  onFillable
                                                  "SslConnection.java"
                                                  410]
                                                 [org.eclipse.jetty.io.ssl.SslConnection$2
                                                  succeeded
                                                  "SslConnection.java"
                                                  164]
                                                 [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
                                                 [org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  runTask
                                                  "EatWhatYouKill.java"
                                                  338]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  doProduce
                                                  "EatWhatYouKill.java"
                                                  315]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  tryProduce
                                                  "EatWhatYouKill.java"
                                                  173]
                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                  run
                                                  "EatWhatYouKill.java"
                                                  131]
                                                 [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread
                                                  run
                                                  "ReservedThreadExecutor.java"
                                                  409]
                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool
                                                  runJob
                                                  "QueuedThreadPool.java"
                                                  883]
                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner
                                                  run
                                                  "QueuedThreadPool.java"
                                                  1034]
                                                 [java.lang.Thread run "Thread.java" 829]]}}

(meta *1)

=>
{:http-request {:request-method :get,
                :scheme :https,
                :server-port 443,
                :uri "...",
                :headers {...},
                :body nil,
                :server-name "s3.us-east-2.amazonaws.com"},
 :http-response {:cognitect.anomalies/category :cognitect.anomalies/fault,
                 :cognitect.anomalies/message "Value out of range for int: 2354315264",
                 :cognitect.http-client/throwable #error{:cause "Value out of range for int: 2354315264",
                                                         :via [{:type java.lang.IllegalArgumentException,
                                                                :message "Value out of range for int: 2354315264",
                                                                :at [clojure.lang.RT intCast "RT.java" 1248]}],
                                                         :trace [[clojure.lang.RT intCast "RT.java" 1248]
                                                                 [cognitect.http_client$expand_buffer
                                                                  invokeStatic
                                                                  "http_client.clj"
                                                                  57]
                                                                 [cognitect.http_client$expand_buffer
                                                                  invokePrim
                                                                  "http_client.clj"
                                                                  -1]
                                                                 [cognitect.http_client$append_buffer
                                                                  invokeStatic
                                                                  "http_client.clj"
                                                                  65]
                                                                 [cognitect.http_client$append_buffer
                                                                  invoke
                                                                  "http_client.clj"
                                                                  61]
                                                                 [cognitect.http_client$on_content$fn__27143
                                                                  invoke
                                                                  "http_client.clj"
                                                                  139]
                                                                 [clojure.core$update invokeStatic "core.clj" 6198]
                                                                 [clojure.core$update invoke "core.clj" 6190]
                                                                 [cognitect.http_client$on_content
                                                                  invokeStatic
                                                                  "http_client.clj"
                                                                  139]
                                                                 [cognitect.http_client$on_content
                                                                  invoke
                                                                  "http_client.clj"
                                                                  136]
                                                                 [clojure.lang.Atom swap "Atom.java" 51]
                                                                 [clojure.core$swap_BANG_ invokeStatic "core.clj" 2355]
                                                                 [clojure.core$swap_BANG_ invoke "core.clj" 2347]
                                                                 [cognitect.http_client.Client$fn$reify__27189
                                                                  onContent
                                                                  "http_client.clj"
                                                                  236]
                                                                 [org.eclipse.jetty.client.HttpRequest$11
                                                                  onContent
                                                                  "HttpRequest.java"
                                                                  542]
                                                                 [org.eclipse.jetty.client.api.Response$ContentListener
                                                                  onContent
                                                                  "Response.java"
                                                                  158]
                                                                 [org.eclipse.jetty.client.api.Response$AsyncContentListener
                                                                  onContent
                                                                  "Response.java"
                                                                  189]
                                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                                  notifyContent
                                                                  "ResponseNotifier.java"
                                                                  155]
                                                                 [org.eclipse.jetty.client.ResponseNotifier
                                                                  notifyContent
                                                                  "ResponseNotifier.java"
                                                                  139]
                                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                                  notifyContent
                                                                  "HttpReceiver.java"
                                                                  693]
                                                                 [org.eclipse.jetty.client.HttpReceiver$ContentListeners
                                                                  access$500
                                                                  "HttpReceiver.java"
                                                                  655]
                                                                 [org.eclipse.jetty.client.HttpReceiver
                                                                  plainResponseContent
                                                                  "HttpReceiver.java"
                                                                  369]
                                                                 [org.eclipse.jetty.client.HttpReceiver
                                                                  responseContent
                                                                  "HttpReceiver.java"
                                                                  352]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  content
                                                                  "HttpReceiverOverHTTP.java"
                                                                  323]
                                                                 [org.eclipse.jetty.http.HttpParser
                                                                  parseContent
                                                                  "HttpParser.java"
                                                                  1716]
                                                                 [org.eclipse.jetty.http.HttpParser
                                                                  parseNext
                                                                  "HttpParser.java"
                                                                  1551]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  parse
                                                                  "HttpReceiverOverHTTP.java"
                                                                  204]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  process
                                                                  "HttpReceiverOverHTTP.java"
                                                                  144]
                                                                 [org.eclipse.jetty.client.http.HttpReceiverOverHTTP
                                                                  receive
                                                                  "HttpReceiverOverHTTP.java"
                                                                  79]
                                                                 [org.eclipse.jetty.client.http.HttpChannelOverHTTP
                                                                  receive
                                                                  "HttpChannelOverHTTP.java"
                                                                  131]
                                                                 [org.eclipse.jetty.client.http.HttpConnectionOverHTTP
                                                                  onFillable
                                                                  "HttpConnectionOverHTTP.java"
                                                                  172]
                                                                 [org.eclipse.jetty.io.AbstractConnection$ReadCallback
                                                                  succeeded
                                                                  "AbstractConnection.java"
                                                                  311]
                                                                 [org.eclipse.jetty.io.FillInterest
                                                                  fillable
                                                                  "FillInterest.java"
                                                                  105]
                                                                 [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint
                                                                  onFillable
                                                                  "SslConnection.java"
                                                                  555]
                                                                 [org.eclipse.jetty.io.ssl.SslConnection
                                                                  onFillable
                                                                  "SslConnection.java"
                                                                  410]
                                                                 [org.eclipse.jetty.io.ssl.SslConnection$2
                                                                  succeeded
                                                                  "SslConnection.java"
                                                                  164]
                                                                 [org.eclipse.jetty.io.FillInterest
                                                                  fillable
                                                                  "FillInterest.java"
                                                                  105]
                                                                 [org.eclipse.jetty.io.ChannelEndPoint$1
                                                                  run
                                                                  "ChannelEndPoint.java"
                                                                  104]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  runTask
                                                                  "EatWhatYouKill.java"
                                                                  338]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  doProduce
                                                                  "EatWhatYouKill.java"
                                                                  315]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  tryProduce
                                                                  "EatWhatYouKill.java"
                                                                  173]
                                                                 [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
                                                                  run
                                                                  "EatWhatYouKill.java"
                                                                  131]
                                                                 [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread
                                                                  run
                                                                  "ReservedThreadExecutor.java"
                                                                  409]
                                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool
                                                                  runJob
                                                                  "QueuedThreadPool.java"
                                                                  883]
                                                                 [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner
                                                                  run
                                                                  "QueuedThreadPool.java"
                                                                  1034]
                                                                 [java.lang.Thread run "Thread.java" 829]]},
                 :body nil}}

If there's another way I can get the content-length please do tell.

@chance-latacora
Copy link

chance-latacora commented Apr 6, 2022

Took a look at this and I'm pretty sure the issue is:

  • cognitect.http-client/on-headers parses content length as a long
  • cognitect.http-client/empty-bbuf is called with that long to create a body buffer
  • empty-bbuf calls clojure.core/byte-array, which calls clojure.lang.Numbers/byte_array, which expects an int rather than a long casts its size argument to a Number then calls .intValue() on it (so you're getting the "narrowing primitive conversion" behavior of Long#intValue())

You can reproduce this just calling (clojure.lang.Numbers/byte_array 3831118640)

@chance-latacora
Copy link

Of course, java indexes arrays with ints, so I'm not sure what the actual fix here would be :)

@x64-latacora
Copy link
Author

I'm not sure what the actual fix here would be

Grab large files in chunks?

@x64-latacora
Copy link
Author

@fogus thanks to some clojure wizardry by @CR-Latacora:

Content-Length: 3831191098

@fogus
Copy link
Contributor

fogus commented Apr 25, 2022

I suspect that there may be a way to address this in user-space using the iteration function and by specifying increasing :Range slices -- concatenating all of the slices at the end. If so then this would be the preferred way since it's unlikely that we'll have a client fix in hand in a useable timeframe.

@dchelimsky
Copy link
Contributor

@x64-latacora were you able to resolve this with iteration?

@x64-latacora
Copy link
Author

We did not. We switched to using amazonica to get objects instead of cognitect's s3-api.

@lowecg
Copy link

lowecg commented Mar 20, 2023

@dchelimsky
I have this working with Range slices.

@fogus thanks for the tip on using :Range

The following was tested with an object size of 2243897556 bytes (larger than max int). Using VisualVM attached to the REPL, the memory was stable during the download remaining between 70-100MiB.

(require '[clojure.java.io :as io])
(import java.io.SequenceInputStream)

(defn parse-content-range
  "Extract the object size from the ContentRange response attribute and convert to Long type
  e.g. \"bytes 0-5242879/2243897556\"
  
  Returns 2243897556"
  [content-range]
  (when-let [object-size (re-find #"[0-9]+$" content-range)]
    (Long/parseLong object-size)))

(def chunk-size-bytes (* 1024 1024 5)) ;; chosen arbitrarily

(defn get-object-chunks [{:keys [bucket, key]}]
  (iteration (fn [range-byte-pos]
               (let [to-byte-pos (+ range-byte-pos chunk-size-bytes)
                     range (str "bytes=" range-byte-pos "-" (dec to-byte-pos))
                     op-map {:op :GetObject :request {:Bucket bucket :Key key :Range range}}
                     {:keys [ContentRange] :as response} (aws/invoke s3 op-map)]
                 (println :range range :response response)

                 ;; todo: check the response for errors

                 (assoc response :range-byte-pos to-byte-pos
                                 :object-size (parse-content-range ContentRange))))
             :initk 0
             :kf (fn [{:keys [range-byte-pos, object-size]}]
                   (when (< range-byte-pos object-size)
                     range-byte-pos))
             :vf :Body))

(defn seq-enumeration
  "Returns a java.util.Enumeration on a seq"
  {:static true}
  [coll]
  (clojure.lang.SeqEnumeration. coll))

(time
  (let [s3-address {:bucket "your bucket"
                    :key    "your key"}
        target-file "/path/to/some-file"]
    (with-open [target (io/output-stream (io/file target-file))]
      (io/copy (SequenceInputStream. (seq-enumeration (sequence (get-object-chunks s3-address))))
               target))))

@lowecg
Copy link

lowecg commented Mar 20, 2023

There is a caveat on the previous example: the ranging doesn't play well if an object's content encoding is set.

Each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons.

I'm not sure if there's a way to get into the Jetty client and disable the GZip decoder from the aws-api

:Range "bytes=5242880-10485759"}}, :response {:cognitect.anomalies/category :cognitect.anomalies/fault, :cognitect.anomalies/message "java.util.zip.ZipException: Invalid gzip bytes", :cognitect.http-client/throwable #error {
 :cause "Invalid gzip bytes"
 :via
 [{:type java.lang.RuntimeException
   :message "java.util.zip.ZipException: Invalid gzip bytes"
   :at [org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 402]}
  {:type java.util.zip.ZipException
   :message "Invalid gzip bytes"
   :at [org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 272]}]
 :trace
 [[org.eclipse.jetty.http.GZIPContentDecoder decodeChunks "GZIPContentDecoder.java" 272]
  [org.eclipse.jetty.http.GZIPContentDecoder decode "GZIPContentDecoder.java" 90]
  [org.eclipse.jetty.client.HttpReceiver$Decoder decodeChunk "HttpReceiver.java" 819]
  [org.eclipse.jetty.client.HttpReceiver$Decoder decode "HttpReceiver.java" 788]
  [org.eclipse.jetty.client.HttpReceiver$Decoder decode "HttpReceiver.java" 768]
  [org.eclipse.jetty.client.HttpReceiver$Decoder access$600 "HttpReceiver.java" 744]
  [org.eclipse.jetty.client.HttpReceiver decodeResponseContent "HttpReceiver.java" 386]
  [org.eclipse.jetty.client.HttpReceiver responseContent "HttpReceiver.java" 354]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP content "HttpReceiverOverHTTP.java" 332]
  [org.eclipse.jetty.http.HttpParser parseContent "HttpParser.java" 1716]
  [org.eclipse.jetty.http.HttpParser parseNext "HttpParser.java" 1551]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP parse "HttpReceiverOverHTTP.java" 208]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP process "HttpReceiverOverHTTP.java" 148]
  [org.eclipse.jetty.client.http.HttpReceiverOverHTTP receive "HttpReceiverOverHTTP.java" 80]
  [org.eclipse.jetty.client.http.HttpChannelOverHTTP receive "HttpChannelOverHTTP.java" 131]
  [org.eclipse.jetty.client.http.HttpConnectionOverHTTP onFillable "HttpConnectionOverHTTP.java" 172]
  [org.eclipse.jetty.io.AbstractConnection$ReadCallback succeeded "AbstractConnection.java" 311]
  [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
  [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint onFillable "SslConnection.java" 555]
  [org.eclipse.jetty.io.ssl.SslConnection onFillable "SslConnection.java" 410]
  [org.eclipse.jetty.io.ssl.SslConnection$2 succeeded "SslConnection.java" 164]
  [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
  [org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill runTask "EatWhatYouKill.java" 338]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill doProduce "EatWhatYouKill.java" 315]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill tryProduce "EatWhatYouKill.java" 173]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill run "EatWhatYouKill.java" 131]
  [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread run "ReservedThreadExecutor.java" 409]
  [org.eclipse.jetty.util.thread.QueuedThreadPool runJob "QueuedThreadPool.java" 883]
  [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner run "QueuedThreadPool.java" 1034]
  [java.lang.Thread run "Thread.java" 829]]}}, :line 76}

@lowecg
Copy link

lowecg commented Oct 16, 2024

With respect to the above situation (reading chunks for gz compressed content), I used the following hack to disable Jetty GZip decoder for my S3 client.

CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large.

(defn private-field [^Object obj ^String field-name]
  (when obj
    (when-let [^Field f (some
                          (fn [^Class c]
                            (try (.getDeclaredField c field-name)
                                 (catch NoSuchFieldException _ nil)))
                          (take-while some? (iterate (fn [^Class c] (.getSuperclass c)) (.getClass obj))))]
      (. f (setAccessible true))
      (. f (get obj)))))

(defn disable-jetty-content-decoders
  "Prevent Jetty client from attempting to decode GZip compressed content.
   When downloading gzip encoded content as range slices, each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons.

   CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large."
  ([]
   (disable-jetty-content-decoders @s3))
  ([client]
   (if-let [^HttpClient jetty-client (-> (client.protocol/-get-info client)
                                         :http-client
                                         (private-field "c")
                                         (private-field "jetty_client"))]
     (do
       (log/debug :in 'disable-decoder-factories :message "Disabling Jetty content decoders for client" :client client :jetty-client jetty-client)
       (.clear (.getContentDecoderFactories jetty-client)))
     (log/error :in 'disable-decoder-factories :message "Failed to disable Jetty client decoders. Subsequent get-object operations may fail"))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants