-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 GetObject fails with large files #209
Comments
Hello! Thank you for the report. We are actively investigating and will update ASAP. |
Hello again. Quick question. The http client gets the size from the |
@fogus I'm unable to get the value because of the exception. Here's what I get when trying to get a file of
If there's another way I can get the |
Took a look at this and I'm pretty sure the issue is:
You can reproduce this just calling |
Of course, java indexes arrays with ints, so I'm not sure what the actual fix here would be :) |
Grab large files in chunks? |
@fogus thanks to some clojure wizardry by @CR-Latacora:
|
I suspect that there may be a way to address this in user-space using the |
@x64-latacora were you able to resolve this with |
We did not. We switched to using amazonica to get objects instead of cognitect's s3-api. |
@dchelimsky @fogus thanks for the tip on using The following was tested with an object size of 2243897556 bytes (larger than max int). Using VisualVM attached to the REPL, the memory was stable during the download remaining between 70-100MiB. (require '[clojure.java.io :as io])
(import java.io.SequenceInputStream)
(defn parse-content-range
"Extract the object size from the ContentRange response attribute and convert to Long type
e.g. \"bytes 0-5242879/2243897556\"
Returns 2243897556"
[content-range]
(when-let [object-size (re-find #"[0-9]+$" content-range)]
(Long/parseLong object-size)))
(def chunk-size-bytes (* 1024 1024 5)) ;; chosen arbitrarily
(defn get-object-chunks [{:keys [bucket, key]}]
(iteration (fn [range-byte-pos]
(let [to-byte-pos (+ range-byte-pos chunk-size-bytes)
range (str "bytes=" range-byte-pos "-" (dec to-byte-pos))
op-map {:op :GetObject :request {:Bucket bucket :Key key :Range range}}
{:keys [ContentRange] :as response} (aws/invoke s3 op-map)]
(println :range range :response response)
;; todo: check the response for errors
(assoc response :range-byte-pos to-byte-pos
:object-size (parse-content-range ContentRange))))
:initk 0
:kf (fn [{:keys [range-byte-pos, object-size]}]
(when (< range-byte-pos object-size)
range-byte-pos))
:vf :Body))
(defn seq-enumeration
"Returns a java.util.Enumeration on a seq"
{:static true}
[coll]
(clojure.lang.SeqEnumeration. coll))
(time
(let [s3-address {:bucket "your bucket"
:key "your key"}
target-file "/path/to/some-file"]
(with-open [target (io/output-stream (io/file target-file))]
(io/copy (SequenceInputStream. (seq-enumeration (sequence (get-object-chunks s3-address))))
target)))) |
There is a caveat on the previous example: the ranging doesn't play well if an object's content encoding is set. Each chunk will be treated as self-contained gzip content, which doesn't work for obvious reasons. I'm not sure if there's a way to get into the Jetty client and disable the GZip decoder from the
|
With respect to the above situation (reading chunks for gz compressed content), I used the following hack to disable Jetty GZip decoder for my S3 client. CAUTION: since AWS API uses a shared http-client for all services, this will affect ALL AWS calls. It is best to create a dedicated client/http-client for input-stream-large.
|
Dependencies
Description with failing test case
Similar issue to #97, but the call fails when making a
GetObject
API call.For reference, the file being fetches is 3.6GB.
Stack traces
The text was updated successfully, but these errors were encountered: