Cache byte range requests #215

4141done · 2024-02-23T22:02:28Z

What

A potential fix for #188

When the Range header is supplied:

NGINX will perform subrequests to s3 in byte ranges of PROXY_CACHE_SLICE_SIZE until the requested range is satisfied
Cache will be populated in slices of PROXY_CACHE_SLICE_SIZE.
Only the requested byte range will be cached

When the Range header is not supplied:

Normal behavior - files will be cached in their entirety
For large files, proxy_cache_lock ensures that multiple requests for the same file are not cached multiple times. Requests received after the initial MISS will queue until they can be served from the cache (the initial request cache write is complete).

I think it's good to have the ability to serve and cache byte range requests efficiently by default. Although we could turn this on and off with a config option, the overhead is low and makes the gateway more flexible.

Implementation Details

This solution takes advantage of the existing redirectToS3 function to change the target NGINX conf location based on the presence of the Range header
The main configuration for the s3 proxy action has been broken out into common/etc/nginx/templates/gateway/s3_location_common.conf.template
A separate cache is defined for the slice-based caching
In the slice caching location, the http_slice_module is configured and other caching options overridden as necessary.

Open questions:

Do we want to be able to control the slice cache separately from the main cache?
Do we need to disable proxy_cache_lock in the @s3_sliced location?
Do we necessarily need to separate the cache storage on disk?

Examples

Normal Request

curl -o foo.txt localhost:8989/a/5mb.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5120k  100 5120k    0     0   111M      0 --:--:-- --:--:-- --:--:--  113M

A single cache file is created

root@f339daeb2d44:/var/cache/nginx/s3_proxy# tree .
.
`-- 5
    `-- 9e
        `-- 447b5a707c18a4c0e90344925e6b39e5

The size of the cache file is equal to the requested file:

root@f339daeb2d44:/var/cache/nginx/s3_proxy# du -h .
5.1M    ./5/9e
5.1M    ./5
5.1M    .

Byte Range Request

In this example, I'm requesting a 5mb file, and the PROXY_CACHE_SLICE_SIZE option has been set to 1000k (1000 kilobytes)

curl -o foo.txt -r 1000000-4000000 localhost:8989/a/5mb.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2929k  100 2929k    0     0  66.8M      0 --:--:-- --:--:-- --:--:-- 68.1M

Cache files are created in chunks:

root@f339daeb2d44:/var/cache/nginx/s3_proxy_slices# tree .
.
|-- 0
|   `-- 5c
|       `-- 18f94c01f7a1beed3afe0aa92baf05c0
|-- 4
|   `-- 30
|       `-- 9fac913edc79622fdcc2975d91e4f304
|-- b
|   `-- 5b
|       `-- 91bfb9ef86136be4b07cdc2eb51025bb
`-- d
    `-- 82
        `-- 339384e3e9840cf7f8fe4e54fdc8182d

The size of each cache file is roughly equal to the requested file the chunk size:

root@f339daeb2d44:/var/cache/nginx/s3_proxy_slices# du -h .
1008K   ./d/82
1012K   ./d
1008K   ./0/5c
1012K   ./0
1008K   ./b/5b
1012K   ./b
1008K   ./4/30
1012K   ./4
4.0M    .

4141done · 2024-02-27T01:25:58Z

/dev_build

4141done · 2024-02-27T01:27:22Z

/build_dev

github-actions · 2024-02-27T01:28:17Z

Build and Push Dev Preview Image
The long task is done!

You can find the workflow here:
https://github.com/nginxinc/nginx-s3-gateway/actions/runs/8058196483

alessfg · 2024-02-27T23:16:44Z

The general gist of the PR looks good to me! Some thoughts re your open ended questions --

Do we want to be able to control the slice cache separately from the main cache?

I see the sliced cache as an additional feature of the main cache so as a user, I don't think you'd need to control the sliced cache separately. In so far as an NGINX implementation detail goes, a separate cache might help fine tune the necessary slice cache settings (but I would still consider using the same cache until such a point comes).

Do we need to disable proxy_cache_lock in the slice-cached handler?

My go to would be to use a default value at which proxy_cache_lock gets enabled for both normal and sliced requests, yet allow the user to tweak that value.

Do we necessarily need to separate the cache storage on disk?

I think this should be the default. I can see slicing back together the data on disk being useful for some use cases, but doing so should be an optional toggle.

4141done · 2024-02-27T23:33:19Z

Sorry @alessfg for the thumbs down. I have a rampant bot that I had to fix. Thank you for your comments, I'm take them into account as we work on this feature more.

nginxinc deleted a comment from github-actions bot Feb 26, 2024

4141done force-pushed the issue-188-byte-range-request branch from 95da3a5 to 4f8836f Compare February 26, 2024 23:25

4141done mentioned this pull request Feb 27, 2024

Slow Respons with byte-range Requests #188

Closed

4141done requested a review from dekobon February 27, 2024 23:30

4141done added 4 commits February 28, 2024 10:52

scratch code for serving byte range requests

8cf3cb5

small fixes

12e47cb

fix typo

fe11c12

test byte range requests in integration tests

f091808

4141done force-pushed the issue-188-byte-range-request branch from 4f8836f to f091808 Compare February 28, 2024 23:01

fix bad checksum executable reference

04e89a7

4141done marked this pull request as ready for review March 6, 2024 19:23

add documentation on byte slice caching

1d85ae9

4141done changed the title ~~[WIP] scratch code for serving byte range requests~~ Cache byte range requests Mar 6, 2024

4141done merged commit 3d971df into master Mar 6, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache byte range requests #215

Cache byte range requests #215

4141done commented Feb 23, 2024 •

edited

Loading

4141done commented Feb 27, 2024

4141done commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

alessfg commented Feb 27, 2024

4141done commented Feb 27, 2024

Cache byte range requests #215

Cache byte range requests #215

Conversation

4141done commented Feb 23, 2024 • edited Loading

What

Implementation Details

Open questions:

Examples

Normal Request

Byte Range Request

4141done commented Feb 27, 2024

4141done commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

alessfg commented Feb 27, 2024

4141done commented Feb 27, 2024

4141done commented Feb 23, 2024 •

edited

Loading