Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache byte range requests #215

Merged
merged 6 commits into from
Mar 6, 2024
Merged

Cache byte range requests #215

merged 6 commits into from
Mar 6, 2024

Conversation

4141done
Copy link
Collaborator

@4141done 4141done commented Feb 23, 2024

What

A potential fix for #188

When the Range header is supplied:

  • NGINX will perform subrequests to s3 in byte ranges of PROXY_CACHE_SLICE_SIZE until the requested range is satisfied
  • Cache will be populated in slices of PROXY_CACHE_SLICE_SIZE.
  • Only the requested byte range will be cached

When the Range header is not supplied:

  • Normal behavior - files will be cached in their entirety
  • For large files, proxy_cache_lock ensures that multiple requests for the same file are not cached multiple times. Requests received after the initial MISS will queue until they can be served from the cache (the initial request cache write is complete).

I think it's good to have the ability to serve and cache byte range requests efficiently by default. Although we could turn this on and off with a config option, the overhead is low and makes the gateway more flexible.

Implementation Details

  • This solution takes advantage of the existing redirectToS3 function to change the target NGINX conf location based on the presence of the Range header

  • The main configuration for the s3 proxy action has been broken out into common/etc/nginx/templates/gateway/s3_location_common.conf.template

  • A separate cache is defined for the slice-based caching

  • In the slice caching location, the http_slice_module is configured and other caching options overridden as necessary.

Open questions:

  • Do we want to be able to control the slice cache separately from the main cache?
  • Do we need to disable proxy_cache_lock in the @s3_sliced location?
  • Do we necessarily need to separate the cache storage on disk?

Examples

Normal Request

curl -o foo.txt localhost:8989/a/5mb.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5120k  100 5120k    0     0   111M      0 --:--:-- --:--:-- --:--:--  113M

A single cache file is created

root@f339daeb2d44:/var/cache/nginx/s3_proxy# tree .
.
`-- 5
    `-- 9e
        `-- 447b5a707c18a4c0e90344925e6b39e5

The size of the cache file is equal to the requested file:

root@f339daeb2d44:/var/cache/nginx/s3_proxy# du -h .
5.1M    ./5/9e
5.1M    ./5
5.1M    .

Byte Range Request

In this example, I'm requesting a 5mb file, and the PROXY_CACHE_SLICE_SIZE option has been set to 1000k (1000 kilobytes)

curl -o foo.txt -r 1000000-4000000 localhost:8989/a/5mb.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2929k  100 2929k    0     0  66.8M      0 --:--:-- --:--:-- --:--:-- 68.1M

Cache files are created in chunks:

root@f339daeb2d44:/var/cache/nginx/s3_proxy_slices# tree .
.
|-- 0
|   `-- 5c
|       `-- 18f94c01f7a1beed3afe0aa92baf05c0
|-- 4
|   `-- 30
|       `-- 9fac913edc79622fdcc2975d91e4f304
|-- b
|   `-- 5b
|       `-- 91bfb9ef86136be4b07cdc2eb51025bb
`-- d
    `-- 82
        `-- 339384e3e9840cf7f8fe4e54fdc8182d

The size of each cache file is roughly equal to the requested file the chunk size:

root@f339daeb2d44:/var/cache/nginx/s3_proxy_slices# du -h .
1008K   ./d/82
1012K   ./d
1008K   ./0/5c
1012K   ./0
1008K   ./b/5b
1012K   ./b
1008K   ./4/30
1012K   ./4
4.0M    .

@nginxinc nginxinc deleted a comment from github-actions bot Feb 26, 2024
@nginxinc nginxinc deleted a comment from github-actions bot Feb 26, 2024
@nginxinc nginxinc deleted a comment from github-actions bot Feb 26, 2024
@nginxinc nginxinc deleted a comment from github-actions bot Feb 26, 2024
@nginxinc nginxinc deleted a comment from github-actions bot Feb 26, 2024
@nginxinc nginxinc deleted a comment from github-actions bot Feb 26, 2024
@4141done 4141done force-pushed the issue-188-byte-range-request branch from 95da3a5 to 4f8836f Compare February 26, 2024 23:25
@4141done
Copy link
Collaborator Author

/dev_build

@4141done
Copy link
Collaborator Author

/build_dev

Copy link

Build and Push Dev Preview Image
The long task is done!

You can find the workflow here:
https://github.com/nginxinc/nginx-s3-gateway/actions/runs/8058196483

@alessfg
Copy link
Collaborator

alessfg commented Feb 27, 2024

The general gist of the PR looks good to me! Some thoughts re your open ended questions --

Do we want to be able to control the slice cache separately from the main cache?

I see the sliced cache as an additional feature of the main cache so as a user, I don't think you'd need to control the sliced cache separately. In so far as an NGINX implementation detail goes, a separate cache might help fine tune the necessary slice cache settings (but I would still consider using the same cache until such a point comes).

Do we need to disable proxy_cache_lock in the slice-cached handler?

My go to would be to use a default value at which proxy_cache_lock gets enabled for both normal and sliced requests, yet allow the user to tweak that value.

Do we necessarily need to separate the cache storage on disk?

I think this should be the default. I can see slicing back together the data on disk being useful for some use cases, but doing so should be an optional toggle.

@4141done 4141done requested a review from dekobon February 27, 2024 23:30
@4141done
Copy link
Collaborator Author

Sorry @alessfg for the thumbs down. I have a rampant bot that I had to fix. Thank you for your comments, I'm take them into account as we work on this feature more.

@4141done 4141done force-pushed the issue-188-byte-range-request branch from 4f8836f to f091808 Compare February 28, 2024 23:01
@4141done 4141done marked this pull request as ready for review March 6, 2024 19:23
@4141done 4141done changed the title [WIP] scratch code for serving byte range requests Cache byte range requests Mar 6, 2024
@4141done 4141done merged commit 3d971df into master Mar 6, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants