Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add caching mechanism #61

Open
jszendre opened this issue Aug 1, 2023 · 1 comment
Open

[feature] Add caching mechanism #61

jszendre opened this issue Aug 1, 2023 · 1 comment

Comments

@jszendre
Copy link

jszendre commented Aug 1, 2023

At Intuit we are considering adding independent s3 proxy servers for various AWS roles. One of the use cases would be for fielding requests from Spark's S3A connector and returning a more consistent view of the data after create / rename / delete / update operations.

Could we add a pattern for users to specify a caching mechanism? A user could specify a struct that implements an interface for interacting with the cache. One example cache could be with a minio backend.

Thanks

@jawher
Copy link
Contributor

jawher commented Aug 2, 2023

Hi @jszendre,

I'm not sure I understood your request.

But here's, from my understanding, how I see it:

This repository is a proxy which makes it possible for apps to interact with S3 without needing the AWS SDK nor any credentials.

Here's a diagram showcasing how this works in the Mirakl (my company) context:

sequenceDiagram
	participant app as your app
	participant lib as lib-xfiles
	participant xfiles as s3proxy
	participant s3 as S3 Bucket

	app ->> lib: XFilesService.fetchBlob(bucket, key)
	lib ->> xfiles: GET /api/v1/presigned/url/<bucket>/<key>
	xfiles -->> lib: 200 OK {url: "https://google/...."}
	lib ->> s3: GET <url>
	s3 -->> lib: OK <data>
	lib -->> app: OK <data>
Loading

As you can see, this is a custom flow we "invented", and we also have an in-house library lib-xfiles to handle this flow.

However, in your use case, i.e. Spark S3 Adapter: this uses the AWS SDK directly to interact with the S3 API.
And as such, it cannot transparently use s3proxy.

But maybe I misunderstood your request entirely, in which case, do no hesitate to explain it more plainly ?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants