-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Implement Remote Storage outside of GAE #3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This PR is ready for review. I know it's a long one, and probably open to a lot of discussion. I will leave some comments regarding minor possible changes. The main major callout I do have is regarding adding support for uploading AIAs through Remote Storage, as it would be more complicated than I initially thought, as that is client side. The code is partially ready, barreeeiroo/appinventor-sources@external-build-artifacts...support-remote-uploads, but how multipart upload work with presigned URLs is a bit different than "standalone" PUT. New fields have to be added to the form, as well as computing an extra policy, rather than a presigned URL. So, I left it undone to also discuss on this. |
| */ | ||
| public final String getProjectExportObjectKey(final String downloadKind, final String userId, final String fileName) { | ||
| final String timestamp = String.valueOf(Instant.now().getEpochSecond()); | ||
| final String timestampHash = generateFieldsHash(userId, downloadKind, fileName, timestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm introducing this as part of the object key to ensure we don't overwrite previous exports (in case the user still has the previous URL).
We can either keep it as it is, or sync it with how it's done for the build output, and just overwrite it. The previous presigned URL will still be valid, but will "point" to the new object already.
| if (providerName.equals("gcp")) { | ||
| try { | ||
| return RemoteStorageProviderGCS.getInstance(); | ||
| } catch (UnsupportedOperationException e) { | ||
| LOG.severe("Could not initialize Remote GCP Storage in non-Production environment!"); | ||
| return null; | ||
| } | ||
| } | ||
|
|
||
| if (providerName.equals("s3")) { | ||
| return RemoteStorageProviderS3.getInstance(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could technically have another map and prevent initializing the same provider twice for different usages, but that would be an unlikely situation. The point of allowing to customize the provider is to save costs, and GCP is "available by default"; the point is to set S3 for build outputs.
| // whether we are going for Android or Apple, as well as in the BarcodeAction. | ||
| final String target = BuildOutputFiles.getTargetName(); | ||
| final String objectKey = remoteStorage.getBuildOutputObjectKey(target, userId, projectId, outputFileName); | ||
| final String remoteUrl = remoteStorage.generateRetrieveUrl(objectKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here would be to modify the Nonce, and along with the user and project, store a pre-calculated presigned URL, valid for as long as the Nonce is valid. This would remove generating in runtime the URL, and "make the URL consistent" if the same Nonce is accessed twice.
The only reason why I went with this approach to generate the URL on the fly is to not modify the current datastore schema, and as this is a local operation, it should be fine to spend a few ms more here. Accessing the same nonce twice is an unlikely situation too imo.
|
|
||
| // Only use remote downloads for specific kind of downloads (and avoid using for other | ||
| // files like assets, even if "larger", although impossible). | ||
| private static final Set<String> REMOTE_DOWNLOAD_KINDS = Set.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have this initially, and I was just proxying all the files through Remote Storage if the size was larger. However, the purpose here is to just allow exporting projects that exceed 32MB, so I decided to limit to project-related download kinds.
| if (!isInline) { | ||
| final String downloadObjectUrl = shouldUseRemoteStorageDownload(downloadKind, userId, downloadableFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't really find where the inline mode is used, but I think it's safe to assume that if something should render "inline", it's better to not use foreign origins or anything similar...
| // configured. | ||
| // If unconfigured, still use GAE, but it may fail for large files due to response | ||
| // payload limit. | ||
| private static final int DIRECT_DOWNLOAD_MAX_FILE_SIZE = 20_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically GAE payload limit is 32MB, but wanted to have some buffer. Maybe it's too much, should this be set to 30MB?
If using GCP actually, we also save costs. GAE sends the AIA to GCS, and user downloads from GCS. Egress traffic from GCS to Internet is much cheaper than using GAE, so there's also a point in storing larger files in GCS.
Implement remote storage capability using presigned URLs. Allow build outputs to be sent from buildserver to remote storage, and GAE to redirect the user to the presigned download URLs directly. S3 compatible providers are supported as of now.
To save bandwidth, and avoid the GAE payload limitation, send the "to be downloaded" files to remote storage. Then, redirect the user to the download URL to download the file from remote storage rather than from GAE.
Allow using Google Cloud Storage buckets with presigned URLs as Remote Storage.
Ensure files downloaded from Remote Storage preserve the same filename.
Some providers don't support virtual hosted endpoints, hence we should allow falling back to path-like endpoints.
Variable outputApk will be null if the build does not succeed, potentially causing a NPE and buildserver not calling back to GAE with the specific error.
f3b42d2 to
bf3b3b9
Compare

Background
App Inventor currently uses Google Cloud Storage as the main storage solution for "large" objects, which is mainly build outputs. These large objects are built outside of Google App Engine, and sent back to get them stored in GCS. Then, GAE will proxy them to the user to download it.
This however implies quite a lot of network hops, and increases egress traffic costs: from the buildserver to GAE (if there is egress network charge), from GAE to GCS to get it stored, from GCS back to GAE when downloading it, and from GAE to the user. This last hop is the most expensive, as it charges $0.139 per GB transferred to the Internet.
With that in mind, it makes sense to allow the users to download the files directly, instead of going through GAE (which is also taking resources to load in memory that data).
Description
This PR implements support for "Remote Storage" in App Inventor. It allows GAE to generate Presigned URLs in a 3P storage provider, and interact with files out there. It is mainly targeted to store build outputs outside of GAE, as these are the largest artifacts eating up the majority of the egress traffic.
This PR also removes the "build size" limitation. By directly letting the user download the output file from storage, we are no longer bound to GAE's request/response payload limits, so it can effectively "download" larger projects.
Now AIA files can also be downloaded beyond the GAE payload limit.
Current Capabilities
uploadUrl. If provided, the output file (APK or AAB) gets uploaded there, instead of sent back to GAE.Pending Items
Allow uploading exported projects to remote storage, without going to GAE.See comments.Testing
Appendix
Example Configurations
For AWS S3:
For Hetzner Object Storage (S3 compatible, note the specific values vary per each provider):
Testing Screenshots
GCS Remote Storage when Exporting AIA
S3 Compatible Remote Storage when Exporting AIA
S3 Compatible Remote Storage when Downloading APK through Nonce