Skip to content

Conversation

@barreeeiroo
Copy link
Member

@barreeeiroo barreeeiroo commented Jul 31, 2025

Background

App Inventor currently uses Google Cloud Storage as the main storage solution for "large" objects, which is mainly build outputs. These large objects are built outside of Google App Engine, and sent back to get them stored in GCS. Then, GAE will proxy them to the user to download it.

This however implies quite a lot of network hops, and increases egress traffic costs: from the buildserver to GAE (if there is egress network charge), from GAE to GCS to get it stored, from GCS back to GAE when downloading it, and from GAE to the user. This last hop is the most expensive, as it charges $0.139 per GB transferred to the Internet.

With that in mind, it makes sense to allow the users to download the files directly, instead of going through GAE (which is also taking resources to load in memory that data).

Description

This PR implements support for "Remote Storage" in App Inventor. It allows GAE to generate Presigned URLs in a 3P storage provider, and interact with files out there. It is mainly targeted to store build outputs outside of GAE, as these are the largest artifacts eating up the majority of the egress traffic.

This PR also removes the "build size" limitation. By directly letting the user download the output file from storage, we are no longer bound to GAE's request/response payload limits, so it can effectively "download" larger projects.
Now AIA files can also be downloaded beyond the GAE payload limit.

Current Capabilities

  • GAE now has support to configure "remote storage", which can be interacted using presigned URLs.
  • Buildserver can now, optionally, receive an extra argument: uploadUrl. If provided, the output file (APK or AAB) gets uploaded there, instead of sent back to GAE.
    • It will still send back to GAE the ZIP file with the full structure, but the output file will be truncated to 0. This is needed to "acknowledge" the build, and make sure the target file in the remote storage can be identified.
  • App Engine, when downloading project files exceeding 20MBs, will actually store it in remote, and provide a presigned URL for the user to download it.
  • Implementation can be cloud-provider agnostic without any libraries: as of now, only S3 is implemented. However, this is S3-compatible and not vendor locked. Any S3 compatible storage provider can be used, as long as they support presigned URLs. Some examples are:
    • Cloud: MinIO, Blackblaze, Cloudflare R2, Hetzner
    • Self Hosted: MinIO, SeaweedFS
  • Different providers can be used for different purposes (i.e., S3 for build outputs, GCS for project exports).
    • The main reason for this is to optimize costs and reduce egress traffic as much as possible.
    • Build outputs, as they are already produced outside of GCP, it's fine to send them to an S3 compatible storage outside of GCP, to ensure egress traffic charges can be avoid.
    • However, project exports work differently. In this case, GAE has to collect and produce the AIA file. If GCP sends the AIA to be downloaded to an external S3 compatible storage, it would still incur in egress traffic (just removing the 32MB limit). As such, it might be interesting to keep this use case in GCS.

Pending Items

  • Implement GCS support, to avoid leaving Google Cloud.
  • Allow sending exported projects (AIA) to remote storage.
  • Allow setting different providers for different purposes (i.e., S3 for builds, GCS for AIAs).
  • Allow uploading exported projects to remote storage, without going to GAE. See comments.

Testing

  • Presigned URL generation works as intended.
  • Locally built some apps, which were sent and download to/from remote storage.
  • Tested remote storage with both AWS S3 buckets ("native" S3), and Hetzner Object Storage buckets (S3 compatible)
  • Locally created an app having assets above 20MB, which AIA file was downloaded from remote storage. Below the thresholds, they were still downloaded from localhost.

Appendix

Example Configurations

For AWS S3:

<!-- Remote Storage Provider -->
<property name="remotestorage" value="s3" />

<!-- S3 Compatible Storage Configuration -->
<property name="remotestorage.s3.endpoint" value="" />  <!-- Not Required -->
<property name="remotestorage.s3.bucketname" value="ai2-remotestorage" />
<property name="remotestorage.s3.bucketregion" value="eu-west-1" />
<property name="remotestorage.s3.accesskeyid" value="[REDACTED]" />
<property name="remotestorage.s3.secretaccesskey" value="[REDACTED]" />

For Hetzner Object Storage (S3 compatible, note the specific values vary per each provider):

<!-- Remote Storage Provider -->
<property name="remotestorage" value="s3" />

<!-- S3 Compatible Storage Configuration -->
<property name="remotestorage.s3.endpoint" value="ai2-remotestorage.hel1.your-objectstorage.com" />
<property name="remotestorage.s3.bucketname" value="ai2-remotestorage" />
<property name="remotestorage.s3.bucketregion" value="hel1" />
<property name="remotestorage.s3.accesskeyid" value="[REDACTED]" />
<property name="remotestorage.s3.secretaccesskey" value="[REDACTED]" />

Testing Screenshots

GCS Remote Storage when Exporting AIA

image

S3 Compatible Remote Storage when Exporting AIA

image

S3 Compatible Remote Storage when Downloading APK through Nonce

image

@barreeeiroo barreeeiroo changed the title Implement Remote Storage outside of GCP Implement Remote Storage outside of GAE Jul 31, 2025
@barreeeiroo
Copy link
Member Author

This PR is ready for review. I know it's a long one, and probably open to a lot of discussion. I will leave some comments regarding minor possible changes.

The main major callout I do have is regarding adding support for uploading AIAs through Remote Storage, as it would be more complicated than I initially thought, as that is client side.
I would have to create a new servlet to generate the presigned URL, and modify the client to first retrieve the URL, then use that URL for the multipart upload, and then call back the upload servlet with an extra flag so it retrieves the file from remote storage rather than expecting it as a multipart upload.
An alternative that users have right now, considering they can now export AIAs larger than 32MB, is to temporarily replace the assets with 0-size ones, and then upload them individually again.

The code is partially ready, barreeeiroo/appinventor-sources@external-build-artifacts...support-remote-uploads, but how multipart upload work with presigned URLs is a bit different than "standalone" PUT. New fields have to be added to the form, as well as computing an extra policy, rather than a presigned URL.
Due to this, the upload form would become more complex, and I think it's not going to work the same way as for GCS...

So, I left it undone to also discuss on this.

@barreeeiroo barreeeiroo marked this pull request as ready for review August 1, 2025 22:16
*/
public final String getProjectExportObjectKey(final String downloadKind, final String userId, final String fileName) {
final String timestamp = String.valueOf(Instant.now().getEpochSecond());
final String timestampHash = generateFieldsHash(userId, downloadKind, fileName, timestamp);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm introducing this as part of the object key to ensure we don't overwrite previous exports (in case the user still has the previous URL).

We can either keep it as it is, or sync it with how it's done for the build output, and just overwrite it. The previous presigned URL will still be valid, but will "point" to the new object already.

Comment on lines +68 to +79
if (providerName.equals("gcp")) {
try {
return RemoteStorageProviderGCS.getInstance();
} catch (UnsupportedOperationException e) {
LOG.severe("Could not initialize Remote GCP Storage in non-Production environment!");
return null;
}
}

if (providerName.equals("s3")) {
return RemoteStorageProviderS3.getInstance();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could technically have another map and prevent initializing the same provider twice for different usages, but that would be an unlikely situation. The point of allowing to customize the provider is to save costs, and GCP is "available by default"; the point is to set S3 for build outputs.

// whether we are going for Android or Apple, as well as in the BarcodeAction.
final String target = BuildOutputFiles.getTargetName();
final String objectKey = remoteStorage.getBuildOutputObjectKey(target, userId, projectId, outputFileName);
final String remoteUrl = remoteStorage.generateRetrieveUrl(objectKey);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here would be to modify the Nonce, and along with the user and project, store a pre-calculated presigned URL, valid for as long as the Nonce is valid. This would remove generating in runtime the URL, and "make the URL consistent" if the same Nonce is accessed twice.

The only reason why I went with this approach to generate the URL on the fly is to not modify the current datastore schema, and as this is a local operation, it should be fine to spend a few ms more here. Accessing the same nonce twice is an unlikely situation too imo.


// Only use remote downloads for specific kind of downloads (and avoid using for other
// files like assets, even if "larger", although impossible).
private static final Set<String> REMOTE_DOWNLOAD_KINDS = Set.of(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have this initially, and I was just proxying all the files through Remote Storage if the size was larger. However, the purpose here is to just allow exporting projects that exceed 32MB, so I decided to limit to project-related download kinds.

Comment on lines +293 to +296
if (!isInline) {
final String downloadObjectUrl = shouldUseRemoteStorageDownload(downloadKind, userId, downloadableFile);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really find where the inline mode is used, but I think it's safe to assume that if something should render "inline", it's better to not use foreign origins or anything similar...

// configured.
// If unconfigured, still use GAE, but it may fail for large files due to response
// payload limit.
private static final int DIRECT_DOWNLOAD_MAX_FILE_SIZE = 20_000_000;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically GAE payload limit is 32MB, but wanted to have some buffer. Maybe it's too much, should this be set to 30MB?

If using GCP actually, we also save costs. GAE sends the AIA to GCS, and user downloads from GCS. Egress traffic from GCS to Internet is much cheaper than using GAE, so there's also a point in storing larger files in GCS.

@barreeeiroo
Copy link
Member Author

I actually forgot that some providers, like MinIO, do not support virtual host endpoints, and use path like ones. I have now added support for both.

Tested by running MinIO locally with these flags:

    <!-- Remote Storage Provider -->
    <property name="remotestorage.build" value="s3" />
    <property name="remotestorage.export" value="s3" />

    <!-- GCP Remote Storage Configuration -->
    <property name="remotestorage.gcp.bucketname" value="" />

    <!-- S3 Compatible Remote Storage Configuration -->
    <property name="remotestorage.s3.protocol" value="http" />
    <property name="remotestorage.s3.endpoint" value="localhost:9000" />
    <property name="remotestorage.s3.pathlike" value="true" />
    <property name="remotestorage.s3.bucketname" value="remotestorage" />
    <property name="remotestorage.s3.bucketregion" value="us-east-1" />
    <property name="remotestorage.s3.accesskeyid" value="minioadmin" />
    <property name="remotestorage.s3.secretaccesskey" value="minioadmin123" />
image

Implement remote storage capability using presigned URLs. Allow build outputs to be sent from buildserver to remote storage, and GAE to redirect the user to the presigned download URLs directly. S3 compatible providers are supported as of now.
To save bandwidth, and avoid the GAE payload limitation, send the "to be downloaded" files to remote storage. Then, redirect the user to the download URL to download the file from remote storage rather than from GAE.
Allow using Google Cloud Storage buckets with presigned URLs as Remote Storage.
Ensure files downloaded from Remote Storage preserve the same filename.
Some providers don't support virtual hosted endpoints, hence we should allow falling back to path-like endpoints.
Variable outputApk will be null if the build does not succeed, potentially causing a NPE and buildserver not calling back to GAE with the specific error.
@barreeeiroo barreeeiroo force-pushed the external-build-artifacts branch from f3b42d2 to bf3b3b9 Compare August 20, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants