Skip to content

Conversation

@barreeeiroo
Copy link
Member

Background

ObjectifyStorageIo is an implementation of StorageIo built on top of Google App Engine services. App Inventor 2 uses this implementation to persist all the state in either objects or the database.

This implementation however is kind of a monolith, and it performs multiple interactions with different persistent services.
This can be summarized in 3 different layers:

  • Memcache: used as a cache layer to avoid hitting the database continuously.
  • Datastore/Objectify: used as the database, also part of Google App Engine.
  • GCS Service: to store larger objects, like assets, outputs or larger source files.

Despite being three different services, they are all interacted and entangled from the unified ObjectifyStorageIo.

Description

This PR aims at breaking down this implementation, and provide clear separation between the services. There are two main benefits achieved with this:

  • Better separation of responsibilities: rather than having mixed calls between the services in a single method, this proposal has service-agnostic calls to the "database", the "filesystem", and the "cache".
  • Decoupling from vendor-locked services: as now each of the persistence layers are fully decoupled, it will be possible to create different implementations of different technologies, without being locked to a specific set of technologies (more on this below).

Implementation Details

A new ModularizedStorageIo implementation has been created for StorageIo. Additionally, three new services inside storage were created:

  • database.DatabaseService: provides an interface to interact with whichever service acts as a database.
  • filesystem.FilesystemService: like for the database, but to store large files or blobs.
  • cache.CacheService: and like the previous ones, but to store key-value pairs on a faster access storage.

Then, ModularizedStorageIo orchestrates its interaction to each service, but does not really care about the service underneath. For example, the database can be running PostgreSQL, the cache layer may be running Redis, and the object storage might be still Google Cloud Storage.

Current Implementations

The main goal is to, at least, maintain status quo with the existing ObjectifyStorageIo. As such, the three services do have Google App Engine implementations by default, with an additional one for demonstration purposes:

  • DatabaseService:
    • Google Cloud Datastore/Objectify
    • AWS DynamoDB
  • CacheService
    • Google App Engine Memcache
    • Redis
  • FilesystemService
    • Google Cloud Storeage
    • AWS S3 Compatible

However, modularizing this comes with a caveat that is not being able to run transactions with additional calls in between. For example, the existing implementation was capable of starting a transaction, writing something, put an object to the object store, writing again, and committing. If the object store failed, the transaction would get rolled back. This however is not possible out of the box, as some providers may not have the option to run those transactions (it would require some additional work).

Testing

  • AI2 still runs using default GAE services.
  • AI2 can now run locally using all the non-GAE services.
  • AI2 can also run using a mix of GAE and non-GAE services.
  • Orchestration between cache, database and filesystem working as intended. Files get created and deleted between them as per the expectation (file size change, etc.).

Not all the paths were tested yet, as this is a huge change to the entire StorageIo.

Running Example

image

@barreeeiroo
Copy link
Member Author

barreeeiroo commented Aug 25, 2025

So, I was going crazy with the unit tests failures, and ended up giving up, for the purpose of getting the PR reviewed.

The original reason why the tests were failing was because of trying to register with Objectify the same kind twice: first through ModularizedStorageIo, and then through ObjectifyStorageIo when setting up the tests. Both models are the same (well, technically similar as I cleaned up some unused attributes and changed the visibility of all fields and classes to package level).
So, then I thought: let's just ignore the error, and proceed, right? Objectify would be smart enough to detect that the kind, by class name, is registered. Well, apparently no. Despite not letting me register the same kind twice, though different package, then it complains that the package does not match with the registered kind name.

So, I think it's fine to ignore those errors for now, and decide later whether to go with the existing model in the existing package, the new ones, and potentially remove one of the StorageIo implements to just avoid duplicated registrations.

@barreeeiroo barreeeiroo marked this pull request as ready for review August 25, 2025 22:26
@jisqyv
Copy link
Member

jisqyv commented Aug 27, 2025

@barreeeiroo As you are likely aware, we have 4 different backends for StorageIo. Only ObjectifyStorageIo is in the open source. The other three are LocalStorageIo (uses the filesystem and SQLite3), RadosStorageIo (uses CEPH) and PostgreSQLStorageIo (guess what it uses!).

When we update ai2, we also merge changes into branches that use these other backends, and have other features as well. Before we would merge this change, I would need to understand the impact on these other systems. Ideally, this should make my life easier, but I would need a good block of time to deal with it.

Also note: Before we go to Java21, we will need to upgrade the version of Objectify that we use. The more modern version has a different interface from the version we are using, so there will be a bit of work there. The driving issue here is the renaming of the "javax" package to "jakarta."

@barreeeiroo
Copy link
Member Author

@jisqyv that makes sense, thank you. If you need any help, either with the upgrade of Objectify or importing all the other 4 implementations, let me know. I can work on adding the feature parity across the implementations, and probably do it as part of this PR.

@ewpatton
Copy link
Member

What I would personally like to see is if we could keep the StorageIo layer somewhat simple. Right now there is a very blurry line between what the implementations do that is storage versus application level logic. Take exporting projects as an example:

  ProjectSourceZip exportProjectSourceZip(String userId, long projectId,
    boolean includeProjectHistory,
    boolean includeAndroidKeystore,
    @Nullable String zipName,
    final boolean includeYail,
    final boolean includeScreenShots,
    final boolean forGallery,
    final boolean fatalError, boolean forAppStore, boolean locallyCachedApp) throws IOException;

In reality, the structure of a project is an application level issue. If we had the storage layer just as a key-value store for project data, then how we filter files for exporting them could just be in YoungAndroidProjectService. As we've added new features, this has required making changes to this API and every implementation, yet the actual change has nothing to do with storage!

@barreeeiroo
Copy link
Member Author

barreeeiroo commented Aug 27, 2025

@ewpatton totally agreed. Also, you picked a very good example with exportProjectSourceZip. When I was re-implementing it with the new system, it was actually the last one I did because it was a massive method. However, when I analyzed it, I realized that it does not perform any additional calls, it just "copied" many of the existing calls, so I reused existing methods instead. You can refer to this commit, e269a4f, and how it has been implemented without any additional calls to any sublayer. This makes this method a business logic one, and not an actual StorageIo one, as it did not require implementing any additional calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants