Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Separation Refactoring] File Storage Layer Proposal #2514

Open
bobbai00 opened this issue Mar 27, 2024 · 0 comments
Open

[Separation Refactoring] File Storage Layer Proposal #2514

bobbai00 opened this issue Mar 27, 2024 · 0 comments
Assignees
Labels

Comments

@bobbai00
Copy link
Collaborator

bobbai00 commented Mar 27, 2024

Objective

Several components in texera backend server and amber rely on different kinds of storage services. E.g.

  • Dataset component: Read/Write file via GitVersionControlLocalFileStorage service. This service are issuing read/write through file system API to local file system, and calls JGit to do the version control.
  • Time Travel component: Read/Write Log records via storage instances of SequentialRecordStorage. It is designed based on apache VFS, a more general layer of abstraction of file system, and return reader/writer based on file stream.

From the perspective of separation refactoring, a more general and stateless file storage layer is needed. Other components, e.g. workflow, will provide a URI to this layer, and this layer is responsible of resolving this layer into the actual File metadata object.

To design such a general layer, we would need to discuss what current data types we are dealing with, locations for these data types, and APIs used to r/w these data types.

Data Type & Storage & API

Type Examples Current Storage Location API Needs to be abstracted by this layer
Structural Data Tables in texera_db MySQL POJOs and DAOs generated by jooq No
Semi-Structural Data workflow results MongoDB/Memory MongoDB SDK ?
Plain Record Object Log record in Time Travel File System kyro as the serializer/deserializer, File System APIs based on Apache VFS Yes
File Object user-uploaded data file File System File System APIs based on Java/Scala native File IO library Yes
@bobbai00 bobbai00 self-assigned this Mar 27, 2024
@bobbai00 bobbai00 changed the title [Separation Refactoring][WIP] Introduce the File Storage Layer [Separation Refactoring][WIP] File Storage Layer Proposal Mar 28, 2024
@bobbai00 bobbai00 changed the title [Separation Refactoring][WIP] File Storage Layer Proposal [Separation Refactoring] File Storage Layer Proposal Apr 10, 2024
bobbai00 added a commit that referenced this issue Apr 14, 2024
This PR adds two abstract classes, `TexeraCollection` and
`TexeraDocument`, one plain class `TexeraURI`, to provide a unified file
layer in Texera. With these abstract classes, any new resources that
need to be stored in the system should follow the protocol and implement
them.

For background discussion, refer to #2514 for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant