[Separation Refactoring] File Storage Layer Proposal #2514

bobbai00 · 2024-03-27T06:28:15Z

Objective

Several components in texera backend server and amber rely on different kinds of storage services. E.g.

Dataset component: Read/Write file via GitVersionControlLocalFileStorage service. This service are issuing read/write through file system API to local file system, and calls JGit to do the version control.
Time Travel component: Read/Write Log records via storage instances of SequentialRecordStorage. It is designed based on apache VFS, a more general layer of abstraction of file system, and return reader/writer based on file stream.

From the perspective of separation refactoring, a more general and stateless file storage layer is needed. Other components, e.g. workflow, will provide a URI to this layer, and this layer is responsible of resolving this layer into the actual File metadata object.

To design such a general layer, we would need to discuss what current data types we are dealing with, locations for these data types, and APIs used to r/w these data types.

Data Type & Storage & API

Type	Examples	Current Storage Location	API	Needs to be abstracted by this layer
Structural Data	Tables in `texera_db`	MySQL	POJOs and DAOs generated by jooq	No
Semi-Structural Data	workflow results	MongoDB/Memory	MongoDB SDK	?
Plain Record Object	Log record in Time Travel	File System	kyro as the serializer/deserializer, File System APIs based on Apache VFS	Yes
File Object	user-uploaded data file	File System	File System APIs based on Java/Scala native File IO library	Yes

The text was updated successfully, but these errors were encountered:

This PR adds two abstract classes, `TexeraCollection` and `TexeraDocument`, one plain class `TexeraURI`, to provide a unified file layer in Texera. With these abstract classes, any new resources that need to be stored in the system should follow the protocol and implement them. For background discussion, refer to #2514 for more details.

bobbai00 added the engine label Mar 27, 2024

bobbai00 self-assigned this Mar 27, 2024

bobbai00 changed the title ~~[Separation Refactoring][WIP] Introduce the File Storage Layer~~ [Separation Refactoring][WIP] File Storage Layer Proposal Mar 28, 2024

bobbai00 mentioned this issue Apr 10, 2024

Introduce File Layer Abstraction #2593

Merged

bobbai00 changed the title ~~[Separation Refactoring][WIP] File Storage Layer Proposal~~ [Separation Refactoring] File Storage Layer Proposal Apr 10, 2024

shengquan-ni mentioned this issue May 2, 2024

File Layer Abstraction and Implementation #2636

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Separation Refactoring] File Storage Layer Proposal #2514

[Separation Refactoring] File Storage Layer Proposal #2514

bobbai00 commented Mar 27, 2024 •

edited

Loading

[Separation Refactoring] File Storage Layer Proposal #2514

[Separation Refactoring] File Storage Layer Proposal #2514

Comments

bobbai00 commented Mar 27, 2024 • edited Loading

Objective

Data Type & Storage & API

bobbai00 commented Mar 27, 2024 •

edited

Loading