You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several components in texera backend server and amber rely on different kinds of storage services. E.g.
Dataset component: Read/Write file via GitVersionControlLocalFileStorage service. This service are issuing read/write through file system API to local file system, and calls JGit to do the version control.
Time Travel component: Read/Write Log records via storage instances of SequentialRecordStorage. It is designed based on apache VFS, a more general layer of abstraction of file system, and return reader/writer based on file stream.
From the perspective of separation refactoring, a more general and stateless file storage layer is needed. Other components, e.g. workflow, will provide a URI to this layer, and this layer is responsible of resolving this layer into the actual File metadata object.
To design such a general layer, we would need to discuss what current data types we are dealing with, locations for these data types, and APIs used to r/w these data types.
Data Type & Storage & API
Type
Examples
Current Storage Location
API
Needs to be abstracted by this layer
Structural Data
Tables in texera_db
MySQL
POJOs and DAOs generated by jooq
No
Semi-Structural Data
workflow results
MongoDB/Memory
MongoDB SDK
?
Plain Record Object
Log record in Time Travel
File System
kyro as the serializer/deserializer, File System APIs based on Apache VFS
Yes
File Object
user-uploaded data file
File System
File System APIs based on Java/Scala native File IO library
Yes
The text was updated successfully, but these errors were encountered:
bobbai00
changed the title
[Separation Refactoring][WIP] Introduce the File Storage Layer
[Separation Refactoring][WIP] File Storage Layer Proposal
Mar 28, 2024
This PR adds two abstract classes, `TexeraCollection` and
`TexeraDocument`, one plain class `TexeraURI`, to provide a unified file
layer in Texera. With these abstract classes, any new resources that
need to be stored in the system should follow the protocol and implement
them.
For background discussion, refer to #2514 for more details.
Objective
Several components in texera backend server and amber rely on different kinds of storage services. E.g.
Dataset
component: Read/Write file viaGitVersionControlLocalFileStorage
service. This service are issuing read/write through file system API to local file system, and callsJGit
to do the version control.Time Travel
component: Read/Write Log records via storage instances ofSequentialRecordStorage
. It is designed based on apache VFS, a more general layer of abstraction of file system, and return reader/writer based on file stream.From the perspective of separation refactoring, a more general and stateless file storage layer is needed. Other components, e.g. workflow, will provide a URI to this layer, and this layer is responsible of resolving this layer into the actual File metadata object.
To design such a general layer, we would need to discuss what current data types we are dealing with, locations for these data types, and APIs used to r/w these data types.
Data Type & Storage & API
texera_db
The text was updated successfully, but these errors were encountered: