OPFS filesystem transparency for AccessHandlePoolVFS? #99
Replies: 5 comments 20 replies
-
Here's a visualization of a sample OPFS tree:
Files not under the special The
When a VFS attaches its OPFS filesystem, it does the following things:
To transform the OPFS filesystem for transparency, we need to make two types of changes:
The tricky part is being able to recover if any error occurs during these changes. I think something like this will work for each change (basically journalling a metadata transaction):
Then on restart the VFS should check IndexedDB for saved metadata and finish any outstanding steps. |
Beta Was this translation helpful? Give feedback.
-
My current thinking (subject to change) for the new |
Beta Was this translation helpful? Give feedback.
-
The past day i've been considering, "on paper," (as opposed to "in Issue 1: Arbitrararily large init workloadsWhen we traverse the FS at startup we have to filter out any file Fundamentally that's not a problem at all. In the VFS we only need to
and it will turn out that that game includes a few thousand asset (It's only a matter of time before developers start using OPFS as local This problem does not bug me just yet because we "could" later, Stomache ache level: low Issue 2: Pre-existing files getting transformed to opaque onesThis may well be a purely academic problem, not a real one, but here Let's say we open up Stomache ache level: the high end of low/the low end of medium Issue 3: Holding directory handlesFrom the top post:
In order to be able to move files back and forth, we have to hold the Moving might also require creation of subdirs, and we'd end up leaving Again, maybe this isn't a real problem, but the uphill ice-skater will Stomache ache level: medium Issue 4: Multiple VFS copies with different dirsIt's possible, in both our impl and the original, to provide a
This last point is what's bugging me the most. On the surface, it Stomache ache level: high That said...Perhaps i'm either over- or under-thinking this. |
Beta Was this translation helpful? Give feedback.
-
Here's an attempt at pseudo-code for the transparency transform (and temporary file removal): Prerequisites: metadata file access handle (no other access handles)
if no metadata copy exists:
make atomic and durable metadata copy
truncate metadata file
for each record in metadata copy:
# a valid record has an OPFS path and a correct digest.
if record is not valid continue
if record has an SQLite path:
if record file type is not temporary:
if record OPFS path is not transparent:
set new OPFS path to match SQLite path
if file exists at old OPFS path:
# move from opaque to transparent (may require directory creation)
move file from old OPFS path to new OPFS path
elif file exists at new OPFS path:
no move needed
else: # file missing
# This error case is a problem. If we just continue and lose the file
# then we have reduced capacity. If we create a file with the old opaque
# name and crash, the new empty file will be considered valid which
# is unacceptable. If we create a file with a new opaque name and crash,
# we have increased capacity. This should be so rare I'm going to
# go with just continuing - applications can check capacity and adjust.
log error
continue?
else: # record OPFS path is transparent
no move needed
else: # record file type is temporary
# remove unneeded temporary file
assert OPFS path is not transparent
unset SQLite path
else: # record has no SQLite path:
if record OPFS path is not transparent:
no move needed
else: # record OPFS path is transparent
set new OPFS path to nonce
if file exists at old OPFS path:
# move from transparent to opaque
move file from old OPFS path to new OPFS path
elif file exists at new OPFS path:
no move needed
else: # file missing
log error
create file at new OPFS path
append possibly updated metadata record to metadata file
# finished iterating records
flush metadata file
delete metadata copy atomically This procedure needs to be idempotent, including if any prior invocations are interrupted. That is why this code is effectively journaling the metadata file changes, and why there is a nonce in the metadata record. On VFS start, this should be run before scanning for new or deleted transparent files. In normal operation, files should not go missing. The mechanism where this can occur is if a user deletes a file after a crash during a transparency transform that moved that file and before the next transparency transform fixes it. This should be exceedingly rare and can probably be considered mis-usage. Applications can check capacity on start and adjust if necessary. |
Beta Was this translation helpful? Give feedback.
-
Sorry to rudely butt in without anything useful to contribute, but can you confirm my understanding that this work is needed to be able to import/open an existing sqlite |
Beta Was this translation helpful? Give feedback.
-
One difference (of many) between the original OriginPrivateFileSystemVFS I wrote and the newer AccessHandlePoolVFS is filesystem transparency: in the original VFS, the file structure in OPFS is exposed to SQLite as-is. That is, if you open/create the file "myDB.db" in SQLite, that opens/creates a file named "myDB.db" in OPFS with the expected contents, no more and no less. This is a nice feature to have - it's really easy to understand how your data is stored, and you can use the OPFS API directly for import/export.
AccessHandlePoolVFS doesn't have filesystem transparency. It uses randomly generated filenames in a single directory, and it prepends its own metadata header to the data that SQLite reads and writes. The metadata includes a somewhat obscure digest function, and so dealing with all this from outside SQLite is a bit of a hassle. I describe this VFS as implementing a filesystem using OPFS as a device, instead of exposing a filesystem. There are good reasons for all of this, but no filesystem transparency is an unfortunate drawback.
Can this be fixed or mitigated? I think the answer is yes. Here's a sketch:
There's no getting around the issue that filesystem transparency must be violated while SQLite is using AccessHandlePoolVFS. This VFS has to open all the files it uses before SQLite calls are made, which is before it knows what filenames SQLite is going to use. That is because opening an OPFS access handle is asynchronous and any AccessHandlePoolVFS methods that SQLite calls must be synchronous. So during use, in general the SQLite filenames won't necessarily match the OPFS filenames.
But what about before and after SQLite is using the VFS? Is there a way to transform the OPFS file structure so it is effectively transparent when the VFS is inactive but supports everything it needs to when the VFS is active? I think...mostly yes, to a practically useful extent.
Here's what we need to do:
For (1), if the metadata aren't attached to their files they will have to go into another OPFS file or files. Storing all the metadata (which now will each have to also contain the OPFS path) in one special file, say
$ROOT/.ahp/metadata
, should work as long as each metadata record is a multiple of the sector size, so that any write error when updating one record won't damage adjacent records.For (2), the VFS already scans all the files in a specific directory (call it
$ROOT
) to acquire its access handles so changing that to a recursive scan shouldn't be difficult. We would likely add a special directory, e.g.$ROOT/.ahp/
exempt from the scan, to store VFS-specific files such as metadata and as yet unassociated filesystem files (OPFS files that aren't being used as SQLite files). Unlike the current scan, which assumes unrecognized files aren't associated with any SQLite file, the new scan would create a metadata record matching a new file with its path.For (3), it seems straightforward but is a little tricky. First of all, although some browsers do support a FileSystemHandle.move() method (Chrome, Safari) to move/rename a file, that is not yet in the OPFS spec so the fallback is copy-and-delete (Update 7/11/23: All major browsers seem to have a working move() method).
Second, as previously noted it will be common for SQLite filenames and OPFS filenames not to match. Under particularly convoluted sequences of file operations, it is possible for a SQLite filename "A" to map to OPFS filename "B" and SQLite filename "B" to map to OPFS filename "A". A naive renaming implementation could accidentally overwrite one of these files.On reflection, it will be much simpler if an OPFS file outside$ROOT/.ahp
can only be associated to a SQLite file with the same path. That avoids the weird cases completely, and probably won't have any negative impact under typical usage.For (4), hmm, that sounds like a job for...a database. It's turtles all the way down! Actually, while a special OPFS file written in sector-size chunks with careful flush() calls could be made to work, IndexedDB should do fine here if needed to ensure that an incomplete transformation can be detected and completed.
So I think there are some interesting problems here and some important details to get right, likely including some I have missed, but nothing seems insurmountable. Applications would probably do the transparency transformation (along with SQLite temporary file removal) both at startup and at shutdown - shutdown is really when you want it to happen but web apps aren't necessarily exited cleanly.
I don't have a need for filesystem transparency myself so I have no immediate plans to implement this. Anyone interested is welcome to give it a try.
Beta Was this translation helpful? Give feedback.
All reactions