Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Incremental dir upload v2 #745

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

Gozala
Copy link
Contributor

@Gozala Gozala commented Apr 11, 2023

No description provided.

@alanshaw
Copy link
Member

alanshaw commented Apr 26, 2023

Could we do something simplier where we just pipe Files into an encoder?

Currently we pass an array of File but what if we could just pipe files into the stream and end up with (dream) code like:

// BYO stream what yields File(s)
class FileStream extends ReadableStream { /* ... */ }
// We'd build this class and use internally as well with the existing stuff
class UnixFSDirectoryEncoderStream extends TransformStream { /* ... */ } 

await new FileStream() // read File(s)
  .pipeThrough(new UnixFSDirectoryEncoderStream()) // write File(s), read Blocks
  .pipeThrough(new ShardingStream(options)) // write Blocks, read CARs
  .pipeThrough(new ShardStoringStream(conf, options)) // write CARs, read CARMeta
  .pipeTo(new WritableStream({ write: meta => console.log(meta.cid) }))

You have to keep directories open until the end because you don't know when you've finished a directory but that isn't different to what we have now. However, you can write files async and not buffer them in memory.

@Gozala
Copy link
Contributor Author

Gozala commented Apr 26, 2023

@alanshaw I think it is perhaps possible to build something like what you've suggested on top of the API implemented here or some other API that gives you more control. I do not personally like stream piping API as a baseline for the following reasons:

  1. I get very little control over it. In fact that is what I disliked about unixfs-importer API the most more specifically:
    1. I have files I want to encode and get a CIDs for, but I can only hope that files going in will produce CIDs in the same order out and even if they do there is no good way to map two. In comparison file.close() //=> cid gives you exactly that.
    2. I have no control of concurrency whether multiple files get built-up concurrency or sequentially is out of my hands. You could possibly add more config options for that but it still not ideal because it's either concurrent or sequential. With implemented API I get to decide which files I write concurrently and which sequentially by arbitrary logic.
    3. Sometimes I know when I exhaust directory and perhaps want to finish it before moving to the next one. With implemented API it's straight forward to do, with stream pipelines you just don't know. You could add some instructions into pipeline input to control that, but at that point you're better of with the API implemented IMO.
  2. I don't actually find proposed API clearer or simpler. I'm sure this is subjective, yet I've had to re-read snippet 5 times and consider all comments before I finally grokked it. I'm sure some will find it a lot more intuitive, but not everyone so I'm not sure it's net win. On the other hand I think following is really straight forward (am I biased ?) you just get a filesystem writer and you create and write files / directories as you find fit. Once function resolves writer is closed and you're done.
 const task = async (writer) => {
   const onetxt = writer.createFile(files[0].name)
   onetxt.write(new Uint8Array(await files[0].arrayBuffer()))
   await onetxt.close()

   const twotxt = writer.createFile(files[1].name)
   await twotxt.write(new Uint8Array(await files[1].arrayBuffer()))
}

It does not concern itself how you obtain files which order or where from nor what representation they are in, which gives you flexibility there. UnixFSDirectoryEncoderStream on the other hand really wants you to do the reduce over the incoming files which is great when that is what you want but really sucks whenever it's something slightly different.

All that said I'm not going to die on this hill, and if you're convinced proposed API is the way to go I'm fine with that too. However that is not type of API I would personally design for the reasons outlined.

P.S.: As side note for years (predating react and friends) I have been pursuing FRP and even gave unconf talk about them in first jsconf. Ultimately myself and few others pursuing this came to conclusion that FRP / stream pipelining distracts you from actual business logic and Evan has captured why perfectly in A farewell to frp post.

@alanshaw
Copy link
Member

I did expect you might say something like that. I'm not wedded to my alternative proposal and what's here will definitely work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants