feat: Incremental dir upload v2 #745

Gozala · 2023-04-11T19:42:36Z

No description provided.

alanshaw · 2023-04-26T13:25:19Z

Could we do something simplier where we just pipe Files into an encoder?

Currently we pass an array of File but what if we could just pipe files into the stream and end up with (dream) code like:

// BYO stream what yields File(s)
class FileStream extends ReadableStream { /* ... */ }
// We'd build this class and use internally as well with the existing stuff
class UnixFSDirectoryEncoderStream extends TransformStream { /* ... */ } 

await new FileStream() // read File(s)
  .pipeThrough(new UnixFSDirectoryEncoderStream()) // write File(s), read Blocks
  .pipeThrough(new ShardingStream(options)) // write Blocks, read CARs
  .pipeThrough(new ShardStoringStream(conf, options)) // write CARs, read CARMeta
  .pipeTo(new WritableStream({ write: meta => console.log(meta.cid) }))

You have to keep directories open until the end because you don't know when you've finished a directory but that isn't different to what we have now. However, you can write files async and not buffer them in memory.

Gozala · 2023-04-26T15:39:07Z

@alanshaw I think it is perhaps possible to build something like what you've suggested on top of the API implemented here or some other API that gives you more control. I do not personally like stream piping API as a baseline for the following reasons:

I get very little control over it. In fact that is what I disliked about unixfs-importer API the most more specifically:
1. I have files I want to encode and get a CIDs for, but I can only hope that files going in will produce CIDs in the same order out and even if they do there is no good way to map two. In comparison file.close() //=> cid gives you exactly that.
2. I have no control of concurrency whether multiple files get built-up concurrency or sequentially is out of my hands. You could possibly add more config options for that but it still not ideal because it's either concurrent or sequential. With implemented API I get to decide which files I write concurrently and which sequentially by arbitrary logic.
3. Sometimes I know when I exhaust directory and perhaps want to finish it before moving to the next one. With implemented API it's straight forward to do, with stream pipelines you just don't know. You could add some instructions into pipeline input to control that, but at that point you're better of with the API implemented IMO.
I don't actually find proposed API clearer or simpler. I'm sure this is subjective, yet I've had to re-read snippet 5 times and consider all comments before I finally grokked it. I'm sure some will find it a lot more intuitive, but not everyone so I'm not sure it's net win. On the other hand I think following is really straight forward (am I biased ?) you just get a filesystem writer and you create and write files / directories as you find fit. Once function resolves writer is closed and you're done.

 const task = async (writer) => {
   const onetxt = writer.createFile(files[0].name)
   onetxt.write(new Uint8Array(await files[0].arrayBuffer()))
   await onetxt.close()

   const twotxt = writer.createFile(files[1].name)
   await twotxt.write(new Uint8Array(await files[1].arrayBuffer()))
}

It does not concern itself how you obtain files which order or where from nor what representation they are in, which gives you flexibility there. UnixFSDirectoryEncoderStream on the other hand really wants you to do the reduce over the incoming files which is great when that is what you want but really sucks whenever it's something slightly different.

All that said I'm not going to die on this hill, and if you're convinced proposed API is the way to go I'm fine with that too. However that is not type of API I would personally design for the reasons outlined.

P.S.: As side note for years (predating react and friends) I have been pursuing FRP and even gave unconf talk about them in first jsconf. Ultimately myself and few others pursuing this came to conclusion that FRP / stream pipelining distracts you from actual business logic and Evan has captured why perfectly in A farewell to frp post.

alanshaw · 2023-04-26T19:03:58Z

I did expect you might say something like that. I'm not wedded to my alternative proposal and what's here will definitely work well.

Gozala added 2 commits April 5, 2023 14:53

feat!: api for incremental uploads

6dd6b3c

revert unintended change

2565fee

travis mentioned this pull request Apr 12, 2023

feat!: api for incremental uploads #742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Incremental dir upload v2 #745

feat: Incremental dir upload v2 #745

Gozala commented Apr 11, 2023

alanshaw commented Apr 26, 2023 •

edited

Loading

Gozala commented Apr 26, 2023 •

edited

Loading

alanshaw commented Apr 26, 2023

feat: Incremental dir upload v2 #745

Are you sure you want to change the base?

feat: Incremental dir upload v2 #745

Conversation

Gozala commented Apr 11, 2023

alanshaw commented Apr 26, 2023 • edited Loading

Gozala commented Apr 26, 2023 • edited Loading

alanshaw commented Apr 26, 2023

alanshaw commented Apr 26, 2023 •

edited

Loading

Gozala commented Apr 26, 2023 •

edited

Loading