Skip to content

xet upload: parrallelize xorb/shard creation #1704

@coyotte508

Description

@coyotte508

They can be parallelized, either in current execution context (to offset network delays) or in separate workers (to offset both network costs and chuning/hashing costs)

Current unparallelized code:

for await (const event of uploadShards(
(async function* () {
for (const obj of json.objects) {
const op = shaToOperation.get(obj.oid);
if (!op || !obj.actions?.upload) {
continue;
}
abortSignal?.throwIfAborted();
yield { content: op.content, path: op.path, sha256: obj.oid };
}
})(),
{
customFetch: params.fetch ?? fetch,
accessToken,
hubUrl: params.hubUrl ?? HUB_URL,
repo: repoId,
// todo: maybe leave empty if PR?
rev: params.branch ?? "main",
}
)) {
if (event.event === "file") {
yield {
event: "fileProgress",
path: event.path,
progress: 1,
state: "uploading",
};
} else if (event.event === "fileProgress") {
yield {
event: "fileProgress",
path: event.path,
progress: event.progress,
state: "uploading",
};
}
}

Downside is that it will create more xorbs/shards, but xet team is working on shard consolidation cc @assafvayner

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions