Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Correctly map deduplicated chunk indices when setting content #54

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 22 additions & 19 deletions src/EncryptedModels.ts
Original file line number Diff line number Diff line change
Expand Up @@ -309,31 +309,34 @@ class EncryptedRevision<CM extends CollectionItemCryptoManager> {
chunks.push([hash, buf]);
}

// Shuffle the items and save the ordering if we have more than one
if (chunks.length > 0) {
// Shuffle the items, deduplicate and save the ordering
if (chunks.length > 1) {
const indices = shuffle(chunks);
const uniqueChunksMap = new Map<string, [base64, Uint8Array]>();

// Filter duplicates and construct the indice list.
const uidIndices = new Map<string, number>();
chunks = chunks.filter((chunk, i) => {
chunks.forEach((chunk) => {
const uid = chunk[0];
const previousIndex = uidIndices.get(uid);
if (previousIndex !== undefined) {
indices[i] = previousIndex;
return false;
} else {
uidIndices.set(uid, i);
return true;
if (!uniqueChunksMap.has(uid)) {
uniqueChunksMap.set(uid, chunk);
}
});

// If we have more than one chunk we need to encode the mapping header in the last chunk
if (indices.length > 1) {
// We encode it in an array so we can extend it later on if needed
const buf = msgpackEncode([indices]);
const hash = toBase64(cryptoManager.calculateMac(buf));
chunks.push([hash, buf]);
}
const chunkKeys = [...uniqueChunksMap.keys()];

// Change the original (shuffled) indices to point at the deduplicated chunks
const newIndices = indices.map((i) => {
const [id] = chunks[i];
return chunkKeys.indexOf(id);
});

chunks = [...uniqueChunksMap.values()];
Comment on lines +324 to +332
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to review this. It's a very intrusive change in a very sensitive place.

I think the change is overall correct, though one thing that I'm very concerned about is this highlighted code I'm responding to. I'm not sure that per spec keys and values are necessarily in the same order and that it's deterministic. It could very well be, though I'm unsure. I'd much rather we created both in one go.

I'm also not sure what uniqueChunksMap actually does to the ordering.

I'll revert back to Rust, this is how it's done there, do you think that makes sense?

            // Filter duplicates and construct the indice list.
            let mut uid_indices: HashMap<String, usize> = HashMap::new();
            chunks = chunks
                .into_iter()
                .enumerate()
                .filter_map(|(i, chunk)| {
                    let uid = &chunk.0;
                    match uid_indices.get(uid) {
                        Some(previous_index) => {
                            indices[i] = *previous_index;
                            None
                        }
                        None => {
                            uid_indices.insert(uid.to_string(), i);
                            Some(chunk)
                        }
                    }
                })
                .collect();

            // If we have more than one chunk we need to encode the mapping header in the last chunk
            if indices.len() > 1 {
                // We encode it in an array so we can extend it later on if needed
                let buf = rmp_serde::to_vec_named(&(indices,))?;
                let hash = to_base64(&crypto_manager.0.calculate_mac(&buf)?)?;
                chunks.push(ChunkArrayItem(hash, Some(buf)));
            }


// We encode it in an array so we can extend it later on if needed
const buf = msgpackEncode([newIndices]);
const hash = toBase64(cryptoManager.calculateMac(buf));

// Append a chunk wth the mapping
chunks.push([hash, buf]);
}

// Encrypt all of the chunks
Expand Down
35 changes: 35 additions & 0 deletions src/Etebase.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1700,3 +1700,38 @@ it.skip("Login and password change", async () => {
const etebase4 = await Etebase.Account.login(USER2.email, USER2.password, testApiBase);
await etebase4.logout();
}, 30000);

describe("Chunking files", () => {
it("Duplicate Chunks", async () => {
const collectionManager = etebase.getCollectionManager();
const col = await collectionManager.create(colType, {}, "");
const buf = randomBytesDeterministic(10 * 1024, new Uint8Array(32)); // 10kb of psuedorandom data
const content = new Uint8Array([...buf, ...buf, ...buf, ...buf]);
await col.setContent(content);
await collectionManager.transaction(col);
const out = await col.getContent();
expect(out).toEqual(content);
});

it("Regular Chunks", async () => {
const collectionManager = etebase.getCollectionManager();
const col = await collectionManager.create(colType, {}, "");
const buf = randomBytesDeterministic(100 * 1024, new Uint8Array(32));
const content = new Uint8Array(buf);
await col.setContent(content);
await collectionManager.transaction(col);
const out = await col.getContent();
expect(out).toEqual(content);
});

it("Small file, no chunks", async () => {
const collectionManager = etebase.getCollectionManager();
const col = await collectionManager.create(colType, {}, "");

const content = new Uint8Array([1]);
await col.setContent(content);
await collectionManager.transaction(col);
const out = await col.getContent();
expect(out).toEqual(content);
});
});