Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange slowness about crypto.webcrypto.subtle.deriveBits, when called with identical inputs a second time #56635

Open
fabiospampinato opened this issue Jan 17, 2025 · 6 comments

Comments

@fabiospampinato
Copy link

fabiospampinato commented Jan 17, 2025

Version

22.2.0

Platform

Darwin MBP.local 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 arm64

Subsystem

No response

What steps will reproduce the bug?

Execute this:

import webcrypto from 'tiny-webcrypto';
import encryptor from 'tiny-encryptor';

const _deriveBits = webcrypto.subtle.deriveBits.bind(webcrypto.subtle);
webcrypto.subtle.deriveBits = async function ( ...args ) {
  // console.log(args);
  console.time('deriveBits');
  const res = await _deriveBits(...args);
  console.timeEnd('deriveBits');
  return res;
};

const SECRET = 'P@ssword!';
const SAMPLE_1GB = new Uint8Array ( 1024 * 1024 * 1024 );
const enc = await encryptor.encrypt ( SAMPLE_1GB, SECRET );
const dec = await encryptor.decrypt ( enc, SECRET );

I see the following output:

~/Desktop/repro ❯ node --version
v22.2.0
~/Desktop/repro ❯ node index.mjs
deriveBits: 1.262ms
deriveBits: 69.418ms
~/Desktop/repro ❯ bun index.mjs
[0.43ms] deriveBits
[0.70ms] deriveBits

I also tried bundling this little repro for the browser and the problem doesn't manifest there either.

Basically through the course of executing that code we end up calling webcrypto.subtle.deriveBits twice, with identical arguments, reported below (but you can log these yourself by uncommenting the console.log in the repro), also asking Node to do very little work to begin with (just one iteration of the derivation function, not a million), and crucially as far as I can see there should be nothing else running concurrently that is blocking the main thread, but still the second execution in this specific scenario is always way slower than the first one, which seems symptomatic of some underlying issue in Node.

[
  {
    name: "PBKDF2",
    salt: Uint8Array(32) [ 242, 78, 191, 112, 241, 109, 103, 131, 247, 218, 234, 20, 139, 106, 24, 50, 87, 41, 33, 23, 250, 89, 1, 228, 230, 71, 135, 106, 133, 145, 86, 63 ],
    iterations: 1,
    hash: {
      name: "SHA-256",
    },
  }, CryptoKey {
    type: "secret",
    extractable: false,
    algorithm: {
      name: "PBKDF2",
    },
    usages: [ "deriveBits" ],
  }, 256
]

I think this is worth a look just because of the wild difference in performance between the two calls, but also since we are dealing with crypto stuff it's probably worth making sure we aren't messing up something important internally.

How often does it reproduce? Is there a required condition?

Always, just execute the code.

What is the expected behavior? Why is that the expected behavior?

The expected behavior is that calling the same function twice takes about the same amount of time basically.

It could be that the GC is triggered during the second call for some reason? But it seems unlikely that if that's the problem it would reproduce pretty much exactly every time, and also ~70ms spent on a GC for what? There are relatively few objects getting allocated here in the first place, at least in userland as far as I can see.

What do you see instead?

I see the second call always taking much longer, which shouldn't be happening.

Additional information

No response

@marco-ippolito marco-ippolito added crypto Issues and PRs related to the crypto subsystem. webcrypto and removed crypto Issues and PRs related to the crypto subsystem. labels Jan 17, 2025
@pandeykushagra51
Copy link

pandeykushagra51 commented Feb 14, 2025

I tried on below platform
Darwin Mac.lan 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:24 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6030 arm64
and able to reproduce

I tried overriding node garbage collector and logged the time involved in garbage collection but it was too less in comparison to decryption time

GC was called total 9 times and this was log produced

GC Time taken: 0.410 ms
GC Time taken: 1.451 ms
GC Time taken: 0.167 ms
GC Time taken: 0.466 ms

deriveBits: 1.264 ms

GC Time taken: 0.439 ms
GC Time taken: 2.239 ms

deriveBits: 42.817 ms

GC Time taken: 0.182 ms
GC Time taken: 0.432 ms
GC Time taken: 0.466 ms

so it can be concluded that only about 2.5ms of time is used for garbage collection between encryption end and decryption start step.

@pandeykushagra51
Copy link

pandeykushagra51 commented Feb 17, 2025

After digging further found that it is not decrypt operation which is taking this much timing,

I did benchmarking of derivebits in two mode:

  1. Before running derivebits opearation, added a timeout which will ensure that all earlier pending task is completed so event loop and main thread is not busy in any work.
  2. No timeout or delay

below is the code from which I tested these:

import webcrypto from "tiny-webcrypto";
import encryptor from "tiny-encryptor";

const _deriveBits = webcrypto.subtle.deriveBits.bind(webcrypto.subtle);
webcrypto.subtle.deriveBits = async function (...args) {
  console.time("deriveBits_work");

  const result = await _deriveBits(...args);

  console.timeEnd("deriveBits_work");

  return result;
};

async function runTest() {
  const SECRET = "P@ssword!";
  const SAMPLE_1GB = new Uint8Array(1024 * 1024 * 1024);

  console.log("\n=== Starting encryption ===");
  console.time("encryption");
  const enc = await encryptor.encrypt(SAMPLE_1GB, SECRET);
  console.timeEnd("encryption");

  console.log("\n=== Starting decryption ===");
  // Log event loop state
  console.log("Event loop state after encryption:");
  console.log(process.memoryUsage());

  // Test both with and without delay
  for (const withDelay of [false, true]) {
    if (withDelay) {
      console.log("\n--- Testing with delay ---");
      await new Promise((resolve) => setTimeout(resolve, 100));
    } else {
      console.log("\n--- Testing without delay ---");
    }

    // Log event loop state before decryption
    console.log("Event loop state before decryption:");
    console.log(process.memoryUsage());

    console.time("decryption");
    const dec = await encryptor.decrypt(enc, SECRET);
    console.timeEnd("decryption");
  }
}

await runTest().catch(console.error);

and following are the logs produced:


=== Starting encryption ===
deriveBits_work: 0.372ms
encryption: 800.944ms

=== Starting decryption ===
Event loop state after encryption:
{
  rss: 2453848064,
  heapTotal: 6569984,
  heapUsed: 4589744,
  external: 3223105059,
  arrayBuffers: 2147500384
}

--- Testing without delay ---
Event loop state before decryption:
{
  rss: 2453880832,
  heapTotal: 6569984,
  heapUsed: 4623072,
  external: 3223105099,
  arrayBuffers: 2147500384
}
deriveBits_work: 93.769ms
webcrypto.subtle.importKey: 0.067ms
webcrypto.subtle.decrypt: 963.601ms
decryption: 1.059s

--- Testing with delay ---
Event loop state before decryption:
{
  rss: 2200682496,
  heapTotal: 7618560,
  heapUsed: 4696704,
  external: 4296846932,
  arrayBuffers: 2147500352
}
deriveBits_work: 0.183ms
webcrypto.subtle.importKey: 0.107ms
webcrypto.subtle.decrypt: 439.814ms
decryption: 440.564ms

so important observation is that:

derveibits with delay: 93.769ms
derveibits without delay: 0.183ms

so basically it is overhead of cleanup fronm encryption job which is causing actual delay.

If we look the process as whole on bun vs node overall decryption time is smaller in nodejs in comparison to bun js.

Also I would like to add that the sync version of deriveBits is completed much faster than that of async job for above operation.

I would love to contribute further as I got good understanding of codebase during digging up the details.
@fabiospampinato @marco-ippolito please let me know if you need need any further information, I will be happy to resolve and respond.
Also please let me know if you want me to implement any specific optimisation for these or any related work.

@fabiospampinato
Copy link
Author

fabiospampinato commented Feb 17, 2025

so basically it is overhead of cleanup fronm encryption job which is causing actual delay.

I think that may be its own problem 🤔 what are we even cleaning up here? Are we cloning the input buffer just to process it asynchronously super safely or something?

@pandeykushagra51
Copy link

I will check and let you know

@pandeykushagra51
Copy link

pandeykushagra51 commented Feb 18, 2025

got some more data, after checking performance, before decrypt operation start there is some idle time for main thread majority of derivebits timing is influenced by that idle time
below is the the observation for 4 test performed

idle time just before decrypt operation      derivebits time             Actual derivebits time
                 218.08ms                        218.216ms               218.216 - 218.08 = 0.18ms
                 100.54ms                        100.84ms                100.84 - 100.54 = 0.3ms
                 113.08ms                        113.312ms               113.312 - 113.08 = 0.25ms
                 340.42ms                        340.801ms               340.801-340.42 = 0.39ms
Image Image Image Image

@pandeykushagra51
Copy link

pandeykushagra51 commented Feb 23, 2025

@fabiospampinato after further debugging, found the root cause:
Actually in your library tiny-encryptor you are using concat operation which is very costly and blocking the main thread.
Even though derivebits operation is completed on worker thread in less than 1 ms, the work related to concat operation inside your lib is blocking the main thread which is preventing the derivebits call back to be executed on main thread and hence it seems derivebits is taking longer time:

Also if we take overall execution time of the above js programme, nodejs is completed much faster than bun.js
i.e avg time for completion with nodejs: 1.552s
avg time for completion with bun: 2.8s
This is result on the device I shared above.

I tried moving concat operation to worker and here is the result for completion time of derivebits:

deriveBits: 1.274ms
deriveBits: 0.207ms

deriveBits: 1.28ms

deriveBits: 1.238ms
deriveBits: 0.19ms

deriveBits: 1.338ms
deriveBits: 0.173ms

deriveBits: 1.253ms
deriveBits: 0.216ms

You can use this code to reproduce:

import { parentPort } from 'node:worker_threads';

parentPort.on('message', (data) => {
    const [version, salt, rounds, iv, encrypted] = data;
    const result = new Uint8Array(version.length + salt.length + rounds.length + iv.length + encrypted.length);
    let offset = 0;
    
    result.set(version, offset);
    offset += version.length;
    result.set(salt, offset);
    offset += salt.length;
    result.set(rounds, offset);
    offset += rounds.length;
    result.set(iv, offset);
    offset += iv.length;
    result.set(encrypted, offset);
    
    parentPort.postMessage(result);
});

and inside your encrypt operation replace below code
const archive = concat([version, salt, Int32.encode(rounds), iv, encryptedUint8]);

with

const worker = new Worker('./node_modules/tiny-encryptor/dist/concatWorker.js');
    
    const archive = await new Promise((resolve, reject) => {
        worker.on('message', (result) => {
            // console.timeEnd("encryt worker");
            worker.terminate();
            resolve(result);
        });
        
        worker.on('error', (error) => {
            worker.terminate();
            reject(error);
        });
        // console.time("encryt worker");
        worker.postMessage([version, salt, Int32.encode(rounds), iv, encryptedUint8]);
    });

But this also add one more question:
Even though concat operation is completed much earlier, why it is still blocking main thread. What type of work post concat is causing blocking of main thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants