Skip to content

Conversation

@CyanChanges
Copy link
Contributor

@CyanChanges CyanChanges commented Jul 20, 2025

Supersede #1158

Waiting for

Summary

22% more performant async ops,
(by resolve Promises natively)
and less memory usage (no need of preparing an array of results)

const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')
function tImm(tag, p) {
  const symbol = Symbol('k.notImm')
  function check(v) {
    const imm = v !== symbol
    // ops.op_print(`@${tag} ${imm}\n`)
  }
  const p1 = Promise.any([p, Promise.resolve(symbol)])
  p1.then(check, check)
}
for (let i = 0; i < 50000; i++) {
tImm("op_void_async", ops.op_void_async())
tImm("op_void_async_deferred", ops.op_void_async_deferred())
tImm("op_error_async", ops.op_error_async())
}
// ops.op_print("Hello World\n")
image

@CyanChanges
Copy link
Contributor Author

CyanChanges commented Jul 20, 2025

Help wanted

async stacktrace for error is gone rn

image

I need do the same but in native:
image

somewhat solution: (obviously bad, but at least it kinda works)
image

Don't have a bindings for this, also this is private i guess (https://github.com/v8/v8/blob/e3529d092163dcfbbb454257dc4103bdebfeda48/src/execution/messages.h#L150

This one should be public?, but still no binding for it i guess: https://github.com/v8/v8/blob/e3529d092163dcfbbb454257dc4103bdebfeda48/include/v8-exception.h#L69

I didn't found a considerable solutions, maybe I need to add bindings to rusty_v8 first?
UPDATE: denoland/rusty_v8#1818

op_unref_op(promiseId);
}

function refOpPromise(promise) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, verify purpose of refOp and refOpPromise and if they can be de-duped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since promiseId is now private so it's not able to be retrieved in JS side directly so I add ops for ref/unref promises directly

self.promises.borrow().get(promise_id as usize).is_some()
}

fn _get_promise<'s>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, remove if not needed before landing

queue,
completed_waker,
arena: Default::default(),
promises: Default::default(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be pre-allocated

@bartlomieju
Copy link
Member

@CyanChanges can you please provide result when running this code in your PR and on main?

// bench.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  for (let i = 0; i < 500000; i++) {
    await op();
  }
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench.js
// bench_batched.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  const batched = new Array(500000).fill(op);
  await Promise.all(batched.map((fn) => fn()));
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench_batched.js

From our benchmarks this PR appears to be significantly slower than main:

// this PR
target/release/dcore bench.js
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 198ms
op_void_async_deferred took 1166ms
// main

./dcore_main bench.js                                                           [INS]
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 35ms
op_void_async_deferred took 974ms

@CyanChanges
Copy link
Contributor Author

@CyanChanges can you please provide result when running this code in your PR and on main?

// bench.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  for (let i = 0; i < 500000; i++) {
    await op();
  }
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench.js
// bench_batched.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  const batched = new Array(500000).fill(op);
  await Promise.all(batched.map((fn) => fn()));
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench_batched.js

From our benchmarks this PR appears to be significantly slower than main:

// this PR
target/release/dcore bench.js
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 198ms
op_void_async_deferred took 1166ms
// main

./dcore_main bench.js                                                           [INS]
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 35ms
op_void_async_deferred took 974ms

@CyanChanges can you please provide result when running this code in your PR and on main?

// bench.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  for (let i = 0; i < 500000; i++) {
    await op();
  }
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench.js
// bench_batched.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  const batched = new Array(500000).fill(op);
  await Promise.all(batched.map((fn) => fn()));
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench_batched.js

From our benchmarks this PR appears to be significantly slower than main:

// this PR
target/release/dcore bench.js
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 198ms
op_void_async_deferred took 1166ms
// main

./dcore_main bench.js                                                           [INS]
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 35ms
op_void_async_deferred took 974ms

I just switched to NixOS recently, so I needs to configure the dev environment and rebuild it. It may take a while tho

@CyanChanges
Copy link
Contributor Author

@CyanChanges can you please provide result when running this code in your PR and on main?

// bench.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  for (let i = 0; i < 500000; i++) {
    await op();
  }
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench.js
// bench_batched.js
const ops = Deno.core.ops;
// ops.op_print(Object.keys(ops).join('\n')+'\n')

async function bench(name, op) {
  const start = new Date();
  const batched = new Array(500000).fill(op);
  await Promise.all(batched.map((fn) => fn()));
  const end = new Date();
  const duration = end - start;
  ops.op_print(`${name} took ${duration}ms\n`);
}

await bench("op_void_async", ops.op_void_async)
await bench("op_void_async_deferred", ops.op_void_async_deferred)
target/release/dcore bench_batched.js

From our benchmarks this PR appears to be significantly slower than main:

// this PR
target/release/dcore bench.js
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 198ms
op_void_async_deferred took 1166ms
// main

./dcore_main bench.js                                                           [INS]
🛑 deno_core binary is meant for development and testing purposes.
Run bench.js
op_void_async took 35ms
op_void_async_deferred took 974ms
image

@CyanChanges
Copy link
Contributor Author

CyanChanges commented Aug 4, 2025

image Yeah seems the improvements of the PR are mainly from `op_error_async`

Copy link
Member

@littledivy littledivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using internalized strings and re-using the v8::Private. It cuts down some overhead but still not as fast as main:

diff --git a/core/runtime/op_driver/mod.rs b/core/runtime/op_driver/mod.rs
index 4a9eaa8..fe66a8f 100644
--- a/core/runtime/op_driver/mod.rs
+++ b/core/runtime/op_driver/mod.rs
@@ -21,7 +21,6 @@ pub use self::op_results::OpResult;
 use self::op_results::PendingOpInfo;
 pub use self::op_results::V8OpMappingContext;
 pub use self::op_results::V8RetValMapper;
-use crate::runtime::v8_static_strings::INTERNAL_PROMISE_ID;

 #[derive(Default)]
 /// Returns a set of stats on inflight ops.
@@ -37,6 +36,10 @@ pub enum OpScheduling {
   Deferred,
 }

+thread_local! {
+static PRIVATE_PROMISE_ID: std::cell::OnceCell<v8::Global<v8::Private>> = std::cell::OnceCell::new();
+}
+
 /// `OpDriver` encapsulates the interface for handling operations within Deno's runtime.
 ///
 /// This trait defines methods for submitting ops and polling readiness inside of the
@@ -55,8 +58,21 @@ pub(crate) trait OpDriver<C: OpMappingContext = V8OpMappingContext>:
     &self,
     scope: &mut v8::HandleScope<'s>,
   ) -> v8::Local<'s, v8::Private> {
-    let name = INTERNAL_PROMISE_ID.v8_string(scope).unwrap();
-    v8::Private::for_api(scope, Some(name))
+    PRIVATE_PROMISE_ID.with(move |cell| {
+      let global = cell.get_or_init(|| {
+        let internalized = v8::String::new_from_one_byte(
+          scope,
+          b"a",
+          v8::NewStringType::Internalized,
+        )
+        .unwrap();
+
+        let private = v8::Private::for_api(scope, Some(internalized));
+        v8::Global::new(scope, private)
+      });
+
+      v8::Local::new(scope, global)
+    })
   }

   fn _get_promise<'s>(
diff --git a/core/runtime/v8_static_strings.rs b/core/runtime/v8_static_strings.rs
index 1bee095..c5c0528 100644
--- a/core/runtime/v8_static_strings.rs
+++ b/core/runtime/v8_static_strings.rs
@@ -33,7 +33,6 @@ v8_static_strings!(
   NAME = "name",
   OPS = "ops",
   RESOLVE = "resolve",
-  INTERNAL_PROMISE_ID = "Promise#Deno.core.internalPromiseId",
   STACK = "stack",
   URL = "url",
   WASM_INSTANCE = "WasmInstance",
$ ./dcore_main test.mjs                                                  [INS]
🛑 deno_core binary is meant for development and testing purposes.
Run test.mjs
op_void_async took 45ms
op_void_async_deferred took 1270ms
$ target/release/dcore test.mjs                                           [INS]
🛑 deno_core binary is meant for development and testing purposes.
Run test.mjs
op_void_async took 174ms
op_void_async_deferred took 1394ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants