Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ Action: Apply loop unrolling for max reductions in high-frequency typed array op
## 2024-11-20 - Softmax math.exp 8x unrolling with local var cache
Learning: Unrolling the `Math.exp` accumulation loop to 8x and caching the multiplication `(tokenLogits[i] - maxLogit) * invTemp` into local variables before passing to `Math.exp` yields a measurable performance improvement (~4%) over the previous 4x unrolled implementation in the V8 engine, by reducing property access and allowing better instruction-level parallelism.
Action: Utilize 8x loop unrolling paired with local variable caching for tight floating-point accumulation loops over TypedArrays.

## 2024-11-20 - BigInt64Array initialization optimization
Learning: Using `BigInt64Array.from([BigInt(val)])` or `new BigInt64Array([BigInt(val)])` is noticeably slower in V8 than manually allocating an array with `new BigInt64Array(1)` and then setting the value `arr[0] = BigInt(val)`.
Action: Prefer manual array allocation and assignment over `.from()` or array literal initialization for typed arrays in performance critical paths.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Consider hyphenating "performance-critical" as a compound adjective.

Because it modifies “paths” as a single compound adjective, it should be written as “performance-critical paths” for clarity.

4 changes: 3 additions & 1 deletion src/parakeet.js
Original file line number Diff line number Diff line change
Expand Up @@ -676,7 +676,9 @@ export class ParakeetModel {
// count of *valid* frames. For the JS preprocessor T === validLength;
// for the ONNX preprocessor T may be validLength+1.
const encoderLength = validLength ?? T;
const lenTensor = new this.ort.Tensor('int64', BigInt64Array.from([BigInt(encoderLength)]), [1]);
const lenArr = new BigInt64Array(1);
lenArr[0] = BigInt(encoderLength);
const lenTensor = new this.ort.Tensor('int64', lenArr, [1]);
let enc;
try {
if (perfEnabled) {
Expand Down
3 changes: 2 additions & 1 deletion src/preprocessor.js
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,8 @@ export class OnnxPreprocessor {

const waveforms = new this.ort.Tensor('float32', buffer, [1, buffer.length]);

const lenArr = new BigInt64Array([BigInt(buffer.length)]);
const lenArr = new BigInt64Array(1);
lenArr[0] = BigInt(buffer.length);
const waveforms_lens = new this.ort.Tensor('int64', lenArr, [1]);

const feeds = { waveforms, waveforms_lens };
Expand Down
Loading