feat(benchmark): Add memory regression tests by WayneFerrao · Pull Request #24092 · microsoft/FluidFramework

WayneFerrao · 2025-03-19T00:50:24Z

Description

This WIP PR introduces memory regression testing to ensure that memory usage remains stable across test runs. This is in line with goals to strengthen the overall reliability of the the DDSes by ensuring memory-related issues are proactively caught in testing.
The new logic would detect regressions and allows for setting baselines dynamically. Regression checking is handled inside benchmarkMemory. Individual test objects pass in baselineMemoryUsage and allowedDeviationBytesdo not need to manually check for regressions.

Key Changes

Memory Regression Detection:
Compares current memory usage against a predefined baseline.
Throws an error if memory usage exceeds the allowed threshold.

chentong7 · 2025-03-26T18:58:04Z

+	try {
+		return JSON.parse(fs.readFileSync(baselineFilePath, "utf8")) as Record<string, number>;
+	} catch {
+		return {};


Add a console error? console.error("Error loading baselines:", error);

Based on the function docs, it is okay for the baseline to be missing. Might be worth returning undefined in that case though, rather than an empty record.

Also, this will fail if the file is missing, but it can also fail for other reasons (malformed file contents, for example). It might be better to first check if the file exists and return early if it doesn't. If it does exist, we probably don't want to eat errors that occur while reading the file.

chentong7 · 2025-03-26T18:58:58Z

+	const baselines = loadBaselines();
+	baselines[testTitle] = memoryUsage;
+	// eslint-disable-next-line unicorn/no-null
+	fs.writeFileSync(baselineFilePath, JSON.stringify(baselines, null, 2));


Specified the encoding as "utf8" in the fs.writeFileSync() method to ensure consistent file writing. Like: fs.writeFileSync(baselineFilePath, JSON.stringify(baselines, null, 2), "utf8");

Josmithr · 2025-03-26T20:08:41Z

 			public readonly title = "Create empty map";
 			public readonly minSampleCount = 500;
-
+			public baselineMemoryUsage = loadBaselines()[this.title] ?? 0;


Josmithr · 2025-03-26T20:09:09Z

+	/**
+	 * The baseline memory usage to compare against for the test, which is used to determine if the test regressed.
+	 */
+	baselineMemoryUsage?: number;


Cool, I didn't know you could specify that in the interface

Josmithr · 2025-03-26T20:10:31Z

 		category: testObject.category ?? "",
 	};

+	const ALLOWED_DEVIATION = 5;


Docs please 🙂

Josmithr · 2025-03-26T20:17:21Z

+
+					if (avgHeapUsed > allowedMemoryUsage) {
+						throw new Error(
+							`Memory Regression detected for ${testObject.title}: Used ${avgHeapUsed} bytes, exceeding the baseline of ${allowedMemoryUsage} bytes.`,


Nit: allowedMemoryUsage isn't the actual baseline, it's the baseline + the allowed deviation. It might be better to express this like

`Memory Regression detected for ${testObject.title}: Used ${avgHeapUsed} bytes, exceeding the baseline of ${args.baselineMemoryUsage} bytes., with an allowed tolerance of {tolerance}.`,

Good catch, will fix!

Josmithr · 2025-03-26T20:18:24Z

@@ -0,0 +1 @@
+{}


Will there be 1 of these files per test? Or a single file with all of the test benchmarks?

Josmithr

A couple of high-level points of feedback:

We should definitely have at least some usages of this in place before merging
We should document (in the benchmark package README) how to use the new kinds of benchmarks in tests.

alexvy86

I don't feel too comfortable with the current approach. Did we consider other alternatives for how to keep track of the baselines? That each test file needs to have (or import) functions to read/write baseline files feels weird to me, and I wonder how things would look if each test could just pass its baseline as an optional parameter to benchmarkMemory(), and the tool would take care of the necessary comparisons. Having baselines be part of source control also seems convenient.

Along the same lines, maybe having the allowed deviation be a parameter that each test can define (with a reasonable default) would give us more flexibility? Some tests might be able to live with tighter targets, but others might need wider ones. Memory measurements are notoriously finnicky, so I'd be concerned about having a global variability threshold that could make some tests flaky with little recourse other than making changes in benchmark tool again.

To @Josmithr 's point:

We should definitely have at least some usages of this in place before merging

I agree, but it'll have to be a 2-step process anyway because changes to benchmark tool need to be published and consumed in the client release group separately. Testing those changes before merging them by locally linking benchmark-tool is a good idea though. One other advantage of having all this be parameters on benchmarkMemory() is that we might be able to write unit tests for it in benchmark-tool itself.

…deviation as optional param

WayneFerrao · 2025-03-28T00:34:55Z

@alexvy86 @Josmithr I like the idea of having a central file with all the tests and their baselines. Also moving the saving/loading baselines logic to benchmarkMemory makes sense to avoid duplication. I also added a env variable for gating when we want to save/overwrite the baseline. Thoughts?

alexvy86 · 2025-03-28T15:36:08Z

@alexvy86 @Josmithr I like the idea of having a central file with all the tests and their baselines. Also moving the saving/loading baselines logic to benchmarkMemory makes sense to avoid duplication. I also added a env variable for gating when we want to save/overwrite the baseline. Thoughts?

Still not super convinced about the idea of the file, to be honest 😅 . One immediate problem is that if we key the entries in the file just by test title, we could run into conflicts if two tests in different suites have the same title; maybe testObject.title is the "fully qualified name" with all the suites and everything, which would take care of that problem, but then my next question is what will happen if a test changes its name (because of a typo, or its purpose changes). The entry in the file with the old name won't get cleaned up so it'll live in the file forever, I think?

In general, I don't like the idea of needing side-effects (disk writes) for tests to work if not strictly necessary. Having it be partially driven by a an env variable to me seems like it introduces more complexity and things one needs to know (and are not super discoverable) in order to work on this kind of tests. When trying to update the baseline for a given test, one needs to be careful to only run that test, because otherwise we would be potentially updating baselines for all memory tests; it would probably get caught during PR review, but still doesn't seem ideal to me.

I think I'd like to see an argument for why the file is necessary or better than other options. To me the "locality" of being able to look at a test in the source file and right there see (and be able to adjust) its expected baseline and allowed variance, feel super useful in comparison.

alexvy86

Looking better :) . Next batch of comments. It's hard to unit-test changes to the benchmark tool, but I'd like to see evidence of using the changes in a test in the client release group (the one for map that I ask below be split to a separate PR is a good candidate for that). How does the output look like when it's within threshold, when it's above, when it's below (just tweak the baseline and deviation to force it to be above/below), with and without the ENV variable.

WayneFerrao · 2025-04-30T02:18:01Z

Here are some screenshots with example usage in the Sharedmap memory tests.

ENABLE_MEM_REGRESSION set to true with high values passed in

ENABLE_MEM_REGRESSION set to true with low values passed in

ENABLE_MEM_REGRESSION set to false with high values passed in. Prints warning and continues with test.

alexvy86 · 2025-04-30T19:14:49Z

+	/**
+	 * The baseline memory usage to compare against for the test, which is used to determine if the test regressed.
+	 * If not specified, the test will not be compared against a baseline and will only be run to measure the memory usage.
+	 * @remarks Should be specified in bytes.


This is what I had in mind for documenting the env variable. People writing memory tests and using the API cannot see the docs for the const in this file, but they can see the ones for these properties, so this is where we can best communicate to them how to use ENABLE_MEM_REGRESSION.

Suggested change

* @remarks Should be specified in bytes.

* @remarks

* Should be specified in bytes.

* If `ENABLE_MEM_REGRESSION=1` in the environment, a test whose memory usage falls outside `baselineMemoryUsage +- allowedDeviationBytes` will be marked as failed.

* Otherwise, a warning is printed to the console.

Re-opening this comment. I see the updated docs in the property below but not in this one.

chentong7 · 2025-05-07T21:08:18Z

 						formattedValue: prettyNumber(runs, 0),
 					};
+
+					if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {


Looks like there are some duplications. More readable way is like:

Suggested change

if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {

if (avgHeapUsed > upperBound) {

reportMemoryIssue(

`Memory regression detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision(

6,

)}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes.\n`,

);

} else if (avgHeapUsed < lowerBound) {

reportMemoryIssue(

`Possible memory improvement detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision(

6,

)}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes. Consider updating the baseline.\n`,

);

}

Suggested change

if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {

function reportMemoryIssue(message: string): void {

if (ENABLE_MEM_REGRESSION) {

throw new Error(message);

} else {

process.stdout.write(chalk.yellow(message));

}

}

Great idea, thanks!

github-actions · 2025-05-08T18:09:31Z

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  195689 links
    1565 destination URLs
    1797 URLs ignored
       0 warnings
       0 errors

alexvy86

🚀 . Once it merges, let's make sure to do a release of this package and consume it in the client release group so we can start leveraging the feature :)

WayneFerrao added 3 commits March 18, 2025 17:00

WIP: Add memory regression tests

9d679ed

Merge branch 'main' into memoryRegression

555047a

updated IMemoryTestObject

435c561

WayneFerrao requested review from Josmithr, alexvy86 and Copilot and removed request for Copilot March 19, 2025 00:50

github-actions Bot added area: dds Issues related to distributed data structures base: main PRs targeted against main branch labels Mar 19, 2025

Added env condition

84acc6e

WayneFerrao requested a review from tylerbutler March 19, 2025 00:53

WayneFerrao added 3 commits March 18, 2025 17:55

revert package.json

d8a1803

Comments

efa5efb

Lint fix

719b4f8

WayneFerrao changed the title ~~WIP: feat(benchmark): Add memory regression tests~~ feat(benchmark): Add memory regression tests Mar 24, 2025

chentong7 reviewed Mar 26, 2025

View reviewed changes

Josmithr reviewed Mar 26, 2025

View reviewed changes

Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts

Josmithr reviewed Mar 26, 2025

View reviewed changes

alexvy86 reviewed Mar 27, 2025

View reviewed changes

Abstracted load and save baseline logic to benchmarkMemory and added …

39f699e

…deviation as optional param

WayneFerrao added 2 commits April 15, 2025 15:48

Merge branch 'main' into memoryRegression

f45854b

Embed baseline & allowed deviation in test

37944d8

WayneFerrao added 3 commits April 22, 2025 12:30

Update api-reports

753a798

Merge branch 'main' into memoryRegression

fef4f63

Update api-reports

75ec59d

github-actions Bot removed area: framework Framework is a tag for issues involving the developer framework. Eg Aqueduct area: dds: tree labels Apr 22, 2025

alexvy86 reviewed Apr 23, 2025

View reviewed changes

WayneFerrao added 2 commits April 24, 2025 16:14

Update to use bytes, added validation for memory values

f5d58bd

remove file

7b6ad77

WayneFerrao force-pushed the memoryRegression branch from 08b07b2 to 7b6ad77 Compare April 24, 2025 23:16

github-actions Bot removed the area: dds Issues related to distributed data structures label Apr 24, 2025

WayneFerrao added 2 commits April 25, 2025 15:18

Merge branch 'main' into memoryRegression

62e1c98

Updated logging to ensure warnings and errors propagate.

393b499

alexvy86 reviewed Apr 30, 2025

View reviewed changes

Updated logging with chalk, fixed logs, moved logic.

122ef0e

WayneFerrao force-pushed the memoryRegression branch from 947454c to 122ef0e Compare May 3, 2025 00:01

Restore logic

823bb2d

WayneFerrao force-pushed the memoryRegression branch from c7cd166 to 823bb2d Compare May 6, 2025 17:52

alexvy86 reviewed May 6, 2025

View reviewed changes

Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated

alexvy86 reviewed May 6, 2025

View reviewed changes

Comment thread tools/benchmark/package.json Outdated

WayneFerrao added 3 commits May 7, 2025 13:37

Remove unneeded package and added comments

ad47e55

Add comment

4422696

Updated comment for baselineMemoryUsage

a5e4ddb

chentong7 reviewed May 7, 2025

View reviewed changes

WayneFerrao added 3 commits May 7, 2025 14:30

update CHANGELOG

694f386

Refactor memory issue reporting to separate function

b31e368

Merge branch 'main' into memoryRegression

5743230

alexvy86 approved these changes May 8, 2025

View reviewed changes

WayneFerrao merged commit 71975c1 into microsoft:main May 8, 2025
31 checks passed

-	 * @remarks Should be specified in bytes.
+	 * @remarks
+	 * Should be specified in bytes.
+	 * If `ENABLE_MEM_REGRESSION=1` in the environment, a test whose memory usage falls outside `baselineMemoryUsage +- allowedDeviationBytes` will be marked as failed.
+	 * Otherwise, a warning is printed to the console.

-					if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {
+					if (avgHeapUsed > upperBound) {
+        reportMemoryIssue(
+            `Memory regression detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision(
+,
+            )}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes.\n`,
+        );
+    } else if (avgHeapUsed < lowerBound) {
+        reportMemoryIssue(
+            `Possible memory improvement detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision(
+,
+            )}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes. Consider updating the baseline.\n`,
+        );
+    }

-					if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {
+function reportMemoryIssue(message: string): void {
+    if (ENABLE_MEM_REGRESSION) {
+        throw new Error(message);
+    } else {
+        process.stdout.write(chalk.yellow(message));
+    }
+}

Conversation

WayneFerrao commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chentong7 Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Josmithr Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Josmithr left a comment

Choose a reason for hiding this comment

Uh oh!

alexvy86 left a comment

Choose a reason for hiding this comment

Uh oh!

WayneFerrao commented Mar 28, 2025

Uh oh!

alexvy86 commented Mar 28, 2025

Uh oh!

alexvy86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WayneFerrao commented Apr 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 8, 2025

linkcheck output

Uh oh!

WayneFerrao commented Mar 19, 2025 •

edited

Loading

chentong7 Mar 26, 2025 •

edited

Loading

Josmithr Mar 26, 2025 •

edited

Loading