Skip to content

feat(benchmark): Add memory regression tests#24092

Merged
WayneFerrao merged 29 commits intomicrosoft:mainfrom
WayneFerrao:memoryRegression
May 8, 2025
Merged

feat(benchmark): Add memory regression tests#24092
WayneFerrao merged 29 commits intomicrosoft:mainfrom
WayneFerrao:memoryRegression

Conversation

@WayneFerrao
Copy link
Copy Markdown
Contributor

@WayneFerrao WayneFerrao commented Mar 19, 2025

AB#32591

Description

This WIP PR introduces memory regression testing to ensure that memory usage remains stable across test runs. This is in line with goals to strengthen the overall reliability of the the DDSes by ensuring memory-related issues are proactively caught in testing.
The new logic would detect regressions and allows for setting baselines dynamically. Regression checking is handled inside benchmarkMemory. Individual test objects pass in baselineMemoryUsage and allowedDeviationBytesdo not need to manually check for regressions.

Key Changes

  • Memory Regression Detection:
  • Compares current memory usage against a predefined baseline.
  • Throws an error if memory usage exceeds the allowed threshold.

@WayneFerrao WayneFerrao requested review from Josmithr, alexvy86 and Copilot and removed request for Copilot March 19, 2025 00:50
@github-actions github-actions Bot added area: dds Issues related to distributed data structures base: main PRs targeted against main branch labels Mar 19, 2025
@WayneFerrao WayneFerrao requested a review from tylerbutler March 19, 2025 00:53
@WayneFerrao WayneFerrao changed the title WIP: feat(benchmark): Add memory regression tests feat(benchmark): Add memory regression tests Mar 24, 2025
try {
return JSON.parse(fs.readFileSync(baselineFilePath, "utf8")) as Record<string, number>;
} catch {
return {};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a console error? console.error("Error loading baselines:", error);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the function docs, it is okay for the baseline to be missing. Might be worth returning undefined in that case though, rather than an empty record.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this will fail if the file is missing, but it can also fail for other reasons (malformed file contents, for example). It might be better to first check if the file exists and return early if it doesn't. If it does exist, we probably don't want to eat errors that occur while reading the file.

const baselines = loadBaselines();
baselines[testTitle] = memoryUsage;
// eslint-disable-next-line unicorn/no-null
fs.writeFileSync(baselineFilePath, JSON.stringify(baselines, null, 2));
Copy link
Copy Markdown
Contributor

@chentong7 chentong7 Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified the encoding as "utf8" in the fs.writeFileSync() method to ensure consistent file writing. Like: fs.writeFileSync(baselineFilePath, JSON.stringify(baselines, null, 2), "utf8");

public readonly title = "Create empty map";
public readonly minSampleCount = 500;

public baselineMemoryUsage = loadBaselines()[this.title] ?? 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly?

/**
* The baseline memory usage to compare against for the test, which is used to determine if the test regressed.
*/
baselineMemoryUsage?: number;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I didn't know you could specify that in the interface

Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts
category: testObject.category ?? "",
};

const ALLOWED_DEVIATION = 5;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs please 🙂


if (avgHeapUsed > allowedMemoryUsage) {
throw new Error(
`Memory Regression detected for ${testObject.title}: Used ${avgHeapUsed} bytes, exceeding the baseline of ${allowedMemoryUsage} bytes.`,
Copy link
Copy Markdown
Contributor

@Josmithr Josmithr Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: allowedMemoryUsage isn't the actual baseline, it's the baseline + the allowed deviation. It might be better to express this like

`Memory Regression detected for ${testObject.title}: Used ${avgHeapUsed} bytes, exceeding the baseline of ${args.baselineMemoryUsage} bytes., with an allowed tolerance of {tolerance}.`,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will fix!

@@ -0,0 +1 @@
{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be 1 of these files per test? Or a single file with all of the test benchmarks?

Copy link
Copy Markdown
Contributor

@Josmithr Josmithr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of high-level points of feedback:

  1. We should definitely have at least some usages of this in place before merging
  2. We should document (in the benchmark package README) how to use the new kinds of benchmarks in tests.

Copy link
Copy Markdown
Contributor

@alexvy86 alexvy86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel too comfortable with the current approach. Did we consider other alternatives for how to keep track of the baselines? That each test file needs to have (or import) functions to read/write baseline files feels weird to me, and I wonder how things would look if each test could just pass its baseline as an optional parameter to benchmarkMemory(), and the tool would take care of the necessary comparisons. Having baselines be part of source control also seems convenient.

Along the same lines, maybe having the allowed deviation be a parameter that each test can define (with a reasonable default) would give us more flexibility? Some tests might be able to live with tighter targets, but others might need wider ones. Memory measurements are notoriously finnicky, so I'd be concerned about having a global variability threshold that could make some tests flaky with little recourse other than making changes in benchmark tool again.

To @Josmithr 's point:

We should definitely have at least some usages of this in place before merging

I agree, but it'll have to be a 2-step process anyway because changes to benchmark tool need to be published and consumed in the client release group separately. Testing those changes before merging them by locally linking benchmark-tool is a good idea though. One other advantage of having all this be parameters on benchmarkMemory() is that we might be able to write unit tests for it in benchmark-tool itself.

@WayneFerrao
Copy link
Copy Markdown
Contributor Author

@alexvy86 @Josmithr I like the idea of having a central file with all the tests and their baselines. Also moving the saving/loading baselines logic to benchmarkMemory makes sense to avoid duplication. I also added a env variable for gating when we want to save/overwrite the baseline. Thoughts?

@alexvy86
Copy link
Copy Markdown
Contributor

@alexvy86 @Josmithr I like the idea of having a central file with all the tests and their baselines. Also moving the saving/loading baselines logic to benchmarkMemory makes sense to avoid duplication. I also added a env variable for gating when we want to save/overwrite the baseline. Thoughts?

Still not super convinced about the idea of the file, to be honest 😅 . One immediate problem is that if we key the entries in the file just by test title, we could run into conflicts if two tests in different suites have the same title; maybe testObject.title is the "fully qualified name" with all the suites and everything, which would take care of that problem, but then my next question is what will happen if a test changes its name (because of a typo, or its purpose changes). The entry in the file with the old name won't get cleaned up so it'll live in the file forever, I think?

In general, I don't like the idea of needing side-effects (disk writes) for tests to work if not strictly necessary. Having it be partially driven by a an env variable to me seems like it introduces more complexity and things one needs to know (and are not super discoverable) in order to work on this kind of tests. When trying to update the baseline for a given test, one needs to be careful to only run that test, because otherwise we would be potentially updating baselines for all memory tests; it would probably get caught during PR review, but still doesn't seem ideal to me.

I think I'd like to see an argument for why the file is necessary or better than other options. To me the "locality" of being able to look at a test in the source file and right there see (and be able to adjust) its expected baseline and allowed variance, feel super useful in comparison.

@github-actions github-actions Bot removed area: framework Framework is a tag for issues involving the developer framework. Eg Aqueduct area: dds: tree labels Apr 22, 2025
Copy link
Copy Markdown
Contributor

@alexvy86 alexvy86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better :) . Next batch of comments. It's hard to unit-test changes to the benchmark tool, but I'd like to see evidence of using the changes in a test in the client release group (the one for map that I ask below be split to a separate PR is a good candidate for that). How does the output look like when it's within threshold, when it's above, when it's below (just tweak the baseline and deviation to force it to be above/below), with and without the ENV variable.

Comment thread packages/dds/map/src/test/memory/map.spec.ts
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/api-report/benchmark Outdated
@github-actions github-actions Bot removed the area: dds Issues related to distributed data structures label Apr 24, 2025
@WayneFerrao
Copy link
Copy Markdown
Contributor Author

Here are some screenshots with example usage in the Sharedmap memory tests.

ENABLE_MEM_REGRESSION set to true with high values passed in
Screenshot 2025-04-29 at 6 55 02 PM

ENABLE_MEM_REGRESSION set to true with low values passed in
Screenshot 2025-04-29 at 6 55 50 PM

ENABLE_MEM_REGRESSION set to false with high values passed in. Prints warning and continues with test.
Screenshot 2025-04-29 at 7 01 17 PM

/**
* The baseline memory usage to compare against for the test, which is used to determine if the test regressed.
* If not specified, the test will not be compared against a baseline and will only be run to measure the memory usage.
* @remarks Should be specified in bytes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I had in mind for documenting the env variable. People writing memory tests and using the API cannot see the docs for the const in this file, but they can see the ones for these properties, so this is where we can best communicate to them how to use ENABLE_MEM_REGRESSION.

Suggested change
* @remarks Should be specified in bytes.
* @remarks
* Should be specified in bytes.
* If `ENABLE_MEM_REGRESSION=1` in the environment, a test whose memory usage falls outside `baselineMemoryUsage +- allowedDeviationBytes` will be marked as failed.
* Otherwise, a warning is printed to the console.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-opening this comment. I see the updated docs in the property below but not in this one.

Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/runner.ts Outdated
Comment thread tools/benchmark/src/mocha/runner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/src/mocha/memoryTestRunner.ts Outdated
Comment thread tools/benchmark/package.json Outdated
formattedValue: prettyNumber(runs, 0),
};

if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some duplications. More readable way is like:

Suggested change
if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {
if (avgHeapUsed > upperBound) {
reportMemoryIssue(
`Memory regression detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision(
6,
)}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes.\n`,
);
} else if (avgHeapUsed < lowerBound) {
reportMemoryIssue(
`Possible memory improvement detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision(
6,
)}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes. Consider updating the baseline.\n`,
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) {
function reportMemoryIssue(message: string): void {
if (ENABLE_MEM_REGRESSION) {
throw new Error(message);
} else {
process.stdout.write(chalk.yellow(message));
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2025

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  195689 links
    1565 destination URLs
    1797 URLs ignored
       0 warnings
       0 errors


Copy link
Copy Markdown
Contributor

@alexvy86 alexvy86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 . Once it merges, let's make sure to do a release of this package and consume it in the client release group so we can start leveraging the feature :)

@WayneFerrao WayneFerrao merged commit 71975c1 into microsoft:main May 8, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

base: main PRs targeted against main branch public api change Changes to a public API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants