feat(benchmark): Add memory regression tests#24092
feat(benchmark): Add memory regression tests#24092WayneFerrao merged 29 commits intomicrosoft:mainfrom
Conversation
| try { | ||
| return JSON.parse(fs.readFileSync(baselineFilePath, "utf8")) as Record<string, number>; | ||
| } catch { | ||
| return {}; |
There was a problem hiding this comment.
Add a console error? console.error("Error loading baselines:", error);
There was a problem hiding this comment.
Based on the function docs, it is okay for the baseline to be missing. Might be worth returning undefined in that case though, rather than an empty record.
There was a problem hiding this comment.
Also, this will fail if the file is missing, but it can also fail for other reasons (malformed file contents, for example). It might be better to first check if the file exists and return early if it doesn't. If it does exist, we probably don't want to eat errors that occur while reading the file.
| const baselines = loadBaselines(); | ||
| baselines[testTitle] = memoryUsage; | ||
| // eslint-disable-next-line unicorn/no-null | ||
| fs.writeFileSync(baselineFilePath, JSON.stringify(baselines, null, 2)); |
There was a problem hiding this comment.
Specified the encoding as "utf8" in the fs.writeFileSync() method to ensure consistent file writing. Like: fs.writeFileSync(baselineFilePath, JSON.stringify(baselines, null, 2), "utf8");
| public readonly title = "Create empty map"; | ||
| public readonly minSampleCount = 500; | ||
|
|
||
| public baselineMemoryUsage = loadBaselines()[this.title] ?? 0; |
| /** | ||
| * The baseline memory usage to compare against for the test, which is used to determine if the test regressed. | ||
| */ | ||
| baselineMemoryUsage?: number; |
There was a problem hiding this comment.
Cool, I didn't know you could specify that in the interface
| category: testObject.category ?? "", | ||
| }; | ||
|
|
||
| const ALLOWED_DEVIATION = 5; |
|
|
||
| if (avgHeapUsed > allowedMemoryUsage) { | ||
| throw new Error( | ||
| `Memory Regression detected for ${testObject.title}: Used ${avgHeapUsed} bytes, exceeding the baseline of ${allowedMemoryUsage} bytes.`, |
There was a problem hiding this comment.
Nit: allowedMemoryUsage isn't the actual baseline, it's the baseline + the allowed deviation. It might be better to express this like
`Memory Regression detected for ${testObject.title}: Used ${avgHeapUsed} bytes, exceeding the baseline of ${args.baselineMemoryUsage} bytes., with an allowed tolerance of {tolerance}.`,
There was a problem hiding this comment.
Good catch, will fix!
| @@ -0,0 +1 @@ | |||
| {} | |||
There was a problem hiding this comment.
Will there be 1 of these files per test? Or a single file with all of the test benchmarks?
Josmithr
left a comment
There was a problem hiding this comment.
A couple of high-level points of feedback:
- We should definitely have at least some usages of this in place before merging
- We should document (in the benchmark package README) how to use the new kinds of benchmarks in tests.
alexvy86
left a comment
There was a problem hiding this comment.
I don't feel too comfortable with the current approach. Did we consider other alternatives for how to keep track of the baselines? That each test file needs to have (or import) functions to read/write baseline files feels weird to me, and I wonder how things would look if each test could just pass its baseline as an optional parameter to benchmarkMemory(), and the tool would take care of the necessary comparisons. Having baselines be part of source control also seems convenient.
Along the same lines, maybe having the allowed deviation be a parameter that each test can define (with a reasonable default) would give us more flexibility? Some tests might be able to live with tighter targets, but others might need wider ones. Memory measurements are notoriously finnicky, so I'd be concerned about having a global variability threshold that could make some tests flaky with little recourse other than making changes in benchmark tool again.
To @Josmithr 's point:
We should definitely have at least some usages of this in place before merging
I agree, but it'll have to be a 2-step process anyway because changes to benchmark tool need to be published and consumed in the client release group separately. Testing those changes before merging them by locally linking benchmark-tool is a good idea though. One other advantage of having all this be parameters on benchmarkMemory() is that we might be able to write unit tests for it in benchmark-tool itself.
…deviation as optional param
Still not super convinced about the idea of the file, to be honest 😅 . One immediate problem is that if we key the entries in the file just by test title, we could run into conflicts if two tests in different suites have the same title; maybe In general, I don't like the idea of needing side-effects (disk writes) for tests to work if not strictly necessary. Having it be partially driven by a an env variable to me seems like it introduces more complexity and things one needs to know (and are not super discoverable) in order to work on this kind of tests. When trying to update the baseline for a given test, one needs to be careful to only run that test, because otherwise we would be potentially updating baselines for all memory tests; it would probably get caught during PR review, but still doesn't seem ideal to me. I think I'd like to see an argument for why the file is necessary or better than other options. To me the "locality" of being able to look at a test in the source file and right there see (and be able to adjust) its expected baseline and allowed variance, feel super useful in comparison. |
alexvy86
left a comment
There was a problem hiding this comment.
Looking better :) . Next batch of comments. It's hard to unit-test changes to the benchmark tool, but I'd like to see evidence of using the changes in a test in the client release group (the one for map that I ask below be split to a separate PR is a good candidate for that). How does the output look like when it's within threshold, when it's above, when it's below (just tweak the baseline and deviation to force it to be above/below), with and without the ENV variable.
08b07b2 to
7b6ad77
Compare
| /** | ||
| * The baseline memory usage to compare against for the test, which is used to determine if the test regressed. | ||
| * If not specified, the test will not be compared against a baseline and will only be run to measure the memory usage. | ||
| * @remarks Should be specified in bytes. |
There was a problem hiding this comment.
This is what I had in mind for documenting the env variable. People writing memory tests and using the API cannot see the docs for the const in this file, but they can see the ones for these properties, so this is where we can best communicate to them how to use ENABLE_MEM_REGRESSION.
| * @remarks Should be specified in bytes. | |
| * @remarks | |
| * Should be specified in bytes. | |
| * If `ENABLE_MEM_REGRESSION=1` in the environment, a test whose memory usage falls outside `baselineMemoryUsage +- allowedDeviationBytes` will be marked as failed. | |
| * Otherwise, a warning is printed to the console. |
There was a problem hiding this comment.
Re-opening this comment. I see the updated docs in the property below but not in this one.
947454c to
122ef0e
Compare
c7cd166 to
823bb2d
Compare
| formattedValue: prettyNumber(runs, 0), | ||
| }; | ||
|
|
||
| if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) { |
There was a problem hiding this comment.
Looks like there are some duplications. More readable way is like:
| if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) { | |
| if (avgHeapUsed > upperBound) { | |
| reportMemoryIssue( | |
| `Memory regression detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision( | |
| 6, | |
| )}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes.\n`, | |
| ); | |
| } else if (avgHeapUsed < lowerBound) { | |
| reportMemoryIssue( | |
| `Possible memory improvement detected for test '${testTitle}': Used '${avgHeapUsed.toPrecision( | |
| 6, | |
| )}' bytes, baseline '${baselineMemoryUsage}', tolerance '${allowedDeviationBytes}' bytes. Consider updating the baseline.\n`, | |
| ); | |
| } |
There was a problem hiding this comment.
| if (baselineMemoryUsage >= 0 && allowedDeviationBytes >= 0) { | |
| function reportMemoryIssue(message: string): void { | |
| if (ENABLE_MEM_REGRESSION) { | |
| throw new Error(message); | |
| } else { | |
| process.stdout.write(chalk.yellow(message)); | |
| } | |
| } |
There was a problem hiding this comment.
Great idea, thanks!
|
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output |
alexvy86
left a comment
There was a problem hiding this comment.
🚀 . Once it merges, let's make sure to do a release of this package and consume it in the client release group so we can start leveraging the feature :)



AB#32591
Description
This WIP PR introduces memory regression testing to ensure that memory usage remains stable across test runs. This is in line with goals to strengthen the overall reliability of the the DDSes by ensuring memory-related issues are proactively caught in testing.
The new logic would detect regressions and allows for setting baselines dynamically. Regression checking is handled inside benchmarkMemory. Individual test objects pass in
baselineMemoryUsageandallowedDeviationBytesdo not need to manually check for regressions.Key Changes