-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add performance report #1
base: main
Are you sure you want to change the base?
Conversation
wesley-weiming
commented
Dec 6, 2023
- We submitted a performance report
- Supports automatically generating project-impact-graph.yaml for a given number of projects and dependencies, and then you can test it yourself
- In the performance test, we found that using "minimatch" for glob matching would lead to performance degradation (For example, in the case of 2,000 projects and 10,000 dependencies, the time to calculate 100 paths reaches minutes). So we need to change the path matching method of the algorithm and use string matching instead of glob matching. This means that fields such as includeGlobs and excludeGlobs in the file schema must also discard the representation of glob.
projects: | ||
A: | ||
includedGlobs: | ||
- projects/folder_A/** | ||
- projects/folder_A/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate the syntax used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
projects:
A:
includedGlobs:
- projects/folder_A/
excludedGlobs:
- projects/folder_A/README.md
dependentProjects:
- G
This semantics describes a project named 'A', and 'includedGlobs' is used to specify the files that should be included in project 'A'. 'excludedGlobs' is a subset of 'includedGlobs', used to specify which paths in project 'A' need to be filtered. 'dependentProjects' indicates which projects directly depend on 'A'
projects/folder_A/
, Because we replaced glob match with startsWith
, this folder path is used here to represent project A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it is no longer included/excludedGlobs
. It's included/excluedPrefix
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it is no longer appropriate to continue using 'glob'
@@ -0,0 +1,148 @@ | |||
import path from 'path'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a REAMD to teach us how to run the performance?
My guess now is run this file with node
. Is it true?
Even better, you can put the performance result with the running environment info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, let me add a new commit
@@ -2,11 +2,11 @@ globalExcludedGlobs: | |||
- OWNERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is these file names still working? The implementation has been changed to match with startsWith
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OWNERS
build.sh
bootstrap.sh
These represent public configuration files in the root directory, OWNERS means repoRootDir/OWNERS
, which is still available for startsWith
@@ -4,7 +4,6 @@ | |||
import fs from 'fs'; | |||
import yaml from 'yaml'; | |||
import _ from 'lodash'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is lodash
used in any inner loops? Some time ago a perf investigation showed that Lodash's algorithms are often extremely inefficient due to so many layers of abstractions in its code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used in loops, lodash
is only used twice in a complete calculation process (using its cloneDeep API to clone the graph structure)
| 3000 | 100000 | 1 | 1 | 9.46s | | ||
| 3000 | 100000 | 10 | 10 | 10.533s | | ||
| 3000 | 100000 | 100 | 100 | 11.029s | | ||
| 3000 | 100000 | 1000 | 1000 | 11.984s | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://github.com/tiktok/project-impact-graph/pull/3/files I've added some instrumentation to count the number of times each part of the loop is executed.
Here's one of the test cases:
[
{
nodeCount: 2000,
edgeCount: 10000,
pathCountA: 1000,
pathCountB: 1000,
hasImpactIntersection: true,
executeTime: '1.513s'
}
]
{
_integrateExcludedGlobs: 1,
_integrateExcludedGlobs2: 2000,
_validatePaths: 2,
_validatePaths2: 4008000,
lookUpProjectNamesByPathList: 2,
lookUpProjectNamesByPathList2: 4000000,
lookUpProjectNamesByPathList3: 4000000,
lookUpProjectNamesByPathList4: 5780,
getProjectImpactByProjectNames: 2,
getProjectImpactByProjectNames2: 4000,
getProjectImpactByProjectNames3: 24000,
hasImpactIntersection: 1
}
The nodeCount
is the number of projects, and pathCountA
and pathCountB
are the "before" and "after" lists of paths from the diff. The bottleneck of this algorithm seems to be 4,000,000 which is O(numberOfPaths * numberOfProjects)
.
But notice that our project paths have a well-behaved structure, for example:
INCLUDE
apps/my-app/**
EXCLUDE
apps/my-app/README.md
INCLUDE
apps/my-app2/**
EXCLUDE
apps/my-app2/README.md
INCLUDE
libraries/my-lib3/**
EXCLUDE
libraries/my-lib3/README.md
INCLUDE
libraries/my-lib4/**
EXCLUDE
libraries/my-lib4/README.md
INCLUDE
libraries/my-lib4/bad-nested-project/**
EXCLUDE
libraries/my-lib4/bad-nested-project/README.md
Even if we permit projects to be nested under other project folders, the prefixes of these globs still form a tree. For an example input path libraries/my-lib3/src/index.ts
, imagine an O(n*log(n))
algorithm that would cheaply walk down libraries
-> my-lib3
, and then need to test only 2 globs **
and README.md
.
This idea is similar to rush-lib/src/logic/LookupByPath.ts.