-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance in environments with long search paths #17948
Comments
See python#17948 There's one call site which has varargs that I leave as os.path.join, it doesn't show up on my profile. I do see the `endswith` on the profile, we could try `path[-1] == '/'` instead In my work environment, this is about a 10% speedup: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 30.842 s ± 0.119 s [User: 26.383 s, System: 4.396 s] Range (min … max): 30.706 s … 30.927 s 3 runs ``` Compared to: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 34.161 s ± 0.163 s [User: 29.818 s, System: 4.289 s] Range (min … max): 34.013 s … 34.336 s 3 runs ``` In the toy "long" environment mentioned in the issue, this is about a 7% speedup: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable long/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable long/bin/python Time (mean ± σ): 23.177 s ± 0.317 s [User: 20.265 s, System: 2.873 s] Range (min … max): 22.815 s … 23.407 s 3 runs ``` Compared to: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental' Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental Time (mean ± σ): 24.838 s ± 0.237 s [User: 22.038 s, System: 2.750 s] Range (min … max): 24.598 s … 25.073 s 3 runs ```
See #17948 There's one call site which has varargs that I leave as os.path.join, it doesn't show up on my profile. I do see the `endswith` on the profile, we could try `path[-1] == '/'` instead (could save a few dozen milliseconds) In my work environment, this is about a 10% speedup: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 30.842 s ± 0.119 s [User: 26.383 s, System: 4.396 s] Range (min … max): 30.706 s … 30.927 s 3 runs ``` Compared to: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 34.161 s ± 0.163 s [User: 29.818 s, System: 4.289 s] Range (min … max): 34.013 s … 34.336 s 3 runs ``` In the toy "long" environment mentioned in the issue, this is about a 7% speedup: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable long/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable long/bin/python Time (mean ± σ): 23.177 s ± 0.317 s [User: 20.265 s, System: 2.873 s] Range (min … max): 22.815 s … 23.407 s 3 runs ``` Compared to: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental' Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental Time (mean ± σ): 24.838 s ± 0.237 s [User: 22.038 s, System: 2.750 s] Range (min … max): 24.598 s … 25.073 s 3 runs ``` In the "clean" environment, this is a 1% speedup, but below the noise floor.
What about filtering the module search path based on the first component(s) of the target module name? We could create a dict that maps a module name prefix For example, if If many search path entries have the same directory/namespace package (e.g. To determine the effective search path for module, we'd look up prefixes of length 2 and 1 (e.g. |
See #17948 This is about 1.06x faster on `mypy -c 'import torch'` (in both the clean and openai environments) - 19.094 -> 17.896 - 34.161 -> 32.214 ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable clean/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable clean/bin/python Time (mean ± σ): 17.896 s ± 0.130 s [User: 16.472 s, System: 1.408 s] Range (min … max): 17.757 s … 18.014 s 3 runs λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 32.214 s ± 0.106 s [User: 29.468 s, System: 2.722 s] Range (min … max): 32.098 s … 32.305 s 3 runs ```
Recording new baseline numbers here for eb816b0 (after a few of ^ PRs have been merged):
Compared to bd9200b we are:
|
See python#17948 Haven't run the benchmark yet, but profile indicates that this could save 0.5s on both incremental and non-incremental builds in environments with long search path
See #17948 This is starting to show up on profiles - 1.01x faster on clean (below noise) - 1.02x faster on long - 1.02x faster on openai - 1.01x faster on openai incremental I had a dumb bug that was preventing the optimisation for a while, I'll see if I can make it even faster. Currently it's a small improvement We could also get rid of the legacy stuff in mypy 2.0
See #17948 - 1.01x faster on clean - 1.06x faster on long - 1.04x faster on openai - 1.26x faster on openai incremental
New numbers for c201a18 (with orjson installed):
The openai environment I was using previously got mutated, so not posting raw numbers for that. In the following, I re-ran the bd9200b baseline in a similar environment to get fair openai comparisons. Compared to bd9200b we are:
|
@hauntsaninja Are you interested in looking into filtering the search path (see my comment above)? If not, I might have a look at it at some point. |
Yup, I'm interested in looking into it. |
See #17948 There's one call site which has varargs that I leave as os.path.join, it doesn't show up on my profile. I do see the `endswith` on the profile, we could try `path[-1] == '/'` instead (could save a few dozen milliseconds) In my work environment, this is about a 10% speedup: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 30.842 s ± 0.119 s [User: 26.383 s, System: 4.396 s] Range (min … max): 30.706 s … 30.927 s 3 runs ``` Compared to: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 34.161 s ± 0.163 s [User: 29.818 s, System: 4.289 s] Range (min … max): 34.013 s … 34.336 s 3 runs ``` In the toy "long" environment mentioned in the issue, this is about a 7% speedup: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable long/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy -c "import torch" --no-incremental --python-executable long/bin/python Time (mean ± σ): 23.177 s ± 0.317 s [User: 20.265 s, System: 2.873 s] Range (min … max): 22.815 s … 23.407 s 3 runs ``` Compared to: ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental' Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental Time (mean ± σ): 24.838 s ± 0.237 s [User: 22.038 s, System: 2.750 s] Range (min … max): 24.598 s … 25.073 s 3 runs ``` In the "clean" environment, this is a 1% speedup, but below the noise floor.
See #17948 This is about 1.06x faster on `mypy -c 'import torch'` (in both the clean and openai environments) - 19.094 -> 17.896 - 34.161 -> 32.214 ``` λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable clean/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable clean/bin/python Time (mean ± σ): 17.896 s ± 0.130 s [User: 16.472 s, System: 1.408 s] Range (min … max): 17.757 s … 18.014 s 3 runs λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python' Benchmark 1: /tmp/mypy_primer/timer_mypy_36738b392/venv/bin/mypy -c "import torch" --no-incremental --python-executable /opt/oai/bin/python Time (mean ± σ): 32.214 s ± 0.106 s [User: 29.468 s, System: 2.722 s] Range (min … max): 32.098 s … 32.305 s 3 runs ```
See #17948 This is starting to show up on profiles - 1.01x faster on clean (below noise) - 1.02x faster on long - 1.02x faster on openai - 1.01x faster on openai incremental I had a dumb bug that was preventing the optimisation for a while, I'll see if I can make it even faster. Currently it's a small improvement We could also get rid of the legacy stuff in mypy 2.0
See #17948 - 1.01x faster on clean - 1.06x faster on long - 1.04x faster on openai - 1.26x faster on openai incremental
Posting more numbers:
|
Okay, with #18038 and the follow up #18045, we're down to within noise between the See timings here: #18045 (comment) So I think we can call this complete! :-) |
In my work environment, we editably install most Python packages. This leads to long search paths, e.g. 200 entries is common. I think it should be possible to significantly improve mypy's performance in this case.
My benchmark workload is
mypy -c "import torch"
on a mypyc-compiled mypy with compile level 3.I'll run it in the following environments:
clean
long
openai
This is my main dev environment. I'll see if I can make an artificial environment that matches the performance characteristics of this more closely (this is pretty easy, just need to install a bunch of third party libraries).
bd9200b is my baseline commit
#17920 has already provided a big win here
88ae62b was the commit I measured
You can see that mypy in my environment is still 1.8x slower than it could be (and 1.3x slower in the reproducible toy environment).
Some ideas for things to experiment with:
The text was updated successfully, but these errors were encountered: