-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RC] Release candidate for version 0.3.1 #442
Commits on Mar 5, 2024
-
Fixes to make simplest conv working (#22)
Simple model with one conv2d failed. - fix signature for conv* ops to corresponds torch.nn.functional] - add missed padding normalization After that the model works
Configuration menu - View commit details
-
Copy full SHA for 5169410 - Browse repository at this point
Copy the full SHA 5169410View commit details -
[Torch][Operator] Some operator support (#49)
Partial changes related to #18
Configuration menu - View commit details
-
Copy full SHA for 8cccced - Browse repository at this point
Copy the full SHA 8ccccedView commit details -
Add .vscode to .gitignore (#61)
Add .vscode to .gitignore
Configuration menu - View commit details
-
Copy full SHA for 75ca9e1 - Browse repository at this point
Copy the full SHA 75ca9e1View commit details -
[CI] Shut down on demand runners on failure (#64)
Previously, if a performance regression fails due to an exception, the job that stops the runner VM instances will be skipped, leaving the instances on. This will make the stop_instances job run even when previous jobs failed. Not sure if always() will override the inputs.shutdown_instances flag, if it does we can move it into the step scope.
Configuration menu - View commit details
-
Copy full SHA for 3dab1f2 - Browse repository at this point
Copy the full SHA 3dab1f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for c15fbac - Browse repository at this point
Copy the full SHA c15fbacView commit details
Commits on Mar 6, 2024
-
[Graph] Add GroupNorm module (#70)
Module wrapper around groupnorm operator. Supports compiled app development.
Configuration menu - View commit details
-
Copy full SHA for 6b02878 - Browse repository at this point
Copy the full SHA 6b02878View commit details
Commits on Mar 13, 2024
-
[App] Resnet Compiled App - Modeling (1/2) (#47)
Adds ResNet model functionality and model hierarchy for compiled apps. Some comments in files are artifacts left for the pipeline interface (part 2 of this PR). See huggingface implementation for original API inspiration. Resolves #59
Configuration menu - View commit details
-
Copy full SHA for abec2ee - Browse repository at this point
Copy the full SHA abec2eeView commit details
Commits on Mar 20, 2024
-
CI perf tests refactoring (#85)
- move scripts from `.github/scripts` to `tests/benchmarks` - move `run_configs.json` (describes what perf tests we run) move from hidet-ci repo to this repo - add individual operators' benches via torch API (not added to CI run yet) - unify scripts to run either hidet as backend or inductor as backend
Configuration menu - View commit details
-
Copy full SHA for 30f299e - Browse repository at this point
Copy the full SHA 30f299eView commit details
Commits on Mar 21, 2024
-
Increase batch size for vision benchmarks (#86)
Increase batch size for vision benchmarks from 1 to 128 to - be close to real life example - deacrease fluctuation of time
Configuration menu - View commit details
-
Copy full SHA for 5034b31 - Browse repository at this point
Copy the full SHA 5034b31View commit details
Commits on Mar 28, 2024
-
Add bias to Conv2d Module. Defaults to false for back compatibility, **this is different from torch default**. Towards #57
Configuration menu - View commit details
-
Copy full SHA for 487aada - Browse repository at this point
Copy the full SHA 487aadaView commit details -
[Fixbug] Mark slow compiled app test (#89)
Flagging slow tests as a result of huggingface dependency (2hrs). To debug on private CI runs. Resolves #87.
Configuration menu - View commit details
-
Copy full SHA for 01b7b92 - Browse repository at this point
Copy the full SHA 01b7b92View commit details -
[Graph} Add basic UNet module components (#93)
Add some necessary module components used frequently in Stable Diffusion's UNet. Includes fixes to module attribute access from LLM branch and work arounds for torch weight copying. Towards #57.
Configuration menu - View commit details
-
Copy full SHA for 6566437 - Browse repository at this point
Copy the full SHA 6566437View commit details
Commits on Mar 29, 2024
-
[Dynamo] Refactor get_wrapper and pickling compiled graph (#78)
The CentML compilation backend I am working on wants to wrap the CompiledGraphs forward function (the one returned by get_wrapper) in a torch.fx.GraphModule. This GraphModule would then be pickled and sent from a server to a client. However, it isn't possible to pickle the lambda/local function returned by get_wrapper. Therefore, I am turning get_wrapper into a class CompiledForwardFunction whose forward function behaves like the wrapper returned by get_wrapper. Additionally, in order to pickle CompiledForwardFunction, I have defined pickling and unpickling behaviour for CompiledGraph using __getstate__ and __setstate__ respectively. These just call CompiledGraph's existing save and load functions.
Configuration menu - View commit details
-
Copy full SHA for 53ade32 - Browse repository at this point
Copy the full SHA 53ade32View commit details
Commits on Apr 1, 2024
-
[IR] IR data types. Add @cahced_property for constant. 15% improvement (
#104) Add `@cahced_property` for constant in IR data type to improve compilation time. Measured with `$ python bench_op.py matmul_f16 --params 1x4096x4096,1x4096x4096 --dtype float16` with `hidet.option.parallel_tune(max_parallel_jobs=1)` **before: 152.5 sec after: 132.5 sec improvement is 15%**
Configuration menu - View commit details
-
Copy full SHA for ac6b8bd - Browse repository at this point
Copy the full SHA ac6b8bdView commit details
Commits on Apr 2, 2024
-
[Graph] Cross Attention Module Support (#94)
Add graph module for using flash attention and clarify some differences in flash attention vs torch sdpa. **Attention: (pun intended)** Softmax has temperature scaling option. Divides inputs by scalar, good explanation of numerical effects [here](https://medium.com/@harshit158/softmax-temperature-5492e4007f71). Used when softmax inputs QK are too big for float 16 (abs value > 65504). This usually means the numbers are so large that dividing by small (< 4) scalar has little effect. Stable diffusion does not use this, as torch spda supports float 32 (or somehow avoids NaNs from large values). No visual or significant numeric differences in this output layer noticed. Towards #57.
Configuration menu - View commit details
-
Copy full SHA for 3d5122a - Browse repository at this point
Copy the full SHA 3d5122aView commit details
Commits on Apr 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 3dd9826 - Browse repository at this point
Copy the full SHA 3dd9826View commit details -
Configuration menu - View commit details
-
Copy full SHA for df05f83 - Browse repository at this point
Copy the full SHA df05f83View commit details