Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RC] Release candidate for version 0.3.1 #442

Merged
merged 17 commits into from
Apr 3, 2024
Merged

[RC] Release candidate for version 0.3.1 #442

merged 17 commits into from
Apr 3, 2024

Commits on Mar 5, 2024

  1. Fixes to make simplest conv working (#22)

    Simple model with one conv2d failed. 
    - fix signature for conv* ops to corresponds torch.nn.functional]
    - add missed padding normalization
    
    After that the model works
    vadiklyutiy authored and hjjq committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    5169410 View commit details
    Browse the repository at this point in the history
  2. [Torch][Operator] Some operator support (#49)

    Partial changes related to #18
    hjjq authored and hjjq committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    8cccced View commit details
    Browse the repository at this point in the history
  3. Add .vscode to .gitignore (#61)

    Add .vscode to .gitignore
    vadiklyutiy authored and hjjq committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    75ca9e1 View commit details
    Browse the repository at this point in the history
  4. [CI] Shut down on demand runners on failure (#64)

    Previously, if a performance regression fails due to an exception, the
    job that stops the runner VM instances will be skipped, leaving the
    instances on. This will make the stop_instances job run even when
    previous jobs failed. Not sure if always() will override the
    inputs.shutdown_instances flag, if it does we can move it into the step
    scope.
    hjjq authored and hjjq committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    3dab1f2 View commit details
    Browse the repository at this point in the history
  5. [Compilation] Optimization for compilation process (#65)

    See details: #426
    maxyanghu authored and hjjq committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    c15fbac View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2024

  1. [Graph] Add GroupNorm module (#70)

    Module wrapper around groupnorm operator. Supports compiled app
    development.
    KTong821 committed Mar 6, 2024
    Configuration menu
    Copy the full SHA
    6b02878 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2024

  1. [App] Resnet Compiled App - Modeling (1/2) (#47)

    Adds ResNet model functionality and model hierarchy for compiled apps.
    
    Some comments in files are artifacts left for the pipeline interface
    (part 2 of this PR).
    
    See huggingface implementation for original API inspiration.
    
    Resolves #59
    KTong821 committed Mar 13, 2024
    Configuration menu
    Copy the full SHA
    abec2ee View commit details
    Browse the repository at this point in the history

Commits on Mar 20, 2024

  1. CI perf tests refactoring (#85)

    - move scripts from `.github/scripts` to `tests/benchmarks`
    - move `run_configs.json` (describes what perf tests we run) move from
    hidet-ci repo to this repo
    - add individual operators' benches via torch API (not added to CI run
    yet)
     - unify scripts to run either hidet as backend or inductor as backend
    vadiklyutiy committed Mar 20, 2024
    Configuration menu
    Copy the full SHA
    30f299e View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2024

  1. Increase batch size for vision benchmarks (#86)

    Increase batch size for vision benchmarks from 1 to 128 to
     - be close to real life example
     - deacrease fluctuation of time
    vadiklyutiy committed Mar 21, 2024
    Configuration menu
    Copy the full SHA
    5034b31 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2024

  1. [Graph] Conv2d Bias (#92)

    Add bias to Conv2d Module.
    
    Defaults to false for back compatibility, **this is different from torch
    default**.
    
    Towards #57
    KTong821 committed Mar 28, 2024
    Configuration menu
    Copy the full SHA
    487aada View commit details
    Browse the repository at this point in the history
  2. [Fixbug] Mark slow compiled app test (#89)

    Flagging slow tests as a result of huggingface dependency
    (2hrs). To debug on private CI runs.
    
    Resolves #87.
    KTong821 committed Mar 28, 2024
    Configuration menu
    Copy the full SHA
    01b7b92 View commit details
    Browse the repository at this point in the history
  3. [Graph} Add basic UNet module components (#93)

    Add some necessary module components used frequently in Stable
    Diffusion's UNet.
    
    Includes fixes to module attribute access from LLM branch and work
    arounds for torch weight copying.
    
    Towards #57.
    KTong821 committed Mar 28, 2024
    Configuration menu
    Copy the full SHA
    6566437 View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2024

  1. [Dynamo] Refactor get_wrapper and pickling compiled graph (#78)

    The CentML compilation backend I am working on wants to wrap the
    CompiledGraphs forward function (the one returned by get_wrapper) in a
    torch.fx.GraphModule. This GraphModule would then be pickled and sent
    from a server to a client.
    
    However, it isn't possible to pickle the lambda/local function returned
    by get_wrapper. Therefore, I am turning get_wrapper into a class
    CompiledForwardFunction whose forward function behaves like the wrapper
    returned by get_wrapper.
    
    Additionally, in order to pickle CompiledForwardFunction, I have defined
    pickling and unpickling behaviour for CompiledGraph using __getstate__
    and __setstate__ respectively. These just call CompiledGraph's existing
    save and load functions.
    destefy committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    53ade32 View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2024

  1. [IR] IR data types. Add @cahced_property for constant. 15% improvement (

    #104)
    
    Add `@cahced_property` for constant in IR data type to improve
    compilation time.
    
    Measured with
    `$ python bench_op.py matmul_f16 --params 1x4096x4096,1x4096x4096
    --dtype float16`
    with `hidet.option.parallel_tune(max_parallel_jobs=1)`
    
    **before: 152.5 sec
    after: 132.5 sec
    improvement is 15%**
    vadiklyutiy committed Apr 1, 2024
    Configuration menu
    Copy the full SHA
    ac6b8bd View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2024

  1. [Graph] Cross Attention Module Support (#94)

    Add graph module for using flash attention and clarify some differences
    in flash attention vs torch sdpa.
    
    **Attention: (pun intended)**
    
    Softmax has temperature scaling option. Divides inputs by scalar, good
    explanation of numerical effects
    [here](https://medium.com/@harshit158/softmax-temperature-5492e4007f71).
    
    Used when softmax inputs QK are too big for float 16 (abs value >
    65504). This usually means the numbers are so large that dividing by
    small (< 4) scalar has little effect.
    
    Stable diffusion does not use this, as torch spda supports float 32 (or
    somehow avoids NaNs from large values). No visual or significant numeric
    differences in this output layer noticed.
    
    Towards #57.
    KTong821 committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    3d5122a View commit details
    Browse the repository at this point in the history

Commits on Apr 3, 2024

  1. Configuration menu
    Copy the full SHA
    3dd9826 View commit details
    Browse the repository at this point in the history
  2. bump version to 0.3.1

    yaoyaoding committed Apr 3, 2024
    Configuration menu
    Copy the full SHA
    df05f83 View commit details
    Browse the repository at this point in the history