-
Notifications
You must be signed in to change notification settings - Fork 198
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Jiaruifang/update readme (#14) update readme set PyTorch version as 1.4.0 use 0.2.0 as turbo version * Jiaruifang/update readme (#16) upgrade onnxrt to v1.2.0 in dev cpu docker add how to use dockerhub get a CPU version * Jiaruifang/update readme (#17) * update readme * update readme * add benchmark compared with gemmlowp * return hidden_states from BertModel * update p40 speedup fig * add github action hook for branch develop * revert to back a mat mul benchmark unitest * Set PyTorch cpu version as 1.4.0 * fix a typo * rm torchvision in docker_ci * update readme and use 0.2.0 as version * upgrade onnxrt to v1.2.0 in dev cpu docker add how to use dockerhub get a CPU version * fix a typo * delete turbotransformer. and add a blank line in readme (#20) * remove duplicated licenses comments. update readme, more accuately describing variable-length for onnxruntime. * Jiaruifang/polish (#30) remove duplicated licenses comments. update readme, more accurately describing variable-length for onnxruntime. * because add hidden state in bert layer, fix it in sequence classification (#36) * Jiaruifang/amd blis (#69) add blis support for AMD cpus. * Jiaruifang/decoder gpu allocator (#85) * Jiaruifang/multi head attn (#29) Add a more functional multiheadedattention. Add positionwise-feed-forward. Add multiheadedattention. * Jiaruifang/transformer decoder layer (#32) add TransformerDecoderLayer * Jiaruifang/transformer decoder layer (#33) * add TransformerDecoderLayer * check multi headed attn's max_relative_positions be 0 * Jiaruifang/transformer decoder (#35) fix multi_headed_attention_test.py bug * Jiaruifang/fixbug multiheadedattn (#40) * add attn as return values for decoder * check attns in decoder_transformer_decoder_layer_test * fix multi_headed_attention_test.py bug * add set_stderr_verbose_level python interface * add profiling method for decoder_multi_headed_attn_test * fix bugs in multiheadedattn cased by mask * option of WITH_PROFILER in CMakeLists set as OFF * fix bug for profiler * Jiaruifang/weight trans ffn (#43) * profile ffn. tuned weight transpose for intel 61xx * finetuned multi_headed_attention layer * fix some bugs. * Jiaruifang/merge bert multiheaded attn (#49) use multiheaded attn to do bert attention * Jiaruifang/gpu decoder (#51) add gpu transformer decoder implementation. using cub::cachingallocator still has some bugs to be fixed. performance to be tuned. * add layernorm support for multi heade attn from_torch * fix a bug in from_torch of MultiHeadedAttention * fix bugs from attn masks in transformer decoder layer. (#64) * fix bugs from attn masks in transformer decoder layer. * polish code * Jiaruifang/debug decoder layer mask (#68) transformere decoder mask float -> bool make multiheaded attn is able to get layer_cache as input parameter. add layer_cache for self attn. self attn layer_cache * softmax supports 3D mask (#72) gpu softmax support 3D mask. * Develop (#74) Add blis support for AMD cpus. * init best fit cuda allocator. * fix a bug of GetInstance * TODO remove temp tensor * remove temp tensor. * fix a bug * add cuda allocator unitests. * fix a bug in best fit cuda allocator. * more unitests for cuda allocator. * a wrong verion, all gpu unitests do not pass. * add comments for best fit and upgrade release version. * merge decoder and best fit cuda memory allocator. * update readme * Jiaruifang/cpu allocator (#88) * Develop (#74) Add blis support for AMD cpus. * add cpu best fit allocator. * Jiaruifang/debug decoder layer mask (#89) * add cpu best fit allocator. * fix a bug in allocator test. * fix tgt_pad_mask bug * update README * revert back to cub allocator * Jiaruifang/benchmark amd blas (#90) * Develop (#74) Add blis support for AMD cpus. * Polish the benchmark code for BLAS on AMD CPU. * add general GEMM benchmark. * show blas type in matmul_benchmark * Jiaruifang/gpu timer (#91) * add gpu profiler. * fix a bug caused by attn_score in bert attention. * fix attn_score bug. * Jiaruifang/gpu concat (#92) * add gpu profiler. * fix a bug caused by attn_score in bert attention. * fix attn_score bug. * accelerate GPU concat * add loss file * Jiaruifang/profiler kernels (#97) * add gpu profiler. * fix a bug caused by attn_score in bert attention. * fix attn_score bug. * accelerate GPU concat * add loss file * print profiling result in increasing order. Fix the best fit cuda allocator bug. * move profiler into functions. * Jiaruifang/fix bestfit bug (#98) * Develop (#74) Add blis support for AMD cpus. * fix a bug in cpp mask (#95) * Fix bestfit allocator bug. * Update readme * Jiaruifang/fix bestfit bug (#99) * Develop (#74) Add blis support for AMD cpus. * fix a bug in cpp mask (#95) * Fix bestfit allocator bug. * Update readme * add a missing file. * update readme, and fix attn score bug in bert_attn (#100) * update readme, and fix attn score bug in bert_attn * fix shared ptr bug. * fix cuda c++11 bug. * Jiaruifang/decoder readme (#101) * update readme, and fix attn score bug in bert_attn * fix shared ptr bug. * fix cuda c++11 bug. * Update Readme Co-authored-by: shicheng <[email protected]>
- Loading branch information
1 parent
af84878
commit 72097bf
Showing
74 changed files
with
3,788 additions
and
630 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.