What is new in 0.2.0
Bug/Issue Fixes
- Fixed incorrect use of double constants in some operators
- Fixed crash when loading models that were saved on OCL devices
- Fixed default parameter of torch.ocl.synchronize
- Fixed failure of printing on Intel devices with missing fp64 support
New nets Validated
Visual transformers vit_transformets
and vit_x_NN
ets validated
New operators implemented:
resize_
,arange
,mm
,bmm
,amin
,amax
,addmm
,_native_multi_head_attention
andtransform_bias_rescale_qkv
,round
,maximum
,minimum
,prod
,atan
,dropout_native
- lt,le,gt,ge,eq,ne for tensors
- bitwise
^
,|
,&
,~
upsample_2d
: bilinear, nearest and nearest exact, forward and backward
Fixed operators
- Fixed softmax and log softmax support of dim that is not last dim
- Fixed view operator and set_ storage
- cat now supports mixed types
- Fix handling of empty tensors with non empty storage
- Very limited half tensor handling
- Fixed tensor
>
,<
==
,!=
scalar ops
New features:
- Added support of profiling via
torch.ocl.profile
API - Improved benchmark scripts
Performance improvements
- Intel Arc, UHD - enabled winograd convolution, support of OpenCL 3.0 floating point add atomics, enabled k-reduction for GEMM operators
- NVidia - added use of native atomic float add (via PTX assembly)
- GELU major improvements due to faulty use of double instead of float