Release 0.2.0

Latest

Latest

artyom-beilis released this 04 Sep 21:30

· 2 commits to master since this release

What is new in 0.2.0

Bug/Issue Fixes

Fixed incorrect use of double constants in some operators
Fixed crash when loading models that were saved on OCL devices
Fixed default parameter of torch.ocl.synchronize
Fixed failure of printing on Intel devices with missing fp64 support

New nets Validated

Visual transformers vit_transformets and vit_x_NN ets validated

New operators implemented:

resize_, arange, mm, bmm, amin, amax, addmm, _native_multi_head_attention and transform_bias_rescale_qkv, round, maximum, minimum, prod, atan, dropout_native
lt,le,gt,ge,eq,ne for tensors
bitwise ^, |, &, ~
upsample_2d : bilinear, nearest and nearest exact, forward and backward

Fixed operators

Fixed softmax and log softmax support of dim that is not last dim
Fixed view operator and set_ storage
cat now supports mixed types
Fix handling of empty tensors with non empty storage
Very limited half tensor handling
Fixed tensor >, < ==, != scalar ops

New features:

Added support of profiling via torch.ocl.profile API
Improved benchmark scripts

Performance improvements

Intel Arc, UHD - enabled winograd convolution, support of OpenCL 3.0 floating point add atomics, enabled k-reduction for GEMM operators
NVidia - added use of native atomic float add (via PTX assembly)
GELU major improvements due to faulty use of double instead of float

Assets 15