Releases: NVIDIA/warp
Releases · NVIDIA/warp
v1.3.2
[1.3.2] - 2024-08-30
- Bug fixes
- Fix accuracy of 3x3 SVD
wp.svd3
with fp64 numbers (GH-281). - Fix module hashing when a kernel argument contained a struct array (GH-287).
- Fix a bug in
wp.bvh_query_ray()
where the direction instead of the reciprocal direction was used
(GH-288). - Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture
will no longer be unloaded before the graph is released. - Fix a bug in
wp.sim.collide.triangle_closest_point_barycentric()
where the returned barycentric coordinates may be
incorrect when the closest point lies on an edge. - Fix 32-bit overflow when array shape is specified using
np.int32
. - Fix handling of integer indices in the
input_output_mask
argument toautograd.jacobian
and
autograd.jacobian_fd
(GH-289). - Fix
ModelBuilder.collapse_fixed_joints()
to correctly update the body centers of mass and the
ModelBuilder.articulation_start
array. - Fix precedence of closure constants over global constants.
- Fix quadrature point indexing in
wp.fem.ExplicitQuadrature
(regression from 1.3.0).
- Fix accuracy of 3x3 SVD
- Documentation improvements
- Add missing return types for built-in functions.
- Clarify that atomic operations also return the previous value.
- Clarify that
wp.bvh_query_aabb()
returns parts that overlap the bounding volume.
[1.3.1] - 2024-07-27
- Remove
wp.synchronize()
from PyTorch autograd function example Tape.check_kernel_array_access()
andTape.reset_array_read_flags()
are now private methods.- Fix reporting unmatched argument types
[1.3.0] - 2024-07-25
- Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
v1.3.1
[1.3.1] - 2024-07-27
- Remove
wp.synchronize()
from PyTorch autograd function example Tape.check_kernel_array_access()
andTape.reset_array_read_flags()
are now private methods.- Fix reporting unmatched argument types
[1.3.0] - 2024-07-25
-
Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
-
warp.autograd
improvements:- New
warp.autograd
module with utility functionsgradcheck()
,jacobian()
, andjacobian_fd()
for debugging kernel Jacobians (docs) - Add array overwrite detection, if
wp.config.verify_autograd_array_access
is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs) - Fix bug where modification of
@wp.func_replay
functions and native snippets would not trigger module recompilation - Add documentation for dynamic loop autograd limitations
- New
-
warp.sim
improvements:- Improve memory usage and performance for rigid body contact handling when
self.rigid_mesh_contact_max
is zero (default behavior). - The
mask
argument towp.sim.eval_fk()
now accepts both integer and boolean arrays to mask articulations. - Fix handling of
ModelBuilder.joint_act
inModelBuilder.collapse_fixed_joints()
(affected floating-base systems) - Fix and improve implementation of
ModelBuilder.plot_articulation()
to visualize the articulation tree of a rigid-body mechanism - Fix ShapeInstancer
__new__()
method (missing instance return and*args
parameter) - Fix handling of
upaxis
variable inModelBuilder
and the rendering thereof inOpenGLRenderer
- Improve memory usage and performance for rigid body contact handling when
-
warp.sparse
improvements:- Sparse matrix allocations (from
bsr_from_triplets()
,bsr_axpy()
, etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously. bsr_assign()
now supports changing block shape (including CSR/BSR conversions)- Add Python operator overloads for common sparse matrix operations, e.g
A += 0.5 * B
,y = x @ C
- Sparse matrix allocations (from
-
warp.fem
new features and fixes:- Support for variable number of nodes per element
- Global
wp.fem.lookup()
operator now supportswp.fem.Tetmesh
andwp.fem.Trimesh2D
geometries - Simplified defining custom subdomains (
wp.fem.Subdomain
), free-slip boundary conditions - New field types:
wp.fem.UniformField
,wp.fem.ImplicitField
andwp.fem.NonconformingField
- New
streamlines
,magnetostatics
andnonconforming_contact
examples, updatedmixed_elasticity
to use a nonlinear model - Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of
wp.fem.PicQuadrature
w.r.t. positions and measures
v1.3.0
[1.3.0] - 2024-07-25
-
Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
compiled(compiled)
, loaded from the cache(cached)
, or was unable to be
loaded(error)
. wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.- Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
. - Add code-completion support for
wp.config
variables. - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
call - Add support for PEP 563's
from __future__ import annotations
(GH-256). - Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorch - Add faster Torch interop path using
return_ctype
argument towp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add
wp.abs()
andwp.sign()
for vector types - Expose scalar arithmetic operators to Python's runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
) - Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiability - Add
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection - Implement the assignment operator for
wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
- Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
-
warp.autograd
improvements:- New
warp.autograd
module with utility functionsgradcheck()
,jacobian()
, andjacobian_fd()
for debugging kernel Jacobians (docs) - Add array overwrite detection, if
wp.config.verify_autograd_array_access
is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs) - Fix bug where modification of
@wp.func_replay
functions and native snippets would not trigger module recompilation - Add documentation for dynamic loop autograd limitations
- New
-
warp.sim
improvements:- Improve memory usage and performance for rigid body contact handling when
self.rigid_mesh_contact_max
is zero (default behavior). - The
mask
argument towp.sim.eval_fk()
now accepts both integer and boolean arrays to mask articulations. - Fix handling of
ModelBuilder.joint_act
inModelBuilder.collapse_fixed_joints()
(affected floating-base systems) - Fix and improve implementation of
ModelBuilder.plot_articulation()
to visualize the articulation tree of a rigid-body mechanism - Fix ShapeInstancer
__new__()
method (missing instance return and*args
parameter) - Fix handling of
upaxis
variable inModelBuilder
and the rendering thereof inOpenGLRenderer
- Improve memory usage and performance for rigid body contact handling when
-
warp.sparse
improvements:- Sparse matrix allocations (from
bsr_from_triplets()
,bsr_axpy()
, etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously. bsr_assign()
now supports changing block shape (including CSR/BSR conversions)- Add Python operator overloads for common sparse matrix operations, e.g
A += 0.5 * B
,y = x @ C
- Sparse matrix allocations (from
-
warp.fem
new features and fixes:- Support for variable number of nodes per element
- Global
wp.fem.lookup()
operator now supportswp.fem.Tetmesh
andwp.fem.Trimesh2D
geometries - Simplified defining custom subdomains (
wp.fem.Subdomain
), free-slip boundary conditions - New field types:
wp.fem.UniformField
,wp.fem.ImplicitField
andwp.fem.NonconformingField
- New
streamlines
,magnetostatics
andnonconforming_contact
examples, updatedmixed_elasticity
to use a nonlinear model - Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of
wp.fem.PicQuadrature
w.r.t. positions and measures
v1.2.2
[1.2.2] - 2024-07-04
- Support for NumPy >= 2.0
[1.2.1] - 2024-06-14
- Fix generic function caching
- Fix Warp not being initialized when constructing arrays with
wp.array()
- Fix
wp.is_mempool_access_supported()
not resolving the provided device arguments towp.context.Device
[1.2.0] - 2024-06-06
- Add a not-a-number floating-point constant that can be used as
wp.NAN
orwp.nan
. - Add
wp.isnan()
,wp.isinf()
, andwp.isfinite()
for scalars, vectors, matrices, etc. - Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by allwp.constant()
variables declared in a Warp program. - Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory. - Add runtime checks for
wp.MarchingCubes
on field dimensions and size - Fix memory leak in
wp.Mesh
BVH (GH-225) - Use C++17 when building the Warp library and user kernels
- Increase PTX target architecture up to
sm_75
(fromsm_70
), enabling Turing ISA features - Extended NanoVDB support (see
warp.Volume
):- Add support for data-agnostic index grids, allocation at voxel granularity
- New
wp.volume_lookup_index()
,wp.volume_sample_index()
and genericwp.volume_sample()
/wp.volume_lookup()
/wp.volume_store()
kernel-level functions - Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
warp.fem
can now work directly on NanoVDB grids usingwarp.fem.Nanogrid
- Fixed
wp.volume_sample_v()
andwp.volume_store_*()
adjoints - Prevent
wp.volume_store()
from overwriting grid background values
- Improve validation of user-provided fields and values in
warp.fem
- Support headless rendering of
wp.render.OpenGLRenderer
viapyglet.options["headless"] = True
wp.render.RegisteredGLBuffer
can fall back to CPU-bound copying if CUDA/OpenGL interop is not available- Clarify terms for external contributions, please see CONTRIBUTING.md for details
- Improve performance of
wp.sparse.bsr_mm()
by ~5x on benchmark problems - Fix for XPBD incorrectly indexing into of joint actuations
joint_act
arrays - Fix for mass matrix gradients computation in
wp.sim.FeatherstoneIntegrator()
- Fix for handling of
--msvc_path
in build scripts - Fix for
wp.copy()
params to record dest and src offset parameters onwp.Tape()
- Fix for
wp.randn()
to ensure return values are finite - Fix for slicing of arrays with gradients in kernels
- Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
- Fix for handling of
bool
types in generic kernels - Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details
v1.2.1
[1.2.1] - 2024-06-14
- Fix generic function caching
- Fix Warp not being initialized when constructing arrays with
wp.array()
- Fix
wp.is_mempool_access_supported()
not resolving the provided device arguments towp.context.Device
[1.2.0] - 2024-06-06
- Add a not-a-number floating-point constant that can be used as
wp.NAN
orwp.nan
. - Add
wp.isnan()
,wp.isinf()
, andwp.isfinite()
for scalars, vectors, matrices, etc. - Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by allwp.constant()
variables declared in a Warp program. - Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory. - Add runtime checks for
wp.MarchingCubes
on field dimensions and size - Fix memory leak in
wp.Mesh
BVH (GH-225) - Use C++17 when building the Warp library and user kernels
- Increase PTX target architecture up to
sm_75
(fromsm_70
), enabling Turing ISA features - Extended NanoVDB support (see
warp.Volume
):- Add support for data-agnostic index grids, allocation at voxel granularity
- New
wp.volume_lookup_index()
,wp.volume_sample_index()
and genericwp.volume_sample()
/wp.volume_lookup()
/wp.volume_store()
kernel-level functions - Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
warp.fem
can now work directly on NanoVDB grids usingwarp.fem.Nanogrid
- Fixed
wp.volume_sample_v()
andwp.volume_store_*()
adjoints - Prevent
wp.volume_store()
from overwriting grid background values
- Improve validation of user-provided fields and values in
warp.fem
- Support headless rendering of
wp.render.OpenGLRenderer
viapyglet.options["headless"] = True
wp.render.RegisteredGLBuffer
can fall back to CPU-bound copying if CUDA/OpenGL interop is not available- Clarify terms for external contributions, please see CONTRIBUTING.md for details
- Improve performance of
wp.sparse.bsr_mm()
by ~5x on benchmark problems - Fix for XPBD incorrectly indexing into of joint actuations
joint_act
arrays - Fix for mass matrix gradients computation in
wp.sim.FeatherstoneIntegrator()
- Fix for handling of
--msvc_path
in build scripts - Fix for
wp.copy()
params to record dest and src offset parameters onwp.Tape()
- Fix for
wp.randn()
to ensure return values are finite - Fix for slicing of arrays with gradients in kernels
- Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
- Fix for handling of
bool
types in generic kernels - Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details
v1.2.0
[1.2.0] - 2024-06-06
- Add a not-a-number floating-point constant that can be used as
wp.NAN
orwp.nan
. - Add
wp.isnan()
,wp.isinf()
, andwp.isfinite()
for scalars, vectors, matrices, etc. - Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by allwp.constant()
variables declared in a Warp program. - Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory. - Add runtime checks for
wp.MarchingCubes
on field dimensions and size - Fix memory leak in
wp.Mesh
BVH (GH-225) - Use C++17 when building the Warp library and user kernels
- Increase PTX target architecture up to
sm_75
(fromsm_70
), enabling Turing ISA features - Extended NanoVDB support (see
warp.Volume
):- Add support for data-agnostic index grids, allocation at voxel granularity
- New
wp.volume_lookup_index()
,wp.volume_sample_index()
and genericwp.volume_sample()
/wp.volume_lookup()
/wp.volume_store()
kernel-level functions - Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
warp.fem
can now work directly on NanoVDB grids usingwarp.fem.Nanogrid
- Fixed
wp.volume_sample_v()
andwp.volume_store_*()
adjoints - Prevent
wp.volume_store()
from overwriting grid background values
- Improve validation of user-provided fields and values in
warp.fem
- Support headless rendering of
wp.render.OpenGLRenderer
viapyglet.options["headless"] = True
wp.render.RegisteredGLBuffer
can fall back to CPU-bound copying if CUDA/OpenGL interop is not available- Clarify terms for external contributions, please see CONTRIBUTING.md for details
- Improve performance of
wp.sparse.bsr_mm()
by ~5x on benchmark problems - Fix for XPBD incorrectly indexing into of joint actuations
joint_act
arrays - Fix for mass matrix gradients computation in
wp.sim.FeatherstoneIntegrator()
- Fix for handling of
--msvc_path
in build scripts - Fix for
wp.copy()
params to record dest and src offset parameters onwp.Tape()
- Fix for
wp.randn()
to ensure return values are finite - Fix for slicing of arrays with gradients in kernels
- Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
- Fix for handling of
bool
types in generic kernels - Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details
[1.1.1] - 2024-05-24
wp.init()
is no longer required to be called explicitly and will be performed on first call to the API- Speed up
omni.warp.core
's startup time
v1.1.0
[1.1.0] - 2024-05-09
- Support returning a value from
@wp.func_native
CUDA functions using type hints - Improved differentiability of the
wp.sim.FeatherstoneIntegrator
- Fix gradient propagation for rigid body contacts in
wp.sim.collide()
- Added support for event-based timing, see
wp.ScopedTimer()
- Added Tape visualization and debugging functions, see
wp.Tape.visualize()
- Support constructing Warp arrays from objects that define the
__cuda_array_interface__
attribute - Support copying a struct to another device, use
struct.to(device)
to migrate struct arrays - Allow rigid shapes to not have any collisions with other shapes in
wp.sim.Model
- Change default test behavior to test redundant GPUs (up to 2x)
- Test each example in an individual subprocess
- Polish and optimize various examples and tests
- Allow non-contiguous point arrays to be passed to
wp.HashGrid.build()
- Upgrade LLVM to 18.1.3 for from-source builds and Linux x86-64 builds
- Build DLL source code as C++17 and require GCC 9.4 as a minimum
- Array clone, assign, and copy are now differentiable
- Use
Ruff
for formatting and linting - Various documentation improvements (infinity, math constants, etc.)
- Improve URDF importer, handle joint armature
- Allow builtins.bool to be used in Warp data structures
- Use external gradient arrays in backward passes when passed to
wp.launch()
- Add Conjugate Residual linear solver, see
wp.optim.linear.cr()
- Fix propagation of gradients on aliased copy of variables in kernels
- Facilitate debugging and speed up
import warp
by eliminating raising any exceptions - Improve support for nested vec/mat assignments in structs
- Recommend Python 3.9 or higher, which is required for JAX and soon PyTorch.
- Support gradient propagation for indexing sliced multi-dimensional arrays, i.e.
a[i][j]
vs.a[i, j]
- Provide an informative message if setting DLL C-types failed, instructing to try rebuilding the library
[1.0.3] - 2024-04-17
- Add a
support_level
entry to the configuration file of the extensions
v1.0.2
v1.0.1
[1.0.1] - 2024-03-15
- Document Device
total_memory
andfree_memory
- Documentation for allocators, streams, peer access, and generics
- Changed example output directory to current working directory
- Added
python -m warp.examples.browse
for browsing the examples folder - Print where the USD stage file is being saved
- Added
examples/optim/example_walker.py
sample - Make the drone example not specific to USD
- Reduce the time taken to run some examples
- Optimise rendering points with a single colour
- Clarify an error message around needing USD
- Raise exception when module is unloaded during graph capture
- Added
wp.synchronize_event()
for blocking the host thread until a recorded event completes - Flush C print buffers when ending
stdout
capture - Remove more unneeded CUTLASS files
- Allow setting mempool release threshold as a fractional value
v1.0.0
[1.0.0] - 2024-03-07
- Add
FeatherstoneIntegrator
which provides more stable simulation of articulated rigid body dynamics in generalized coordinates (State.joint_q
andState.joint_qd
) - Introduce
warp.sim.Control
struct to store control inputs for simulations (optional, by default theModel
control inputs are used as before); integrators now have a different simulation signature:integrator.simulate(model: Model, state_in: State, state_out: State, dt: float, control: Control)
joint_act
can now behave in 3 modes: withjoint_axis_mode
set toJOINT_MODE_FORCE
it behaves as a force/torque, withJOINT_MODE_VELOCITY
it behaves as a velocity target, and withJOINT_MODE_POSITION
it behaves as a position target;joint_target
has been removed- Add adhesive contact to Euler integrators via
Model.shape_materials.ka
which controls the contact distance at which the adhesive force is applied - Improve handling of visual/collision shapes in URDF importer so visual shapes are not involved in contact dynamics
- Experimental JAX kernel callback support
- Improve module load exception message
- Add
wp.ScopedCapture
- Removing
enable_backward
warning for callables - Copy docstrings and annotations from wrapped kernels, functions, structs