Releases: NVIDIA/warp
Releases · NVIDIA/warp
v1.0.0-beta.6
[1.0.0-beta.6] - 2024-01-10
- Do not create CPU copy of grad array when calling
array.numpy()
- Fix
assert_np_equal()
bug - Support Linux AArch64 platforms, including Jetson/Tegra devices
- Add parallel testing runner (invoke with
python -m warp.tests
, usewarp/tests/unittest_serial.py
for serial testing) - Fix support for function calls in
range()
matmul
adjoints now accumulate- Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
- Fix multi-gpu synchronization issue in
sparse.py
- Add depth rendering to
OpenGLRenderer
, documentwarp.render
- Make
atomic_min
,atomic_max
differentiable - Fix error reporting using the exact source segment
- Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
- Address multiple differentiability issues
- Fix backpropagation for returning array element references
- Support passing the return value to adjoints
- Add point basis space and explicit point-based quadrature for
warp.fem
- Support overriding the LLVM project source directory path using
build_lib.py --build_llvm --llvm_source_path=
- Fix the error message for accessing non-existing attributes
- Flatten faces array for Mesh constructor in URDF parser
v1.0.0-beta.5
[1.0.0-beta.5] - 2023-11-22
- Fix for kernel caching when function argument types change
- Fix code-gen ordering of dependent structs
- Fix for
wp.Mesh
build on MGPU systems - Fix for name clash bug with adjoint code: #154
- Add
wp.frac()
for returning the fractional part of a floating point value - Add support for custom native CUDA snippets using
@wp.func_native
decorator - Add support for batched matmul with batch size > 2^16-1
- Add support for tranposed CUTLASS
wp.matmul()
and additional error checking - Add support for quad and hex meshes in
wp.fem
- Detect and warn when C++ runtime doesn't match compiler during build, e.g.: libstdc++.so.6: version `GLIBCXX_3.4.30' not found
- Documentation update for
wp.BVH
- Documentaiton and simplified API for runtime kernel specialization
wp.Kernel
[1.0.0-beta.4] - 2023-11-01
- Add
wp.cbrt()
for cube root calculation - Add
wp.mesh_furthest_point_no_sign()
to compute furthest point on a surface from a query point - Add support for GPU BVH builds, 10-100x faster than CPU builds for large meshes
- Add support for chained comparisons, i.e.:
0 < x < 2
- Add support for running
warp.fem
examples headless - Fix for unit test determinism
- Fix for possible GC collection of array during graph capture
- Fix for
wp.utils.array_sum()
output initialization when used with vector types - Coverage and documentation updates
[1.0.0-beta.3] - 2023-10-19
- Add support for code coverage scans (test_coverage.py), coverage at 85% in omni.warp.core
- Add support for named component access for vector types, e.g.:
a = v.x
- Add support for lvalue expressions, e.g.:
array[i] += b
- Add casting constructors for matrix and vector types
- Add support for
type()
operator that can be used to return type inside kernels - Add support for grid-stride kernels to support kernels with > 2^31-1 thread blocks
- Fix for multi-process initialization warnings
- Fix alignment issues with empty
wp.struct
- Fix for return statement warning with tuple-returning functions
- Fix for
wp.batched_matmul()
registering the wrong function in the Tape - Fix and document for
wp.sim
forward + inverse kinematics - Fix for
wp.func
to return a default value if function does not return on all control paths - Refactor
wp.fem
support for new basis functions, decoupled function spaces - Optimizations for
wp.noise
functions, up to 10x faster in most cases - Optimizations for
type_size_in_bytes()
used in array construction
[1.0.0-beta.2] - 2023-09-01
- Fix for passing bool into
wp.func
functions - Fix for deprecation warnings appearing on
stderr
, now redirected tostdout
- Fix for using
for i in wp.hash_grid_query(..)
syntax
[1.0.0-beta.1] - 2023-08-29
- Fix for
wp.float16
being passed as kernel arguments - Fix for compile errors with kernels using structs in backward pass
- Fix for
wp.Mesh.refit()
not being CUDA graph capturable due to synchronous temp. allocs - Fix for dynamic texture example flickering / MGPU crashes demo in Kit by reusing
ui.DynamicImageProvider
instances - Fix for a regression that disabled bundle change tracking in samples
- Fix for incorrect surface velocities when meshes are deforming in
OgnClothSimulate
- Fix for incorrect lower-case when setting USD stage "up_axis" in examples
- Fix for incompatible gradient types when wrapping PyTorch tensor as a vector or matrix type
- Fix for adding open edges when building cloth constraints from meshes in
wp.sim.ModelBuilder.add_cloth_mesh()
- Add support for
wp.fabricarray
to directly access Fabric data from Warp kernels, see https://omniverse.gitlab-master-pages.nvidia.com/usdrt/docs/usdrt_prim_selection.html for examples - Add support for user defined gradient functions, see
@wp.func_replay
, and@wp.func_grad
decorators - Add support for more OG attribute types in
omni.warp.from_omni_graph()
- Add support for creating NanoVDB
wp.Volume
objects from dense NumPy arrays - Add support for
wp.volume_sample_grad_f()
which returns the value + gradient efficiently from an NVDB volume - Add support for LLVM fp16 intrinsics for half-precision arithmetic
- Add implementation of stochastic gradient descent, see
wp.optim.SGD
- Add
warp.fem
framework for solving weak-form PDE problems (see https://nvidia.github.io/warp/_build/html/modules/fem.html) - Optimizations for
omni.warp
extension load time (2.2s to 625ms cold start) - Make all
omni.ui
dependencies optional so that Warp unit tests can run headless - Deprecation of
wp.tid()
outside of kernel functions, users should passtid()
values towp.func
functions explicitly - Deprecation of
wp.sim.Model.flatten()
for returning all contained tensors from the model - Add support for clamping particle max velocity in
wp.sim.Model.particle_max_velocity
- Remove dependency on
urdfpy
package, improve MJCF parser handling of default values
v0.10.1
[0.10.1] - 2023-07-25
- Fix for large multidimensional kernel launches (> 2^32 threads)
- Fix for module hashing with generics
- Fix for unrolling loops with break or continue statements (will skip unrolling)
- Fix for passing boolean arguments to build_lib.py (previously ignored)
- Fix build warnings on Linux
- Fix for creating array of structs from NumPy structured array
- Fix for regression on kernel load times in Kit when using warp.sim
- Update
warp.array.reshape()
to handle-1
dimensions - Update margin used by for mesh queries when using
wp.sim.create_soft_body_contacts()
- Improvements to gradient handling with
warp.from_torch()
,warp.to_torch()
plus documentation
[0.10.0] - 2023-07-05
- Add support for macOS universal binaries (x86 + aarch64) for M1+ support
- Add additional methods for SDF generation please see the following new methods:
wp.mesh_query_point_nosign()
- closest point query with no sign determinationwp.mesh_query_point_sign_normal()
- closest point query with sign from angle-weighted normalwp.mesh_query_point_sign_winding_number()
- closest point query with fast winding number sign determination
- Add CSR/BSR sparse matrix support, see
warp.sparse
module:wp.sparse.BsrMatrix
wp.sparse.bsr_zeros()
,wp.sparse.bsr_set_from_triplets()
for constructionwp.sparse.bsr_mm()
,wp.sparse_bsr_mv()
for matrix-matrix and matrix-vector products respectively
- Add array-wide utilities:
wp.utils.array_scan()
- prefix sum (inclusive or exlusive)wp.utils.array_sum()
- sum across arraywp.utils.radix_sort_pairs()
- in-place radix sort (key,value) pairs
- Add support for calling
@wp.func
functions from Python (outside of kernel scope) - Add support for recording kernel launches using a
wp.Launch
object that can be replayed with low overhead, usewp.launch(..., record_cmd=True)
to generate a command object - Optimizations for
wp.struct
kernel arguments, up to 20x faster launches for kernels with large structs or number of params - Refresh USD samples to use bundle based workflow + change tracking
- Add Python API for manipulating mesh and point bundle data in OmniGraph, see
omni.warp.nodes
module- See
omni.warp.nodes.mesh_create_bundle()
,omni.warp.nodes.mesh_get_points()
, etc.
- See
- Improvements to
wp.array
:- Fix a number of array methods misbehaving with empty arrays
- Fix a number of bugs and memory leaks related to gradient arrays
- Fix array construction when creating arrays in pinned memory from a data source in pageable memory
wp.empty()
no longer zeroes-out memory and returns an uninitialized array, as intendedarray.zero_()
andarray.fill_()
work with non-contiguous arrays- Support wrapping non-contiguous NumPy arrays without a copy
- Support preserving the outer dimensions of NumPy arrays when wrapping them as Warp arrays of vector or matrix types
- Improve PyTorch and DLPack interop with Warp arrays of arbitrary vectors and matrices
array.fill_()
can now take lists or other sequences when filling arrays of vectors or matrices, e.g.arr.fill_([[1, 2], [3, 4]])
array.fill_()
now works with arrays of structs (pass a struct instance)wp.copy()
gracefully handles copying between non-contiguous arrays on different devices- Add
wp.full()
andwp.full_like()
, e.g.,a = wp.full(shape, value)
- Add optional
device
argument towp.empty_like()
,wp.zeros_like()
,wp.full_like()
, andwp.clone()
- Add
indexedarray
methods.zero_()
,.fill_()
, and.assign()
- Fix
indexedarray
methods.numpy()
and.list()
- Fix
array.list()
to work with arrays of any Warp data type - Fix
array.list()
synchronization issue with CUDA arrays array.numpy()
called on an array of structs returns a structured NumPy array with named fields- Improve the performance of creating arrays
- Fix for
Error: No module named 'omni.warp.core'
when running some Kit configurations (e.g.: stubgen) - Fix for
wp.struct
instance address being included in module content hash - Fix codegen with overridden function names
- Fix for kernel hashing so it occurs after code generation and before loading to fix a bug with stale kernel cache
- Fix for
wp.BVH.refit()
when executed on the CPU - Fix adjoint of
wp.struct
constructor - Fix element accessors for
wp.float16
vectors and matrices in Python - Fix
wp.float16
members in structs - Remove deprecated
wp.ScopedCudaGuard()
, please usewp.ScopedDevice()
instead
v0.9.0
[0.9.0] - 2023-06-01
- Add support for in-place modifications to vector, matrix, and struct types inside kernels (will warn during backward pass with
wp.verbose
if using gradients) - Add support for step-through VSCode debugging of kernel code with standalone LLVM compiler, see
wp.breakpoint()
, andtest_debug.py
- Add support for default values on built-in functions
- Add support for multi-valued
@wp.func
functions - Add support for
pass
,continue
, andbreak
statements - Add missing
__sincos_stret
symbol for macOS - Add support for gradient propagation through
wp.Mesh.points
, and other cases where arrays are passed to native functions - Add support for Python
@
operator as an alias forwp.matmul()
- Add XPBD support for particle-particle collision
- Add support for individual particle radii:
ModelBuilder.add_particle
has a newradius
argument,Model.particle_radius
is now a Warp array - Add per-particle flags as a
Model.particle_flags
Warp array, introducePARTICLE_FLAG_ACTIVE
to define whether a particle is being simulated and participates in contact dynamics - Add support for Python bitwise operators
&
,|
,~
,<<
,>>
- Switch to using standalone LLVM compiler by default for
cpu
devices - Split
omni.warp
intoomni.warp.core
for Omniverse applications that want to use the Warp Python module with minimal additional dependencies - Disable kernel gradient generation by default inside Omniverse for improved compile times
- Fix for bounds checking on element access of vector/matrix types
- Fix for stream initialization when a custom (non-primary) external CUDA context has been set on the calling thread
- Fix for duplicate
@wp.struct
registration during hot reload - Fix for array
unot()
operator so kernel writers can useif not array:
syntax - Fix for case where dynamic loops are nested within unrolled loops
- Change
wp.hash_grid_point_id()
now returns -1 if thewp.HashGrid
has not been reserved before - Deprecate
wp.Model.soft_contact_distance
which is now replaced bywp.Model.particle_radius
- Deprecate single scalar particle radius (should be a per-particle array)
v0.7.0
[0.7.0] - 2023-02-13
- Add support for arbitrary length / type vector and matrices e.g.:
wp.vec(length=7, dtype=wp.float16)
, seewp.vec()
, andwp.mat()
- Add support for
array.flatten()
,array.reshape()
, andarray.view()
with NumPy semantics - Add support for slicing
wp.array
types in Python - Add
wp.from_ptr()
helper to construct arrays from an existing allocation - Add support for
break
statements in ranged-for and while loops (backward pass support currently not implemented) - Add built-in mathematic constants, see
wp.pi
,wp.e
,wp.log2e
, etc - Add built-in conversion between degrees and radians, see
wp.degrees()
,wp.radians()
- Add security pop-up for Kernel Node
- Improve error handling for kernel return values
v0.5.0
[0.5.0] - 2022-10-31
- Add smoothed particle hydrodynamics (SPH) example, see
example_sph.py
- Add support for accessing
array.shape
inside kernels, e.g.:width = arr.shape[0]
- Add dependency tracking to hot-reload modules if dependencies were modified
- Add lazy acquisition of CUDA kernel contexts (save ~300Mb of GPU memory in MGPU environments)
- Add BVH object, see
wp.Bvh
andbvh_query_ray()
,bvh_query_aabb()
functions - Add component index operations for
spatial_vector
,spatial_matrix
types - Add
wp.lerp()
andwp.smoothstep()
builtins - Add
wp.optim
module with implementation of the Adam optimizer for float and vector types - Add support for transient Python modules (fix for Houdini integration)
- Add
wp.length_sq()
,wp.trace()
for vector / matrix types respectively - Add missing adjoints for
wp.quat_rpy()
,wp.determinant()
- Add
wp.atomic_min()
,wp.atomic_max()
operators - Add vectorized version of
warp.sim.model.add_cloth_mesh()
- Add NVDB volume allocation API, see
wp.Volume.allocate()
, andwp.Volume.allocate_by_tiles()
- Add NVDB volume write methods, see
wp.volume_store_i()
,wp.volume_store_f()
,wp.volume_store_v()
- Add MGPU documentation
- Add example showing how to compute Jacobian of multiple environements in parallel, see
example_jacobian_ik.py
- Add
wp.Tape.zero()
support forwp.struct
types - Make SampleBrowser an optional dependency for Kit extension
- Make
wp.Mesh
object accept both 1d and 2d arrays of face vertex indices - Fix for reloading of class member kernel / function definitions using
importlib.reload()
- Fix for hashing of
wp.constants()
not invalidating kernels - Fix for reload when multiple
.ptx
versions are present - Improved error reporting during code-gen
v0.4.3
[0.4.3] - 2022-09-20
- Update all samples to use GPU interop path by default
- Fix for arrays > 2GB in length
- Add support for per-vertex USD mesh colors with warp.render class
[0.4.2] - 2022-09-07
- Register Warp samples to the sample browser in Kit
- Add NDEBUG flag to release mode kernel builds
- Fix for particle solver node when using a large number of particles
- Fix for broken cameras in Warp sample scenes
[0.4.1] - 2022-08-30
- Add geometry sampling methods, see
wp.sample_unit_cube()
,wp.sample_unit_disk()
, etc - Add
wp.lower_bound()
for searching sorted arrays - Add an option for disabling code-gen of backward pass to improve compilation times, see
wp.set_module_options({"enable_backward": False})
, True by default - Fix for using Warp from Script Editor or when module does not have a
__file__
attribute - Fix for hot reload of modules containing
wp.func()
definitions - Fix for debug flags not being set correctly on CUDA when
wp.config.mode == "debug"
, this enables bounds checking on CUDA kernels in debug mode - Fix for code gen of functions that do not return a value
[0.4.0] - 2022-08-09
- Fix for FP16 conversions on GPUs without hardware support
- Fix for
runtime = None
errors when reloading the Warp module - Fix for PTX architecture version when running with older drivers, see
wp.config.ptx_target_arch
- Fix for USD imports from
__init__.py
, defer them to individual functions that need them - Fix for robustness issues with sign determination for
wp.mesh_query_point()
- Fix for
wp.HashGrid
memory leak when creating/destroying grids - Add CUDA version checks for toolkit and driver
- Add support for cross-module
@wp.struct
references - Support running even if CUDA initialization failed, use
wp.is_cuda_available()
to check availability - Statically linking with the CUDA runtime library to avoid deployment issues
v0.3.1
[0.3.1] - 2022-07-12
- Fix for marching cubes reallocation after initialization
- Add support for closest point between line segment tests, see
wp.closest_point_edge_edge()
builtin - Add support for per-triangle elasticity coefficients in simulation, see
wp.sim.ModelBuilder.add_cloth_mesh()
[0.3.0] - 2022-07-08
- Add support for FP16 storage type, see
wp.float16
- Add support for per-dimension byte strides, see
wp.array.strides
- Add support for passing Python classes as kernel arguments, see
@wp.struct
decorator - Add additional bounds checks for builtin matrix types
- Add additional floating point checks, see
wp.config.verify_fp
- Add interleaved user source with generated code to aid debugging
- Add generalized GPU marching cubes implementation, see
wp.MarchingCubes
class - Add additional scalar*matrix vector operators
- Add support for retrieving a single row from builtin types, e.g.:
r = m33[i]
- Add
wp.log2()
andwp.log10()
builtins - Add support for quickly instancing
wp.sim.ModelBuilder
objects to improve env. creation performance for RL - Remove custom CUB version and improve compatability with CUDA 11.7
- Fix to preserve external user-gradients when calling
wp.Tape.zero()
- Fix to only allocate gradient of a Torch tensor if
requires_grad=True
- Fix for missing
wp.mat22
constructor adjoint - Fix for ray-cast precision in edge case on GPU (watertightness issue)
- Fix for kernel hot-reload when definition changes
- Fix for NVCC warnings on Linux
- Fix for generated function names when kernels are defined as class functions
- Fix for reload of generated CPU kernel code on Linux
- Fix for example scripts to output USD at 60 timecodes per-second (better Kit compatibility)
v0.2.3
[0.2.3] - 2022-06-13
- Fix for incorrect 4d array bounds checking
- Fix for
wp.constant
changes not updating module hash - Fix for stale CUDA kernel cache when CPU kernels launched first
- Array gradients are now allocated along with the arrays and accessible as
wp.array.grad
, users should take care to always callwp.Tape.zero()
to clear gradients between different invocations ofwp.Tape.backward()
- Added
wp.array.fill_()
to set all entries to a scalar value (4-byte values only currently)
Breaking Changes
- Tape
capture
option has been removed, users can now capture tapes inside existing CUDA graphs (e.g.: inside Torch) - Scalar loss arrays should now explicitly set
requires_grad=True
at creation time
[0.2.2] - 2022-05-30
- Fix for
from import *
inside Warp initialization - Fix for body space velocity when using deforming Mesh objects with scale
- Fix for noise gradient discontinuities affecting
wp.curlnoise()
- Fix for
wp.from_torch()
to correctly preserve shape - Fix for URDF parser incorrectly passing density to scale parameter
- Optimizations for startup time from 3s -> 0.3s
- Add support for custom kernel cache location, Warp will now store generated binaries in the user's application directory
- Add support for cross-module function references, e.g.: call another modules @wp.func functions
- Add support for overloading
@wp.func
functions based on argument type - Add support for calling built-in functions directly from Python interpreter outside kernels (experimental)
- Add support for auto-complete and docstring lookup for builtins in IDEs like VSCode, PyCharm, etc
- Add support for doing partial array copys, see
wp.copy()
for details - Add support for accessing mesh data directly in kernels, see
wp.mesh_get_point()
,wp.mesh_get_index()
,wp.mesh_eval_face_normal()
- Change to only compile for targets where kernel is launched (e.g.: will not compile CPU unless explicitly requested)
Breaking Changes
- Builtin methods such as
wp.quat_identity()
now call the Warp native implementation directly and will return awp.quat
object instead of NumPy array - NumPy implementations of many builtin methods have been moved to
warp.utils
and will be deprecated - Local
@wp.func
functions should not be namespaced when called, e.g.: previouslywp.myfunc()
would work even ifmyfunc()
was not a builtin - Removed
wp.rpy2quat()
, please usewp.quat_rpy()
instead
[0.2.1] - 2022-05-11
- Fix for unit tests in Kit
v0.2.0
Warp Core
- Fix for unrolling loops with negative bounds
- Fix for unresolved symbol
hash_grid_build_device()
not found when lib is compiled without CUDA support - Fix for failure to load nvrtc-builtins64_113.dll when user has a newer CUDA toolkit installed on their machine
- Fix for conversion of Torch tensors to wp.arrays() with a vector dtype (incorrect row count)
- Fix for
warp.dll
not found on some Windows installations - Fix for macOS builds on Clang 13.x
- Fix for step-through debugging of kernels on Linux
- Add argument type checking for user defined
@wp.func
functions - Add support for custom iterable types, supports ranges, hash grid, and mesh query objects
- Add support for multi-dimensional arrays, for example use
x = array[i,j,k]
syntax to address a 3-dimensional array - Add support for multi-dimensional kernel launches, use
launch(kernel, dim=(i,j,k), ...
andi,j,k = wp.tid()
to obtain thread indices - Add support for bounds-checking array memory accesses in debug mode, use
wp.config.mode = "debug"
to enable - Add support for differentiating through dynamic and nested for-loops
- Add support for evaluating MLP neural network layers inside kernels with custom activation functions, see
wp.mlp()
- Add additional NVDB sampling methods and adjoints, see
wp.volume_sample_i()
,wp.volume_sample_f()
, andwp.volume_sample_vec()
- Add support for loading zlib compressed NVDB volumes, see
wp.Volume.load_from_nvdb()
- Add support for triangle intersection testing, see
wp.intersect_tri_tri()
- Add support for NVTX profile zones in
wp.ScopedTimer()
- Add support for additional transform and quaternion math operations, see
wp.inverse()
,wp.quat_to_matrix()
,wp.quat_from_matrix()
- Add fast math (
--fast-math
) to kernel compilation by default - Add
warp.torch
import by default (if PyTorch is installed)
Warp Kit
- Add Kit menu for browsing Warp documentation and example scenes under 'Window->Warp'
- Fix for OgnParticleSolver.py example when collider is coming from Read Prim into Bundle node
Warp Sim
- Fix for joint attachment forces
- Fix for URDF importer and floating base support
- Add examples showing how to use differentiable forward kinematics to solve inverse kinematics
- Add examples for URDF cartpole and quadruped simulation
Breaking Changes
wp.volume_sample_world()
is now replaced bywp.volume_sample_f/i/vec()
which operate in index (local) space. Users should usewp.volume_world_to_index()
to transform points from world space to index space before sampling.wp.mlp()
expects multi-dimensional arrays instead of one-dimensional arrays for inference, all other semantics remain the same as earlier versions of this API.wp.array.length
member has been removed, please usewp.array.shape
to access array dimensions, or usewp.array.size
to get total element count- Marking
dense_gemm()
,dense_chol()
, etc methods as experimental until we revisit them