Releases · NVIDIA/warp · GitHub

30 Aug 15:32

shi-eric

v1.3.2

[1.3.2] - 2024-08-30

Bug fixes
- Fix accuracy of 3x3 SVD wp.svd3 with fp64 numbers (GH-281).
- Fix module hashing when a kernel argument contained a struct array (GH-287).
- Fix a bug in wp.bvh_query_ray() where the direction instead of the reciprocal direction was used
  (GH-288).
- Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture
  will no longer be unloaded before the graph is released.
- Fix a bug in wp.sim.collide.triangle_closest_point_barycentric() where the returned barycentric coordinates may be
  incorrect when the closest point lies on an edge.
- Fix 32-bit overflow when array shape is specified using np.int32.
- Fix handling of integer indices in the input_output_mask argument to autograd.jacobian and
  autograd.jacobian_fd (GH-289).
- Fix ModelBuilder.collapse_fixed_joints() to correctly update the body centers of mass and the
  ModelBuilder.articulation_start array.
- Fix precedence of closure constants over global constants.
- Fix quadrature point indexing in wp.fem.ExplicitQuadrature (regression from 1.3.0).
Documentation improvements
- Add missing return types for built-in functions.
- Clarify that atomic operations also return the previous value.
- Clarify that wp.bvh_query_aabb() returns parts that overlap the bounding volume.

[1.3.1] - 2024-07-27

Remove wp.synchronize() from PyTorch autograd function example
Tape.check_kernel_array_access() and Tape.reset_array_read_flags() are now private methods.
Fix reporting unmatched argument types

[1.3.0] - 2024-07-25

Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
  compiled (compiled), loaded from the cache (cached), or was unable to be
  loaded (error).
- wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
- Add wp.clear_kernel_cache() to the public API. This is equivalent to wp.build.clear_kernel_cache().
- Add code-completion support for wp.config variables.
- Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
- Add support for PEP 563's from __future__ import annotations (GH-256).
- Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
- Add faster Torch interop path using return_ctype argument to wp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add wp.abs() and wp.sign() for vector types
- Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
- Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
- Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
- Implement the assignment operator for wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
- Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)

Assets 12

28 Jul 05:18

c0d1f1ed

v1.3.1

[1.3.1] - 2024-07-27

Remove wp.synchronize() from PyTorch autograd function example
Tape.check_kernel_array_access() and Tape.reset_array_read_flags() are now private methods.
Fix reporting unmatched argument types

[1.3.0] - 2024-07-25

Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
  compiled (compiled), loaded from the cache (cached), or was unable to be
  loaded (error).
- wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
- Add wp.clear_kernel_cache() to the public API. This is equivalent to wp.build.clear_kernel_cache().
- Add code-completion support for wp.config variables.
- Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
- Add support for PEP 563's from __future__ import annotations (GH-256).
- Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
- Add faster Torch interop path using return_ctype argument to wp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add wp.abs() and wp.sign() for vector types
- Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
- Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
- Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
- Implement the assignment operator for wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
- Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)
warp.autograd improvements:
- New warp.autograd module with utility functions gradcheck(), jacobian(), and jacobian_fd() for debugging kernel Jacobians (docs)
- Add array overwrite detection, if wp.config.verify_autograd_array_access is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs)
- Fix bug where modification of @wp.func_replay functions and native snippets would not trigger module recompilation
- Add documentation for dynamic loop autograd limitations
warp.sim improvements:
- Improve memory usage and performance for rigid body contact handling when self.rigid_mesh_contact_max is zero (default behavior).
- The mask argument to wp.sim.eval_fk() now accepts both integer and boolean arrays to mask articulations.
- Fix handling of ModelBuilder.joint_act in ModelBuilder.collapse_fixed_joints() (affected floating-base systems)
- Fix and improve implementation of ModelBuilder.plot_articulation() to visualize the articulation tree of a rigid-body mechanism
- Fix ShapeInstancer __new__() method (missing instance return and *args parameter)
- Fix handling of upaxis variable in ModelBuilder and the rendering thereof in OpenGLRenderer
warp.sparse improvements:
- Sparse matrix allocations (from bsr_from_triplets(), bsr_axpy(), etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously.
- bsr_assign() now supports changing block shape (including CSR/BSR conversions)
- Add Python operator overloads for common sparse matrix operations, e.g A += 0.5 * B, y = x @ C
warp.fem new features and fixes:
- Support for variable number of nodes per element
- Global wp.fem.lookup() operator now supports wp.fem.Tetmesh and wp.fem.Trimesh2D geometries
- Simplified defining custom subdomains (wp.fem.Subdomain), free-slip boundary conditions
- New field types: wp.fem.UniformField, wp.fem.ImplicitField and wp.fem.NonconformingField
- New streamlines, magnetostatics and nonconforming_contact examples, updated mixed_elasticity to use a nonlinear model
- Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of wp.fem.PicQuadrature w.r.t. positions and measures

Assets 9

26 Jul 04:43

c0d1f1ed

v1.3.0

[1.3.0] - 2024-07-25

Warp Core improvements
- Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
- Add information to the module load print outs to indicate whether a module was
  compiled (compiled), loaded from the cache (cached), or was unable to be
  loaded (error).
- wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
- Add wp.clear_kernel_cache() to the public API. This is equivalent to wp.build.clear_kernel_cache().
- Add code-completion support for wp.config variables.
- Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
- Improve error messages for unsupported Python operations such as sequence construction in kernels
- Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
- Add support for PEP 563's from __future__ import annotations (GH-256).
- Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
- Add faster Torch interop path using return_ctype argument to wp.from_torch()
- Handle incompatible CUDA driver versions gracefully
- Add wp.abs() and wp.sign() for vector types
- Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
- Add support for creating volumes with anisotropic transforms
- Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
- Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
- Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
- Implement the assignment operator for wp.quat
- Make the geometry-related built-ins available only from within kernels
- Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
- Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)
warp.autograd improvements:
- New warp.autograd module with utility functions gradcheck(), jacobian(), and jacobian_fd() for debugging kernel Jacobians (docs)
- Add array overwrite detection, if wp.config.verify_autograd_array_access is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs)
- Fix bug where modification of @wp.func_replay functions and native snippets would not trigger module recompilation
- Add documentation for dynamic loop autograd limitations
warp.sim improvements:
- Improve memory usage and performance for rigid body contact handling when self.rigid_mesh_contact_max is zero (default behavior).
- The mask argument to wp.sim.eval_fk() now accepts both integer and boolean arrays to mask articulations.
- Fix handling of ModelBuilder.joint_act in ModelBuilder.collapse_fixed_joints() (affected floating-base systems)
- Fix and improve implementation of ModelBuilder.plot_articulation() to visualize the articulation tree of a rigid-body mechanism
- Fix ShapeInstancer __new__() method (missing instance return and *args parameter)
- Fix handling of upaxis variable in ModelBuilder and the rendering thereof in OpenGLRenderer
warp.sparse improvements:
- Sparse matrix allocations (from bsr_from_triplets(), bsr_axpy(), etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously.
- bsr_assign() now supports changing block shape (including CSR/BSR conversions)
- Add Python operator overloads for common sparse matrix operations, e.g A += 0.5 * B, y = x @ C
warp.fem new features and fixes:
- Support for variable number of nodes per element
- Global wp.fem.lookup() operator now supports wp.fem.Tetmesh and wp.fem.Trimesh2D geometries
- Simplified defining custom subdomains (wp.fem.Subdomain), free-slip boundary conditions
- New field types: wp.fem.UniformField, wp.fem.ImplicitField and wp.fem.NonconformingField
- New streamlines, magnetostatics and nonconforming_contact examples, updated mixed_elasticity to use a nonlinear model
- Function spaces can now export VTK-compatible cells for visualization
- Fixed edge cases with NanoVDB function spaces
- Fixed differentiability of wp.fem.PicQuadrature w.r.t. positions and measures

Assets 9

04 Jul 19:07

c0d1f1ed

v1.2.2

[1.2.2] - 2024-07-04

Support for NumPy >= 2.0

[1.2.1] - 2024-06-14

Fix generic function caching
Fix Warp not being initialized when constructing arrays with wp.array()
Fix wp.is_mempool_access_supported() not resolving the provided device arguments to wp.context.Device

[1.2.0] - 2024-06-06

Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by all wp.constant() variables declared in a Warp program.
Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory.
Add runtime checks for wp.MarchingCubes on field dimensions and size
Fix memory leak in wp.Mesh BVH (GH-225)
Use C++17 when building the Warp library and user kernels
Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
Extended NanoVDB support (see warp.Volume):
- Add support for data-agnostic index grids, allocation at voxel granularity
- New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
- Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
- warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
- Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
- Prevent wp.volume_store() from overwriting grid background values
Improve validation of user-provided fields and values in warp.fem
Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
Clarify terms for external contributions, please see CONTRIBUTING.md for details
Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
Fix for handling of --msvc_path in build scripts
Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
Fix for wp.randn() to ensure return values are finite
Fix for slicing of arrays with gradients in kernels
Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
Fix for handling of bool types in generic kernels
Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details

Assets 9

14 Jun 21:16

c0d1f1ed

v1.2.1

[1.2.1] - 2024-06-14

Fix generic function caching
Fix Warp not being initialized when constructing arrays with wp.array()
Fix wp.is_mempool_access_supported() not resolving the provided device arguments to wp.context.Device

[1.2.0] - 2024-06-06

Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by all wp.constant() variables declared in a Warp program.
Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory.
Add runtime checks for wp.MarchingCubes on field dimensions and size
Fix memory leak in wp.Mesh BVH (GH-225)
Use C++17 when building the Warp library and user kernels
Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
Extended NanoVDB support (see warp.Volume):
- Add support for data-agnostic index grids, allocation at voxel granularity
- New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
- Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
- warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
- Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
- Prevent wp.volume_store() from overwriting grid background values
Improve validation of user-provided fields and values in warp.fem
Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
Clarify terms for external contributions, please see CONTRIBUTING.md for details
Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
Fix for handling of --msvc_path in build scripts
Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
Fix for wp.randn() to ensure return values are finite
Fix for slicing of arrays with gradients in kernels
Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
Fix for handling of bool types in generic kernels
Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details

Assets 9

07 Jun 03:53

c0d1f1ed

v1.2.0

[1.2.0] - 2024-06-06

Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by all wp.constant() variables declared in a Warp program.
Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory.
Add runtime checks for wp.MarchingCubes on field dimensions and size
Fix memory leak in wp.Mesh BVH (GH-225)
Use C++17 when building the Warp library and user kernels
Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
Extended NanoVDB support (see warp.Volume):
- Add support for data-agnostic index grids, allocation at voxel granularity
- New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
- Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
- warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
- Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
- Prevent wp.volume_store() from overwriting grid background values
Improve validation of user-provided fields and values in warp.fem
Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
Clarify terms for external contributions, please see CONTRIBUTING.md for details
Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
Fix for handling of --msvc_path in build scripts
Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
Fix for wp.randn() to ensure return values are finite
Fix for slicing of arrays with gradients in kernels
Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
Fix for handling of bool types in generic kernels
Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details

[1.1.1] - 2024-05-24

wp.init() is no longer required to be called explicitly and will be performed on first call to the API
Speed up omni.warp.core's startup time

Assets 9

08 May 15:54

c0d1f1ed

v1.1.0

[1.1.0] - 2024-05-09

Support returning a value from @wp.func_native CUDA functions using type hints
Improved differentiability of the wp.sim.FeatherstoneIntegrator
Fix gradient propagation for rigid body contacts in wp.sim.collide()
Added support for event-based timing, see wp.ScopedTimer()
Added Tape visualization and debugging functions, see wp.Tape.visualize()
Support constructing Warp arrays from objects that define the __cuda_array_interface__ attribute
Support copying a struct to another device, use struct.to(device) to migrate struct arrays
Allow rigid shapes to not have any collisions with other shapes in wp.sim.Model
Change default test behavior to test redundant GPUs (up to 2x)
Test each example in an individual subprocess
Polish and optimize various examples and tests
Allow non-contiguous point arrays to be passed to wp.HashGrid.build()
Upgrade LLVM to 18.1.3 for from-source builds and Linux x86-64 builds
Build DLL source code as C++17 and require GCC 9.4 as a minimum
Array clone, assign, and copy are now differentiable
Use Ruff for formatting and linting
Various documentation improvements (infinity, math constants, etc.)
Improve URDF importer, handle joint armature
Allow builtins.bool to be used in Warp data structures
Use external gradient arrays in backward passes when passed to wp.launch()
Add Conjugate Residual linear solver, see wp.optim.linear.cr()
Fix propagation of gradients on aliased copy of variables in kernels
Facilitate debugging and speed up import warp by eliminating raising any exceptions
Improve support for nested vec/mat assignments in structs
Recommend Python 3.9 or higher, which is required for JAX and soon PyTorch.
Support gradient propagation for indexing sliced multi-dimensional arrays, i.e. a[i][j] vs. a[i, j]
Provide an informative message if setting DLL C-types failed, instructing to try rebuilding the library

[1.0.3] - 2024-04-17

Add a support_level entry to the configuration file of the extensions

Assets 6

22 Mar 20:42

c0d1f1ed

v1.0.2

[1.0.2] - 2024-03-22

Make examples runnable from any location
Fix the examples not running directly from their Python file
Add the example gallery to the documentation
Update README.md examples USD location
Update example_graph_capture.py description

Assets 6

15 Mar 17:32

c0d1f1ed

v1.0.1

[1.0.1] - 2024-03-15

Document Device total_memory and free_memory
Documentation for allocators, streams, peer access, and generics
Changed example output directory to current working directory
Added python -m warp.examples.browse for browsing the examples folder
Print where the USD stage file is being saved
Added examples/optim/example_walker.py sample
Make the drone example not specific to USD
Reduce the time taken to run some examples
Optimise rendering points with a single colour
Clarify an error message around needing USD
Raise exception when module is unloaded during graph capture
Added wp.synchronize_event() for blocking the host thread until a recorded event completes
Flush C print buffers when ending stdout capture
Remove more unneeded CUTLASS files
Allow setting mempool release threshold as a fractional value

Assets 6

08 Mar 01:58

c0d1f1ed

v1.0.0

[1.0.0] - 2024-03-07

Add FeatherstoneIntegrator which provides more stable simulation of articulated rigid body dynamics in generalized coordinates (State.joint_q and State.joint_qd)
Introduce warp.sim.Control struct to store control inputs for simulations (optional, by default the Model control inputs are used as before); integrators now have a different simulation signature: integrator.simulate(model: Model, state_in: State, state_out: State, dt: float, control: Control)
joint_act can now behave in 3 modes: with joint_axis_mode set to JOINT_MODE_FORCE it behaves as a force/torque, with JOINT_MODE_VELOCITY it behaves as a velocity target, and with JOINT_MODE_POSITION it behaves as a position target; joint_target has been removed
Add adhesive contact to Euler integrators via Model.shape_materials.ka which controls the contact distance at which the adhesive force is applied
Improve handling of visual/collision shapes in URDF importer so visual shapes are not involved in contact dynamics
Experimental JAX kernel callback support
Improve module load exception message
Add wp.ScopedCapture
Removing enable_backward warning for callables
Copy docstrings and annotations from wrapped kernels, functions, structs

Assets 6