🌴 Undo The Flattening (d4f82fc)
A previous ort
release 'flattened' all exports, such that everything was exported at the crate root - ort::{TensorElementType, Session, Value}
. This was done at a time when ort
didn't export much, but now it exports a lot, so this was leading to some big, ugly use
blocks.
rc.9
now has most exports behind their respective modules - Session
is now imported as ort::session::Session
, Tensor
as ort::value::Tensor
, etc. rust-analyzer
and some quick searches on docs.rs can help you find the right paths to import.
📦 Tensor extract
optimization (1dbad54)
Previously, calling any of the extract_tensor_*
methods would have to call back to ONNX Runtime to determine the value's ValueType
to ensure it was OK to extract. This involved a lot of FFI calls and a few allocations which could have a notable performance impact in hot loops.
Since a value's type never changes after it is created, the ValueType
is now created when the Value
is constructed (i.e. via Tensor::from_array
or returned from a session). This makes extract_tensor_*
a lot cheaper!
Note that this does come with some breaking changes:
- Raw tensor extract methods return
&[i64]
for their dimensions instead ofVec<i64>
. Value::dtype()
andTensor::memory_info()
now return&ValueType
and&MemoryInfo
respectively, instead of their non-borrowed counterparts.ValueType::Tensor
now has an extra field for symbolic dimensions,dimension_symbols
, so you might have to updatematch
es onValueType
.
🚥 Threading management (87577ef)
2.0.0-rc.9
introduces a new trait: ThreadManager
. This allows you to define custom thread create & join functions for session & environment thread pools! See the thread_manager.rs
test for an example of how to create your own ThreadManager
and apply it to a session, or an environment's GlobalThreadPoolOptions
(previously EnvironmentGlobalThreadPoolOptions
).
Additionally, sessions may now opt out of the environment's global thread pool if one is configured.
🧠 Shape inference for custom operators (87577ef)
ort
now provides ShapeInferenceContext
, an interface for custom operators to provide a hint to ONNX Runtime about the shape of the operator's output tensors based on its inputs, which may open the doors to memory optimizations.
See the updated custom_operators.rs
example to see how it works.
📃 Session output refactor (8a16adb)
SessionOutputs
has been slightly refactored to reduce memory usage and slightly increase performance. Most notably, it no longer derefs to a &BTreeMap
.
The new SessionOutputs
interface closely mirrors BTreeMap
's API, so most applications require no changes unless you were explicitly dereferencing to a &BTreeMap
.
🛠️ LoRA Adapters (d877fb3)
ONNX Runtime v1.20.0 introduces a new Adapter
format for supporting LoRA-like weight adapters, and now ort
has it too!
An Adapter
essentially functions as a map of tensors, loaded from disk or memory and copied to a device (typically whichever device the session resides on). When you add an Adapter
to RunOptions
, those tensors are automatically added as inputs (except faster, because they don't need to be copied anywhere!)
With some modification to your ONNX graph, you can add LoRA layers using optional inputs which Adapter
can then override. (Hopefully ONNX Runtime will provide some documentation on how this can be done soon, but until then, it's ready to use in ort
!)
let model = Session::builder()?.commit_from_file("tests/data/lora_model.onnx")?;
let lora = Adapter::from_file("tests/data/adapter.orl", None)?;
let mut run_options = RunOptions::new()?;
run_options.add_adapter(&lora)?;
let outputs = model.run_with_options(ort::inputs![Tensor::<f32>::from_array(([4, 4], vec![1.0; 16]))?]?, &run_options)?;
🗂️ Prepacked weights (87577ef)
PrepackedWeights
allows multiple sessions to share the same weights across multiple sessions. If you create multiple Session
s from one model file, they can all share the same memory!
Currently, ONNX Runtime only supports prepacked weights for the CPU execution provider.
‼️ Dynamic dimension overrides (87577ef)
You can now override dynamic dimensions in a graph using SessionBuilder::with_dimension_override
, allowing ONNX Runtime to do more optimizations.
🪶 Customizable workload type (87577ef)
Not all workloads need full performance all the time! If you're using ort
to perform background tasks, you can now set a session's workload type to prioritize either efficiency (by lowering scheduling priority or utilizing more efficient CPU cores on some architectures), or performance (the default).
let session = Session::builder()?.commit_from_file("tests/data/upsample.onnx")?;
session.set_workload_type(WorkloadType::Efficient)?;
Other features
- 28e00e3 Update to ONNX Runtime v1.20.0.
- 552727e Expose the
ortsys!
macro.- Note that this commit also made
ort::api()
return&ort_sys::OrtApi
instead ofNonNull<ort_sys::OrtApi>
.
- Note that this commit also made
- 82dcf84 Add
AsPointer
trait.- Structs that previously had a
ptr()
method now have anAsPointer
implementation instead.
- Structs that previously had a
- b51f60c Add config entries to
RunOptions
. - 67fe38c Introduce the
ORT_CXX_STDLIB
environment variable (mirroringCXXSTDLIB
) to allow changing the C++ standard library ort links to.
Fixes
- c1c736b Fix
ValueRef
&ValueRefMut
leaking value memory. - 2628378 Query
MemoryInfo
'sDeviceType
instead of its allocation device to determine whetherTensor
s can be extracted. - e220795 Allow
ORT_PREFER_DYNAMIC_LINK
to work even whencuda
ortensorrt
are enabled. - 1563c13 Add missing downcast implementations for
Sequence<T>
. - Returned Ferris to the docs.rs page 🦀
If you have any questions about this release, we're here to help:
Thank you to Thomas, Johannes Laier, Yunho Cho, Phu Tran, Bartek, Noah, Matouš Kučera, Kevin Lacker, and Okabintaro, whose support made this release possible. If you'd like to support ort
as well, consider contributing on Open Collective 💖
🩷💜🩷💜