-
Notifications
You must be signed in to change notification settings - Fork 118
adopt mlx 0.29.1 and related mlx-c #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"mlx/mlx/backend/metal/no_metal.cpp", | ||
|
||
// special handling for cuda -- we need to keep one file: | ||
// mlx/mlx/backend/cuda/no_cuda.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little more complicated than I wish, but we can't exclude the directory + include one file, so I need to just list them.
/// - ``asArray(_:)`` | ||
/// - ``asData(access:)`` | ||
public func asMTLBuffer(device: any MTLDevice, noCopy: Bool = false) -> (any MTLBuffer)? { | ||
let data = asData(access: noCopy ? .noCopyIfContiguous : .copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #259 -- this line is unused.
|
||
// If it's just a simple slice, just do a slice update and return | ||
if operations.count == 1, case let .slice(slice) = operations[0] { | ||
if operations.count == 1, case .slice(let slice) = operations[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the new swift-format.
/// - values: values with shape `[B, N_kv, T_kv, D]` | ||
/// - scale: scale for queries, typically `1 / sqrt(q.dim(-1))` | ||
/// - mask: mask array | ||
/// - sinks: optional array of attention sinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New optional argument
} | ||
|
||
let x = MLXArray(1) | ||
let x = MLXArray([1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was incorrect before -- a dimensionless parameter is not the same as a shaped array. Now it throws as the back end rejects it.
Source/MLX/Ops.swift
Outdated
/// MX (Microscaling) FP4 quantization format. | ||
/// | ||
/// MXFP4 is a specialized 4-bit floating-point format designed for neural network inference. | ||
/// It uses a shared exponent across a block of values with individual 3-bit mantissas plus sign bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The individual elements are e2m1 (so 1 sign bit, 2 exponent, 1 mantissa)
Source/MLX/Ops.swift
Outdated
/// | ||
/// MXFP4 is a specialized 4-bit floating-point format designed for neural network inference. | ||
/// It uses a shared exponent across a block of values with individual 3-bit mantissas plus sign bits. | ||
/// This format can provide better accuracy than standard 4-bit integer quantization for certain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just remove that.. as it's not usually right (MLX Q4 is probably more accurate for most cases). We support this mostly because of GPT OSS (and probably future models) which were trained in mxfp4 (since the hardware has native support for it).
Source/MLX/Ops.swift
Outdated
/// | ||
/// The format consists of: | ||
/// - Shared 8-bit exponent per block | ||
/// - Individual 3-bit mantissas + 1 sign bit per element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update as comment above.
/// - Parameters: | ||
/// - w: The quantized weight matrix to dequantize | ||
/// - scales: Scaling factors used during quantization. Should have shape compatible with the quantized groups | ||
/// - biases: Bias values used during quantization. Should have shape compatible with the quantized groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth commenting that it is optional for some modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is already marked as optional so we are covered there
Source/MLX/Ops.swift
Outdated
/// - bits: The number of bits occupied by each element of `w` in the returned quantized matrix. Default is `4` | ||
/// - mode: The quantization mode. Default is `.affine` | ||
/// - stream: Stream or device to evaluate on | ||
/// - Returns: A tuple containing the quantized weights (`wq`), scaling factors (`scales`), and bias values (`biases`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it work if the mode is mxfp4
? Is the bias null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question -- as written the values are not optional. Let me write a test and see what shows up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very crashy in that case. Hrm, this is going to change the signature of the method slightly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks for the update! Left a few comments / questions on the new quantization stuff.
Co-authored-by: Awni Hannun <[email protected]>
NOTE
This change contains some breaking API changes in the area of quantization. Specifically:
quantized
/dequantized
methods now take amode
parameter (not breaking)biases
result fromquantized
is now optional, e.g.(wq: MLXArray, scales: MLXArray, biases: MLXArray?)
We are keeping the same semver here to match with python mlx. Although the change is breaking, it will likely be limited to implementations of quantized layers, e.g.
QuantizedLinear
, or other code that uses quantization directly.mlx-swift-examples
will have a synchronized release to reflect this change.If you need to make a similar change, consider the changes from
QuantizedLinear
:The properties changed from this:
to:
A
mode
with parameter with a default value was added where needed:mode: QuantizationMode = .affine
and the mode parameter was used in calls to the quantization APIs:and the
Quantizable
protocol was updated to have amode
parameter (protocol methods can't have default values):