adopt mlx 0.29.1 and related mlx-c #273

davidkoski · 2025-09-24T20:48:28Z

ml-explore/mlx@v0.25.1...v0.29.1

NOTE

This change contains some breaking API changes in the area of quantization. Specifically:

the quantized / dequantized methods now take a mode parameter (not breaking)
the biases result from quantized is now optional, e.g. (wq: MLXArray, scales: MLXArray, biases: MLXArray?)

We are keeping the same semver here to match with python mlx. Although the change is breaking, it will likely be limited to implementations of quantized layers, e.g. QuantizedLinear, or other code that uses quantization directly. mlx-swift-examples will have a synchronized release to reflect this change.

If you need to make a similar change, consider the changes from QuantizedLinear:

The properties changed from this:

    public let scales: MLXArray
    public let biases: MLXArray

to:

    public let mode: QuantizationMode
    public let scales: MLXArray
    public let biases: MLXArray?

A mode with parameter with a default value was added where needed: mode: QuantizationMode = .affine and the mode parameter was used in calls to the quantization APIs:

        var x = quantizedMatmul(
            x,
            weight,
            scales: scales,
            biases: biases,
            transpose: true,
            groupSize: groupSize,
            bits: bits,
            mode: mode
        )

and the Quantizable protocol was updated to have a mode parameter (protocol methods can't have default values):

    /// Return the module as a quantized representation
    func toQuantized(groupSize: Int, bits: Int, mode: QuantizationMode) -> Module

davidkoski · 2025-09-24T20:49:23Z

Package.swift

+                "mlx/mlx/backend/metal/no_metal.cpp",
+
+                // special handling for cuda -- we need to keep one file:
+                // mlx/mlx/backend/cuda/no_cuda.cpp


This is a little more complicated than I wish, but we can't exclude the directory + include one file, so I need to just list them.

davidkoski · 2025-09-24T20:51:19Z

Source/MLX/MLXArray+Bytes.swift

    /// - ``asArray(_:)``
    /// - ``asData(access:)``
    public func asMTLBuffer(device: any MTLDevice, noCopy: Bool = false) -> (any MTLBuffer)? {
-        let data = asData(access: noCopy ? .noCopyIfContiguous : .copy)


From #259 -- this line is unused.

davidkoski · 2025-09-24T20:51:37Z

Source/MLX/MLXArray+Indexing.swift


    // If it's just a simple slice, just do a slice update and return
-    if operations.count == 1, case let .slice(slice) = operations[0] {
+    if operations.count == 1, case .slice(let slice) = operations[0] {


Just the new swift-format.

davidkoski · 2025-09-24T20:51:50Z

Source/MLX/MLXFast.swift

    ///   - values: values with shape `[B, N_kv, T_kv, D]`
    ///   - scale: scale for queries, typically `1 / sqrt(q.dim(-1))`
    ///   - mask: mask array
+    ///   - sinks: optional array of attention sinks


New optional argument

davidkoski · 2025-09-24T20:53:09Z

Tests/MLXTests/ExportTests.swift

        }

-        let x = MLXArray(1)
+        let x = MLXArray([1])


This was incorrect before -- a dimensionless parameter is not the same as a shaped array. Now it throws as the back end rejects it.

- ml-explore/mlx@v0.25.1...v0.29.1

Source/MLX/Ops.swift

awni · 2025-10-08T13:14:40Z

Source/MLX/Ops.swift

+    /// MX (Microscaling) FP4 quantization format.
+    ///
+    /// MXFP4 is a specialized 4-bit floating-point format designed for neural network inference.
+    /// It uses a shared exponent across a block of values with individual 3-bit mantissas plus sign bits.


The individual elements are e2m1 (so 1 sign bit, 2 exponent, 1 mantissa)

awni · 2025-10-08T13:16:19Z

Source/MLX/Ops.swift

+    ///
+    /// MXFP4 is a specialized 4-bit floating-point format designed for neural network inference.
+    /// It uses a shared exponent across a block of values with individual 3-bit mantissas plus sign bits.
+    /// This format can provide better accuracy than standard 4-bit integer quantization for certain


I would just remove that.. as it's not usually right (MLX Q4 is probably more accurate for most cases). We support this mostly because of GPT OSS (and probably future models) which were trained in mxfp4 (since the hardware has native support for it).

awni · 2025-10-08T13:16:36Z

Source/MLX/Ops.swift

+    ///
+    /// The format consists of:
+    /// - Shared 8-bit exponent per block
+    /// - Individual 3-bit mantissas + 1 sign bit per element


Update as comment above.

awni · 2025-10-08T13:17:50Z

Source/MLX/Ops.swift

+/// - Parameters:
+///   - w: The quantized weight matrix to dequantize
+///   - scales: Scaling factors used during quantization. Should have shape compatible with the quantized groups
+///   - biases: Bias values used during quantization. Should have shape compatible with the quantized groups


Worth commenting that it is optional for some modes?

The type is already marked as optional so we are covered there

awni · 2025-10-08T13:19:04Z

Source/MLX/Ops.swift

+///   - bits: The number of bits occupied by each element of `w` in the returned quantized matrix. Default is `4`
+///   - mode: The quantization mode. Default is `.affine`
+///   - stream: Stream or device to evaluate on
+/// - Returns: A tuple containing the quantized weights (`wq`), scaling factors (`scales`), and bias values (`biases`)


How does it work if the mode is mxfp4? Is the bias null?

That is a good question -- as written the values are not optional. Let me write a test and see what shows up.

Very crashy in that case. Hrm, this is going to change the signature of the method slightly

awni

Very nice, thanks for the update! Left a few comments / questions on the new quantization stuff.

Co-authored-by: Awni Hannun <[email protected]>

davidkoski requested a review from awni September 24, 2025 20:48

davidkoski commented Sep 24, 2025

View reviewed changes

davidkoski added 10 commits October 7, 2025 10:12

adopt mlx 0.29.1 and related mlx-c

75bec8b

- ml-explore/mlx@v0.25.1...v0.29.1

swift-format

dd7cc4a

remove unused code

8947ccf

fix test -- dimensionless array is not compatible with shaped array

041d072

biases are now optional

7ebf020

make format failures give more info

7096796

second try on improved swift-format diagnostics

b5641d0

third try on pre-commit

66d7ecd

turn off pager in dif

bfe1d70

docs, codable & sendable

d7283d7

davidkoski force-pushed the mlx-0291 branch from d1f3f05 to d7283d7 Compare October 7, 2025 17:13

davidkoski added 2 commits October 7, 2025 10:14

pick up tags for mlx-c

c4d32d7

rebuild mlx-c headers per tag

9aff267

awni reviewed Oct 8, 2025

View reviewed changes

Source/MLX/Ops.swift Outdated Show resolved Hide resolved

awni reviewed Oct 8, 2025

View reviewed changes

awni approved these changes Oct 8, 2025

View reviewed changes

davidkoski and others added 7 commits October 10, 2025 10:19

Update Source/MLX/Ops.swift

8f39ce0

Co-authored-by: Awni Hannun <[email protected]>

update docs, make biases optional, tests per PR feedback

eec7f37

more quantization support for visitor functions + mode

e88a515

pass mode in qmm

0fbab14

disfavor old quantize methods

3520fc4

fix docs warnings especially around new mode: parameter

ef395f0

fix docs warnings especially around new mode: parameter

4422e40

davidkoski merged commit 072b684 into main Oct 16, 2025
1 check passed

davidkoski deleted the mlx-0291 branch October 16, 2025 17:34

adopt mlx 0.29.1 and related mlx-c #273

adopt mlx 0.29.1 and related mlx-c #273

Uh oh!

Conversation

davidkoski commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidkoski commented Sep 24, 2025 •

edited

Loading