You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
let text ="A group of friends enjoy a barbecue on a sandy beach, with one person grilling over a large black grill, while the other sits nearby, laughing and enjoying the camaraderie."
24
27
let textEmbedding: Embedding =try textModel.encode(text)
25
28
let textVector: [Float32] = textEmbedding.asFloats()
@@ -28,7 +31,10 @@ let textVector: [Float32] = textEmbedding.asFloats()
28
31
### Image Embeddings
29
32
30
33
```swift
31
-
let imageModel =tryawaitImageEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let imageURL ="https://github.com/ashvardanian/ashvardanian/blob/master/demos/bbq-on-beach.jpg?raw=true"
33
39
guardlet url =URL(string: imageURL),
34
40
let imageSource =CGImageSourceCreateWithURL(url as CFURL, nil),
@@ -40,6 +46,26 @@ var imageEmbedding: Embedding = try imageModel.encode(cgImage)
40
46
var imageVector: [Float32] = embedding.asFloats()
41
47
```
42
48
49
+
### Choosing Target Device
50
+
51
+
Apple chips provide several functional units capable of high-throughput matrix multiplication and AI inference.
52
+
Those `computeUnits` include the CPU, GPU, and Neural Engine.
53
+
For maximum compatibility, the `.all` option is used by default.
54
+
Sadly, Apple's scheduler is not always optimal, and it might be beneficial to specify the target device explicitly, especially if the models are pre-compiled for the Apple Neural Engine, as it may yield significant performance gains.
55
+
56
+
| Model | GPU Text E. | ANE Text E. | GPU Image E. | ANE Image E. |
|`english-small`| 2.53 ms | 0.53 ms | 6.57 ms | 1.23 ms |
59
+
|`english-base`| 2.54 ms | 0.61 ms | 18.90 ms | 3.79 ms |
60
+
|`english-large`| 2.30 ms | 0.61 ms | 79.68 ms | 20.94 ms |
61
+
|`multilingual-base`| 2.34 ms | 0.50 ms | 18.98 ms | 3.77 ms |
62
+
63
+
> On Apple M4 iPad, running iOS 18.2.
64
+
> Batch size is 1, and the model is pre-loaded into memory.
65
+
> The original encoders use `f32` single-precision numbers for maximum compatibility, and mostly rely on __GPU__ for computation.
66
+
> The quantized encoders use a mixture of `i8`, `f16`, and `f32` numbers for maximum performance, and mostly rely on the Apple Neural Engine (__ANE__) for computation.
67
+
> The median latency is reported.
68
+
43
69
### Computing Distances
44
70
45
71
There are several ways to compute distances between embeddings, once you have them.
0 commit comments