+
+
+
## Supported variants
Scribe v1 is available in two variants:
diff --git a/fern/docs/pages/capabilities/text-to-speech.mdx b/fern/docs/pages/capabilities/text-to-speech.mdx
index bac9e6a4e..0ebdd7b2a 100644
--- a/fern/docs/pages/capabilities/text-to-speech.mdx
+++ b/fern/docs/pages/capabilities/text-to-speech.mdx
@@ -37,7 +37,7 @@ Explore our [voice library](https://elevenlabs.io/community) to find the perfect
For real-time applications, Flash v2.5 provides ultra-low 75ms latency, while Multilingual v2 delivers the highest quality audio with more nuanced expression.
-
+
[Explore all](/docs/models)
diff --git a/fern/docs/pages/models.mdx b/fern/docs/pages/models.mdx
index d12952541..0f63752d8 100644
--- a/fern/docs/pages/models.mdx
+++ b/fern/docs/pages/models.mdx
@@ -120,6 +120,21 @@ With its lower price point and 75ms latency, Flash v2.5 is the cost-effective op
+## Character limits
+
+The maximum number of characters supported in a single text-to-speech request varies by model.
+
+| Model ID | Character limit | Approximate audio duration |
+| ------------------------ | --------------- | -------------------------- |
+| `eleven_flash_v2_5` | 40,000 | ~40 minutes |
+| `eleven_flash_v2` | 30,000 | ~30 minutes |
+| `eleven_multilingual_v2` | 10,000 | ~10 minutes |
+| `eleven_multilingual_v1` | 10,000 | ~10 minutes |
+| `eleven_english_sts_v2` | 10,000 | ~10 minutes |
+| `eleven_english_sts_v1` | 10,000 | ~10 minutes |
+
+For longer content, consider splitting the input into multiple requests.
+
## Scribe v1
Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging.
@@ -140,21 +155,6 @@ Key features:
Read more about Scribe v1 [here](/docs/capabilities/speech-to-text).
-## Character limits
-
-The maximum number of characters supported in a single text-to-speech request varies by model.
-
-| Model ID | Character limit | Approximate audio duration |
-| ------------------------ | --------------- | -------------------------- |
-| `eleven_flash_v2_5` | 40,000 | ~40 minutes |
-| `eleven_flash_v2` | 30,000 | ~30 minutes |
-| `eleven_multilingual_v2` | 10,000 | ~10 minutes |
-| `eleven_multilingual_v1` | 10,000 | ~10 minutes |
-| `eleven_english_sts_v2` | 10,000 | ~10 minutes |
-| `eleven_english_sts_v1` | 10,000 | ~10 minutes |
-
-For longer content, consider splitting the input into multiple requests.
-
## Concurrency and priority
Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue.
diff --git a/fern/docs/pages/product-guides/playground/text-to-speech.mdx b/fern/docs/pages/product-guides/playground/text-to-speech.mdx
index a2f327712..ee3cc2b99 100644
--- a/fern/docs/pages/product-guides/playground/text-to-speech.mdx
+++ b/fern/docs/pages/product-guides/playground/text-to-speech.mdx
@@ -66,7 +66,7 @@ Not all voices are equal, and a lot depends on the source audio used to create t
ElevenLabs offers two families of models: standard (high-quality) models and Flash models, which are optimized for low latency. Each family includes both English-only and multilingual models, tailored for specific use cases with strengths in either speed, accuracy, or language diversity.
-
+
[Learn more about our models](/docs/models)
diff --git a/fern/snippets/stt-models.mdx b/fern/snippets/stt-models.mdx
new file mode 100644
index 000000000..5ff4a631c
--- /dev/null
+++ b/fern/snippets/stt-models.mdx
@@ -0,0 +1,11 @@
+
+
+ State-of-the-art speech recognition model
+
+
Accurate transcription in 99 languages
+
Precise word-level timestamps
+
Speaker diarization
+
Dynamic audio tagging
+
+
+
diff --git a/fern/snippets/tts-models.mdx b/fern/snippets/tts-models.mdx
new file mode 100644
index 000000000..cdaffc21f
--- /dev/null
+++ b/fern/snippets/tts-models.mdx
@@ -0,0 +1,20 @@
+
+
+ Our most lifelike, emotionally rich speech synthesis model
+
+
Most natural-sounding output
+
29 languages supported
+
10,000 character limit
+
Rich emotional expression
+
+
+
+ Our fast, affordable speech synthesis model
+