Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/add-yolo-keybinding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@kilocode/cli": patch
---

feat(cli): add Ctrl+Y keybinding to toggle YOLO mode
5 changes: 0 additions & 5 deletions .changeset/humble-points-care.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/pink-gorillas-breathe.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/pretty-memes-lose.md

This file was deleted.

6 changes: 0 additions & 6 deletions .changeset/smart-otters-smell.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/tidy-agent-manager-errors.md

This file was deleted.

22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# kilo-code

## 4.137.0

### Minor Changes

- [#4394](https://github.com/Kilo-Org/kilocode/pull/4394) [`01b968b`](https://github.com/Kilo-Org/kilocode/commit/01b968ba4635a162c787169bffe1809fc1ab973a) Thanks [@hassoncs](https://github.com/hassoncs)! - Add Speech-To-Text experiment for the chat input powered by ffmpeg and the OpenAI Whisper API

- [#4388](https://github.com/Kilo-Org/kilocode/pull/4388) [`af93318`](https://github.com/Kilo-Org/kilocode/commit/af93318e3648c235721ba58fe9caab9429608241) Thanks [@iscekic](https://github.com/iscekic)! - send org id and last mode with session data

### Patch Changes

- [#4412](https://github.com/Kilo-Org/kilocode/pull/4412) [`d56879c`](https://github.com/Kilo-Org/kilocode/commit/d56879c58f65c8da1419c9840816720279bec4e6) Thanks [@quantizoor](https://github.com/quantizoor)! - Added support for xhigh reasoning effort

- [#4415](https://github.com/Kilo-Org/kilocode/pull/4415) [`5e670d1`](https://github.com/Kilo-Org/kilocode/commit/5e670d14047054a2f92a9057391286402076b5a5) Thanks [@kevinvandijk](https://github.com/kevinvandijk)! - Fix: bottom controls no longer overlap with create mode button

- [#4416](https://github.com/Kilo-Org/kilocode/pull/4416) [`026da65`](https://github.com/Kilo-Org/kilocode/commit/026da65fdb9f16d23216197412e06ca2ed208639) Thanks [@marius-kilocode](https://github.com/marius-kilocode)! - fix: resolve AbortSignal memory leak in CLI (MaxListenersExceededWarning)

- [#4392](https://github.com/Kilo-Org/kilocode/pull/4392) [`73681e9`](https://github.com/Kilo-Org/kilocode/commit/73681e9002af4c5aa3fec3bc2a86e8008dc926af) Thanks [@markijbema](https://github.com/markijbema)! - Split autocomplete suggestion in current line and next lines in most cases

- [#4426](https://github.com/Kilo-Org/kilocode/pull/4426) [`fdc0c0a`](https://github.com/Kilo-Org/kilocode/commit/fdc0c0a07d49c4726997121ad540d6c855965e7b) Thanks [@kevinvandijk](https://github.com/kevinvandijk)! - Fix API request errors with MCP functions incompatible with OpenAI strict mode

- [#4373](https://github.com/Kilo-Org/kilocode/pull/4373) [`a80ec02`](https://github.com/Kilo-Org/kilocode/commit/a80ec02db75c061163100ce91d099f4fd3846a99) Thanks [@marius-kilocode](https://github.com/marius-kilocode)! - Handle different cli authentication errors when using agent manager

## 4.136.0

### Minor Changes
Expand Down
84 changes: 21 additions & 63 deletions apps/kilocode-docs/docs/basic-usage/model-selection-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,83 +2,41 @@
sidebar_label: "Model Selection Guide"
---

# Kilo Code Model Selection Guide
# Model Selection Guide

Last updated: September 3, 2025.
Here's the honest truth about AI model recommendations: by the time I write them down, they're probably already outdated. New models drop every few weeks, existing ones get updated, prices shift, and yesterday's champion becomes today's budget option.

The AI model landscape evolves rapidly, so this guide focuses on what's delivering excellent results with Kilo Code right now. We update this regularly as new models emerge and performance shifts.
Instead of maintaining a static list that's perpetually behind, we built something better — a real-time leaderboard showing which models Kilo Code users are actually having success with right now.

## Kilo Code Top Performers
## Check the Live Models List

| Model | Context Window | SWE-Bench Verified | Human Eval | LiveCodeBench | Input Price\* | Output Price\* | Best For |
| -------------------- | -------------- | ------------------ | ---------- | ------------- | ------------- | -------------- | ------------------------------------------- |
| **GPT-5** | 400K tokens | 74.9% | 96.3% | 68.2% | $1.25 | $10 | Latest capabilities, multi-modal coding |
| **Claude Sonnet 4** | 1M tokens | 72.7% | 94.8% | 65.9% | $3-6 | $15-22.50 | Enterprise code generation, complex systems |
| **Grok Code Fast 1** | 256K tokens | 70.8% | 92.1% | 63.4% | $0.20 | $1.50 | Rapid development, cost-performance balance |
| **Qwen3 Coder** | 256K tokens | 68.4% | 91.7% | 61.8% | $0.20 | $0.80 | Pure coding tasks, rapid prototyping |
| **Gemini 2.5 Pro** | 1M+ tokens | 67.2% | 89.9% | 59.3% | TBD | TBD | Massive codebases, architectural planning |
**[👉 See what's working today at kilo.ai/models](https://kilo.ai/models)**

\*Per million tokens
This isn't benchmarks from some lab. It's real usage data from developers like you, updated continuously. You'll see which models people are choosing for different tasks, what's delivering results, and how the landscape is shifting in real-time.

## Budget-Conscious Options
## General Guidance

| Model | Context Window | SWE-Bench Verified | Human Eval | LiveCodeBench | Input Price\* | Output Price\* | Notes |
| ---------------- | -------------- | ------------------ | ---------- | ------------- | ------------- | -------------- | ------------------------------------ |
| **DeepSeek V3** | 128K tokens | 64.1% | 87.3% | 56.7% | $0.14 | $0.28 | Exceptional value for daily coding |
| **DeepSeek R1** | 128K tokens | 62.8% | 85.9% | 54.2% | $0.55 | $2.19 | Advanced reasoning at budget prices |
| **Qwen3 32B** | 128K tokens | 60.3% | 83.4% | 52.1% | Varies | Varies | Open source flexibility |
| **Z AI GLM 4.5** | 128K tokens | 58.7% | 81.2% | 49.8% | TBD | TBD | MIT license, hybrid reasoning system |
While the specifics change constantly, some principles stay consistent:

\*Per million tokens
**For complex coding tasks**: Premium models (Claude Sonnet/Opus, GPT-5 class, Gemini Pro) typically handle nuanced requirements, large refactors, and architectural decisions better.

## Comprehensive Evaluation Framework
**For everyday coding**: Mid-tier models often provide the best balance of speed, cost, and quality. They're fast enough to keep your flow state intact and capable enough for most tasks.

### Latency Performance
**For budget-conscious work**: Newer efficient models keep surprising us with price-to-performance ratios. DeepSeek, Qwen, and similar models can handle more than you'd expect.

Response times significantly impact development flow and productivity:
**For local/private work**: Ollama and LM Studio let you run models locally. The tradeoff is usually speed and capability for privacy and zero API costs.

- **Ultra-Fast (< 2s)**: Grok Code Fast 1, Qwen3 Coder
- **Fast (2-4s)**: DeepSeek V3, GPT-5
- **Moderate (4-8s)**: Claude Sonnet 4, DeepSeek R1
- **Slower (8-15s)**: Gemini 2.5 Pro, Z AI GLM 4.5
## Context Windows Matter

**Impact on Development**: Ultra-fast models enable real-time coding assistance and immediate feedback loops. Models with 8+ second latency can disrupt flow state but may be acceptable for complex architectural decisions.
One thing that doesn't change: context window size matters for your workflow.

### Throughput Analysis
- **Small projects** (scripts, components): 32-64K tokens works fine
- **Standard applications**: 128K tokens handles most multi-file context
- **Large codebases**: 256K+ tokens helps with cross-system understanding
- **Massive systems**: 1M+ token models exist but effectiveness degrades at the extremes

Token generation rates affect large codebase processing:
Check [our provider docs](/docs/providers/openrouter) for specific context limits on each model.

- **High Throughput (150+ tokens/s)**: GPT-5, Grok Code Fast 1
- **Medium Throughput (100-150 tokens/s)**: Claude Sonnet 4, Qwen3 Coder
- **Standard Throughput (50-100 tokens/s)**: DeepSeek models, Gemini 2.5 Pro
- **Variable Throughput**: Open source models depend on infrastructure
## Stay Current

**Scaling Factors**: High throughput models excel when generating extensive documentation, refactoring large files, or batch processing multiple components.

### Reliability & Availability

Enterprise considerations for production environments:

- **Enterprise Grade (99.9%+ uptime)**: Claude Sonnet 4, GPT-5, Gemini 2.5 Pro
- **Production Ready (99%+ uptime)**: Qwen3 Coder, Grok Code Fast 1
- **Developing Reliability**: DeepSeek models, Z AI GLM 4.5
- **Self-Hosted**: Qwen3 32B (reliability depends on your infrastructure)

**Success Rates**: Enterprise models maintain consistent output quality and handle edge cases more gracefully, while budget options may require additional validation steps.

### Context Window Strategy

Optimizing for different project scales:

| Size | Word Count | Typical Use Case | Recommended Models | Strategy |
| ---------------- | --------------- | ------------------------------------- | -------------------------------------- | ----------------------------------------------- |
| **32K tokens** | ~24,000 words | Individual components, scripts | DeepSeek V3, Qwen3 Coder | Focus on single-file optimization |
| **128K tokens** | ~96,000 words | Standard applications, most projects | All budget models, Grok Code Fast 1 | Multi-file context, moderate complexity |
| **256K tokens** | ~192,000 words | Large applications, multiple services | Qwen3 Coder, Grok Code Fast 1 | Full feature context, service integration |
| **400K+ tokens** | ~300,000+ words | Enterprise systems, full stack apps | GPT-5, Claude Sonnet 4, Gemini 2.5 Pro | Architectural overview, system-wide refactoring |

**Performance Degradation**: Model effectiveness typically drops significantly beyond 400-500K tokens, regardless of advertised limits. Plan context usage accordingly.

## Community Choice

The AI model landscape changes quicky to stay up to date [**👉 check Kilo Code Community Favorites on OpenRouter**](https://openrouter.ai/apps?url=https%3A%2F%2Fkilocode.ai%2F)
The AI model space moves fast. Bookmark [kilo.ai/models](https://kilo.ai/models) and check back when you're evaluating options. What's best today might not be best next month — and that's actually exciting.
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,13 @@ When enabled, native JSON function calling improves reliability via explicit sig

It replaces brittle XML-style prompts that risk mixed prose/markup, missing fields, and regex-heavy cleanup, yielding more deterministic tool use and clearer error handling.

[More Details are available](native-function-calling)
[More details are available](native-function-calling)

## Voice Transcription

When enabled, voice transcription allows you to dictate messages using speech-to-text in the chat interface. Powered by OpenAI's Whisper API and FFmpeg for audio capture.

[More details are available](voice-transcription)

## Concurrent file edits

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Voice Transcription

Kilo Code now includes experimental support for voice input in the chat interface. This feature allows you to dictate your messages using speech-to-text (STT) technology powered by OpenAI's Whisper API.

## Prerequisites

Voice transcription requires two components to be set up:

### 1. FFmpeg Installation

FFmpeg is required for audio capture and processing. Install it for your platform:

**macOS:**

```bash
brew install ffmpeg
```

**Linux (Ubuntu/Debian):**

```bash
sudo apt update
sudo apt install ffmpeg
```

**Windows:**
Download from [ffmpeg.org/download.html](https://ffmpeg.org/download.html) and add to your system PATH.

### 2. OpenAI API Key

Voice transcription uses OpenAI's Whisper API for speech recognition. You need an OpenAI API configuration in Kilo Code:

1. Configure an OpenAI provider profile in Kilo Code settings
2. Add your OpenAI API key to the profile
3. Either **OpenAI** or **OpenAI Native** provider types will work

## Enabling Voice Transcription

Voice transcription is an experimental feature that must be enabled:

1. Open Kilo Code settings
2. Navigate to **Experimental Features**
3. Enable the **Speech to Text** experiment

## Using Voice Input

Once configured and enabled, a microphone button will appear in the chat input area:

1. Click the microphone button to start recording
2. Speak your message clearly
3. Click again to stop recording
4. Your speech will be automatically transcribed into text

The feature includes real-time audio level visualization and voice activity detection to automatically detect when you're speaking.

## Technical Details

- **Audio Processing**: Uses FFmpeg for system audio capture
- **Voice Recognition**: OpenAI Whisper API for transcription

## Troubleshooting

**Microphone button not appearing:**

- Ensure the Speech to Text experiment is enabled
- Verify FFmpeg is installed and in your PATH
- Check that you have an OpenAI provider configured with a valid API key

**Transcription errors:**

- Verify your OpenAI API key is valid and has available credits
- Check your internet connection
- Try speaking more clearly or adjusting your microphone settings

## Limitations

This feature is currently experimental and may have limitations:

- Requires active internet connection
- Uses OpenAI API credits based on audio duration
- Transcription accuracy depends on audio quality and speech clarity
38 changes: 38 additions & 0 deletions apps/storybook/stories/ChatView.stories.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,25 @@ export const Default: Story = {
apiModelId: "claude-3-5-sonnet-20241022",
apiKey: "mock-key",
},
currentApiConfigName: "Claude 3.5 Sonnet",
listApiConfigMeta: [
{
id: "config-1",
name: "Claude 3.5 Sonnet",
profileType: "chat",
apiProvider: "anthropic",
apiModelId: "claude-3-5-sonnet-20241022",
},
{
id: "config-2",
name: "GPT-4",
profileType: "chat",
apiProvider: "openai",
apiModelId: "gpt-4-turbo-preview",
},
],
pinnedApiConfigs: {},
togglePinnedApiConfig: fn(),
mcpServers: [],
allowedCommands: [],
mode: "code",
Expand Down Expand Up @@ -212,6 +231,25 @@ export const EmptyWithNotificationsAndHistory: Story = {
apiModelId: "claude-3-5-sonnet-20241022",
apiKey: "mock-key",
},
currentApiConfigName: "Claude 3.5 Sonnet",
listApiConfigMeta: [
{
id: "config-1",
name: "Claude 3.5 Sonnet",
profileType: "chat",
apiProvider: "anthropic",
apiModelId: "claude-3-5-sonnet-20241022",
},
{
id: "config-2",
name: "GPT-4",
profileType: "chat",
apiProvider: "openai",
apiModelId: "gpt-4-turbo-preview",
},
],
pinnedApiConfigs: {},
togglePinnedApiConfig: fn(),
mcpServers: [],
allowedCommands: [],
mode: "code",
Expand Down
37 changes: 37 additions & 0 deletions apps/storybook/stories/VolumeVisualizer.stories.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import type { Meta, StoryObj } from "@storybook/react-vite"
import { VolumeVisualizer } from "@/components/chat/VolumeVisualizer"

const meta = {
title: "Components/VolumeVisualizer",
component: VolumeVisualizer,
parameters: {
layout: "centered",
},
argTypes: {
volume: {
control: { type: "range", min: 0, max: 1, step: 0.01 },
description: "Volume level from 0 to 1",
},
isActive: {
control: "boolean",
description: "Whether recording is active (affects color)",
},
},
} satisfies Meta<typeof VolumeVisualizer>

export default meta
type Story = StoryObj<typeof meta>

export const Default: Story = {
args: {
volume: 0.5,
isActive: true,
},
}

export const Inactive: Story = {
args: {
volume: 0.3,
isActive: false,
},
}
6 changes: 6 additions & 0 deletions cli/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# @kilocode/cli

## 0.16.0

### Minor Changes

- [#4388](https://github.com/Kilo-Org/kilocode/pull/4388) [`af93318`](https://github.com/Kilo-Org/kilocode/commit/af93318e3648c235721ba58fe9caab9429608241) Thanks [@iscekic](https://github.com/iscekic)! - send org id and last mode with session data

## 0.15.0

### Minor Changes
Expand Down
2 changes: 1 addition & 1 deletion cli/package.dist.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@kilocode/cli",
"version": "0.15.0",
"version": "0.16.0",
"description": "Terminal User Interface for Kilo Code",
"type": "module",
"main": "index.js",
Expand Down
2 changes: 1 addition & 1 deletion cli/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@kilocode/cli",
"version": "0.15.0",
"version": "0.16.0",
"description": "Terminal User Interface for Kilo Code",
"type": "module",
"main": "dist/index.js",
Expand Down
3 changes: 3 additions & 0 deletions cli/src/__tests__/config-command.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ vi.mock("fs/promises", async () => {
}
})

// Mock environment variables to avoid ephemeral mode
vi.stubEnv("KILOCODE_EPHEMERAL", "false")

describe("Config Command", () => {
let testDir: string
let testConfigFile: string
Expand Down
Loading
Loading