Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- **Voyage AI Provider**: Add Voyage AI as a new cloud embedding provider (`voyageai`) with code-optimized models like `voyage-code-3` (1024 dims), batch embedding, adaptive rate limiting, and full integration across CLI, MCP server, and workspaces
- **`.grepaiignore` Support**: New `.grepaiignore` file allows overriding `.gitignore` rules for grepai indexing. Supports negation patterns (`!`) to re-include files excluded by `.gitignore`, with directory-level precedence for nested files (#107)

## [0.34.0] - 2026-02-24
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ curl -sSL https://raw.githubusercontent.com/yoanbernabeu/grepai/main/install.sh
irm https://raw.githubusercontent.com/yoanbernabeu/grepai/main/install.ps1 | iex
```

Requires an embedding provider — [Ollama](https://ollama.ai) (default), [LM Studio](https://lmstudio.ai), or OpenAI.
Requires an embedding provider — [Ollama](https://ollama.ai) (default), [LM Studio](https://lmstudio.ai), OpenAI, or [Voyage AI](https://voyageai.com).

**Ollama (recommended):**
```bash
Expand Down
24 changes: 18 additions & 6 deletions cli/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ var initCmd = &cobra.Command{

This command will:
- Create .grepai/config.yaml with default settings
- Prompt for embedding provider (Ollama or OpenAI)
- Prompt for embedding provider (Ollama, LM Studio, OpenAI, or Voyage AI)
- Prompt for storage backend (GOB file or PostgreSQL)
- Add .grepai/ to .gitignore if present`,
RunE: runInit,
}

func init() {
initCmd.Flags().StringVarP(&initProvider, "provider", "p", "", "Embedding provider (ollama, lmstudio, openai, synthetic, or openrouter)")
initCmd.Flags().StringVarP(&initProvider, "provider", "p", "", "Embedding provider (ollama, lmstudio, openai, voyageai, synthetic, or openrouter)")
initCmd.Flags().StringVarP(&initModel, "model", "m", "", "Embedding model (for openrouter: text-embedding-3-small, text-embedding-3-large, qwen3-embedding-8b)")
initCmd.Flags().StringVarP(&initBackend, "backend", "b", "", "Storage backend (gob, postgres, or qdrant)")
initCmd.Flags().BoolVar(&initNonInteractive, "yes", false, "Use defaults without prompting")
Expand Down Expand Up @@ -123,6 +123,7 @@ func runInit(cmd *cobra.Command, args []string) error {
fmt.Println(" 3) openai (cloud, requires API key)")
fmt.Println(" 4) synthetic (cloud, free embedding API)")
fmt.Println(" 5) openrouter (cloud, multi-provider gateway)")
fmt.Println(" 6) voyageai (cloud, optimized for code, requires API key)")
fmt.Print("Choice [1]: ")

input, _ := reader.ReadString('\n')
Expand All @@ -145,7 +146,7 @@ func runInit(cmd *cobra.Command, args []string) error {
cfg.Embedder.Provider = "openai"
cfg.Embedder.Model = "text-embedding-3-small"
cfg.Embedder.Endpoint = "https://api.openai.com/v1"
// OpenAI: leave Dimensions nil to use model's native dimensions
cfg.Embedder.Dimensions = nil // use model's native dimensions
case "4", "synthetic":
cfg.Embedder.Provider = "synthetic"
cfg.Embedder.Model = "hf:nomic-ai/nomic-embed-text-v1.5"
Expand All @@ -155,7 +156,7 @@ func runInit(cmd *cobra.Command, args []string) error {
case "5", "openrouter":
cfg.Embedder.Provider = "openrouter"
cfg.Embedder.Endpoint = "https://openrouter.ai/api/v1"
// OpenRouter: leave Dimensions nil to use model's native dimensions
cfg.Embedder.Dimensions = nil // use model's native dimensions

// Model selection for OpenRouter
fmt.Println("\nSelect OpenRouter embedding model:")
Expand All @@ -175,6 +176,11 @@ func runInit(cmd *cobra.Command, args []string) error {
default:
cfg.Embedder.Model = "openai/text-embedding-3-small"
}
case "6", "voyageai":
cfg.Embedder.Provider = "voyageai"
cfg.Embedder.Model = "voyage-code-3"
cfg.Embedder.Endpoint = "https://api.voyageai.com/v1"
cfg.Embedder.Dimensions = nil // use model's native dimensions (1024)
default:
cfg.Embedder.Provider = "ollama"
fmt.Print("Ollama endpoint [http://localhost:11434]: ")
Expand All @@ -196,7 +202,11 @@ func runInit(cmd *cobra.Command, args []string) error {
case "openai":
cfg.Embedder.Model = "text-embedding-3-small"
cfg.Embedder.Endpoint = "https://api.openai.com/v1"
// OpenAI: leave Dimensions nil to use model's native dimensions
cfg.Embedder.Dimensions = nil // use model's native dimensions
case "voyageai":
cfg.Embedder.Model = "voyage-code-3"
cfg.Embedder.Endpoint = "https://api.voyageai.com/v1"
cfg.Embedder.Dimensions = nil // use model's native dimensions (1024)
case "synthetic":
cfg.Embedder.Model = "hf:nomic-ai/nomic-embed-text-v1.5"
cfg.Embedder.Endpoint = "https://api.synthetic.new/openai/v1"
Expand All @@ -205,7 +215,7 @@ func runInit(cmd *cobra.Command, args []string) error {
case "openrouter":
cfg.Embedder.Model = "openai/text-embedding-3-small"
cfg.Embedder.Endpoint = "https://openrouter.ai/api/v1"
// OpenRouter: leave Dimensions nil to use model's native dimensions
cfg.Embedder.Dimensions = nil // use model's native dimensions
}
}

Expand Down Expand Up @@ -339,6 +349,8 @@ func runInit(cmd *cobra.Command, args []string) error {
fmt.Printf(" Endpoint: %s\n", cfg.Embedder.Endpoint)
case "openai":
fmt.Println("\nMake sure OPENAI_API_KEY is set in your environment.")
case "voyageai":
fmt.Println("\nMake sure VOYAGE_API_KEY is set in your environment.")
case "synthetic":
fmt.Println("\nMake sure SYNTHETIC_API_KEY or OPENAI_API_KEY is set in your environment.")
fmt.Println(" Get your free API key at: https://api.synthetic.new")
Expand Down
1 change: 1 addition & 0 deletions cli/watch.go
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,7 @@ func initializeEmbedder(ctx context.Context, cfg *config.Config) (embedder.Embed
return nil, fmt.Errorf("cannot connect to Ollama: %w\nMake sure Ollama is running and has the %s model", err, cfg.Embedder.Model)
}
}

case "lmstudio":
if p, ok := emb.(pinger); ok {
if err := p.Ping(ctx); err != nil {
Expand Down
18 changes: 15 additions & 3 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,12 @@ const (
DefaultOpenAIEmbeddingModel = "text-embedding-3-small"
DefaultSyntheticEmbeddingModel = "hf:nomic-ai/nomic-embed-text-v1.5"
DefaultOpenRouterEmbeddingModel = "openai/text-embedding-3-small"
DefaultVoyageAIEmbeddingModel = "voyage-code-3"

DefaultOllamaEndpoint = "http://localhost:11434"
DefaultLMStudioEndpoint = "http://127.0.0.1:1234"
DefaultOpenAIEndpoint = "https://api.openai.com/v1"
DefaultVoyageAIEndpoint = "https://api.voyageai.com/v1"
DefaultSyntheticEndpoint = "https://api.synthetic.new/openai/v1"
DefaultOpenRouterEndpoint = "https://openrouter.ai/api/v1"

Expand Down Expand Up @@ -93,16 +95,17 @@ type BoostRule struct {
}

type EmbedderConfig struct {
Provider string `yaml:"provider"` // ollama | lmstudio | openai | synthetic | openrouter
Provider string `yaml:"provider"` // ollama | lmstudio | openai | voyageai | synthetic | openrouter
Model string `yaml:"model"`
Endpoint string `yaml:"endpoint,omitempty"`
APIKey string `yaml:"api_key,omitempty"`
Dimensions *int `yaml:"dimensions,omitempty"`
Parallelism int `yaml:"parallelism"` // Number of parallel workers for batch embedding (default: 4)
Parallelism int `yaml:"parallelism,omitempty"` // Number of parallel workers for batch embedding (default: 4)
}

// GetDimensions returns the configured dimensions or a default value.
// For OpenAI/OpenRouter, defaults to 1536 (text-embedding-3-small).
// For Voyage AI, defaults to 1024 (voyage-code-3).
// For Ollama/LMStudio/Synthetic, defaults to 768 (nomic-embed-text-v1.5).
func (e *EmbedderConfig) GetDimensions() int {
if e.Dimensions != nil {
Expand All @@ -111,13 +114,22 @@ func (e *EmbedderConfig) GetDimensions() int {
switch e.Provider {
case "openai", "openrouter":
return DefaultOpenAIDimensions
case "voyageai":
return 1024
default:
return DefaultLocalEmbeddingDimensions
}
}

func DefaultEmbedderForProvider(provider string) EmbedderConfig {
switch provider {
case "voyageai":
return EmbedderConfig{
Provider: "voyageai",
Model: DefaultVoyageAIEmbeddingModel,
Endpoint: DefaultVoyageAIEndpoint,
Dimensions: nil, // Voyage AI uses native dimensions (1024)
}
case "synthetic":
dim := DefaultLocalEmbeddingDimensions
return EmbedderConfig{
Expand Down Expand Up @@ -448,7 +460,7 @@ func (c *Config) applyDefaults() {
}
}

// Parallelism default (only used by OpenAI embedder)
// Parallelism default (used by OpenAI and Voyage AI embedders)
if c.Embedder.Parallelism <= 0 {
c.Embedder.Parallelism = 4
}
Expand Down
8 changes: 8 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ func TestDefaultEmbedderForProvider(t *testing.T) {
if openai.Dimensions != nil {
t.Fatalf("openai dimensions should be nil, got %v", openai.Dimensions)
}

voyageai := DefaultEmbedderForProvider("voyageai")
if voyageai.Endpoint != DefaultVoyageAIEndpoint || voyageai.Model != DefaultVoyageAIEmbeddingModel {
t.Fatalf("unexpected voyageai defaults: %+v", voyageai)
}
if voyageai.Dimensions != nil {
t.Fatalf("voyageai dimensions should be nil, got %v", voyageai.Dimensions)
}
}

func TestDefaultStoreForBackend(t *testing.T) {
Expand Down
54 changes: 54 additions & 0 deletions docs/src/content/docs/backends/embedders.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Embedders convert text (code chunks) into vector representations that enable sem
| Ollama | Local | Privacy, free, no internet | Requires local resources |
| LM Studio | Local | Privacy, OpenAI-compatible API, GUI | Requires local resources |
| OpenAI | Cloud | High quality, fast | Costs money, sends code to cloud |
| Voyage AI | Cloud | Optimized for code, high quality | Costs money, sends code to cloud |

## Ollama (Local)

Expand Down Expand Up @@ -233,6 +234,59 @@ For a typical codebase:
- Initial index: ~$0.001 with `text-embedding-3-small`
- Ongoing updates: negligible

## Voyage AI (Cloud)

[Voyage AI](https://voyageai.com) provides embedding models specifically optimized for code search and retrieval.

### Setup

1. Get an API key from [Voyage AI Dashboard](https://dash.voyageai.com/api-keys)

2. Set the environment variable:

```bash
export VOYAGE_API_KEY=pa-...
```

### Configuration

```yaml
embedder:
provider: voyageai
model: voyage-code-3
endpoint: https://api.voyageai.com/v1
api_key: ${VOYAGE_API_KEY}
```

### Available Models

| Model | Dimensions | Context | Notes |
|-------|------------|---------|-------|
| `voyage-code-3` | 1024 | 32K | Optimized for code retrieval (recommended) |
| `voyage-4-large` | 1024 | 32K | Best general-purpose retrieval quality |
| `voyage-4` | 1024 | 32K | Balanced quality and performance |
| `voyage-4-lite` | 1024 | 32K | Optimized for latency and cost |

All Voyage 4 series models support flexible dimensions (256, 512, 1024, 2048) and share a compatible embedding space.

### Parallelism & Rate Limiting

Voyage AI embeddings support parallel batch processing with adaptive rate limiting:

```yaml
embedder:
provider: voyageai
model: voyage-code-3
api_key: ${VOYAGE_API_KEY}
parallelism: 4 # Concurrent API requests (default: 4)
```

Rate limiting works the same as OpenAI: on 429 responses, parallelism auto-reduces and retries with exponential backoff.

### Cost Estimation

See [Voyage AI Pricing](https://docs.voyageai.com/docs/pricing) for current rates.

## Changing Embedding Models

You can use any embedding model available on your provider. Two parameters matter:
Expand Down
4 changes: 2 additions & 2 deletions docs/src/content/docs/commands/grepai_init.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Initialize grepai by creating a .grepai directory with configuration.

This command will:
- Create .grepai/config.yaml with default settings
- Prompt for embedding provider (Ollama or OpenAI)
- Prompt for embedding provider (Ollama, OpenAI, Voyage AI, etc.)
- Prompt for storage backend (GOB file or PostgreSQL)
- Add .grepai/ to .gitignore if present

Expand All @@ -27,7 +27,7 @@ grepai init [flags]
-b, --backend string Storage backend (gob, postgres, or qdrant)
-h, --help help for init
--inherit Inherit configuration from main worktree (for git worktrees)
-p, --provider string Embedding provider (ollama, lmstudio, or openai)
-p, --provider string Embedding provider (ollama, lmstudio, openai, voyageai, synthetic, or openrouter)
--ui Run interactive Bubble Tea UI wizard
--yes Use defaults without prompting
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/content/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ version: 1

# Embedder configuration
embedder:
# Provider: "ollama" (local), "lmstudio" (local), or "openai" (cloud)
# Provider: "ollama" (local), "lmstudio" (local), "openai" (cloud), or "voyageai" (cloud)
provider: ollama
# Model name (depends on provider)
model: nomic-embed-text
Expand Down
4 changes: 3 additions & 1 deletion docs/src/content/docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,10 @@ grepai/
├── config/ # Configuration loading
├── embedder/ # Embedding providers
│ ├── embedder.go # Interface
│ ├── factory.go # Provider factory
│ ├── ollama.go
│ └── openai.go
│ ├── openai.go
│ └── voyageai.go
├── store/ # Vector storage
│ ├── store.go # Interface
│ ├── gob.go
Expand Down
2 changes: 1 addition & 1 deletion docs/src/content/docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: How to install grepai

## Prerequisites

- **Ollama** (for local embeddings) or an **OpenAI API key** (for cloud embeddings)
- **Ollama** (for local embeddings) or a cloud API key (**OpenAI**, **Voyage AI**, etc.)

## Homebrew (macOS)

Expand Down
1 change: 1 addition & 0 deletions docs/src/content/docs/skills.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ Install skills by category:
| `grepai-embeddings-ollama` | Configure Ollama for local, private embeddings |
| `grepai-embeddings-openai` | Configure OpenAI for cloud embeddings |
| `grepai-embeddings-lmstudio` | Configure LM Studio with GUI interface |
| `grepai-embeddings-voyageai` | Configure Voyage AI for code-optimized cloud embeddings |

### Storage Backends
| Skill | Description |
Expand Down
2 changes: 1 addition & 1 deletion docs/src/content/docs/workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ grepai workspace create my-fullstack --from workspace-config.yaml
| Flag | Description | Default |
|------|-------------|---------|
| `--backend` | Storage backend (`qdrant` or `postgres`) | Required (or `--yes`) |
| `--provider` | Embedding provider (`ollama`, `openai`, `lmstudio`) | `ollama` with `--yes` |
| `--provider` | Embedding provider (`ollama`, `openai`, `lmstudio`, `voyageai`, `synthetic`, `openrouter`) | `ollama` with `--yes` |
| `--model` | Embedding model name | Provider default |
| `--endpoint` | Embedder endpoint URL | Provider default |
| `--dsn` | PostgreSQL connection string | Required for postgres |
Expand Down
Loading
Loading