Vector database SDK for JavaScript/TypeScript with built-in semantic search
Works seamlessly with seekdb and OceanBase
For complete usage, see the official documentation.
Why seekdb-js?
Packages
Installation
Running Modes
Quick Start
Usage Guide
Examples
Development
License
- Auto Vectorization - Automatic embedding generation, no manual vector calculation needed
- Semantic Search - Vector-based similarity search for natural language queries
- Hybrid Search - Combine keyword matching with semantic search
- Multiple Embedding Functions - Built-in support for local and cloud embedding providers
- TypeScript Native - Full TypeScript support with complete type definitions
This is a monorepo containing:
| Package | Description |
|---|---|
seekdb |
Core SDK for seekdb operations |
@seekdb/default-embed |
Local embedding function using Xenova/all-MiniLM-L6-v2 model (default) |
@seekdb/qwen |
DashScope/Tongyi Qianwen cloud embedding service |
@seekdb/openai |
OpenAI cloud embedding service |
@seekdb/jina |
Jina AI multimodal embedding service |
@seekdb/bm25 |
BM25 sparse embedding function for efficient text-based keyword search |
@seekdb/prisma-adapter |
Prisma driver adapter for seekdb Embedded (use Prisma with seekdb Embedded) |
npm install seekdb @seekdb/default-embed- Embedded mode: No seekdb server deployment required; use locally after install.
- Server mode: Deploy seekdb or OceanBase first; see official deployment docs.
The SDK supports two modes; the constructor arguments to SeekdbClient determine which is used. For database management (create/list/get/delete database), use SeekdbAdminClient() which returns a SeekdbClient instance.
| Mode | Parameter | Description |
|---|---|---|
| Embedded | path (database directory path) |
Runs locally with no separate seekdb server; data is stored under the given path (e.g. ./seekdb.db). Requires native addon @seekdb/js-bindings. |
| Server | host (and port, user, password, etc.) |
Connects to a remote seekdb or OceanBase instance. |
OceanBase and seekdb: OceanBase is compatible with seekdb and can be understood as its distributed, multi-tenant, etc. version. seekdb-js therefore supports OceanBase server mode with the same API: use the same SeekdbClient / SeekdbAdminClient and connection parameters; when connecting to OceanBase, additionally pass tenant (e.g. "sys" or your tenant name). See OceanBase mode below.
- SeekdbClient: Pass
pathfor embedded mode, orhost(and port, user, password, etc.) for server mode. - SeekdbAdminClient(): For admin operations only; pass
pathfor embedded orhostfor server. In embedded mode you do not specify a database name.
Embedded mode (local file, no server):
import { SeekdbClient } from "seekdb";
// 1. Connect
const client = new SeekdbClient({
path: "./seekdb.db",
database: "test",
});
// 2. Create collection
const collection = await client.createCollection({ name: "my_collection" });
// 3. Add data (auto-vectorized using @seekdb/default-embed)
await collection.add({
ids: ["1", "2"],
documents: ["Hello world", "seekdb is fast"],
});
// 4. Search
const results = await collection.query({ queryTexts: "Hello", nResults: 5 });
console.log("query results", results);Server mode (connect to a deployed seekdb):
import { SeekdbClient } from "seekdb";
// 1. Connect
const client = new SeekdbClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
});
// 2. Create collection
const collection = await client.createCollection({ name: "my_collection" });
// 3. Add data (auto-vectorized using @seekdb/default-embed)
await collection.add({
ids: ["1", "2"],
documents: ["Hello world", "seekdb is fast"],
});
// 4. Search
const results = await collection.query({ queryTexts: "Hello", nResults: 5 });
console.log("query results", results);This section covers basic usage. See the official SDK documentation for full details.
Embedded mode (local database file):
import { SeekdbClient } from "seekdb";
const client = new SeekdbClient({
path: "./seekdb.db", // database file path
database: "test",
});Server mode:
import { SeekdbClient } from "seekdb";
const client = new SeekdbClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
});OceanBase mode (server mode with tenant): OceanBase is compatible with seekdb (distributed, multi-tenant, etc.). Use the same server-mode connection; when the backend is OceanBase, pass tenant (e.g. "sys" or your tenant name):
const client = new SeekdbClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
tenant: "sys", // or your OceanBase tenant
});You can create a collection without any configuration; the default embedding function will be used for vectorization. Ensure @seekdb/default-embed is installed first.
const collection = await client.createCollection({
name: "my_collection",
});A schema defines which indexes are available on a collection:
- FulltextIndexConfig - For keyword-based full-text search
- VectorIndexConfig - For dense vector similarity search
- SparseVectorIndexConfig - For sparse vector similarity search
Examples for creating the three index types:
FulltextIndexConfig
const schema = new Schema({
fulltextIndex: new FulltextIndexConfig("ik", { ik_mode: "smart" }),
});
const collection = await client.createCollection({
name: "ft_collection",
schema,
});VectorIndexConfig
If you do not set embeddingFunction, the default embedding function is used. Ensure @seekdb/default-embed is installed.
const schema = new Schema({
vectorIndex: new VectorIndexConfig({
hnsw: { dimension: 384, distance: "cosine" },
}),
});
const collection = await client.createCollection({
name: "vec_collection",
schema,
});To use a custom embedding function, install one of the provided packages or implement your own. See the official SDK documentation for details.
Take @seekdb/qwen as an example:
npm install @seekdb/qwenimport { QwenEmbeddingFunction } from "@seekdb/qwen";
const qwenEF = new QwenEmbeddingFunction();
const schema = new Schema({
vectorIndex: new VectorIndexConfig({
hnsw: { dimension: 384, distance: "cosine" },
embeddingFunction: qwenEF,
}),
});
const collection = await client.createCollection({
name: "my_collection",
schema,
});If you don't need an embedding function, set embeddingFunction to null.
const schema = new Schema({
vectorIndex: new VectorIndexConfig({
hnsw: { dimension: 384, distance: "cosine" },
embeddingFunction: null,
}),
});
const collection = await client.createCollection({
name: "vec_collection",
schema,
});SparseVectorIndexConfig
You can use the provided @seekdb/bm25 as the sparse embedding function or implement your own.
npm install @seekdb/bm25import { Bm25EmbeddingFunction } from "@seekdb/bm25";
import { K } from "seekdb";
const schema = new Schema({
sparseVectorIndex: new SparseVectorIndexConfig({
sourceKey: K.DOCUMENT,
embeddingFunction: new Bm25EmbeddingFunction(),
}),
});
const collection = await client.createCollection({
name: "sparse_collection",
schema,
});The embedding function defined in Schema is used automatically for vectorization. No need to set it again.
await collection.add({
ids: ["1", "2"],
documents: ["Hello world", "seekdb is fast"],
metadatas: [{ category: "test" }, { category: "db" }],
});You can also pass a vector or an array of vectors directly.
const qwenEF = new QwenEmbeddingFunction();
await collection.add({
ids: ["1", "2"],
documents: ["Hello world", "seekdb is fast"],
metadatas: [{ category: "test" }, { category: "db" }],
embeddings: [
[0.1, 0.2, 0.3],
[0.2, 0.3, 0.4],
],
});Get Data
The get() method is used to retrieve documents from a collection without performing vector similarity search.
const results = await collection.get({
ids: ["1", "2"],
});Semantic Search
The query() method is used to execute vector similarity search to find documents most similar to the query vector.
The embedding function defined in Schema is used automatically for vectorization. No need to set it again.
const results = await collection.query({
queryTexts: "Hello",
nResults: 5,
});You can also pass a vector or an array of vectors directly.
const results = await collection.query({
queryEmbeddings: [
[0.1, 0.2, 0.3],
[0.2, 0.3, 0.4],
],
nResults: 5,
});Specify queryKey to run sparse vector search. If a sparse embedding function is defined, queryTexts will be vectorized automatically.
import
import { K } from "seekdb";
const results = await collection.query({
queryTexts: "artificial intelligence",
// Use sparse vector index, default by K.EMBEDDING
queryKey: K.EMBEDDING,
nResults: 3,
});You can also supply your own sparse vectors for search.
const queryVector: SparseVector = { 1234: 0.5, 5678: 0.8 };
const results = await collection.query({
queryEmbeddings: queryVector,
queryKey: K.SPARSE_EMBEDDING,
nResults: 5,
});Hybrid Search (Keyword + Semantic)
The hybridSearch() combines full-text search and vector similarity search with ranking.
const hybridResults = await collection.hybridSearch({
query: { whereDocument: { $contains: "seekdb" } },
knn: { queryTexts: ["fast database"] },
nResults: 5,
});You can also pass a vector or an array of vectors directly.
const hybridResults = await collection.hybridSearch({
query: { whereDocument: { $contains: "seekdb" } },
knn: {
queryEmbeddings: [
[0.1, 0.2, 0.3],
[0.2, 0.3, 0.4],
],
},
nResults: 5,
});The SDK supports multiple Embedding Functions for generating vectors locally or in the cloud.
For complete usage, see the official documentation.
Uses a local model (Xenova/all-MiniLM-L6-v2) by default. No API Key required. Suitable for quick development and testing.
No configuration is needed to use the default model.
First install the built-in model:
npm install @seekdb/default-embedThen use it as-is; it will auto-vectorize:
const collection = await client.createCollection({
name: "local_embed_collection",
});Uses DashScope's cloud Embedding service (Qwen/Tongyi Qianwen). Suitable for production environments.
npm install @seekdb/qwenimport { QwenEmbeddingFunction } from "@seekdb/qwen";
const qwenEmbed = new QwenEmbeddingFunction({
// Your DashScope environment variable name, defaults to 'DASHSCOPE_API_KEY'
apiKeyEnvVar: 'DASHSCOPE_API_KEY'
// Optional, defaults to 'text-embedding-v4'
modelName: "text-embedding-v4",
});
const collection = await client.createCollection({
name: "qwen_embed_collection",
embeddingFunction: qwenEmbed,
});Uses OpenAI's embedding API. Suitable for production environments with OpenAI integration.
npm install @seekdb/openaiimport { OpenAIEmbeddingFunction } from "@seekdb/openai";
const openaiEmbed = new OpenAIEmbeddingFunction({
// Your openai environment variable name, defaults to 'OPENAI_API_KEY'
apiKeyEnvVar: 'OPENAI_API_KEY'
// Optional, defaults to 'text-embedding-3-small'
modelName: "text-embedding-3-small",
});
const collection = await client.createCollection({
name: "openai_embed_collection",
embeddingFunction: openaiEmbed,
});Uses Jina AI's embedding API. Supports multimodal embeddings.
npm install @seekdb/jinaimport { JinaEmbeddingFunction } from "@seekdb/jina";
const jinaEmbed = new JinaEmbeddingFunction({
// Your jina environment variable name, defaults to 'JINA_API_KEY'
apiKeyEnvVar: 'JINA_API_KEY'
// Optional, defaults to jina-clip-v2
modelName: "jina-clip-v2",
});
const collection = await client.createCollection({
name: "jina_embed_collection",
embeddingFunction: jinaEmbed,
});You can also use your own custom embedding function.
First, implement the EmbeddingFunction interface:
import type { EmbeddingFunction } from "seekdb";
import { registerEmbeddingFunction } from "seekdb";
interface MyCustomEmbeddingConfig {
apiKeyEnv: string;
}
class MyCustomEmbeddingFunction implements EmbeddingFunction {
// The name of the `embeddingFunction`, must be unique.
readonly name = "my_custom_embedding";
private apiKeyEnv: string;
dimension: number;
constructor(config: MyCustomEmbeddingConfig) {
this.apiKeyEnv = config.apiKeyEnv;
this.dimension = 384;
}
// Implement your vector generation code here
async generate(texts: string[]): Promise<number[][]> {
const embeddings: number[][] = [];
return embeddings;
}
// The configuration of the current `embeddingFunction` instance, used to restore this instance
getConfig(): MyCustomEmbeddingConfig {
return {
apiKeyEnv: this.apiKeyEnv,
};
}
// Create a new instance of the current `embeddingFunction` based on the provided configuration
static buildFromConfig(config: MyCustomEmbeddingConfig): EmbeddingFunction {
return new MyCustomEmbeddingFunction(config);
}
}
// Register the constructor
registerEmbeddingFunction("my_custom_embedding", MyCustomEmbeddingFunction);Then use it:
const customEmbed = new MyCustomEmbeddingFunction({
apiKeyEnv: "MY_CUSTOM_API_KEY_ENV",
});
const collection = await client.createCollection({
name: "custom_embed_collection",
configuration: {
dimension: 384,
distance: "cosine",
},
embeddingFunction: customEmbed,
});BM25 (Best Matching 25) is a sparse embedding function that uses term frequency and document length normalization for efficient text search. It's particularly useful for keyword-based search scenarios.
npm install @seekdb/bm25import { SeekdbClient, Schema, SparseVectorIndexConfig, K } from "seekdb";
import { Bm25EmbeddingFunction } from "@seekdb/bm25";
const bm25 = new Bm25EmbeddingFunction({
k: 1.2, // Term frequency saturation
b: 0.75, // Document length normalization
avgDocLength: 256, // Average document length
tokenMaxLength: 40, // Maximum token length
stopwords: ["a", "an", "the"], // Custom stopwords
});
const collection = await client.createCollection({
name: "bm25_collection",
schema: new Schema({
sparseVectorIndex: new SparseVectorIndexConfig({
sourceKey: K.DOCUMENT,
embeddingFunction: bm25,
}),
}),
});For more details, see BM25 Embedding Guide.
Implement your own sparse embedding function:
import {
SparseEmbeddingFunction,
SparseVector,
registerSparseEmbeddingFunction,
EmbeddingConfig,
} from "seekdb";
interface MySparseConfig {
vocabSize: number;
}
class MySparseEmbeddingFunction implements SparseEmbeddingFunction {
readonly name = "my_sparse";
private vocabSize: number;
constructor(config: MySparseConfig) {
this.vocabSize = config.vocabSize;
}
// Implement your vector generation code here
async generate(texts: string[]): Promise<SparseVector[]> {
const embeddings: number[][] = [];
return embeddings;
}
// Generate sparse vectors for queries (can be different)
async generateForQueries(texts: string[]): Promise<SparseVector[]> {
return this.generate(texts);
}
// Return configuration for persistence
getConfig(): EmbeddingConfig {
return { vocabSize: this.vocabSize };
}
// Static factory method
static buildFromConfig(config: EmbeddingConfig): SparseEmbeddingFunction {
return new MySparseEmbeddingFunction(config as MySparseConfig);
}
}
// Register the function
registerSparseEmbeddingFunction("my_sparse", MySparseEmbeddingFunction);
// Use it
const collection = await client.createCollection({
name: "my_sparse_collection",
schema: {
sparseVectorIndex: new SparseVectorIndexConfig({
sourceKey: K.DOCUMENT,
embeddingFunction: new MySparseEmbeddingFunction({ vocabSize: 100000 }),
}),
},
});You can combine vector (or hybrid) search with relational tables: run collection.query() or collection.hybridSearch() to get ids, then query your relational table by those ids. For type-safe relational queries, prefer an ORM (see Integration with ORM); here is a raw-SQL recipe.
Recipe
- Get ids (and optional metadata) from vector/hybrid search.
- Query the relational table with
client.execute(). ForWHERE id IN (...)with MySQL/mysql2, use parameterized placeholders to avoid SQL injection: one?per id and pass the ids array as params.
Example (TypeScript) — hybrid search, then fetch users by id and merge:
const client = new SeekdbClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
});
const collection = await client.getCollection({ name: "my_collection" });
// 1. Hybrid search → get ids
const hybridResult = await collection.hybridSearch({
query: { whereDocument: { $contains: "seekdb" } },
knn: { queryTexts: ["fast database"] },
nResults: 5,
});
const ids = hybridResult.ids?.flat().filter(Boolean) ?? []; // e.g. string[]
// 2. Query relational table by ids (parameterized)
type UserRow = { id: string; name: string };
let users: UserRow[] = [];
if (ids.length > 0) {
const placeholders = ids.map(() => "?").join(",");
const rows = await client.execute(
`SELECT id, name FROM users WHERE id IN (${placeholders})`,
ids
);
users = (rows ?? []) as UserRow[];
}
// 3. Merge: e.g. map ids to (vector result + user row)
const merged = ids.map((id) => ({
id,
user: users.find((u) => u.id === id) ?? null,
}));Transactions: There is no explicit transaction API on the client. For transactions that span both vector and relational operations, use a separate mysql2 connection (same DB config) and run beginTransaction() / commit() / rollback() on that connection (see Integration with ORM).
Use seekdb-js for vector/full-text/hybrid search and an ORM (Drizzle or Prisma) for type-safe relational tables. seekdb is MySQL-compatible in both modes. Server mode: same database, two connections (SeekdbClient + ORM). Embedded mode: Drizzle uses drizzle-orm/mysql-proxy with a callback around client.execute(); Prisma uses @seekdb/prisma-adapter.
Server mode: Create a mysql2 connection with the same DB config as SeekdbClient and pass it to drizzle(conn).
import { createConnection } from "mysql2/promise";
import { SeekdbClient } from "seekdb";
import { drizzle } from "drizzle-orm/mysql2";
import { inArray } from "drizzle-orm";
import { users } from "./schema"; // mysqlTable, relational only; vector tables via Collection
const dbConfig = {
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
};
const client = new SeekdbClient(dbConfig);
const conn = await createConnection(dbConfig);
const db = drizzle(conn);
const collection = await client.getCollection({ name: "docs" });
const result = await collection.hybridSearch({
query: { whereDocument: { $contains: "seekdb" } },
knn: { queryTexts: ["database"] },
nResults: 5,
});
const ids = result.ids?.flat().filter(Boolean) ?? [];
const usersList = await db.select().from(users).where(inArray(users.id, ids));
// when done: await conn.end(); await client.close();Embedded mode: Use drizzle-orm/mysql-proxy with a callback that runs SQL via client.execute() and returns { rows }. See seekdb-drizzle for a runnable sample.
Server mode: Use SeekdbClient and PrismaClient with DATABASE_URL pointing to the same database.
import { SeekdbClient } from "seekdb";
import { PrismaClient } from "@prisma/client";
const client = new SeekdbClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
});
const prisma = new PrismaClient(); // DATABASE_URL="mysql://root:@127.0.0.1:2881/test"
const collection = await client.getCollection({ name: "docs" });
const result = await collection.hybridSearch({
query: { whereDocument: { $contains: "seekdb" } },
knn: { queryTexts: ["database"] },
nResults: 5,
});
const ids = result.ids?.flat().filter(Boolean) ?? [];
const users = await prisma.user.findMany({ where: { id: { in: ids } } });
// merge as needed; when done: await client.close(); await prisma.$disconnect();Set DATABASE_URL to the same host/port/user/password/database (e.g. mysql://user:password@host:port/database). OceanBase: see Prisma/MySQL tenant docs.
Embedded mode: Use @seekdb/prisma-adapter with PrismaClient({ adapter }) so Prisma runs SQL via client.execute(). See seekdb-prisma; run pnpm run start:embedded in that example.
Use SeekdbAdminClient for database management. It supports both embedded and server modes.
Embedded mode (local database file):
import { SeekdbAdminClient } from "seekdb";
const admin = new SeekdbAdminClient({ path: "./seekdb.db" });
await admin.createDatabase("new_database");
const databases = await admin.listDatabases();
const db = await admin.getDatabase("new_database");
await admin.deleteDatabase("new_database");
await admin.close();Server mode:
import { SeekdbAdminClient } from "seekdb";
const admin = new SeekdbAdminClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
});
await admin.createDatabase("new_database");
const databases = await admin.listDatabases();
const db = await admin.getDatabase("new_database");
await admin.deleteDatabase("new_database");
await admin.close();OceanBase mode (server mode with tenant): add tenant (e.g. "sys" or your tenant name) to the config:
const admin = new SeekdbAdminClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
tenant: "sys", // or your OceanBase tenant
});
await admin.createDatabase("new_database");
const databases = await admin.listDatabases();
const db = await admin.getDatabase("new_database");
await admin.deleteDatabase("new_database");
await admin.close();AdminClient() is available as a factory and returns a SeekdbClient configured for admin operations.
Check out the examples directory for complete usage examples:
Basic examples (root examples/):
- simple-example.ts - Basic usage
- complete-example.ts - All features
- hybrid-search-example.ts - Hybrid search
ORM integration (vector + relational tables):
- seekdb-drizzle - Drizzle ORM (Server: same DB two connections; Embedded: mysql-proxy)
- seekdb-prisma - Prisma ORM (Server: DATABASE_URL; Embedded: @seekdb/prisma-adapter)
To run the examples, see Run Examples in the development guide.
See DEVELOP.md for details on development, testing, and contributing.
This package is licensed under Apache 2.0.