Skip to content

oceanbase/seekdb-js

seekdb-js

npm version npm downloads TypeScript Node.js PRs Welcome

Vector database SDK for JavaScript/TypeScript with built-in semantic search
Works seamlessly with seekdb and OceanBase

For complete usage, see the official documentation.

Table of contents

Why seekdb-js?
Packages
Installation
Running Modes
Quick Start
Usage Guide
Examples
Development
License

Why seekdb-js?

  • Auto Vectorization - Automatic embedding generation, no manual vector calculation needed
  • Semantic Search - Vector-based similarity search for natural language queries
  • Hybrid Search - Combine keyword matching with semantic search
  • Multiple Embedding Functions - Built-in support for local and cloud embedding providers
  • TypeScript Native - Full TypeScript support with complete type definitions

Packages

This is a monorepo containing:

Package Description
seekdb Core SDK for seekdb operations
@seekdb/default-embed Local embedding function using Xenova/all-MiniLM-L6-v2 model (default)
@seekdb/qwen DashScope/Tongyi Qianwen cloud embedding service
@seekdb/openai OpenAI cloud embedding service
@seekdb/jina Jina AI multimodal embedding service
@seekdb/bm25 BM25 sparse embedding function for efficient text-based keyword search
@seekdb/prisma-adapter Prisma driver adapter for seekdb Embedded (use Prisma with seekdb Embedded)

Installation

npm install seekdb @seekdb/default-embed
  • Embedded mode: No seekdb server deployment required; use locally after install.
  • Server mode: Deploy seekdb or OceanBase first; see official deployment docs.

Running Modes

The SDK supports two modes; the constructor arguments to SeekdbClient determine which is used. For database management (create/list/get/delete database), use SeekdbAdminClient() which returns a SeekdbClient instance.

Mode Parameter Description
Embedded path (database directory path) Runs locally with no separate seekdb server; data is stored under the given path (e.g. ./seekdb.db). Requires native addon @seekdb/js-bindings.
Server host (and port, user, password, etc.) Connects to a remote seekdb or OceanBase instance.

OceanBase and seekdb: OceanBase is compatible with seekdb and can be understood as its distributed, multi-tenant, etc. version. seekdb-js therefore supports OceanBase server mode with the same API: use the same SeekdbClient / SeekdbAdminClient and connection parameters; when connecting to OceanBase, additionally pass tenant (e.g. "sys" or your tenant name). See OceanBase mode below.

  • SeekdbClient: Pass path for embedded mode, or host (and port, user, password, etc.) for server mode.
  • SeekdbAdminClient(): For admin operations only; pass path for embedded or host for server. In embedded mode you do not specify a database name.

Quick Start

Embedded mode (local file, no server):

import { SeekdbClient } from "seekdb";

// 1. Connect
const client = new SeekdbClient({
  path: "./seekdb.db",
  database: "test",
});

// 2. Create collection
const collection = await client.createCollection({ name: "my_collection" });

// 3. Add data (auto-vectorized using @seekdb/default-embed)
await collection.add({
  ids: ["1", "2"],
  documents: ["Hello world", "seekdb is fast"],
});

// 4. Search
const results = await collection.query({ queryTexts: "Hello", nResults: 5 });
console.log("query results", results);

Server mode (connect to a deployed seekdb):

import { SeekdbClient } from "seekdb";

// 1. Connect
const client = new SeekdbClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  database: "test",
});

// 2. Create collection
const collection = await client.createCollection({ name: "my_collection" });

// 3. Add data (auto-vectorized using @seekdb/default-embed)
await collection.add({
  ids: ["1", "2"],
  documents: ["Hello world", "seekdb is fast"],
});

// 4. Search
const results = await collection.query({ queryTexts: "Hello", nResults: 5 });
console.log("query results", results);

Usage Guide

This section covers basic usage. See the official SDK documentation for full details.

Client Connection

Embedded mode (local database file):

import { SeekdbClient } from "seekdb";

const client = new SeekdbClient({
  path: "./seekdb.db", // database file path
  database: "test",
});

Server mode:

import { SeekdbClient } from "seekdb";

const client = new SeekdbClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  database: "test",
});

OceanBase mode (server mode with tenant): OceanBase is compatible with seekdb (distributed, multi-tenant, etc.). Use the same server-mode connection; when the backend is OceanBase, pass tenant (e.g. "sys" or your tenant name):

const client = new SeekdbClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  database: "test",
  tenant: "sys", // or your OceanBase tenant
});

Create Collection

You can create a collection without any configuration; the default embedding function will be used for vectorization. Ensure @seekdb/default-embed is installed first.

const collection = await client.createCollection({
  name: "my_collection",
});

Schema API

A schema defines which indexes are available on a collection:

  • FulltextIndexConfig - For keyword-based full-text search
  • VectorIndexConfig - For dense vector similarity search
  • SparseVectorIndexConfig - For sparse vector similarity search

Examples for creating the three index types:

FulltextIndexConfig

const schema = new Schema({
  fulltextIndex: new FulltextIndexConfig("ik", { ik_mode: "smart" }),
});
const collection = await client.createCollection({
  name: "ft_collection",
  schema,
});

VectorIndexConfig

If you do not set embeddingFunction, the default embedding function is used. Ensure @seekdb/default-embed is installed.

const schema = new Schema({
  vectorIndex: new VectorIndexConfig({
    hnsw: { dimension: 384, distance: "cosine" },
  }),
});
const collection = await client.createCollection({
  name: "vec_collection",
  schema,
});

To use a custom embedding function, install one of the provided packages or implement your own. See the official SDK documentation for details.

Take @seekdb/qwen as an example:

npm install @seekdb/qwen
import { QwenEmbeddingFunction } from "@seekdb/qwen";

const qwenEF = new QwenEmbeddingFunction();

const schema = new Schema({
  vectorIndex: new VectorIndexConfig({
    hnsw: { dimension: 384, distance: "cosine" },
    embeddingFunction: qwenEF,
  }),
});

const collection = await client.createCollection({
  name: "my_collection",
  schema,
});

If you don't need an embedding function, set embeddingFunction to null.

const schema = new Schema({
  vectorIndex: new VectorIndexConfig({
    hnsw: { dimension: 384, distance: "cosine" },
    embeddingFunction: null,
  }),
});
const collection = await client.createCollection({
  name: "vec_collection",
  schema,
});

SparseVectorIndexConfig

You can use the provided @seekdb/bm25 as the sparse embedding function or implement your own.

npm install @seekdb/bm25
import { Bm25EmbeddingFunction } from "@seekdb/bm25";
import { K } from "seekdb";

const schema = new Schema({
  sparseVectorIndex: new SparseVectorIndexConfig({
    sourceKey: K.DOCUMENT,
    embeddingFunction: new Bm25EmbeddingFunction(),
  }),
});

const collection = await client.createCollection({
  name: "sparse_collection",
  schema,
});

Add Data

The embedding function defined in Schema is used automatically for vectorization. No need to set it again.

await collection.add({
  ids: ["1", "2"],
  documents: ["Hello world", "seekdb is fast"],
  metadatas: [{ category: "test" }, { category: "db" }],
});

You can also pass a vector or an array of vectors directly.

const qwenEF = new QwenEmbeddingFunction();
await collection.add({
  ids: ["1", "2"],
  documents: ["Hello world", "seekdb is fast"],
  metadatas: [{ category: "test" }, { category: "db" }],
  embeddings: [
    [0.1, 0.2, 0.3],
    [0.2, 0.3, 0.4],
  ],
});

Query Data

Get Data

The get() method is used to retrieve documents from a collection without performing vector similarity search.

const results = await collection.get({
  ids: ["1", "2"],
});

Semantic Search

The query() method is used to execute vector similarity search to find documents most similar to the query vector.

The embedding function defined in Schema is used automatically for vectorization. No need to set it again.

const results = await collection.query({
  queryTexts: "Hello",
  nResults: 5,
});

You can also pass a vector or an array of vectors directly.

const results = await collection.query({
  queryEmbeddings: [
    [0.1, 0.2, 0.3],
    [0.2, 0.3, 0.4],
  ],
  nResults: 5,
});

Specify queryKey to run sparse vector search. If a sparse embedding function is defined, queryTexts will be vectorized automatically.

import
import { K } from "seekdb";

const results = await collection.query({
  queryTexts: "artificial intelligence",
  // Use sparse vector index, default by K.EMBEDDING
  queryKey: K.EMBEDDING,
  nResults: 3,
});

You can also supply your own sparse vectors for search.

const queryVector: SparseVector = { 1234: 0.5, 5678: 0.8 };

const results = await collection.query({
  queryEmbeddings: queryVector,
  queryKey: K.SPARSE_EMBEDDING,
  nResults: 5,
});

Hybrid Search (Keyword + Semantic)

The hybridSearch() combines full-text search and vector similarity search with ranking.

const hybridResults = await collection.hybridSearch({
  query: { whereDocument: { $contains: "seekdb" } },
  knn: { queryTexts: ["fast database"] },
  nResults: 5,
});

You can also pass a vector or an array of vectors directly.

const hybridResults = await collection.hybridSearch({
  query: { whereDocument: { $contains: "seekdb" } },
  knn: {
    queryEmbeddings: [
      [0.1, 0.2, 0.3],
      [0.2, 0.3, 0.4],
    ],
  },
  nResults: 5,
});

Embedding Functions

The SDK supports multiple Embedding Functions for generating vectors locally or in the cloud.

For complete usage, see the official documentation.

Default Embedding

Uses a local model (Xenova/all-MiniLM-L6-v2) by default. No API Key required. Suitable for quick development and testing.

No configuration is needed to use the default model.

First install the built-in model:

npm install @seekdb/default-embed

Then use it as-is; it will auto-vectorize:

const collection = await client.createCollection({
  name: "local_embed_collection",
});

Qwen Embedding

Uses DashScope's cloud Embedding service (Qwen/Tongyi Qianwen). Suitable for production environments.

npm install @seekdb/qwen
import { QwenEmbeddingFunction } from "@seekdb/qwen";

const qwenEmbed = new QwenEmbeddingFunction({
  // Your DashScope environment variable name, defaults to 'DASHSCOPE_API_KEY'
  apiKeyEnvVar: 'DASHSCOPE_API_KEY'
  // Optional, defaults to 'text-embedding-v4'
  modelName: "text-embedding-v4",
});

const collection = await client.createCollection({
  name: "qwen_embed_collection",
  embeddingFunction: qwenEmbed,
});

OpenAI Embedding

Uses OpenAI's embedding API. Suitable for production environments with OpenAI integration.

npm install @seekdb/openai
import { OpenAIEmbeddingFunction } from "@seekdb/openai";

const openaiEmbed = new OpenAIEmbeddingFunction({
  // Your openai environment variable name, defaults to 'OPENAI_API_KEY'
  apiKeyEnvVar: 'OPENAI_API_KEY'
  // Optional, defaults to 'text-embedding-3-small'
  modelName: "text-embedding-3-small",
});

const collection = await client.createCollection({
  name: "openai_embed_collection",
  embeddingFunction: openaiEmbed,
});

Jina Embedding

Uses Jina AI's embedding API. Supports multimodal embeddings.

npm install @seekdb/jina
import { JinaEmbeddingFunction } from "@seekdb/jina";

const jinaEmbed = new JinaEmbeddingFunction({
  // Your jina environment variable name, defaults to 'JINA_API_KEY'
  apiKeyEnvVar: 'JINA_API_KEY'
  // Optional, defaults to jina-clip-v2
  modelName: "jina-clip-v2",
});

const collection = await client.createCollection({
  name: "jina_embed_collection",
  embeddingFunction: jinaEmbed,
});

Custom Embedding Function

You can also use your own custom embedding function.

First, implement the EmbeddingFunction interface:

import type { EmbeddingFunction } from "seekdb";
import { registerEmbeddingFunction } from "seekdb";

interface MyCustomEmbeddingConfig {
  apiKeyEnv: string;
}
class MyCustomEmbeddingFunction implements EmbeddingFunction {
  // The name of the `embeddingFunction`, must be unique.
  readonly name = "my_custom_embedding";
  private apiKeyEnv: string;
  dimension: number;
  constructor(config: MyCustomEmbeddingConfig) {
    this.apiKeyEnv = config.apiKeyEnv;
    this.dimension = 384;
  }
  // Implement your vector generation code here
  async generate(texts: string[]): Promise<number[][]> {
    const embeddings: number[][] = [];
    return embeddings;
  }
  // The configuration of the current `embeddingFunction` instance, used to restore this instance
  getConfig(): MyCustomEmbeddingConfig {
    return {
      apiKeyEnv: this.apiKeyEnv,
    };
  }
  // Create a new instance of the current `embeddingFunction` based on the provided configuration
  static buildFromConfig(config: MyCustomEmbeddingConfig): EmbeddingFunction {
    return new MyCustomEmbeddingFunction(config);
  }
}

// Register the constructor
registerEmbeddingFunction("my_custom_embedding", MyCustomEmbeddingFunction);

Then use it:

const customEmbed = new MyCustomEmbeddingFunction({
  apiKeyEnv: "MY_CUSTOM_API_KEY_ENV",
});
const collection = await client.createCollection({
  name: "custom_embed_collection",
  configuration: {
    dimension: 384,
    distance: "cosine",
  },
  embeddingFunction: customEmbed,
});

BM25 Sparse Embedding

BM25 (Best Matching 25) is a sparse embedding function that uses term frequency and document length normalization for efficient text search. It's particularly useful for keyword-based search scenarios.

npm install @seekdb/bm25
import { SeekdbClient, Schema, SparseVectorIndexConfig, K } from "seekdb";
import { Bm25EmbeddingFunction } from "@seekdb/bm25";

const bm25 = new Bm25EmbeddingFunction({
  k: 1.2, // Term frequency saturation
  b: 0.75, // Document length normalization
  avgDocLength: 256, // Average document length
  tokenMaxLength: 40, // Maximum token length
  stopwords: ["a", "an", "the"], // Custom stopwords
});

const collection = await client.createCollection({
  name: "bm25_collection",
  schema: new Schema({
    sparseVectorIndex: new SparseVectorIndexConfig({
      sourceKey: K.DOCUMENT,
      embeddingFunction: bm25,
    }),
  }),
});

For more details, see BM25 Embedding Guide.

Custom Sparse Embedding Function

Implement your own sparse embedding function:

import {
  SparseEmbeddingFunction,
  SparseVector,
  registerSparseEmbeddingFunction,
  EmbeddingConfig,
} from "seekdb";

interface MySparseConfig {
  vocabSize: number;
}

class MySparseEmbeddingFunction implements SparseEmbeddingFunction {
  readonly name = "my_sparse";
  private vocabSize: number;

  constructor(config: MySparseConfig) {
    this.vocabSize = config.vocabSize;
  }

  // Implement your vector generation code here
  async generate(texts: string[]): Promise<SparseVector[]> {
    const embeddings: number[][] = [];
    return embeddings;
  }

  // Generate sparse vectors for queries (can be different)
  async generateForQueries(texts: string[]): Promise<SparseVector[]> {
    return this.generate(texts);
  }

  // Return configuration for persistence
  getConfig(): EmbeddingConfig {
    return { vocabSize: this.vocabSize };
  }

  // Static factory method
  static buildFromConfig(config: EmbeddingConfig): SparseEmbeddingFunction {
    return new MySparseEmbeddingFunction(config as MySparseConfig);
  }
}

// Register the function
registerSparseEmbeddingFunction("my_sparse", MySparseEmbeddingFunction);

// Use it
const collection = await client.createCollection({
  name: "my_sparse_collection",
  schema: {
    sparseVectorIndex: new SparseVectorIndexConfig({
      sourceKey: K.DOCUMENT,
      embeddingFunction: new MySparseEmbeddingFunction({ vocabSize: 100000 }),
    }),
  },
});

Vector search + relational tables

You can combine vector (or hybrid) search with relational tables: run collection.query() or collection.hybridSearch() to get ids, then query your relational table by those ids. For type-safe relational queries, prefer an ORM (see Integration with ORM); here is a raw-SQL recipe.

Recipe

  1. Get ids (and optional metadata) from vector/hybrid search.
  2. Query the relational table with client.execute(). For WHERE id IN (...) with MySQL/mysql2, use parameterized placeholders to avoid SQL injection: one ? per id and pass the ids array as params.

Example (TypeScript) — hybrid search, then fetch users by id and merge:

const client = new SeekdbClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  database: "test",
});
const collection = await client.getCollection({ name: "my_collection" });

// 1. Hybrid search → get ids
const hybridResult = await collection.hybridSearch({
  query: { whereDocument: { $contains: "seekdb" } },
  knn: { queryTexts: ["fast database"] },
  nResults: 5,
});
const ids = hybridResult.ids?.flat().filter(Boolean) ?? []; // e.g. string[]

// 2. Query relational table by ids (parameterized)
type UserRow = { id: string; name: string };
let users: UserRow[] = [];
if (ids.length > 0) {
  const placeholders = ids.map(() => "?").join(",");
  const rows = await client.execute(
    `SELECT id, name FROM users WHERE id IN (${placeholders})`,
    ids
  );
  users = (rows ?? []) as UserRow[];
}

// 3. Merge: e.g. map ids to (vector result + user row)
const merged = ids.map((id) => ({
  id,
  user: users.find((u) => u.id === id) ?? null,
}));

Transactions: There is no explicit transaction API on the client. For transactions that span both vector and relational operations, use a separate mysql2 connection (same DB config) and run beginTransaction() / commit() / rollback() on that connection (see Integration with ORM).

Integration with ORM

Use seekdb-js for vector/full-text/hybrid search and an ORM (Drizzle or Prisma) for type-safe relational tables. seekdb is MySQL-compatible in both modes. Server mode: same database, two connections (SeekdbClient + ORM). Embedded mode: Drizzle uses drizzle-orm/mysql-proxy with a callback around client.execute(); Prisma uses @seekdb/prisma-adapter.

Drizzle

Server mode: Create a mysql2 connection with the same DB config as SeekdbClient and pass it to drizzle(conn).

import { createConnection } from "mysql2/promise";
import { SeekdbClient } from "seekdb";
import { drizzle } from "drizzle-orm/mysql2";
import { inArray } from "drizzle-orm";
import { users } from "./schema"; // mysqlTable, relational only; vector tables via Collection

const dbConfig = {
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  database: "test",
};
const client = new SeekdbClient(dbConfig);
const conn = await createConnection(dbConfig);
const db = drizzle(conn);

const collection = await client.getCollection({ name: "docs" });
const result = await collection.hybridSearch({
  query: { whereDocument: { $contains: "seekdb" } },
  knn: { queryTexts: ["database"] },
  nResults: 5,
});
const ids = result.ids?.flat().filter(Boolean) ?? [];
const usersList = await db.select().from(users).where(inArray(users.id, ids));
// when done: await conn.end(); await client.close();

Embedded mode: Use drizzle-orm/mysql-proxy with a callback that runs SQL via client.execute() and returns { rows }. See seekdb-drizzle for a runnable sample.

Prisma

Server mode: Use SeekdbClient and PrismaClient with DATABASE_URL pointing to the same database.

import { SeekdbClient } from "seekdb";
import { PrismaClient } from "@prisma/client";

const client = new SeekdbClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  database: "test",
});
const prisma = new PrismaClient(); // DATABASE_URL="mysql://root:@127.0.0.1:2881/test"

const collection = await client.getCollection({ name: "docs" });
const result = await collection.hybridSearch({
  query: { whereDocument: { $contains: "seekdb" } },
  knn: { queryTexts: ["database"] },
  nResults: 5,
});
const ids = result.ids?.flat().filter(Boolean) ?? [];
const users = await prisma.user.findMany({ where: { id: { in: ids } } });
// merge as needed; when done: await client.close(); await prisma.$disconnect();

Set DATABASE_URL to the same host/port/user/password/database (e.g. mysql://user:password@host:port/database). OceanBase: see Prisma/MySQL tenant docs.

Embedded mode: Use @seekdb/prisma-adapter with PrismaClient({ adapter }) so Prisma runs SQL via client.execute(). See seekdb-prisma; run pnpm run start:embedded in that example.

Database Management

Use SeekdbAdminClient for database management. It supports both embedded and server modes.

Embedded mode (local database file):

import { SeekdbAdminClient } from "seekdb";

const admin = new SeekdbAdminClient({ path: "./seekdb.db" });
await admin.createDatabase("new_database");
const databases = await admin.listDatabases();
const db = await admin.getDatabase("new_database");
await admin.deleteDatabase("new_database");
await admin.close();

Server mode:

import { SeekdbAdminClient } from "seekdb";

const admin = new SeekdbAdminClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
});

await admin.createDatabase("new_database");
const databases = await admin.listDatabases();
const db = await admin.getDatabase("new_database");
await admin.deleteDatabase("new_database");
await admin.close();

OceanBase mode (server mode with tenant): add tenant (e.g. "sys" or your tenant name) to the config:

const admin = new SeekdbAdminClient({
  host: "127.0.0.1",
  port: 2881,
  user: "root",
  password: "",
  tenant: "sys", // or your OceanBase tenant
});

await admin.createDatabase("new_database");
const databases = await admin.listDatabases();
const db = await admin.getDatabase("new_database");
await admin.deleteDatabase("new_database");
await admin.close();

AdminClient() is available as a factory and returns a SeekdbClient configured for admin operations.

Examples

Check out the examples directory for complete usage examples:

Basic examples (root examples/):

ORM integration (vector + relational tables):

To run the examples, see Run Examples in the development guide.

Development

See DEVELOP.md for details on development, testing, and contributing.

License

This package is licensed under Apache 2.0.

About

The JavaScript/TypeScript SDK for OceanBase or OceanBase seekdb.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors