Skip to content

Latest commit

 

History

History
631 lines (473 loc) · 13.9 KB

File metadata and controls

631 lines (473 loc) · 13.9 KB

Index Management Guide

Declarative index management via decorators for optimal query performance


Quick Start

Problem: Slow Chat Queries (30,000+ messages)

// BEFORE: Slow query (50-200ms with OFFSET scanning)
SELECT * FROM chat_messages
WHERE room_id = ?
ORDER BY timestamp DESC
LIMIT 50 OFFSET 1000;

Solution: Add Composite Index

import { CompositeIndex } from '../decorators/FieldDecorators';

@CompositeIndex({
  name: 'idx_chat_messages_room_timestamp',
  fields: ['roomId', 'timestamp'],
  direction: 'DESC'
})
export class ChatMessageEntity extends BaseEntity {
  @TextField({ index: true })
  roomId: UUID;

  @DateField({ index: true })
  timestamp: Date;
}

Result: 10-100x faster queries (5-20ms with index scan)


Index Types

1. Single-Column Index

Use when: Filtering or sorting on ONE column

export class UserEntity extends BaseEntity {
  @TextField({ index: true })  // ✅ Creates index automatically
  displayName: string;

  @DateField({ index: true })  // ✅ Index on date for sorting
  lastActiveAt: Date;
}

Generated SQL:

CREATE INDEX IF NOT EXISTS idx_users_display_name ON users(display_name);
CREATE INDEX IF NOT EXISTS idx_users_last_active_at ON users(last_active_at);

2. Composite (Multi-Column) Index

Use when: Filtering/sorting on MULTIPLE columns together

@CompositeIndex({
  name: 'idx_chat_messages_room_timestamp',
  fields: ['roomId', 'timestamp'],
  direction: 'DESC'
})
export class ChatMessageEntity extends BaseEntity {
  @TextField({ index: true })
  roomId: UUID;

  @DateField({ index: true })
  timestamp: Date;
}

Generated SQL:

CREATE INDEX IF NOT EXISTS idx_chat_messages_room_timestamp
ON chat_messages(room_id, timestamp DESC);

Optimizes queries like:

-- ✅ FAST: Uses composite index
SELECT * FROM chat_messages
WHERE room_id = ?
ORDER BY timestamp DESC
LIMIT 50;

-- ✅ FAST: Uses composite index (prefix match)
SELECT * FROM chat_messages
WHERE room_id = ?;

3. Unique Composite Index

Use when: Enforcing uniqueness across multiple columns

@CompositeIndex({
  name: 'idx_user_room_unique',
  fields: ['userId', 'roomId'],
  unique: true  // ✅ UNIQUE constraint
})
export class RoomMemberEntity extends BaseEntity {
  @TextField()
  userId: UUID;

  @TextField()
  roomId: UUID;
}

Generated SQL:

CREATE UNIQUE INDEX IF NOT EXISTS idx_user_room_unique
ON room_members(user_id, room_id);

Prevents:

  • Duplicate room memberships
  • Race conditions in concurrent inserts

How Indexes Are Created

Automatic Schema Management

Indexes are created automatically when:

  1. Table creation (first time entity is used)
  2. Schema validation (on startup)
  3. Manual rebuild (see below)

No manual SQL required! Just add the decorator and deploy.

Deployment Process

# 1. Add @CompositeIndex decorator to entity
# 2. Compile TypeScript
npm run build:ts

# 3. Deploy (indexes created automatically on startup)
npm start

# Indexes are created via:
#   SqliteSchemaManager.ensureSchema()
#   → generateCreateTableSql()
#   → generateCreateIndexSql()  # ✅ Includes composite indexes

Manual Index Rebuild (If Needed)

If you need to rebuild indexes immediately:

// Option 1: Via data command (recommended)
await Commands.execute('data/rebuild-indexes', {
  collection: 'chat_messages'
});

// Option 2: Direct SQL (if command doesn't exist yet)
await Commands.execute('data/execute', {
  sql: 'DROP INDEX IF EXISTS idx_chat_messages_room_timestamp',
  collection: 'chat_messages'
});

await Commands.execute('data/execute', {
  sql: 'CREATE INDEX idx_chat_messages_room_timestamp ON chat_messages(room_id, timestamp DESC)',
  collection: 'chat_messages'
});

Performance Impact

Before vs After (30,000 Messages)

Operation Without Index With Composite Index Speedup
Load recent 50 messages 50-200ms 5-20ms 10-40x
Pagination (OFFSET 1000) 200-500ms 5-20ms 40-100x
Filter by room 30-100ms 2-10ms 15-30x

Query Plan Verification

# Check if index is being used
sqlite3 .continuum/jtag/data/database.sqlite \
  "EXPLAIN QUERY PLAN SELECT * FROM chat_messages WHERE room_id='general' ORDER BY timestamp DESC LIMIT 50"

# Expected output (WITH index):
# SEARCH chat_messages USING INDEX idx_chat_messages_room_timestamp (room_id=?)

# Bad output (WITHOUT index):
# SCAN chat_messages
# USE TEMP B-TREE FOR ORDER BY

Index Design Patterns

Pattern 1: Equality + Sort (Most Common)

Query:

SELECT * FROM table WHERE col1 = ? ORDER BY col2 DESC

Index:

@CompositeIndex({
  name: 'idx_table_col1_col2',
  fields: ['col1', 'col2'],
  direction: 'DESC'
})

Examples:

  • Chat messages: WHERE room_id = ? ORDER BY timestamp DESC
  • User posts: WHERE user_id = ? ORDER BY created_at DESC
  • Task list: WHERE assignee = ? ORDER BY priority DESC

Pattern 2: Multiple Equality Filters

Query:

SELECT * FROM table WHERE col1 = ? AND col2 = ?

Index:

@CompositeIndex({
  name: 'idx_table_col1_col2',
  fields: ['col1', 'col2']
})

Examples:

  • Room members: WHERE room_id = ? AND user_id = ?
  • Training examples: WHERE dataset_id = ? AND type = ?

Pattern 3: Range Queries

Query:

SELECT * FROM table WHERE col1 = ? AND col2 > ?

Index:

@CompositeIndex({
  name: 'idx_table_col1_col2',
  fields: ['col1', 'col2'],
  direction: 'ASC'
})

Examples:

  • Recent activity: WHERE user_id = ? AND timestamp > ?
  • Pagination: WHERE room_id = ? AND id > ? (cursor-based)

Pattern 4: Covering Index (Advanced)

Query:

SELECT col1, col2, col3 FROM table WHERE col1 = ?

Index:

@CompositeIndex({
  name: 'idx_table_col1_col2_col3',
  fields: ['col1', 'col2', 'col3']  // ✅ Include SELECT columns
})

Benefit: SQLite can answer query from index alone (no table lookup)


Common Mistakes to Avoid

❌ Wrong Column Order

// BAD: Sort column first
@CompositeIndex({
  fields: ['timestamp', 'roomId']  // ❌ Wrong order
})

// Query: WHERE room_id = ? ORDER BY timestamp
// Result: Index not used (can't filter on roomId)

Rule: Filter columns first, sort columns last

❌ Too Many Indexes

// BAD: Index every field
export class UserEntity {
  @TextField({ index: true })
  displayName: string;

  @TextField({ index: true })
  email: string;

  @TextField({ index: true })
  status: string;

  @DateField({ index: true })
  createdAt: Date;

  @DateField({ index: true })
  lastActiveAt: Date;
}

Problems:

  • Slower writes (maintain 5 indexes)
  • Wasted disk space
  • Marginal query improvement

Rule: Only index columns actually used in WHERE/ORDER BY

❌ Redundant Indexes

// BAD: Both single and composite on same column
@CompositeIndex({
  fields: ['roomId', 'timestamp']  // ✅ Composite index
})
export class ChatMessageEntity {
  @TextField({ index: true })  // ❌ Redundant! Composite index covers this
  roomId: UUID;
}

Rule: Composite index (A, B) also works for queries on just A

❌ Wrong Direction

// BAD: ASC index for DESC query
@CompositeIndex({
  fields: ['roomId', 'timestamp'],
  direction: 'ASC'  // ❌ Query uses DESC
})

// Query: ORDER BY timestamp DESC
// Result: Index scan backwards (slower)

Rule: Match direction to most common query pattern


Rust Adapter Support

Automatic Support (No Changes Needed)

Both TypeScript and Rust adapters use the same schema generation from decorators:

Entity Definition (TypeScript)
  ↓
Decorators (@CompositeIndex)
  ↓
SqlStorageAdapterBase.generateCreateIndexSql()
  ↓
SQL String Generated
  ↓
┌─────────────────────┬─────────────────────┐
│ TypeScript Adapter  │  Rust Adapter       │
│ (SqliteRawExecutor) │  (RustSqliteExecutor)│
│                     │                     │
│ CREATE INDEX ...    │  CREATE INDEX ...   │
└─────────────────────┴─────────────────────┘

Both adapters execute identical SQL → indexes work the same!

Verification

# 1. Check indexes exist
sqlite3 .continuum/jtag/data/database.sqlite \
  "SELECT name, sql FROM sqlite_master WHERE type='index' AND tbl_name='chat_messages'"

# 2. Should see:
# idx_chat_messages_room_timestamp | CREATE INDEX idx_chat_messages_room_timestamp ON chat_messages(room_id, timestamp DESC)

Migration Strategy

For Existing Tables

  1. Add decorator to entity

    @CompositeIndex({
      name: 'idx_chat_messages_room_timestamp',
      fields: ['roomId', 'timestamp'],
      direction: 'DESC'
    })
  2. Deploy with npm start

    • Indexes created automatically via ensureSchema()
  3. Verify index exists

    sqlite3 .continuum/jtag/data/database.sqlite \
      "PRAGMA index_list('chat_messages')"
  4. Measure performance improvement

    # Before
    time sqlite3 .continuum/jtag/data/database.sqlite \
      "SELECT * FROM chat_messages WHERE room_id='general' ORDER BY timestamp DESC LIMIT 50"
    
    # After (should be 10-100x faster)

For New Tables

Indexes are created automatically on first use. No migration needed!


Troubleshooting

Index Not Being Used

Check 1: Does index exist?

PRAGMA index_list('chat_messages');

Check 2: Is query using it?

EXPLAIN QUERY PLAN SELECT * FROM chat_messages WHERE room_id=? ORDER BY timestamp DESC;

Check 3: Statistics up to date?

ANALYZE chat_messages;

Slow Query Despite Index

Possible causes:

  1. OFFSET pagination - Use cursor-based instead:

    // BAD
    WHERE room_id = ? ORDER BY timestamp DESC LIMIT 50 OFFSET 1000
    
    // GOOD
    WHERE room_id = ? AND timestamp < ? ORDER BY timestamp DESC LIMIT 50
  2. Low selectivity - Index not useful if many rows match:

    // May not use index if 90% of rows have status='active'
    WHERE status = 'active'
  3. Complex expressions - Index doesn't work with functions:

    // BAD: Index not used
    WHERE LOWER(display_name) = ?
    
    // GOOD: Store lowercase version
    @TextField({ index: true })
    displayNameLower: string;

Index Creation Failed

Check logs:

tail -f .continuum/sessions/*/logs/server.log | grep "CREATE INDEX"

Common errors:

  • Duplicate index name (change name in decorator)
  • Invalid column name (typo in fields array)
  • Syntax error (check generated SQL)

Best Practices

✅ DO

  1. Profile before indexing - Measure query time, identify bottlenecks

  2. Match query patterns - Index columns actually used in WHERE/ORDER BY

  3. Use composite indexes - Single index better than multiple single-column indexes

  4. Document intent - Comment why index exists:

    // Optimizes chat room pagination (30k+ messages)
    @CompositeIndex({
      name: 'idx_chat_messages_room_timestamp',
      fields: ['roomId', 'timestamp'],
      direction: 'DESC'
    })
  5. Verify with EXPLAIN - Check query plan uses index

❌ DON'T

  1. Index every column - Wastes space, slows writes
  2. Duplicate indexes - Composite (A,B) covers queries on A
  3. Ignore write performance - Each index slows INSERT/UPDATE/DELETE
  4. Use OFFSET - Cursor-based pagination much faster
  5. Skip testing - Always verify index improves performance

Examples

Chat Pagination (Real-World)

Before:

// No composite index, 30k messages
// Query time: 50-200ms

SELECT * FROM chat_messages
WHERE room_id = 'general'
ORDER BY timestamp DESC
LIMIT 50 OFFSET 1000;

After:

@CompositeIndex({
  name: 'idx_chat_messages_room_timestamp',
  fields: ['roomId', 'timestamp'],
  direction: 'DESC'
})
export class ChatMessageEntity extends BaseEntity {
  // Fields...
}

// Query time: 5-20ms (10-40x faster!)

User Activity Tracking

@CompositeIndex({
  name: 'idx_activities_user_timestamp',
  fields: ['userId', 'timestamp'],
  direction: 'DESC'
})
export class ActivityEntity extends BaseEntity {
  @TextField()
  userId: UUID;

  @DateField()
  timestamp: Date;

  @TextField()
  activityType: string;
}

// Optimizes: "Show recent activities for user X"

Training Dataset Queries

@CompositeIndex({
  name: 'idx_examples_dataset_type',
  fields: ['datasetId', 'exampleType']
})
export class TrainingExampleEntity extends BaseEntity {
  @TextField()
  datasetId: UUID;

  @EnumField()
  exampleType: 'positive' | 'negative';
}

// Optimizes: "Get all positive examples from dataset X"

Summary

What You Get

Declarative index management - Define in entity, SQL generated automatically ✅ TypeScript + Rust support - Works with both adapters (same SQL) ✅ Composite indexes - Multi-column optimization for complex queries ✅ 10-100x speedup - Proven performance improvements (30k messages) ✅ Zero manual SQL - Decorators handle everything

Next Steps

  1. Identify slow queries - Profile with EXPLAIN QUERY PLAN
  2. Add @CompositeIndex - Target most common query patterns
  3. Deploy - npm start creates indexes automatically
  4. Verify - Measure performance improvement
  5. Iterate - Add more indexes as needed

Remember: Indexes are cheap to add but expensive to maintain. Only index what you actually query!


Last Updated: 2025-12-13 Related Docs:

  • RUST-DATA-WORKER-ARCHITECTURE.md - Performance characteristics
  • DECORATOR-DRIVEN-SCHEMA.md - Entity system overview