Skip to main content

Overview

The LocalDatabase class provides a SQL.js-based client for managing the complete semantic search index. It handles repository metadata, text chunking, embedding storage, and vector similarity search.

Initialization

getDb()

Get or create the singleton database instance.
import { getDb } from "./db/client";

const db = await getDb();
Returns
Promise<LocalDatabase>
Singleton LocalDatabase instance

Constructor

Create a new LocalDatabase instance (typically not called directly).
import { LocalDatabase } from "./db/client";
import initSqlJs from "sql.js";

const SQL = await initSqlJs({ locateFile: () => wasmUrl });
const rawDb = new SQL.Database();
const db = new LocalDatabase({
  sql: SQL,
  db: rawDb,
  storageMode: "opfs",
  embeddingCheckpointPolicy: {
    everyEmbeddings: 256,
    everyMs: 3000
  }
});
args.sql
SqlJsStatic
required
SQL.js instance
args.db
Database
required
SQL.js Database instance
args.storageMode
StorageMode
required
Storage backend: "opfs" | "local-storage" | "memory"
args.embeddingCheckpointPolicy
EmbeddingCheckpointPolicy
Optional checkpoint configuration

Repository Management

upsertRepos()

Insert or update repository records.
await db.upsertRepos([{
  id: 123456,
  fullName: "owner/repo",
  name: "repo",
  description: "A semantic search library",
  topics: ["vector-search", "embeddings"],
  language: "TypeScript",
  htmlUrl: "https://github.com/owner/repo",
  stars: 1500,
  forks: 42,
  updatedAt: "2026-03-01T00:00:00Z",
  readmeUrl: "https://github.com/owner/repo/blob/main/README.md",
  readmeText: "# Project\n\nDescription...",
  readmeEtag: "abc123",
  readmeLastModified: "2026-02-28T12:00:00Z",
  checksum: "sha256-...",
  lastSyncedAt: Date.now()
}]);
repos
RepoRecord[]
required
Array of repository records to upsert
Returns
Promise<void>
Resolves when all repos are persisted

listRepos()

Retrieve all stored repositories.
const repos = db.listRepos();
console.log(`Found ${repos.length} repositories`);
Returns
RepoRecord[]
Array of all repository records, ordered by ID ascending

listRepoSyncState()

Get lightweight sync state for all repositories.
const states = db.listRepoSyncState();
for (const state of states) {
  console.log(`${state.fullName}: ${state.checksum}`);
}
Returns
RepoSyncState[]
Array of sync state records (id, fullName, checksum, etags, updatedAt)

getRepoCount()

Count total repositories in the database.
const count = db.getRepoCount();
console.log(`Total repos: ${count}`);
Returns
number
Total number of repositories

deleteReposByIds()

Delete repositories and cascade to chunks/embeddings.
await db.deleteReposByIds([123, 456, 789]);
repoIds
number[]
required
Array of repository IDs to delete
Returns
Promise<void>
Resolves when deletion is complete

Chunk Management

upsertChunks()

Insert or update text chunks.
await db.upsertChunks([{
  id: "123:0",
  repoId: 123,
  chunkId: "123:0",
  text: "Repository: owner/repo\nDescription: semantic search...",
  source: "metadata+readme",
  createdAt: Date.now()
}]);
chunks
ChunkRecord[]
required
Array of chunk records to upsert
Returns
Promise<void>
Resolves when chunks are persisted and vector cache is invalidated

getChunkCount()

Count total chunks in the database.
const count = db.getChunkCount();
console.log(`Total chunks: ${count}`);
Returns
number
Total number of chunks

deleteChunksByRepoIds()

Delete all chunks for specified repositories.
await db.deleteChunksByRepoIds([123, 456]);
repoIds
number[]
required
Array of repository IDs
Returns
Promise<void>
Resolves when deletion is complete and caches are cleared

Embedding Management

upsertEmbeddings()

Insert or update embedding vectors.
import { float32ToBlob } from "./embeddings/vector";

const vector = new Float32Array([0.1, 0.2, 0.3, ...]);
await db.upsertEmbeddings([{
  id: "emb-123:0",
  chunkId: "123:0",
  model: "Xenova/all-MiniLM-L6-v2",
  dimension: 384,
  vectorBlob: float32ToBlob(vector),
  createdAt: Date.now()
}]);
embeddings
EmbeddingRecord[]
required
Array of embedding records to upsert
Returns
Promise<void>
Resolves when embeddings are persisted. May trigger automatic checkpoint.

getEmbeddingCount()

Count total embeddings in the database.
const count = db.getEmbeddingCount();
const pending = db.getPendingEmbeddingChunkCount();
console.log(`${count} embedded, ${pending} pending`);
Returns
number
Total number of embeddings

getPendingEmbeddingChunkCount()

Count chunks without embeddings.
const pending = db.getPendingEmbeddingChunkCount();
Returns
number
Number of chunks that need embedding

getChunksToEmbed()

Get next batch of chunks to embed.
const chunks = db.getChunksToEmbed(50);
for (const chunk of chunks) {
  const vector = await embedder.embed(chunk.text);
  // ... store embedding
}
limit
number
required
Maximum number of chunks to return
Returns
ChunkRecord[]
Array of chunks without embeddings, ordered by created_at ascending

listPendingChunksForEmbedding()

Get pending chunks with optional filtering.
const chunks = db.listPendingChunksForEmbedding({
  limit: 100,
  repoIds: [123, 456]
});
args.limit
number
Maximum number of chunks to return
args.repoIds
number[]
Filter to specific repository IDs
Returns
ChunkRecord[]
Filtered chunks without embeddings

clearEmbeddings()

Delete all embeddings and reset checkpoint state.
await db.clearEmbeddings();
Returns
Promise<void>
Resolves when all embeddings are deleted and caches cleared

Checkpoint Management

getEmbeddingCheckpointStatus()

Get current checkpoint state.
const status = db.getEmbeddingCheckpointStatus();
console.log(`${status.pendingEmbeddings} embeddings pending checkpoint`);
Returns
object
Checkpoint status object
lastCheckpointAt
number | null
Timestamp of last checkpoint, or null if never checkpointed
pendingEmbeddings
number
Number of embeddings written since last checkpoint
everyEmbeddings
number
Checkpoint threshold (number of embeddings)
everyMs
number
Checkpoint threshold (milliseconds)

flushPendingEmbeddingCheckpoint()

Force an immediate checkpoint.
const flushed = await db.flushPendingEmbeddingCheckpoint();
if (flushed) {
  console.log("Database checkpointed");
}
Returns
Promise<boolean>
true if checkpoint was performed, false if no pending embeddings

Metadata

upsertIndexMeta()

Store key-value metadata.
await db.upsertIndexMeta({
  key: "embedding_model",
  value: "Xenova/all-MiniLM-L6-v2",
  updatedAt: Date.now()
});
record.key
string
required
Metadata key
record.value
string
required
Metadata value
record.updatedAt
number
required
Timestamp (milliseconds since epoch)
Returns
Promise<void>
Resolves when metadata is persisted

getIndexMetaValue()

Retrieve metadata value by key.
const model = db.getIndexMetaValue("embedding_model");
if (model) {
  console.log(`Using model: ${model}`);
}
key
string
required
Metadata key to retrieve
Returns
string | null
Metadata value, or null if key not found

Storage

storageMode (property)

Get current storage backend.
const mode = db.storageMode;
console.log(`Storage: ${mode}`); // "opfs", "local-storage", or "memory"
Returns
StorageMode
Current storage mode: "opfs" | "local-storage" | "memory"

Types

RepoRecord

type RepoRecord = {
  id: number;
  fullName: string;
  name: string;
  description: string | null;
  topics: string[];
  language: string | null;
  htmlUrl: string;
  stars: number;
  forks: number;
  updatedAt: string;
  readmeUrl: string | null;
  readmeText: string | null;
  readmeEtag?: string | null;
  readmeLastModified?: string | null;
  checksum: string | null;
  lastSyncedAt: number;
};

ChunkRecord

type ChunkRecord = {
  id: string;
  repoId: number;
  chunkId: string;
  text: string;
  source: string;
  createdAt: number;
};

EmbeddingRecord

type EmbeddingRecord = {
  id: string;
  chunkId: string;
  model: string;
  dimension: number;
  vectorBlob: Uint8Array;
  createdAt: number;
};

StorageMode

type StorageMode = "opfs" | "local-storage" | "memory";

EmbeddingCheckpointPolicy

type EmbeddingCheckpointPolicy = {
  everyEmbeddings: number;  // Checkpoint after N embeddings
  everyMs: number;           // Checkpoint after N milliseconds
};