Overview
The LocalDatabase class provides a SQL.js-based client for managing the complete semantic search index. It handles repository metadata, text chunking, embedding storage, and vector similarity search.
Initialization
getDb()
Get or create the singleton database instance.
import { getDb } from "./db/client";
const db = await getDb();
Singleton LocalDatabase instance
Constructor
Create a new LocalDatabase instance (typically not called directly).
import { LocalDatabase } from "./db/client";
import initSqlJs from "sql.js";
const SQL = await initSqlJs({ locateFile: () => wasmUrl });
const rawDb = new SQL.Database();
const db = new LocalDatabase({
sql: SQL,
db: rawDb,
storageMode: "opfs",
embeddingCheckpointPolicy: {
everyEmbeddings: 256,
everyMs: 3000
}
});
Storage backend: "opfs" | "local-storage" | "memory"
args.embeddingCheckpointPolicy
EmbeddingCheckpointPolicy
Optional checkpoint configuration
Repository Management
upsertRepos()
Insert or update repository records.
await db.upsertRepos([{
id: 123456,
fullName: "owner/repo",
name: "repo",
description: "A semantic search library",
topics: ["vector-search", "embeddings"],
language: "TypeScript",
htmlUrl: "https://github.com/owner/repo",
stars: 1500,
forks: 42,
updatedAt: "2026-03-01T00:00:00Z",
readmeUrl: "https://github.com/owner/repo/blob/main/README.md",
readmeText: "# Project\n\nDescription...",
readmeEtag: "abc123",
readmeLastModified: "2026-02-28T12:00:00Z",
checksum: "sha256-...",
lastSyncedAt: Date.now()
}]);
Array of repository records to upsert
Resolves when all repos are persisted
listRepos()
Retrieve all stored repositories.
const repos = db.listRepos();
console.log(`Found ${repos.length} repositories`);
Array of all repository records, ordered by ID ascending
listRepoSyncState()
Get lightweight sync state for all repositories.
const states = db.listRepoSyncState();
for (const state of states) {
console.log(`${state.fullName}: ${state.checksum}`);
}
Array of sync state records (id, fullName, checksum, etags, updatedAt)
getRepoCount()
Count total repositories in the database.
const count = db.getRepoCount();
console.log(`Total repos: ${count}`);
Total number of repositories
deleteReposByIds()
Delete repositories and cascade to chunks/embeddings.
await db.deleteReposByIds([123, 456, 789]);
Array of repository IDs to delete
Resolves when deletion is complete
Chunk Management
upsertChunks()
Insert or update text chunks.
await db.upsertChunks([{
id: "123:0",
repoId: 123,
chunkId: "123:0",
text: "Repository: owner/repo\nDescription: semantic search...",
source: "metadata+readme",
createdAt: Date.now()
}]);
Array of chunk records to upsert
Resolves when chunks are persisted and vector cache is invalidated
getChunkCount()
Count total chunks in the database.
const count = db.getChunkCount();
console.log(`Total chunks: ${count}`);
deleteChunksByRepoIds()
Delete all chunks for specified repositories.
await db.deleteChunksByRepoIds([123, 456]);
Resolves when deletion is complete and caches are cleared
Embedding Management
upsertEmbeddings()
Insert or update embedding vectors.
import { float32ToBlob } from "./embeddings/vector";
const vector = new Float32Array([0.1, 0.2, 0.3, ...]);
await db.upsertEmbeddings([{
id: "emb-123:0",
chunkId: "123:0",
model: "Xenova/all-MiniLM-L6-v2",
dimension: 384,
vectorBlob: float32ToBlob(vector),
createdAt: Date.now()
}]);
embeddings
EmbeddingRecord[]
required
Array of embedding records to upsert
Resolves when embeddings are persisted. May trigger automatic checkpoint.
getEmbeddingCount()
Count total embeddings in the database.
const count = db.getEmbeddingCount();
const pending = db.getPendingEmbeddingChunkCount();
console.log(`${count} embedded, ${pending} pending`);
Total number of embeddings
getPendingEmbeddingChunkCount()
Count chunks without embeddings.
const pending = db.getPendingEmbeddingChunkCount();
Number of chunks that need embedding
getChunksToEmbed()
Get next batch of chunks to embed.
const chunks = db.getChunksToEmbed(50);
for (const chunk of chunks) {
const vector = await embedder.embed(chunk.text);
// ... store embedding
}
Maximum number of chunks to return
Array of chunks without embeddings, ordered by created_at ascending
listPendingChunksForEmbedding()
Get pending chunks with optional filtering.
const chunks = db.listPendingChunksForEmbedding({
limit: 100,
repoIds: [123, 456]
});
Maximum number of chunks to return
Filter to specific repository IDs
Filtered chunks without embeddings
clearEmbeddings()
Delete all embeddings and reset checkpoint state.
await db.clearEmbeddings();
Resolves when all embeddings are deleted and caches cleared
Checkpoint Management
getEmbeddingCheckpointStatus()
Get current checkpoint state.
const status = db.getEmbeddingCheckpointStatus();
console.log(`${status.pendingEmbeddings} embeddings pending checkpoint`);
Timestamp of last checkpoint, or null if never checkpointed
Number of embeddings written since last checkpoint
Checkpoint threshold (number of embeddings)
Checkpoint threshold (milliseconds)
flushPendingEmbeddingCheckpoint()
Force an immediate checkpoint.
const flushed = await db.flushPendingEmbeddingCheckpoint();
if (flushed) {
console.log("Database checkpointed");
}
true if checkpoint was performed, false if no pending embeddings
Store key-value metadata.
await db.upsertIndexMeta({
key: "embedding_model",
value: "Xenova/all-MiniLM-L6-v2",
updatedAt: Date.now()
});
Timestamp (milliseconds since epoch)
Resolves when metadata is persisted
Retrieve metadata value by key.
const model = db.getIndexMetaValue("embedding_model");
if (model) {
console.log(`Using model: ${model}`);
}
Metadata value, or null if key not found
Storage
storageMode (property)
Get current storage backend.
const mode = db.storageMode;
console.log(`Storage: ${mode}`); // "opfs", "local-storage", or "memory"
Current storage mode: "opfs" | "local-storage" | "memory"
Types
RepoRecord
type RepoRecord = {
id: number;
fullName: string;
name: string;
description: string | null;
topics: string[];
language: string | null;
htmlUrl: string;
stars: number;
forks: number;
updatedAt: string;
readmeUrl: string | null;
readmeText: string | null;
readmeEtag?: string | null;
readmeLastModified?: string | null;
checksum: string | null;
lastSyncedAt: number;
};
ChunkRecord
type ChunkRecord = {
id: string;
repoId: number;
chunkId: string;
text: string;
source: string;
createdAt: number;
};
EmbeddingRecord
type EmbeddingRecord = {
id: string;
chunkId: string;
model: string;
dimension: number;
vectorBlob: Uint8Array;
createdAt: number;
};
StorageMode
type StorageMode = "opfs" | "local-storage" | "memory";
EmbeddingCheckpointPolicy
type EmbeddingCheckpointPolicy = {
everyEmbeddings: number; // Checkpoint after N embeddings
everyMs: number; // Checkpoint after N milliseconds
};