Local Embeddings

GitStarRecall generates embeddings entirely in your browser using WebGPU or WASM backends. No cloud API calls, no usage costs, and complete privacy.

Architecture

Embeddings are generated using a worker pool architecture for optimal performance:

Worker Pool

Multiple Web Workers process embeddings in parallel

Backend Selection

Automatically chooses WebGPU (fast) or WASM (compatible) based on browser support

Batch Processing

Chunks are processed in batches of 8 for efficiency

Checkpointing

Progress saved periodically to prevent data loss

Backend Selection

GitStarRecall automatically selects the best embedding backend:

WebGPU (Preferred)

WebGPU provides 3-5x faster embedding generation compared to WASM.

WebGPU Probe

export async function probeWebGpuSupport(
  navigatorLike: unknown
): Promise<WebGpuProbeResult> {
  const gpu = (navigatorLike as { gpu?: GpuLike })?.gpu;
  if (!gpu || typeof gpu.requestAdapter !== "function") {
    return {
      ok: false,
      reason: "navigator.gpu unavailable"
    };
  }

  try {
    const adapter = await gpu.requestAdapter();
    if (!adapter) {
      return {
        ok: false,
        reason: "no WebGPU adapter available"
      };
    }
    await adapter.requestDevice();
    return { ok: true };
  } catch (error) {
    return {
      ok: false,
      reason: `webgpu probe error: ${error.message}`
    };
  }
}

Fallback Logic:

Backend Resolution

export function resolvePreferredBackend(
  preferredBackend: EmbeddingBackendPreference,
  probeResult: WebGpuProbeResult
): { backend: EmbeddingBackendPreference; fallbackReason: string | null } {
  if (preferredBackend === "wasm") {
    return { backend: "wasm", fallbackReason: null };
  }

  if (probeResult.ok) {
    return { backend: "webgpu", fallbackReason: null };
  }

  return {
    backend: "wasm",
    fallbackReason: probeResult.reason
  };
}

WASM (Fallback)

Used when WebGPU is unavailable:

Compatibility: Works in all modern browsers
Performance: Slower than WebGPU but still runs locally
Memory: Lower memory usage than WebGPU

Safari on iOS uses WASM due to limited WebGPU support.

Worker Pool

Embeddings are generated using a configurable worker pool:

Worker Pool Configuration

const DEFAULT_POOL_SIZE = 2;
const DEFAULT_MAX_POOL_SIZE = 2;
const DEFAULT_WORKER_BATCH_SIZE = 8;

class EmbeddingWorkerPool {
  private configuredPoolSize: number;
  private activePoolSize: number;
  private embedders: EmbedderLike[] = [];

  constructor(options: EmbeddingWorkerPoolOptions = {}) {
    this.configuredPoolSize = options.poolSize ?? DEFAULT_POOL_SIZE;
    this.activePoolSize = Math.min(
      this.configuredPoolSize,
      options.maxPoolSize ?? DEFAULT_MAX_POOL_SIZE
    );
    this.workerBatchSize = options.workerBatchSize ?? DEFAULT_WORKER_BATCH_SIZE;
  }
}

Concurrency Management

The pool adjusts concurrency based on runtime conditions:

WebGPU Concurrency

WebGPU backend runs with single worker concurrency:

if (status.selectedBackend === "webgpu") {
  this.setConcurrency(1);
  return;
}

This prevents GPU contention and ensures optimal performance.

WASM Concurrency

WASM backend can use multiple workers (default: 2):

Parallelizes across CPU cores
Each worker processes batches independently
Dynamically adjusts on memory pressure

Automatic Downshift

Pool reduces to single worker on errors:

if (isMemoryPressureError(message) || 
    this.errorCount >= this.downshiftErrorThreshold) {
  this.downshiftToSingle(message);
}

Prevents cascade failures during memory constraints.

Embedding Generation

The Embedder class manages individual worker instances:

Embedder Class

export class Embedder {
  private worker: Worker;
  private pending = new Map<string, PendingJob>();
  private preferredBackend: EmbeddingBackendPreference;
  private selectedBackend: EmbeddingBackendPreference | null = null;

  async embedBatch(texts: string[]): Promise<BatchEmbeddingResultItem[]> {
    if (texts.length === 0) {
      return [];
    }

    const id = crypto.randomUUID();
    return new Promise((resolve, reject) => {
      this.pending.set(id, { resolve, reject });
      this.worker.postMessage({ 
        id, 
        texts, 
        preferredBackend: this.preferredBackend 
      });
    });
  }

  async embed(text: string): Promise<Float32Array> {
    const results = await this.embedBatch([text]);
    const first = results[0];
    if (!first || first.error || !first.embedding) {
      throw new Error(first?.error ?? "Embedding worker returned empty vector");
    }
    return first.embedding;
  }
}

Vector Format

Embeddings are 384-dimensional L2-normalized vectors:

Vector Operations

export function l2Normalize(vec: Float32Array): Float32Array {
  let sumSquares = 0;
  for (let i = 0; i < vec.length; i += 1) {
    sumSquares += vec[i] * vec[i];
  }

  const norm = Math.sqrt(sumSquares);
  if (norm === 0) {
    return vec.slice();
  }

  const out = new Float32Array(vec.length);
  for (let i = 0; i < vec.length; i += 1) {
    out[i] = vec[i] / norm;
  }
  return out;
}

export function float32ToBlob(vec: Float32Array): Uint8Array {
  const normalized = l2Normalize(vec);
  const bytes = new Uint8Array(normalized.length * Float32Array.BYTES_PER_ELEMENT);
  new Float32Array(bytes.buffer).set(normalized);
  return bytes;
}

Vectors are stored as BLOBs in SQLite for efficient storage and retrieval.

Checkpointing

Progress is saved periodically to prevent data loss:

Checkpoint Policy

const DEFAULT_EMBEDDING_CHECKPOINT_EVERY_EMBEDDINGS = 256;
const DEFAULT_EMBEDDING_CHECKPOINT_EVERY_MS = 3000;

type EmbeddingCheckpointPolicy = {
  everyEmbeddings: number;  // Checkpoint every N embeddings
  everyMs: number;           // Checkpoint every N milliseconds
};

private shouldCheckpointEmbeddings(now: number): boolean {
  if (this.pendingEmbeddingsSinceCheckpoint <= 0) {
    return false;
  }

  if (this.pendingEmbeddingsSinceCheckpoint >= this.embeddingCheckpointPolicy.everyEmbeddings) {
    return true;
  }

  const checkpointBaseline = this.lastEmbeddingCheckpointAt ?? this.pendingEmbeddingsStartedAt;
  return now - checkpointBaseline >= this.embeddingCheckpointPolicy.everyMs;
}

Count-Based

Checkpoint every 256 embeddings by defaultEnsures regular saves during bulk processing

Time-Based

Checkpoint every 3 seconds by defaultPrevents long gaps without saves

Performance Metrics

Real-time metrics track embedding performance:

Embedding Metrics

type EmbeddingRunMetrics = {
  backendIdentity: string;              // "webgpu" or "wasm"
  configuredPoolSize: number;           // Configured workers
  activePoolSize: number;               // Active workers
  poolDownshifted: boolean;             // Reduced to 1 worker?
  batchCount: number;                   // Batches processed
  embeddingsProcessed: number;          // Total embeddings
  embeddingsPerSecond: number;          // Throughput
  avgBatchEmbedLatencyMs: number;       // Average batch time
  lastBatchEmbedLatencyMs: number;      // Last batch time
  avgDbCheckpointMs: number;            // Average save time
  queueDepth: number;                   // Pending chunks
  peakQueueDepth: number;               // Max queue depth
};

Expected Performance

Backend	Workers	Chunks/Second	1000 Chunks
WebGPU	1	40-60	~20s
WASM	2	15-25	~50s
WASM	1	8-12	~100s

Performance varies significantly based on device hardware and browser.

Storage

Embeddings are stored in SQLite with optimized schema:

Embeddings Table

CREATE TABLE embeddings (
  id TEXT PRIMARY KEY,
  chunk_id TEXT NOT NULL,
  model TEXT NOT NULL,
  dimension INTEGER NOT NULL,
  vector_blob BLOB NOT NULL,
  created_at INTEGER NOT NULL,
  FOREIGN KEY (chunk_id) REFERENCES chunks(id) ON DELETE CASCADE
);

CREATE INDEX idx_embeddings_chunk_id ON embeddings(chunk_id);

Storage Backends

OPFS (Preferred)
localStorage (Fallback)
Memory (Last Resort)

Origin Private File System

Fast file-based storage
Supports large databases
Available in modern browsers
Better performance than localStorage

Error Handling

Robust error handling prevents cascade failures:

Memory Pressure

function isMemoryPressureError(errorMessage: string | null): boolean {
  if (!errorMessage) return false;
  const normalized = errorMessage.toLowerCase();
  return (
    normalized.includes("out of memory") ||
    normalized.includes("oom") ||
    normalized.includes("allocation failed")
  );
}

Response: Downshift to single worker, reduce batch size

Network Errors

Model download failures are reported with actionable guidance:

Check internet connection
Verify CSP allows Hugging Face CDN
Retry with exponential backoff

Worker Failures

Individual worker failures don’t stop the pool:

Failed batches are retried
Error count tracked per worker
Pool downshifts after threshold

Alternative: Ollama Embeddings

For users who prefer local models with more control:

Ollama Client

class OllamaEmbeddingClient {
  async embed(text: string): Promise<Float32Array> {
    const response = await fetch(`${this.baseUrl}/api/embeddings`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: this.model,
        prompt: text
      })
    });

    const data = await response.json();
    return Float32Array.from(data.embedding);
  }
}

Browser Embeddings

Pros:

No setup required
Works immediately
Cross-platform

Cons:

Fixed model
Browser limitations

Ollama Embeddings

Pros:

Choose your model
Better performance
More control

Cons:

Requires installation
Desktop only

Best Practices

Batch Processing

Process repositories in batches rather than one-by-oneReduces overhead and improves throughput

Monitor Metrics

Watch embedding speed and queue depthSlow speeds may indicate memory pressure or network issues

Checkpoint Regularly

Use default checkpoint policy or more frequent for safetyPrevents loss of hours of indexing work

Clear on Model Change

Delete embeddings if switching embedding modelsDifferent models produce incompatible vectors

Semantic Search

Use embeddings to find repositories by meaning

Local LLM

Run language models in the browser like embeddings

Chat Sessions

Interact with search results using LLMs

Get Started

Core Features

Configuration

Deployment

Advanced

Architecture

Backend Selection

WebGPU (Preferred)

WASM (Fallback)

Worker Pool

Concurrency Management

Embedding Generation

Vector Format

Checkpointing

Count-Based

Time-Based

Performance Metrics

Expected Performance

Storage

Storage Backends

Error Handling

Alternative: Ollama Embeddings

Browser Embeddings

Ollama Embeddings

Best Practices

Semantic Search

Local LLM

Chat Sessions

Get Started

Core Features

Configuration

Deployment

Advanced

​Architecture

​Backend Selection

​WebGPU (Preferred)

​WASM (Fallback)

​Worker Pool

​Concurrency Management

​Embedding Generation

​Vector Format

​Checkpointing

Count-Based

Time-Based

​Performance Metrics

​Expected Performance

​Storage

​Storage Backends

​Error Handling

​Alternative: Ollama Embeddings

Browser Embeddings

Ollama Embeddings

​Best Practices

​Related Features

Semantic Search

Local LLM

Chat Sessions

Architecture

Backend Selection

WebGPU (Preferred)

WASM (Fallback)

Worker Pool

Concurrency Management

Embedding Generation

Vector Format

Checkpointing

Performance Metrics

Expected Performance

Storage

Storage Backends

Error Handling

Alternative: Ollama Embeddings

Best Practices

Related Features