Skip to main content
GitStarRecall generates embeddings entirely in your browser using WebGPU or WASM backends. No cloud API calls, no usage costs, and complete privacy.

Architecture

Embeddings are generated using a worker pool architecture for optimal performance:
1

Worker Pool

Multiple Web Workers process embeddings in parallel
2

Backend Selection

Automatically chooses WebGPU (fast) or WASM (compatible) based on browser support
3

Batch Processing

Chunks are processed in batches of 8 for efficiency
4

Checkpointing

Progress saved periodically to prevent data loss

Backend Selection

GitStarRecall automatically selects the best embedding backend:

WebGPU (Preferred)

WebGPU provides 3-5x faster embedding generation compared to WASM.
WebGPU Probe
export async function probeWebGpuSupport(
  navigatorLike: unknown
): Promise<WebGpuProbeResult> {
  const gpu = (navigatorLike as { gpu?: GpuLike })?.gpu;
  if (!gpu || typeof gpu.requestAdapter !== "function") {
    return {
      ok: false,
      reason: "navigator.gpu unavailable"
    };
  }

  try {
    const adapter = await gpu.requestAdapter();
    if (!adapter) {
      return {
        ok: false,
        reason: "no WebGPU adapter available"
      };
    }
    await adapter.requestDevice();
    return { ok: true };
  } catch (error) {
    return {
      ok: false,
      reason: `webgpu probe error: ${error.message}`
    };
  }
}
Fallback Logic:
Backend Resolution
export function resolvePreferredBackend(
  preferredBackend: EmbeddingBackendPreference,
  probeResult: WebGpuProbeResult
): { backend: EmbeddingBackendPreference; fallbackReason: string | null } {
  if (preferredBackend === "wasm") {
    return { backend: "wasm", fallbackReason: null };
  }

  if (probeResult.ok) {
    return { backend: "webgpu", fallbackReason: null };
  }

  return {
    backend: "wasm",
    fallbackReason: probeResult.reason
  };
}

WASM (Fallback)

Used when WebGPU is unavailable:
  • Compatibility: Works in all modern browsers
  • Performance: Slower than WebGPU but still runs locally
  • Memory: Lower memory usage than WebGPU
Safari on iOS uses WASM due to limited WebGPU support.

Worker Pool

Embeddings are generated using a configurable worker pool:
Worker Pool Configuration
const DEFAULT_POOL_SIZE = 2;
const DEFAULT_MAX_POOL_SIZE = 2;
const DEFAULT_WORKER_BATCH_SIZE = 8;

class EmbeddingWorkerPool {
  private configuredPoolSize: number;
  private activePoolSize: number;
  private embedders: EmbedderLike[] = [];

  constructor(options: EmbeddingWorkerPoolOptions = {}) {
    this.configuredPoolSize = options.poolSize ?? DEFAULT_POOL_SIZE;
    this.activePoolSize = Math.min(
      this.configuredPoolSize,
      options.maxPoolSize ?? DEFAULT_MAX_POOL_SIZE
    );
    this.workerBatchSize = options.workerBatchSize ?? DEFAULT_WORKER_BATCH_SIZE;
  }
}

Concurrency Management

The pool adjusts concurrency based on runtime conditions:
WebGPU backend runs with single worker concurrency:
if (status.selectedBackend === "webgpu") {
  this.setConcurrency(1);
  return;
}
This prevents GPU contention and ensures optimal performance.
WASM backend can use multiple workers (default: 2):
  • Parallelizes across CPU cores
  • Each worker processes batches independently
  • Dynamically adjusts on memory pressure
Pool reduces to single worker on errors:
if (isMemoryPressureError(message) || 
    this.errorCount >= this.downshiftErrorThreshold) {
  this.downshiftToSingle(message);
}
Prevents cascade failures during memory constraints.

Embedding Generation

The Embedder class manages individual worker instances:
Embedder Class
export class Embedder {
  private worker: Worker;
  private pending = new Map<string, PendingJob>();
  private preferredBackend: EmbeddingBackendPreference;
  private selectedBackend: EmbeddingBackendPreference | null = null;

  async embedBatch(texts: string[]): Promise<BatchEmbeddingResultItem[]> {
    if (texts.length === 0) {
      return [];
    }

    const id = crypto.randomUUID();
    return new Promise((resolve, reject) => {
      this.pending.set(id, { resolve, reject });
      this.worker.postMessage({ 
        id, 
        texts, 
        preferredBackend: this.preferredBackend 
      });
    });
  }

  async embed(text: string): Promise<Float32Array> {
    const results = await this.embedBatch([text]);
    const first = results[0];
    if (!first || first.error || !first.embedding) {
      throw new Error(first?.error ?? "Embedding worker returned empty vector");
    }
    return first.embedding;
  }
}

Vector Format

Embeddings are 384-dimensional L2-normalized vectors:
Vector Operations
export function l2Normalize(vec: Float32Array): Float32Array {
  let sumSquares = 0;
  for (let i = 0; i < vec.length; i += 1) {
    sumSquares += vec[i] * vec[i];
  }

  const norm = Math.sqrt(sumSquares);
  if (norm === 0) {
    return vec.slice();
  }

  const out = new Float32Array(vec.length);
  for (let i = 0; i < vec.length; i += 1) {
    out[i] = vec[i] / norm;
  }
  return out;
}

export function float32ToBlob(vec: Float32Array): Uint8Array {
  const normalized = l2Normalize(vec);
  const bytes = new Uint8Array(normalized.length * Float32Array.BYTES_PER_ELEMENT);
  new Float32Array(bytes.buffer).set(normalized);
  return bytes;
}
Vectors are stored as BLOBs in SQLite for efficient storage and retrieval.

Checkpointing

Progress is saved periodically to prevent data loss:
Checkpoint Policy
const DEFAULT_EMBEDDING_CHECKPOINT_EVERY_EMBEDDINGS = 256;
const DEFAULT_EMBEDDING_CHECKPOINT_EVERY_MS = 3000;

type EmbeddingCheckpointPolicy = {
  everyEmbeddings: number;  // Checkpoint every N embeddings
  everyMs: number;           // Checkpoint every N milliseconds
};

private shouldCheckpointEmbeddings(now: number): boolean {
  if (this.pendingEmbeddingsSinceCheckpoint <= 0) {
    return false;
  }

  if (this.pendingEmbeddingsSinceCheckpoint >= this.embeddingCheckpointPolicy.everyEmbeddings) {
    return true;
  }

  const checkpointBaseline = this.lastEmbeddingCheckpointAt ?? this.pendingEmbeddingsStartedAt;
  return now - checkpointBaseline >= this.embeddingCheckpointPolicy.everyMs;
}

Count-Based

Checkpoint every 256 embeddings by defaultEnsures regular saves during bulk processing

Time-Based

Checkpoint every 3 seconds by defaultPrevents long gaps without saves

Performance Metrics

Real-time metrics track embedding performance:
Embedding Metrics
type EmbeddingRunMetrics = {
  backendIdentity: string;              // "webgpu" or "wasm"
  configuredPoolSize: number;           // Configured workers
  activePoolSize: number;               // Active workers
  poolDownshifted: boolean;             // Reduced to 1 worker?
  batchCount: number;                   // Batches processed
  embeddingsProcessed: number;          // Total embeddings
  embeddingsPerSecond: number;          // Throughput
  avgBatchEmbedLatencyMs: number;       // Average batch time
  lastBatchEmbedLatencyMs: number;      // Last batch time
  avgDbCheckpointMs: number;            // Average save time
  queueDepth: number;                   // Pending chunks
  peakQueueDepth: number;               // Max queue depth
};

Expected Performance

BackendWorkersChunks/Second1000 Chunks
WebGPU140-60~20s
WASM215-25~50s
WASM18-12~100s
Performance varies significantly based on device hardware and browser.

Storage

Embeddings are stored in SQLite with optimized schema:
Embeddings Table
CREATE TABLE embeddings (
  id TEXT PRIMARY KEY,
  chunk_id TEXT NOT NULL,
  model TEXT NOT NULL,
  dimension INTEGER NOT NULL,
  vector_blob BLOB NOT NULL,
  created_at INTEGER NOT NULL,
  FOREIGN KEY (chunk_id) REFERENCES chunks(id) ON DELETE CASCADE
);

CREATE INDEX idx_embeddings_chunk_id ON embeddings(chunk_id);

Storage Backends

Origin Private File System
  • Fast file-based storage
  • Supports large databases
  • Available in modern browsers
  • Better performance than localStorage

Error Handling

Robust error handling prevents cascade failures:
function isMemoryPressureError(errorMessage: string | null): boolean {
  if (!errorMessage) return false;
  const normalized = errorMessage.toLowerCase();
  return (
    normalized.includes("out of memory") ||
    normalized.includes("oom") ||
    normalized.includes("allocation failed")
  );
}
Response: Downshift to single worker, reduce batch size
Model download failures are reported with actionable guidance:
  • Check internet connection
  • Verify CSP allows Hugging Face CDN
  • Retry with exponential backoff
Individual worker failures don’t stop the pool:
  • Failed batches are retried
  • Error count tracked per worker
  • Pool downshifts after threshold

Alternative: Ollama Embeddings

For users who prefer local models with more control:
Ollama Client
class OllamaEmbeddingClient {
  async embed(text: string): Promise<Float32Array> {
    const response = await fetch(`${this.baseUrl}/api/embeddings`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: this.model,
        prompt: text
      })
    });

    const data = await response.json();
    return Float32Array.from(data.embedding);
  }
}

Browser Embeddings

Pros:
  • No setup required
  • Works immediately
  • Cross-platform
Cons:
  • Fixed model
  • Browser limitations

Ollama Embeddings

Pros:
  • Choose your model
  • Better performance
  • More control
Cons:
  • Requires installation
  • Desktop only

Best Practices

1

Batch Processing

Process repositories in batches rather than one-by-oneReduces overhead and improves throughput
2

Monitor Metrics

Watch embedding speed and queue depthSlow speeds may indicate memory pressure or network issues
3

Checkpoint Regularly

Use default checkpoint policy or more frequent for safetyPrevents loss of hours of indexing work
4

Clear on Model Change

Delete embeddings if switching embedding modelsDifferent models produce incompatible vectors