Embedding Performance Overview
Embedding generation is the most compute-intensive operation. The system uses a multi-layered optimization approach:- WebGPU acceleration with automatic WASM fallback
- Worker pool parallelism with adaptive downshifting
- Micro-batch processing to balance throughput and memory
- Checkpointed persistence to reduce write overhead
Performance Metrics
Target performance (1,000 stars on modern laptop):
- Time to first searchable chunks: < 120 seconds
- Embedding throughput improvement: 30%+ over baseline
- Query response time: < 2 seconds after indexing
Embedding Pool Size
Control the number of parallel embedding workers.Number of concurrent embedding workers (clamped to
1..2).Configuration Guide
| Device Type | RAM | Recommended Pool Size | Notes |
|---|---|---|---|
| Budget laptop | < 8GB | 1 | Avoid memory pressure |
| Standard laptop | 8-16GB | 1-2 | Start with 1, increase to 2 if stable |
| High-end desktop | 16GB+ | 2 | Maximum parallelism |
| Mobile/tablet | < 4GB | 1 | Always use single worker |
Adaptive Downshifting
The system automatically reduces pool size when it detects:- Memory pressure errors
- Worker initialization failures
- Repeated embedding failures
Worker Batch Size
Control how many texts are processed per worker batch.Texts per batch (clamped to
1..32).Configuration Guide
| Scenario | Recommended Batch Size | Rationale |
|---|---|---|
| Low memory (< 8GB RAM) | 8 | Reduce memory footprint |
| Standard (8-16GB RAM) | 12-16 | Balance throughput and stability |
| High performance (16GB+ RAM) | 16-24 | Maximize batch efficiency |
| Debugging/stability issues | 4 | Isolate problematic texts |
The system adaptively reduces batch size on failures. Start with default (
12) and increase gradually while monitoring stability.Large Library Mode
Optimizations specifically for 500+ starred repositories.Enable large library optimizations (
0 = disabled, 1 = enabled).Minimum repositories to trigger optimizations.
Large Library Optimizations
When enabled, the system:- Priority ordering - Processes recently updated repositories first
- Resumable cursors - Saves indexing position to recover from crashes
- Adaptive batching - Dynamically adjusts batch size based on available chunks
- Checkpoint coordination - Reduces DB write frequency for large batches
.env.local
Database Write Optimization
Reduce write overhead with batched database commits.Number of embeddings buffered before SQLite write.
Write Strategy
- Small libraries (< 100 repos): Use default
512 - Large libraries (500+ repos): Consider increasing to
1024for fewer writes - Crash recovery concern: Reduce to
256to minimize loss window
Checkpoints occur both at record-count intervals and time intervals. A final flush ensures all data is persisted on completion.
README Fetching Performance
Optimize GitHub API usage during initial sync.Number of READMEs to fetch concurrently.
Enable experimental v2 pipeline with adaptive concurrency.
Fetch Strategy
| Scenario | Batch Size | Pipeline | Notes |
|---|---|---|---|
| Standard (< 500 stars) | 40 | 0 (default) | Balanced approach |
| Large (500-1000 stars) | 30-40 | 1 (v2) | Adaptive cooldown prevents rate limits |
| Very large (1000+ stars) | 20-30 | 1 (v2) | Conservative to avoid 429 errors |
| Rate limit issues | 10-20 | 1 (v2) | Prioritize reliability |
UI Update Throttling
Prevent main-thread pressure during long indexing runs.Throttle interval (milliseconds) for progress updates.
Configuration Guide
- Fast UI (high CPU):
250-300ms- More responsive updates - Balanced:
350ms- Default, good for most cases - Slow device:
500-1000ms- Reduce main-thread load
WebGPU vs WASM Backend
Choose the optimal compute backend for your hardware.Backend preference:
webgpu (GPU-first) or wasm (CPU-only).Backend Comparison
| Backend | Speed | Compatibility | Memory | Use Case |
|---|---|---|---|---|
| WebGPU | ⚡⚡⚡ Fast | Modern browsers only | Higher | Default, best performance |
| WASM | ⚡⚡ Moderate | Universal | Lower | Fallback, older devices |
Platform Support
- Windows: WebGPU → Direct3D backend
- macOS: WebGPU → Metal backend
- Linux: WebGPU → Vulkan backend
- All platforms: WASM CPU fallback
The app automatically falls back to WASM if WebGPU is unavailable. Fallback reason is logged and visible in diagnostics.
Force WASM Mode
If you experience GPU-related issues:.env.local
Chunking Configuration
Control text chunking behavior for embedding generation.Maximum chunk size in characters.
Minimum pending chunks before triggering batch.
Chunking Strategy
- Window size (
512): Balances semantic context and retrieval granularity - Trigger threshold (
256): Prevents frequent small batches during sync - READMEs are truncated to 100,000 characters before chunking
- Chunks have 80-120 character overlap to preserve context at boundaries
Monitoring and Diagnostics
The app tracks and displays detailed performance metrics:Embedding Run Metadata
Access Diagnostics
- Open the app after completing an embedding run
- Check the status panel for “Last embedding run” metrics
- Review
backend,downshift, andfallbackReasonfor issues - Adjust configuration based on observed behavior
Recommended Configurations
Budget Setup (< 8GB RAM)
.env.local
Standard Setup (8-16GB RAM)
.env.local
High-Performance Setup (16GB+ RAM, 1000+ stars)
.env.local
Troubleshooting Performance Issues
Slow Indexing
- Check backend: Verify WebGPU is active (not fallback to WASM)
- Increase pool size: Try
VITE_EMBEDDING_POOL_SIZE=2 - Increase batch size: Try
VITE_EMBEDDING_WORKER_BATCH_SIZE=16 - Enable large library mode: Set
VITE_EMBEDDING_LARGE_LIBRARY_MODE=1
Memory Crashes
- Reduce pool size: Set
VITE_EMBEDDING_POOL_SIZE=1 - Reduce batch size: Set
VITE_EMBEDDING_WORKER_BATCH_SIZE=8 - Force WASM: Set
VITE_EMBEDDING_BACKEND_PREFERRED=wasm - Check downshift events: Review diagnostics for adaptive reductions
Rate Limit Errors
- Enable v2 pipeline: Set
VITE_README_BATCH_PIPELINE_V2=1 - Reduce batch size: Set
VITE_README_BATCH_SIZE=20 - Wait and retry: GitHub rate limits reset hourly