Overview
GitStarRecall uses a multi-layered storage strategy designed for privacy, performance, and browser compatibility. All data is stored client-side by default.Storage Layers
Primary Storage: SQLite WASM
The main database usessql.js, a SQLite WASM implementation that runs entirely in the browser.
Key Features:
- Full SQL database in browser memory
- ACID transactions
- Foreign key constraints
- Efficient indexing
- Export/import capability
Persistence Layer: OPFS
Origin Private File System (OPFS) provides fast, private file storage when available. Benefits:- Persistent across sessions
- Better performance than localStorage
- Larger storage quota
- Private to origin
- Database file:
gitstarrecall.sqlite - Written on checkpoint (periodic + on completion)
- Read on app initialization
- Atomic write with
createWritable()
Fallback: localStorage
When OPFS is unavailable, the database is persisted to localStorage as Base64-encoded bytes. Key:gitstarrecall.sqlite.base64
Limitations:
- ~5-10MB quota (browser-dependent)
- Slower than OPFS
- Can fail on quota exceeded
Fallback: Memory-Only Mode
When both OPFS and localStorage fail (quota exceeded), the database runs in memory only. Behavior:- Data persists within tab session
- Lost on tab close/refresh
- No write errors
- Storage mode indicated in UI
Storage Mode Priority
- OPFS (preferred)
- localStorage (fallback)
- memory (last resort)
Database Schema
Tables
repos
Stores GitHub repository metadata and README content.id: GitHub repository IDchecksum: SHA-256 hash of metadata + README for diff-based syncreadme_etag: HTTP ETag for conditional README fetchreadme_last_modified: HTTP Last-Modified headertopics_json: JSON array of repository topics
chunks
Stores text chunks generated from README content.- Simple tokenizer with overlap
- Target: 500-800 characters per chunk
- Overlap: 80-120 characters
- Source:
readmeormetadata
embeddings
Stores vector embeddings for semantic search.- Format: Float32Array stored as BLOB
- Dimension: 384 (for
all-MiniLM-L6-v2) - Model: Tracked for compatibility checks
- Normalization: L2-normalized before storage
chat_sessions
Stores chat session metadata.- One session per initial query
- Can have multiple follow-up messages
- Sorted by
updated_atDESC in UI - Updated on new message
chat_messages
Stores individual chat messages.- Primary:
created_atASC - Secondary:
sequenceASC (for same timestamp) - Role:
user,assistant, orsystem
index_meta
Stores indexing metadata and resume state.embedding_backend:webgpuorwasmembedding_pool_size: Number of workers usedcheckpoint_policy_version: Tracking policy changeslast_checkpoint_at: Timestamp of last checkpointembedding_perf_last_run: JSON performance summarylarge_library_cursor: Resume position for interrupted indexing
Chat Backup System
Chat sessions and messages are additionally backed up to IndexedDB (with localStorage fallback) to provide extra durability.IndexedDB Structure
Database:gitstarrecall-chat-backup (version 1)
Object Stores:
-
chat_sessions- keyPath:
id - Stores:
ChatSessionRecord[]
- keyPath:
-
chat_messages- keyPath:
id - Index:
by_session_idonsessionId(non-unique) - Stores:
ChatMessageRecord[]
- keyPath:
Backup Strategy
On Session Write:- Try IndexedDB
- Fall back to localStorage
- Silent failure (chat still in SQLite)
Backup Limits
- Sessions: Max 200 (keep most recent)
- Messages: Max 5000 (keep most recent by
created_at) - Auto-pruning: After each backup write
localStorage Backup Keys
- Sessions:
gitstarrecall.chat.backup.sessions.v1 - Messages:
gitstarrecall.chat.backup.messages.v1
Recovery Flow
- On app load, check IndexedDB for backup
- If IndexedDB has data, use it
- Otherwise, check localStorage
- Merge backup into SQLite if SQLite is empty or corrupt
Checkpointing Strategy
Embedding writes are batched and checkpointed periodically to balance performance and durability.Policy
Checkpoint Triggers:- Every 256 embeddings (configurable)
- Every 3000ms (configurable)
- On completion of indexing run
- On manual flush
Implementation
Benefits
- Performance: Fewer disk writes
- Durability: Regular checkpoints limit data loss
- Tunability: Configurable based on device capabilities
Vector Search Implementation
GitStarRecall uses brute-force cosine similarity search with an in-memory cache.Vector Index Cache
- On new embeddings written
- On chunks deleted
- When embedding count changes
Similarity Search
Cosine Similarity
- All embeddings are L2-normalized before storage
- Query vectors are L2-normalized before search
- This ensures cosine similarity works correctly
Sync and Diff Strategy
Checksum Generation
Diff-Based Sync
On Fetch Stars:- Get current repo sync state from DB
- Fetch latest stars from GitHub
- Compute checksums for fetched repos
- Compare:
- New: Not in local DB
- Changed: Checksum differs
- Unchanged: Checksum matches
- Removed: In local DB but not in fetched stars
- Update/insert changed and new repos
- Delete removed repos (cascades to chunks and embeddings)
- Generate chunks only for new/changed repos
- Queue chunks for embedding
README Caching
ETag and Last-Modified Headers:- Reduces GitHub API calls
- Faster sync for unchanged READMEs
- Respects rate limits
Data Cleanup
Clear All Data
- “Delete local data” button in settings
- Clears all repos, chunks, embeddings, and chat data
- Does not clear GitHub token
Clear Token
- “Clear token” button in settings
- Logs user out
- Does not clear local data
Storage Diagnostics
Storage Mode Detection
- Settings page shows current storage mode
- Warning if in memory-only mode
Quota Estimation
Database Size
Related Documentation
- Architecture - System architecture and tech stack
- Troubleshooting - Common storage issues