Skip to main content
GitStarRecall uses semantic search to help you find repositories based on meaning rather than exact keyword matches. Ask questions in natural language and get relevant results from your starred repositories.

How It Works

Semantic search transforms your repositories into vector embeddings that capture their meaning, enabling intelligent search across your GitHub stars.

Vector Embeddings

Each repository is converted into a mathematical representation (embedding) that captures its semantic meaning:
  1. Repository Metadata: Name, description, language, topics
  2. README Content: Normalized and chunked documentation
  3. Embedding Generation: Converted to 384-dimensional vectors using local models
// Adaptive chunk sizing based on README length
function resolveChunkConfig(textLength: number) {
  if (textLength <= 3_000) {
    return { size: 900, overlap: 140 };  // Short docs
  }
  if (textLength <= 15_000) {
    return { size: 760, overlap: 110 };  // Medium docs
  }
  return { size: 640, overlap: 90 };     // Long docs
}
When you search, GitStarRecall:
  1. Converts your query into an embedding vector
  2. Computes cosine similarity against all repository embeddings
  3. Returns the most semantically similar repositories
Cosine Similarity
function cosineSimilarity(a: Float32Array, b: Float32Array): number {
  let dot = 0;
  let normA = 0;
  let normB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }

  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

Natural Language Queries

Search using conversational language instead of keywords:

Traditional Keyword

react hooks state management

Natural Language

“What are good libraries for managing state in React applications?”

Query Examples

  • “Command line tools for database management”
  • “Libraries for parsing configuration files”
  • “Tools to improve developer productivity”
  • “Lightweight alternatives to Webpack”
  • “Modern CSS frameworks similar to Tailwind”
  • “GraphQL clients for React”
  • “How to handle file uploads in Express”
  • “Testing frameworks for TypeScript”
  • “Authentication libraries for Node.js”

Search Results

Results include context from matching repository chunks:
Search Result Structure
type SearchResult = {
  chunkId: string;           // Unique chunk identifier
  score: number;             // Similarity score (0-1)
  text: string;              // Matching chunk text
  repoId: number;
  repoName: string;
  repoFullName: string;
  repoDescription: string | null;
  repoUrl: string;
  language: string | null;
  topics: string[];
  updatedAt: string;
};

Filtering Results

Refine search results using filters:
  • Language: Filter by programming language
  • Topics: Filter by repository topics
  • Recency: Filter by last updated date
Filters are applied after semantic search to preserve ranking quality.

Performance

Semantic search is optimized for browser environments:
1

In-Memory Cache

Embeddings are cached in memory for instant repeated queries
2

Indexed Storage

Vector index is stored in SQLite with OPFS or localStorage backend
3

Efficient Similarity

Cosine similarity computed using optimized Float32Array operations

Search Performance Characteristics

RepositoriesEmbeddingsSearch Time
100~200<50ms
500~1000<200ms
1000~2000<400ms
All computations run locally in your browser. No data is sent to external servers.

Best Practices

Be Specific

Include relevant details about your use caseGood: “Python libraries for processing CSV files”Better: “Fast Python libraries for parsing large CSV files with data validation”

Use Filters

Combine semantic search with filters for precisionSearch: “web frameworks”Filter: Language = JavaScript, Updated within 1 year

Multiple Attempts

Try different phrasings if results aren’t relevant“state management” → “managing application state” → “global state libraries”

Check Context

Review the chunk text to understand why a repository matched