Remote LLM Providers - GitStarRecall

Overview

Remote LLM providers enable chat functionality through external APIs. The system supports any OpenAI-compatible API (OpenAI, Anthropic via proxies, OpenRouter, etc.) and local LM Studio servers.

Provider Definitions

OpenAI-Compatible

From src/llm/providers.ts:155-162:

{
  id: "openai-compatible",
  label: "Remote (OpenAI-compatible)",
  kind: "remote",
  defaultBaseUrl: "https://api.openai.com",
  defaultModel: "gpt-4o-mini",
  requiresApiKey: true,
}

string

Provider identifier: "openai-compatible"

kind

string

Provider type: "remote"

defaultBaseUrl

string

Default: "https://api.openai.com"

defaultModel

string

Default: "gpt-4o-mini"

requiresApiKey

boolean

API key required: true

LM Studio

From src/llm/providers.ts:171-178:

{
  id: "lmstudio",
  label: "Local (LM Studio)",
  kind: "local",
  defaultBaseUrl: "http://localhost:1234",
  defaultModel: "local-model",
  requiresApiKey: false,
}

string

Provider identifier: "lmstudio"

kind

string

Provider type: "local" (runs on localhost)

defaultBaseUrl

string

Default: "http://localhost:1234"

defaultModel

string

Default: "local-model"

requiresApiKey

boolean

API key required: false

Configuration

LLMProviderConfig

From src/llm/types.ts:22-27:

type LLMProviderConfig = {
  baseUrl: string;
  model: string;
  apiKey?: string;
  allowModelDownload?: boolean; // WebLLM only
};

baseUrl

string

required

API base URL (e.g., "https://api.openai.com")

model

string

required

Model identifier (e.g., "gpt-4o-mini", "claude-3-5-sonnet-20241022")

apiKey

string

API key for authentication. Required for remote providers.

allowModelDownload

boolean

Only used by WebLLM provider. Ignored for remote/LM Studio.

LLMStreamRequest

From src/llm/types.ts:14-20:

type LLMStreamRequest = {
  prompt: string;
  contextSnippets: string[];
  signal: AbortSignal;
  onToken: (token: string) => void;
  onInitProgress?: (progress: number, text: string) => void;
};

prompt

string

required

User query or prompt

contextSnippets

string[]

required

Context chunks to include (max 8 used, see TOP_K_LIMIT)

signal

AbortSignal

required

Abort signal to cancel the request

onToken

(token: string) => void

required

Callback invoked for each streamed token

onInitProgress

(progress: number, text: string) => void

Progress callback (WebLLM only, ignored by remote providers)

Streaming API

OpenAI-Compatible Stream

From src/llm/providers.ts:190-212:

import { getProviderById } from './llm/providers';
import type { LLMProviderConfig, LLMStreamRequest } from './llm/types';

const provider = getProviderById('openai-compatible');

const config: LLMProviderConfig = {
  baseUrl: 'https://api.openai.com',
  model: 'gpt-4o-mini',
  apiKey: process.env.OPENAI_API_KEY,
};

const controller = new AbortController();

const request: LLMStreamRequest = {
  prompt: 'Explain semantic search',
  contextSnippets: [
    'Semantic search uses embeddings...',
    'Vector databases store embeddings...',
  ],
  signal: controller.signal,
  onToken: (token) => {
    process.stdout.write(token);
  },
};

await provider.stream(config, request);

Endpoint: POST {baseUrl}/v1/chat/completions Request Headers:

{
  "Content-Type": "application/json",
  "Authorization": "Bearer YOUR_API_KEY"
}

Request Body:

{
  "model": "gpt-4o-mini",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a recommendation assistant for GitHub starred repositories. Use only provided context and be concise."
    },
    {
      "role": "user",
      "content": "Explain semantic search\n\nContext 1:\nSemantic search uses embeddings...\n\nContext 2:\nVector databases store embeddings..."
    }
  ]
}

Response: Server-Sent Events (SSE) From src/llm/providers.ts:31-80:

data: {"choices":[{"delta":{"content":"Semantic"}}]}

data: {"choices":[{"delta":{"content":" search"}}]}

data: {"choices":[{"delta":{"content":" is"}}]}

data: [DONE]

LM Studio Stream

From src/llm/providers.ts:237-259:

const provider = getProviderById('lmstudio');

const config: LLMProviderConfig = {
  baseUrl: 'http://localhost:1234',
  model: 'local-model', // Model name from LM Studio
  // No apiKey needed
};

const request: LLMStreamRequest = {
  prompt: 'What are vector embeddings?',
  contextSnippets: [],
  signal: new AbortController().signal,
  onToken: (token) => console.log(token),
};

await provider.stream(config, request);

Endpoint: POST http://localhost:1234/v1/chat/completions Request Format: Same as OpenAI-compatible (no Authorization header) Response: Server-Sent Events (SSE) - same parsing logic

Message Building

From src/llm/providers.ts:135-147:

function buildMessages(prompt: string, snippets: string[]): ProviderMessage[] {
  return [
    {
      role: "system",
      content:
        "You are a recommendation assistant for GitHub starred repositories. Use only provided context and be concise.",
    },
    {
      role: "user",
      content: `${prompt}\n\n${buildContextBlock(snippets)}`,
    },
  ];
}

Context Formatting

From src/llm/providers.ts:16-21:

const TOP_K_LIMIT = 8;

function buildContextBlock(snippets: string[]): string {
  return snippets
    .slice(0, TOP_K_LIMIT)
    .map((snippet, index) => `Context ${index + 1}:\n${snippet}`)
    .join("\n\n");
}

Example Output:

Context 1:
First snippet content here...

Context 2:
Second snippet content here...

Context 3:
Third snippet content here...

Only the first 8 snippets are included.

Stream Parsing

SSE Parser

From src/llm/providers.ts:31-80:

async function parseSseStream(
  response: Response,
  onToken: (token: string) => void,
): Promise<void> {
  if (!response.body) {
    throw new Error("Streaming response body is not available");
  }

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() ?? "";

    for (const line of lines) {
      const trimmed = line.trim();
      if (!trimmed.startsWith("data:")) continue;

      const raw = trimmed.slice(5).trim();
      if (raw === "[DONE]") return;
      if (!raw) continue;

      try {
        const payload = JSON.parse(raw);
        const token =
          payload.choices?.[0]?.delta?.content ??
          payload.choices?.[0]?.message?.content;

        if (token) {
          onToken(token);
        }
      } catch {
        // Skip malformed SSE lines without killing the stream
      }
    }
  }
}

Extracts tokens from:

choices[0].delta.content (streaming format)
choices[0].message.content (alternative format)

Handles:

data: [DONE] - End marker
Malformed JSON - Skips silently
Empty data lines - Ignores

Error Handling

Format Provider Error

From src/llm/providers.ts:295-319:

import { formatProviderError } from './llm/providers';

try {
  await provider.stream(config, request);
} catch (error) {
  const userMessage = formatProviderError(error, selectedProvider.kind);
  console.error(userMessage);
}

Error Messages:

AbortError

string

“Generation cancelled.”

Failed to fetch

string

default:"local"

“Local provider unreachable. Ensure Ollama/LM Studio is running and CORS/network access allows localhost calls.”

NetworkError

string

default:"local"

“Local provider unreachable. Ensure Ollama/LM Studio is running and CORS/network access allows localhost calls.”

Other errors

string

Returns the original error message

HTTP Error Handling

From src/llm/providers.ts:207-209:

if (!response.ok) {
  throw new Error(`Provider request failed (${response.status})`);
}

No automatic retry logic - errors propagate immediately.

Provider Selection

Get Provider Definitions

import { getProviderDefinitions } from './llm/providers';

const providers = getProviderDefinitions();
// LLMProviderDefinition[]

providers.forEach((provider) => {
  console.log(`${provider.label}: ${provider.id}`);
  console.log(`  Kind: ${provider.kind}`);
  console.log(`  Default: ${provider.defaultModel}`);
  console.log(`  Requires API Key: ${provider.requiresApiKey}`);
});

Output:

Remote (OpenAI-compatible): openai-compatible
  Kind: remote
  Default: gpt-4o-mini
  Requires API Key: true

Local (Ollama): ollama
  Kind: local
  Default: llama3.1:8b
  Requires API Key: false

Local (LM Studio): lmstudio
  Kind: local
  Default: local-model
  Requires API Key: false

Get Provider by ID

import { getProviderById } from './llm/providers';
import type { LLMStreamProvider } from './llm/types';

const provider: LLMStreamProvider = getProviderById('openai-compatible');

console.log(provider.definition.label);
// "Remote (OpenAI-compatible)"

await provider.stream(config, request);

Usage Examples

OpenAI

const config: LLMProviderConfig = {
  baseUrl: 'https://api.openai.com',
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
};

Anthropic (via OpenAI SDK)

const config: LLMProviderConfig = {
  baseUrl: 'https://api.anthropic.com',
  model: 'claude-3-5-sonnet-20241022',
  apiKey: process.env.ANTHROPIC_API_KEY,
};

Note: Requires proxy that translates to OpenAI format

OpenRouter

const config: LLMProviderConfig = {
  baseUrl: 'https://openrouter.ai/api',
  model: 'anthropic/claude-3.5-sonnet',
  apiKey: process.env.OPENROUTER_API_KEY,
};

Azure OpenAI

const config: LLMProviderConfig = {
  baseUrl: 'https://YOUR_RESOURCE.openai.azure.com',
  model: 'gpt-4o',
  apiKey: process.env.AZURE_OPENAI_API_KEY,
};

Note: May require different endpoint paths

LM Studio Local

const config: LLMProviderConfig = {
  baseUrl: 'http://localhost:1234',
  model: 'lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF',
};

Cancellation

const controller = new AbortController();

const request: LLMStreamRequest = {
  prompt: 'Long generation task...',
  contextSnippets: [],
  signal: controller.signal,
  onToken: (token) => console.log(token),
};

// Start streaming
const streamPromise = provider.stream(config, request);

// Cancel after 5 seconds
setTimeout(() => {
  controller.abort();
  console.log('Generation cancelled');
}, 5000);

try {
  await streamPromise;
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('User cancelled generation');
  }
}

Type Definitions

LLMProviderId

From src/llm/types.ts:3:

type LLMProviderId = "openai-compatible" | "ollama" | "lmstudio" | "webllm";

LLMProviderDefinition

From src/llm/types.ts:5-12:

type LLMProviderDefinition = {
  id: LLMProviderId;
  label: string;
  kind: LLMProviderKind;
  defaultBaseUrl: string;
  defaultModel: string;
  requiresApiKey: boolean;
};

LLMStreamProvider

From src/llm/types.ts:29-32:

type LLMStreamProvider = {
  definition: LLMProviderDefinition;
  stream: (config: LLMProviderConfig, request: LLMStreamRequest) => Promise<void>;
};

ProviderMessage

From src/llm/providers.ts:130-133:

type ProviderMessage = {
  role: "system" | "user";
  content: string;
};

Note: Assistant messages not used (streaming responses only)

Best Practices

Always validate API keys before making requests
Use AbortController for user cancellation
Handle network errors gracefully (connection failures, timeouts)
Respect rate limits - implement backoff for production
Validate response status before parsing SSE
Sanitize context snippets - limit to 8 most relevant
Test with different models - behavior varies by provider
Set appropriate timeouts - default fetch timeout is browser-dependent
Monitor token usage - track costs for metered APIs
Fallback to smaller models on errors (e.g., gpt-4o → gpt-4o-mini)

Common Issues

CORS Errors (LM Studio)

Ensure LM Studio CORS settings allow your origin:

{
  "cors.allowedOrigins": ["http://localhost:5173"]
}

Invalid API Key

if (!response.ok) {
  if (response.status === 401) {
    throw new Error('Invalid API key');
  }
  if (response.status === 403) {
    throw new Error('API key lacks required permissions');
  }
}

Rate Limiting

if (response.status === 429) {
  const retryAfter = response.headers.get('Retry-After');
  throw new Error(`Rate limited. Retry after ${retryAfter}s`);
}

Model Not Found

if (response.status === 404) {
  throw new Error(`Model "${config.model}" not found`);
}

Core Modules

LLM Providers

GitHub Integration

​Overview

​Provider Definitions

​OpenAI-Compatible

​LM Studio

​Configuration

​LLMProviderConfig

​LLMStreamRequest

​Streaming API

​OpenAI-Compatible Stream

​LM Studio Stream

​Message Building

​Context Formatting

​Stream Parsing

​SSE Parser

​Error Handling

​Format Provider Error

​HTTP Error Handling

​Provider Selection

​Get Provider Definitions

​Get Provider by ID

​Usage Examples

​OpenAI

​Anthropic (via OpenAI SDK)

​OpenRouter

​Azure OpenAI

​LM Studio Local

​Cancellation

​Type Definitions

​LLMProviderId

​LLMProviderDefinition

​LLMStreamProvider

​ProviderMessage

​Best Practices

​Common Issues

​CORS Errors (LM Studio)

​Invalid API Key

​Rate Limiting

​Model Not Found

Overview

Provider Definitions

OpenAI-Compatible

LM Studio

Configuration

LLMProviderConfig

LLMStreamRequest

Streaming API

OpenAI-Compatible Stream

LM Studio Stream

Message Building

Context Formatting

Stream Parsing

SSE Parser

Error Handling

Format Provider Error

HTTP Error Handling

Provider Selection

Get Provider Definitions

Get Provider by ID

Usage Examples

OpenAI

Anthropic (via OpenAI SDK)

OpenRouter

Azure OpenAI

LM Studio Local

Cancellation

Type Definitions

LLMProviderId

LLMProviderDefinition

LLMStreamProvider

ProviderMessage

Best Practices

Common Issues

CORS Errors (LM Studio)

Invalid API Key

Rate Limiting

Model Not Found