Skip to main content

Overview

Remote LLM providers enable chat functionality through external APIs. The system supports any OpenAI-compatible API (OpenAI, Anthropic via proxies, OpenRouter, etc.) and local LM Studio servers.

Provider Definitions

OpenAI-Compatible

From src/llm/providers.ts:155-162:
{
  id: "openai-compatible",
  label: "Remote (OpenAI-compatible)",
  kind: "remote",
  defaultBaseUrl: "https://api.openai.com",
  defaultModel: "gpt-4o-mini",
  requiresApiKey: true,
}
id
string
Provider identifier: "openai-compatible"
kind
string
Provider type: "remote"
defaultBaseUrl
string
Default: "https://api.openai.com"
defaultModel
string
Default: "gpt-4o-mini"
requiresApiKey
boolean
API key required: true

LM Studio

From src/llm/providers.ts:171-178:
{
  id: "lmstudio",
  label: "Local (LM Studio)",
  kind: "local",
  defaultBaseUrl: "http://localhost:1234",
  defaultModel: "local-model",
  requiresApiKey: false,
}
id
string
Provider identifier: "lmstudio"
kind
string
Provider type: "local" (runs on localhost)
defaultBaseUrl
string
Default: "http://localhost:1234"
defaultModel
string
Default: "local-model"
requiresApiKey
boolean
API key required: false

Configuration

LLMProviderConfig

From src/llm/types.ts:22-27:
type LLMProviderConfig = {
  baseUrl: string;
  model: string;
  apiKey?: string;
  allowModelDownload?: boolean; // WebLLM only
};
baseUrl
string
required
API base URL (e.g., "https://api.openai.com")
model
string
required
Model identifier (e.g., "gpt-4o-mini", "claude-3-5-sonnet-20241022")
apiKey
string
API key for authentication. Required for remote providers.
allowModelDownload
boolean
Only used by WebLLM provider. Ignored for remote/LM Studio.

LLMStreamRequest

From src/llm/types.ts:14-20:
type LLMStreamRequest = {
  prompt: string;
  contextSnippets: string[];
  signal: AbortSignal;
  onToken: (token: string) => void;
  onInitProgress?: (progress: number, text: string) => void;
};
prompt
string
required
User query or prompt
contextSnippets
string[]
required
Context chunks to include (max 8 used, see TOP_K_LIMIT)
signal
AbortSignal
required
Abort signal to cancel the request
onToken
(token: string) => void
required
Callback invoked for each streamed token
onInitProgress
(progress: number, text: string) => void
Progress callback (WebLLM only, ignored by remote providers)

Streaming API

OpenAI-Compatible Stream

From src/llm/providers.ts:190-212:
import { getProviderById } from './llm/providers';
import type { LLMProviderConfig, LLMStreamRequest } from './llm/types';

const provider = getProviderById('openai-compatible');

const config: LLMProviderConfig = {
  baseUrl: 'https://api.openai.com',
  model: 'gpt-4o-mini',
  apiKey: process.env.OPENAI_API_KEY,
};

const controller = new AbortController();

const request: LLMStreamRequest = {
  prompt: 'Explain semantic search',
  contextSnippets: [
    'Semantic search uses embeddings...',
    'Vector databases store embeddings...',
  ],
  signal: controller.signal,
  onToken: (token) => {
    process.stdout.write(token);
  },
};

await provider.stream(config, request);
Endpoint: POST {baseUrl}/v1/chat/completions Request Headers:
{
  "Content-Type": "application/json",
  "Authorization": "Bearer YOUR_API_KEY"
}
Request Body:
{
  "model": "gpt-4o-mini",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a recommendation assistant for GitHub starred repositories. Use only provided context and be concise."
    },
    {
      "role": "user",
      "content": "Explain semantic search\n\nContext 1:\nSemantic search uses embeddings...\n\nContext 2:\nVector databases store embeddings..."
    }
  ]
}
Response: Server-Sent Events (SSE) From src/llm/providers.ts:31-80:
data: {"choices":[{"delta":{"content":"Semantic"}}]}

data: {"choices":[{"delta":{"content":" search"}}]}

data: {"choices":[{"delta":{"content":" is"}}]}

data: [DONE]

LM Studio Stream

From src/llm/providers.ts:237-259:
const provider = getProviderById('lmstudio');

const config: LLMProviderConfig = {
  baseUrl: 'http://localhost:1234',
  model: 'local-model', // Model name from LM Studio
  // No apiKey needed
};

const request: LLMStreamRequest = {
  prompt: 'What are vector embeddings?',
  contextSnippets: [],
  signal: new AbortController().signal,
  onToken: (token) => console.log(token),
};

await provider.stream(config, request);
Endpoint: POST http://localhost:1234/v1/chat/completions Request Format: Same as OpenAI-compatible (no Authorization header) Response: Server-Sent Events (SSE) - same parsing logic

Message Building

From src/llm/providers.ts:135-147:
function buildMessages(prompt: string, snippets: string[]): ProviderMessage[] {
  return [
    {
      role: "system",
      content:
        "You are a recommendation assistant for GitHub starred repositories. Use only provided context and be concise.",
    },
    {
      role: "user",
      content: `${prompt}\n\n${buildContextBlock(snippets)}`,
    },
  ];
}

Context Formatting

From src/llm/providers.ts:16-21:
const TOP_K_LIMIT = 8;

function buildContextBlock(snippets: string[]): string {
  return snippets
    .slice(0, TOP_K_LIMIT)
    .map((snippet, index) => `Context ${index + 1}:\n${snippet}`)
    .join("\n\n");
}
Example Output:
Context 1:
First snippet content here...

Context 2:
Second snippet content here...

Context 3:
Third snippet content here...
Only the first 8 snippets are included.

Stream Parsing

SSE Parser

From src/llm/providers.ts:31-80:
async function parseSseStream(
  response: Response,
  onToken: (token: string) => void,
): Promise<void> {
  if (!response.body) {
    throw new Error("Streaming response body is not available");
  }

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() ?? "";

    for (const line of lines) {
      const trimmed = line.trim();
      if (!trimmed.startsWith("data:")) continue;

      const raw = trimmed.slice(5).trim();
      if (raw === "[DONE]") return;
      if (!raw) continue;

      try {
        const payload = JSON.parse(raw);
        const token =
          payload.choices?.[0]?.delta?.content ??
          payload.choices?.[0]?.message?.content;

        if (token) {
          onToken(token);
        }
      } catch {
        // Skip malformed SSE lines without killing the stream
      }
    }
  }
}
Extracts tokens from:
  • choices[0].delta.content (streaming format)
  • choices[0].message.content (alternative format)
Handles:
  • data: [DONE] - End marker
  • Malformed JSON - Skips silently
  • Empty data lines - Ignores

Error Handling

Format Provider Error

From src/llm/providers.ts:295-319:
import { formatProviderError } from './llm/providers';

try {
  await provider.stream(config, request);
} catch (error) {
  const userMessage = formatProviderError(error, selectedProvider.kind);
  console.error(userMessage);
}
Error Messages:
AbortError
string
“Generation cancelled.”
Failed to fetch
string
default:"local"
“Local provider unreachable. Ensure Ollama/LM Studio is running and CORS/network access allows localhost calls.”
NetworkError
string
default:"local"
“Local provider unreachable. Ensure Ollama/LM Studio is running and CORS/network access allows localhost calls.”
Other errors
string
Returns the original error message

HTTP Error Handling

From src/llm/providers.ts:207-209:
if (!response.ok) {
  throw new Error(`Provider request failed (${response.status})`);
}
No automatic retry logic - errors propagate immediately.

Provider Selection

Get Provider Definitions

import { getProviderDefinitions } from './llm/providers';

const providers = getProviderDefinitions();
// LLMProviderDefinition[]

providers.forEach((provider) => {
  console.log(`${provider.label}: ${provider.id}`);
  console.log(`  Kind: ${provider.kind}`);
  console.log(`  Default: ${provider.defaultModel}`);
  console.log(`  Requires API Key: ${provider.requiresApiKey}`);
});
Output:
Remote (OpenAI-compatible): openai-compatible
  Kind: remote
  Default: gpt-4o-mini
  Requires API Key: true

Local (Ollama): ollama
  Kind: local
  Default: llama3.1:8b
  Requires API Key: false

Local (LM Studio): lmstudio
  Kind: local
  Default: local-model
  Requires API Key: false

Get Provider by ID

import { getProviderById } from './llm/providers';
import type { LLMStreamProvider } from './llm/types';

const provider: LLMStreamProvider = getProviderById('openai-compatible');

console.log(provider.definition.label);
// "Remote (OpenAI-compatible)"

await provider.stream(config, request);

Usage Examples

OpenAI

const config: LLMProviderConfig = {
  baseUrl: 'https://api.openai.com',
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
};

Anthropic (via OpenAI SDK)

const config: LLMProviderConfig = {
  baseUrl: 'https://api.anthropic.com',
  model: 'claude-3-5-sonnet-20241022',
  apiKey: process.env.ANTHROPIC_API_KEY,
};
Note: Requires proxy that translates to OpenAI format

OpenRouter

const config: LLMProviderConfig = {
  baseUrl: 'https://openrouter.ai/api',
  model: 'anthropic/claude-3.5-sonnet',
  apiKey: process.env.OPENROUTER_API_KEY,
};

Azure OpenAI

const config: LLMProviderConfig = {
  baseUrl: 'https://YOUR_RESOURCE.openai.azure.com',
  model: 'gpt-4o',
  apiKey: process.env.AZURE_OPENAI_API_KEY,
};
Note: May require different endpoint paths

LM Studio Local

const config: LLMProviderConfig = {
  baseUrl: 'http://localhost:1234',
  model: 'lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF',
};

Cancellation

const controller = new AbortController();

const request: LLMStreamRequest = {
  prompt: 'Long generation task...',
  contextSnippets: [],
  signal: controller.signal,
  onToken: (token) => console.log(token),
};

// Start streaming
const streamPromise = provider.stream(config, request);

// Cancel after 5 seconds
setTimeout(() => {
  controller.abort();
  console.log('Generation cancelled');
}, 5000);

try {
  await streamPromise;
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('User cancelled generation');
  }
}

Type Definitions

LLMProviderId

From src/llm/types.ts:3:
type LLMProviderId = "openai-compatible" | "ollama" | "lmstudio" | "webllm";

LLMProviderDefinition

From src/llm/types.ts:5-12:
type LLMProviderDefinition = {
  id: LLMProviderId;
  label: string;
  kind: LLMProviderKind;
  defaultBaseUrl: string;
  defaultModel: string;
  requiresApiKey: boolean;
};

LLMStreamProvider

From src/llm/types.ts:29-32:
type LLMStreamProvider = {
  definition: LLMProviderDefinition;
  stream: (config: LLMProviderConfig, request: LLMStreamRequest) => Promise<void>;
};

ProviderMessage

From src/llm/providers.ts:130-133:
type ProviderMessage = {
  role: "system" | "user";
  content: string;
};
Note: Assistant messages not used (streaming responses only)

Best Practices

  1. Always validate API keys before making requests
  2. Use AbortController for user cancellation
  3. Handle network errors gracefully (connection failures, timeouts)
  4. Respect rate limits - implement backoff for production
  5. Validate response status before parsing SSE
  6. Sanitize context snippets - limit to 8 most relevant
  7. Test with different models - behavior varies by provider
  8. Set appropriate timeouts - default fetch timeout is browser-dependent
  9. Monitor token usage - track costs for metered APIs
  10. Fallback to smaller models on errors (e.g., gpt-4o → gpt-4o-mini)

Common Issues

CORS Errors (LM Studio)

Ensure LM Studio CORS settings allow your origin:
{
  "cors.allowedOrigins": ["http://localhost:5173"]
}

Invalid API Key

if (!response.ok) {
  if (response.status === 401) {
    throw new Error('Invalid API key');
  }
  if (response.status === 403) {
    throw new Error('API key lacks required permissions');
  }
}

Rate Limiting

if (response.status === 429) {
  const retryAfter = response.headers.get('Retry-After');
  throw new Error(`Rate limited. Retry after ${retryAfter}s`);
}

Model Not Found

if (response.status === 404) {
  throw new Error(`Model "${config.model}" not found`);
}