Overview
Remote LLM providers enable chat functionality through external APIs. The system supports any OpenAI-compatible API (OpenAI, Anthropic via proxies, OpenRouter, etc.) and local LM Studio servers.
Provider Definitions
OpenAI-Compatible
From src/llm/providers.ts:155-162:
{
id: "openai-compatible",
label: "Remote (OpenAI-compatible)",
kind: "remote",
defaultBaseUrl: "https://api.openai.com",
defaultModel: "gpt-4o-mini",
requiresApiKey: true,
}
Provider identifier: "openai-compatible"
Default: "https://api.openai.com"
LM Studio
From src/llm/providers.ts:171-178:
{
id: "lmstudio",
label: "Local (LM Studio)",
kind: "local",
defaultBaseUrl: "http://localhost:1234",
defaultModel: "local-model",
requiresApiKey: false,
}
Provider identifier: "lmstudio"
Provider type: "local" (runs on localhost)
Default: "http://localhost:1234"
Configuration
LLMProviderConfig
From src/llm/types.ts:22-27:
type LLMProviderConfig = {
baseUrl: string;
model: string;
apiKey?: string;
allowModelDownload?: boolean; // WebLLM only
};
API base URL (e.g., "https://api.openai.com")
Model identifier (e.g., "gpt-4o-mini", "claude-3-5-sonnet-20241022")
API key for authentication. Required for remote providers.
Only used by WebLLM provider. Ignored for remote/LM Studio.
LLMStreamRequest
From src/llm/types.ts:14-20:
type LLMStreamRequest = {
prompt: string;
contextSnippets: string[];
signal: AbortSignal;
onToken: (token: string) => void;
onInitProgress?: (progress: number, text: string) => void;
};
Context chunks to include (max 8 used, see TOP_K_LIMIT)
Abort signal to cancel the request
onToken
(token: string) => void
required
Callback invoked for each streamed token
onInitProgress
(progress: number, text: string) => void
Progress callback (WebLLM only, ignored by remote providers)
Streaming API
OpenAI-Compatible Stream
From src/llm/providers.ts:190-212:
import { getProviderById } from './llm/providers';
import type { LLMProviderConfig, LLMStreamRequest } from './llm/types';
const provider = getProviderById('openai-compatible');
const config: LLMProviderConfig = {
baseUrl: 'https://api.openai.com',
model: 'gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY,
};
const controller = new AbortController();
const request: LLMStreamRequest = {
prompt: 'Explain semantic search',
contextSnippets: [
'Semantic search uses embeddings...',
'Vector databases store embeddings...',
],
signal: controller.signal,
onToken: (token) => {
process.stdout.write(token);
},
};
await provider.stream(config, request);
Endpoint: POST {baseUrl}/v1/chat/completions
Request Headers:
{
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
Request Body:
{
"model": "gpt-4o-mini",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are a recommendation assistant for GitHub starred repositories. Use only provided context and be concise."
},
{
"role": "user",
"content": "Explain semantic search\n\nContext 1:\nSemantic search uses embeddings...\n\nContext 2:\nVector databases store embeddings..."
}
]
}
Response: Server-Sent Events (SSE)
From src/llm/providers.ts:31-80:
data: {"choices":[{"delta":{"content":"Semantic"}}]}
data: {"choices":[{"delta":{"content":" search"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]
LM Studio Stream
From src/llm/providers.ts:237-259:
const provider = getProviderById('lmstudio');
const config: LLMProviderConfig = {
baseUrl: 'http://localhost:1234',
model: 'local-model', // Model name from LM Studio
// No apiKey needed
};
const request: LLMStreamRequest = {
prompt: 'What are vector embeddings?',
contextSnippets: [],
signal: new AbortController().signal,
onToken: (token) => console.log(token),
};
await provider.stream(config, request);
Endpoint: POST http://localhost:1234/v1/chat/completions
Request Format: Same as OpenAI-compatible (no Authorization header)
Response: Server-Sent Events (SSE) - same parsing logic
Message Building
From src/llm/providers.ts:135-147:
function buildMessages(prompt: string, snippets: string[]): ProviderMessage[] {
return [
{
role: "system",
content:
"You are a recommendation assistant for GitHub starred repositories. Use only provided context and be concise.",
},
{
role: "user",
content: `${prompt}\n\n${buildContextBlock(snippets)}`,
},
];
}
Context Formatting
From src/llm/providers.ts:16-21:
const TOP_K_LIMIT = 8;
function buildContextBlock(snippets: string[]): string {
return snippets
.slice(0, TOP_K_LIMIT)
.map((snippet, index) => `Context ${index + 1}:\n${snippet}`)
.join("\n\n");
}
Example Output:
Context 1:
First snippet content here...
Context 2:
Second snippet content here...
Context 3:
Third snippet content here...
Only the first 8 snippets are included.
Stream Parsing
SSE Parser
From src/llm/providers.ts:31-80:
async function parseSseStream(
response: Response,
onToken: (token: string) => void,
): Promise<void> {
if (!response.body) {
throw new Error("Streaming response body is not available");
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
const trimmed = line.trim();
if (!trimmed.startsWith("data:")) continue;
const raw = trimmed.slice(5).trim();
if (raw === "[DONE]") return;
if (!raw) continue;
try {
const payload = JSON.parse(raw);
const token =
payload.choices?.[0]?.delta?.content ??
payload.choices?.[0]?.message?.content;
if (token) {
onToken(token);
}
} catch {
// Skip malformed SSE lines without killing the stream
}
}
}
}
Extracts tokens from:
choices[0].delta.content (streaming format)
choices[0].message.content (alternative format)
Handles:
data: [DONE] - End marker
- Malformed JSON - Skips silently
- Empty data lines - Ignores
Error Handling
From src/llm/providers.ts:295-319:
import { formatProviderError } from './llm/providers';
try {
await provider.stream(config, request);
} catch (error) {
const userMessage = formatProviderError(error, selectedProvider.kind);
console.error(userMessage);
}
Error Messages:
“Local provider unreachable. Ensure Ollama/LM Studio is running and CORS/network access allows localhost calls.”
“Local provider unreachable. Ensure Ollama/LM Studio is running and CORS/network access allows localhost calls.”
Returns the original error message
HTTP Error Handling
From src/llm/providers.ts:207-209:
if (!response.ok) {
throw new Error(`Provider request failed (${response.status})`);
}
No automatic retry logic - errors propagate immediately.
Provider Selection
Get Provider Definitions
import { getProviderDefinitions } from './llm/providers';
const providers = getProviderDefinitions();
// LLMProviderDefinition[]
providers.forEach((provider) => {
console.log(`${provider.label}: ${provider.id}`);
console.log(` Kind: ${provider.kind}`);
console.log(` Default: ${provider.defaultModel}`);
console.log(` Requires API Key: ${provider.requiresApiKey}`);
});
Output:
Remote (OpenAI-compatible): openai-compatible
Kind: remote
Default: gpt-4o-mini
Requires API Key: true
Local (Ollama): ollama
Kind: local
Default: llama3.1:8b
Requires API Key: false
Local (LM Studio): lmstudio
Kind: local
Default: local-model
Requires API Key: false
Get Provider by ID
import { getProviderById } from './llm/providers';
import type { LLMStreamProvider } from './llm/types';
const provider: LLMStreamProvider = getProviderById('openai-compatible');
console.log(provider.definition.label);
// "Remote (OpenAI-compatible)"
await provider.stream(config, request);
Usage Examples
OpenAI
const config: LLMProviderConfig = {
baseUrl: 'https://api.openai.com',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY,
};
Anthropic (via OpenAI SDK)
const config: LLMProviderConfig = {
baseUrl: 'https://api.anthropic.com',
model: 'claude-3-5-sonnet-20241022',
apiKey: process.env.ANTHROPIC_API_KEY,
};
Note: Requires proxy that translates to OpenAI format
OpenRouter
const config: LLMProviderConfig = {
baseUrl: 'https://openrouter.ai/api',
model: 'anthropic/claude-3.5-sonnet',
apiKey: process.env.OPENROUTER_API_KEY,
};
Azure OpenAI
const config: LLMProviderConfig = {
baseUrl: 'https://YOUR_RESOURCE.openai.azure.com',
model: 'gpt-4o',
apiKey: process.env.AZURE_OPENAI_API_KEY,
};
Note: May require different endpoint paths
LM Studio Local
const config: LLMProviderConfig = {
baseUrl: 'http://localhost:1234',
model: 'lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF',
};
Cancellation
const controller = new AbortController();
const request: LLMStreamRequest = {
prompt: 'Long generation task...',
contextSnippets: [],
signal: controller.signal,
onToken: (token) => console.log(token),
};
// Start streaming
const streamPromise = provider.stream(config, request);
// Cancel after 5 seconds
setTimeout(() => {
controller.abort();
console.log('Generation cancelled');
}, 5000);
try {
await streamPromise;
} catch (error) {
if (error.name === 'AbortError') {
console.log('User cancelled generation');
}
}
Type Definitions
LLMProviderId
From src/llm/types.ts:3:
type LLMProviderId = "openai-compatible" | "ollama" | "lmstudio" | "webllm";
LLMProviderDefinition
From src/llm/types.ts:5-12:
type LLMProviderDefinition = {
id: LLMProviderId;
label: string;
kind: LLMProviderKind;
defaultBaseUrl: string;
defaultModel: string;
requiresApiKey: boolean;
};
LLMStreamProvider
From src/llm/types.ts:29-32:
type LLMStreamProvider = {
definition: LLMProviderDefinition;
stream: (config: LLMProviderConfig, request: LLMStreamRequest) => Promise<void>;
};
ProviderMessage
From src/llm/providers.ts:130-133:
type ProviderMessage = {
role: "system" | "user";
content: string;
};
Note: Assistant messages not used (streaming responses only)
Best Practices
- Always validate API keys before making requests
- Use AbortController for user cancellation
- Handle network errors gracefully (connection failures, timeouts)
- Respect rate limits - implement backoff for production
- Validate response status before parsing SSE
- Sanitize context snippets - limit to 8 most relevant
- Test with different models - behavior varies by provider
- Set appropriate timeouts - default fetch timeout is browser-dependent
- Monitor token usage - track costs for metered APIs
- Fallback to smaller models on errors (e.g., gpt-4o → gpt-4o-mini)
Common Issues
CORS Errors (LM Studio)
Ensure LM Studio CORS settings allow your origin:
{
"cors.allowedOrigins": ["http://localhost:5173"]
}
Invalid API Key
if (!response.ok) {
if (response.status === 401) {
throw new Error('Invalid API key');
}
if (response.status === 403) {
throw new Error('API key lacks required permissions');
}
}
Rate Limiting
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
throw new Error(`Rate limited. Retry after ${retryAfter}s`);
}
Model Not Found
if (response.status === 404) {
throw new Error(`Model "${config.model}" not found`);
}