Ollama Client

Overview

The Ollama client provides local embedding generation and LLM chat capabilities through the Ollama server. It supports both the /api/embed (batch) and /api/embeddings (single) endpoints with automatic fallback detection.

OllamaEmbeddingClient

Constructor

From src/embeddings/ollamaClient.ts:97-116:

import { OllamaEmbeddingClient } from './embeddings/ollamaClient';

const client = new OllamaEmbeddingClient({
  baseUrl: 'http://localhost:11434',
  model: 'nomic-embed-text',
  timeoutMs: 30000,
});

baseUrl

string

required

Ollama server URL. Must use localhost, 127.0.0.1, or [::1] for security.

model

string

required

Embedding model name (e.g., "nomic-embed-text", "mxbai-embed-large")

timeoutMs

number

required

Request timeout in milliseconds. Minimum: 1000ms.

Throws:

Error("Ollama endpoint must use localhost / 127.0.0.1 / [::1]") - Non-local URL
Error("Ollama embedding model is required") - Empty model name

Embedding Methods

embedBatch

From src/embeddings/ollamaClient.ts:189-242:

const texts = [
  'First document content',
  'Second document content',
  'Third document content',
];

const embeddings = await client.embedBatch(texts);
// Float32Array[] - one vector per input text

console.log(embeddings.length); // 3
console.log(embeddings[0]); // Float32Array [0.123, -0.456, ...]

texts

string[]

required

Array of text strings to embed. Returns empty array if input is empty.

Returns: Promise<Float32Array[]> Behavior:

Automatically detects endpoint (/api/embed vs /api/embeddings)
/api/embed: Batches all texts in single request
/api/embeddings: Sends one request per text sequentially
Validates response vector count matches input count

Throws:

Error("Ollama request timed out after Xms")
Error("Ollama /api/embed failed: <reason>")
Error("Ollama /api/embeddings failed: <reason>")
Error("invalid embedding vector format")
Error("embedding vector contains non-numeric values")

Health & Probing

healthCheck

From src/embeddings/ollamaClient.ts:167-176:

const availableModels = await client.healthCheck();
// ['nomic-embed-text', 'llama3.1:8b', 'mxbai-embed-large']

Returns: Promise<string[]> - List of available model names Queries /api/tags and parses the models array. Throws:

Error("Ollama health check failed: <reason>")

probeRuntime

From src/embeddings/ollamaClient.ts:178-187:

const runtime = await client.probeRuntime();
// {
//   baseUrl: 'http://localhost:11434',
//   model: 'nomic-embed-text',
//   endpoint: 'embed',
//   availableModels: ['nomic-embed-text', ...]
// }

Returns: Promise<OllamaEmbeddingRuntimeInfo>

baseUrl

string

Configured base URL

model

string

Configured model name

endpoint

'embed' | 'embeddings'

Detected endpoint type (batch vs single)

availableModels

string[]

Models discovered via /api/tags

Performs both health check and endpoint detection.

Endpoint Detection

From src/embeddings/ollamaClient.ts:136-165: The client automatically detects which endpoint to use:

First request: Probes /api/embed with {"model": "...", "input": ["probe"]}
HTTP 404 or 405: Falls back to /api/embeddings
HTTP 200: Uses /api/embed (batch endpoint)
Other errors: Throws with reason

Endpoint is cached after first detection.

/api/embed (Batch)

{
  "model": "nomic-embed-text",
  "input": ["text1", "text2", "text3"]
}

Response:

{
  "embeddings": [
    [0.1, 0.2, 0.3],
    [0.4, 0.5, 0.6],
    [0.7, 0.8, 0.9]
  ]
}

/api/embeddings (Single)

{
  "model": "nomic-embed-text",
  "prompt": "single text"
}

Response:

{
  "embedding": [0.1, 0.2, 0.3]
}

Ollama LLM Provider

Provider Definition

From src/llm/providers.ts:163-170:

{
  id: "ollama",
  label: "Local (Ollama)",
  kind: "local",
  defaultBaseUrl: "http://localhost:11434",
  defaultModel: "llama3.1:8b",
  requiresApiKey: false,
}

Chat Streaming

From src/llm/providers.ts:214-235:

import { getProviderById } from './llm/providers';
import type { LLMProviderConfig, LLMStreamRequest } from './llm/types';

const provider = getProviderById('ollama');

const config: LLMProviderConfig = {
  baseUrl: 'http://localhost:11434',
  model: 'llama3.1:8b',
};

const request: LLMStreamRequest = {
  prompt: 'Explain vector embeddings',
  contextSnippets: ['Vector embeddings are...', 'Semantic search uses...'],
  signal: new AbortController().signal,
  onToken: (token) => process.stdout.write(token),
};

await provider.stream(config, request);

Endpoint: POST /api/chat Request Format:

{
  "model": "llama3.1:8b",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a recommendation assistant..."
    },
    {
      "role": "user",
      "content": "Explain vector embeddings\n\nContext 1:\nVector embeddings are..."
    }
  ]
}

Response Format: JSON Lines (NDJSON) From src/llm/providers.ts:82-127:

{"message":{"content":"Vector"},"done":false}
{"message":{"content":" embeddings"},"done":false}
{"message":{"content":" are"},"done":false}
{"done":true}

The stream parser extracts:

message.content - Token text
response - Alternative token field
done - End of stream flag

Usage Examples

Basic Embedding

From test: src/embeddings/ollamaClient.test.ts:20-73:

const client = new OllamaEmbeddingClient({
  baseUrl: 'http://localhost:11434',
  model: 'nomic-embed-text',
  timeoutMs: 10000,
});

// Verify connection
const runtime = await client.probeRuntime();
console.log(`Using ${runtime.endpoint} endpoint`);
console.log(`Available models: ${runtime.availableModels.join(', ')}`);

// Generate embeddings
const vectors = await client.embedBatch(['hello', 'world']);
console.log(`Generated ${vectors.length} embeddings`);
console.log(`Dimension: ${vectors[0].length}`);

Connection Testing

From src/pages/UsagePage.tsx:1161-1181:

import { OllamaEmbeddingClient } from './embeddings/ollamaClient';

async function testOllamaConnection(
  baseUrl: string,
  model: string,
): Promise<string> {
  try {
    const client = new OllamaEmbeddingClient({
      baseUrl: baseUrl.trim() || 'http://localhost:11434',
      model: model.trim() || 'nomic-embed-text',
      timeoutMs: 30000,
    });

    const runtime = await client.probeRuntime();
    return `Connected (${runtime.endpoint}) · model ${runtime.model} · ${runtime.availableModels.length} models detected`;
  } catch (error) {
    const message = error instanceof Error ? error.message : String(error);
    if (message.toLowerCase().includes('localhost')) {
      return 'Ollama URL must be localhost, 127.0.0.1, or [::1].';
    }
    return message;
  }
}

Batch Processing with Timeout

const client = new OllamaEmbeddingClient({
  baseUrl: 'http://127.0.0.1:11434',
  model: 'mxbai-embed-large',
  timeoutMs: 60000, // 60 second timeout for large batches
});

const chunks = [
  'Document chunk 1...',
  'Document chunk 2...',
  // ... hundreds of chunks
];

const embeddings = await client.embedBatch(chunks);

// Store in database
for (let i = 0; i < embeddings.length; i++) {
  await db.insert({
    text: chunks[i],
    embedding: Array.from(embeddings[i]),
  });
}

Type Definitions

OllamaEmbeddingRuntimeInfo

From src/embeddings/ollamaClient.ts:3-8:

type OllamaEmbeddingRuntimeInfo = {
  baseUrl: string;
  model: string;
  endpoint: OllamaEmbeddingEndpoint;
  availableModels: string[];
};

OllamaEmbeddingEndpoint

type OllamaEmbeddingEndpoint = "embed" | "embeddings";

Security Constraints

From src/embeddings/ollamaClient.ts:14-18:

const LOCAL_ENDPOINT_PATTERN = /^https?:\/\/(localhost|127\.0\.0\.1|\[::1\])(?::\d+)?$/i;

The client only accepts localhost URLs:

http://localhost:11434
http://127.0.0.1:11434
http://[::1]:11434
https://localhost:8443

This prevents accidental exposure to remote servers.

Error Handling

Timeout Errors

From src/embeddings/ollamaClient.ts:118-133:

try {
  const embeddings = await client.embedBatch(texts);
} catch (error) {
  if (error instanceof Error && error.message.includes('timed out')) {
    console.error('Ollama request exceeded timeout');
    // Retry with larger timeout or smaller batch
  }
}

Error Message Extraction

From src/embeddings/ollamaClient.ts:83-95: The client parses JSON error responses:

{"error": "model not found"}

{"message": "invalid request"}

Falls back to HTTP status if parsing fails:

404 Not Found

Best Practices

Always use localhost URLs for security
Set appropriate timeouts based on batch size
Call probeRuntime() before processing to verify connection
Handle both endpoint types - client does this automatically
Monitor available models via healthCheck for debugging
Use Float32Array directly - already optimized for storage
Batch when possible - /api/embed is more efficient than individual requests

Common Models

Embedding Models:

nomic-embed-text - General purpose embeddings
mxbai-embed-large - Higher quality, slower
all-minilm - Lightweight embeddings

Chat Models:

llama3.1:8b - Balanced performance
llama3.2:3b - Faster, lower memory
mistral:7b - Alternative architecture
qwen2.5:7b - Multilingual support

Install models via:

ollama pull nomic-embed-text
ollama pull llama3.1:8b

Core Modules

LLM Providers

GitHub Integration

Overview

OllamaEmbeddingClient

Constructor

Embedding Methods

embedBatch

Health & Probing

healthCheck

probeRuntime

Endpoint Detection

/api/embed (Batch)

/api/embeddings (Single)

Ollama LLM Provider

Provider Definition

Chat Streaming

Usage Examples

Basic Embedding

Connection Testing

Batch Processing with Timeout

Type Definitions

OllamaEmbeddingRuntimeInfo

OllamaEmbeddingEndpoint

Security Constraints

Error Handling

Timeout Errors

Error Message Extraction

Best Practices

Common Models

Core Modules

LLM Providers

GitHub Integration

​Overview

​OllamaEmbeddingClient

​Constructor

​Embedding Methods

​embedBatch

​Health & Probing

​healthCheck

​probeRuntime

​Endpoint Detection

​/api/embed (Batch)

​/api/embeddings (Single)

​Ollama LLM Provider

​Provider Definition

​Chat Streaming

​Usage Examples

​Basic Embedding

​Connection Testing

​Batch Processing with Timeout

​Type Definitions

​OllamaEmbeddingRuntimeInfo

​OllamaEmbeddingEndpoint

​Security Constraints

​Error Handling

​Timeout Errors

​Error Message Extraction

​Best Practices

​Common Models

Overview

OllamaEmbeddingClient

Constructor

Embedding Methods

embedBatch

Health & Probing

healthCheck

probeRuntime

Endpoint Detection

/api/embed (Batch)

/api/embeddings (Single)

Ollama LLM Provider

Provider Definition

Chat Streaming

Usage Examples

Basic Embedding

Connection Testing

Batch Processing with Timeout

Type Definitions

OllamaEmbeddingRuntimeInfo

OllamaEmbeddingEndpoint

Security Constraints

Error Handling

Timeout Errors

Error Message Extraction

Best Practices

Common Models