2025-07-29

Using OpenAI Python SDK with Local Ollama Models (and When to Opt for Alternatives)

I've been diving into how to use the official openai Python package to talk to local Ollama models—and when it makes sense to bring in abstraction layers like LiteLLM. Let me walk you through what I learned.

1. Can I use the OpenAI Python package for local Ollama models?

Yes! Since early February 2024, Ollama supports the OpenAI Chat Completions API, exposing compatible endpoints locally. You can simply point the OpenAI client at "http://localhost:11434/v1", pass a dummy API key, and call completions just like you would to OpenAI’s hosted API (see Ollama blog).

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused-key")

response = client.chat.completions.create(
  model="llama2",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What’s the capital of France?"}
  ],
)
print(response.choices[0].message.content)

You can also do embeddings similarly:

resp = client.embeddings.create(model="llama2", input="Hello world!")
print(resp.data[0].embedding)

So for fairly simple local projects, the OpenAI SDK works perfectly with Ollama.

2. When should I use LiteLLM instead?

LiteLLM is a lightweight Python SDK (and optional proxy server) that provides a unified API for over 100 LLM providers—including OpenAI, Anthropic, HuggingFace—and crucially, Ollama/local models ( nice example with minimal Flask app - poem generator KodeKloud Notes).

Here are some benefits of using LiteLLM: - It standardizes completions, embeddings, streaming, retries, and fallback logic
- You can swap providers (e.g. openai/gpt‑4, anthropic/claude, ollama/llama2) with no code changes
- Proxy server mode offers observability, logging, rate limiting, and cost tracking across providers (see LiteLLM documentation: LiteLLM)

3. Example: using LiteLLM Python SDK

First install:

pip install litellm

Then in Python:

import os
from litellm import completion, embeddings

os.environ["OPENAI_API_KEY"] = "dummy"
os.environ["LITELLM_OLLAMA_BASE"] = "http://localhost:11434/v1"

# Completion via Ollama
resp = completion(model="ollama/llama2", messages=[{"role":"user","content":"Hello!"}])
print(resp["choices"][0]["message"]["content"])

# Embeddings via Ollama
emb = embeddings(model="ollama/llama2", input="Hello world")
print(emb["data"][0]["embedding"])

Later you can just switch to openai/gpt-4o or another provider. You keep the same completion(...) call. No branching logic in your app (Langfuse, LiteLLM, LiteLLM).

5. Alternatives to LiteLLM

There are several other frameworks you may consider:

Summary

I like using the OpenAI Python SDK with Ollama—it’s quick, reliable, and simple for local use cases. But as soon as I need to add other providers, handle retries/fallbacks, use embeddings, or manage observability and switching logic, LiteLLM becomes more convenient. And if I’m building complex agent pipelines or need structure, then libraries like Langchain or TrueFoundry fit right in.