top of page

How to Create an OpenAI-Compatible Wrapper for Ollama

  • Writer: Adrian Araya
    Adrian Araya
  • Apr 28
  • 3 min read
How to create an OpenAI compatible Wrapper for ollama

As large language models become more integral to modern applications, developers often face the challenge of switching between providers like OpenAI and local solutions such as Ollama. The OpenAI Python client has emerged as a standard for interacting with chat models, providing a simple interface and wide community support. By merely changing the base URL and API key, you can seamlessly switch between OpenAI and Ollama models while maintaining the same interface.


This is where an Ollama OpenAI wrapper becomes especially useful. It allows developers to leverage the familiar OpenAI API structure while running models locally through Ollama, reducing dependencies on external services and increasing flexibility for experimentation and deployment.


Installing Ollama


To get started with Ollama, install it using the following command:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, you can pull models by running (for example, for llama3.2 model):

ollama pull llama3.2

A complete list of available models can be found at https://ollama.com/search. To test a model interactively, try:

ollama run llama3.2

Installing OpenAI Python Client


Install the official OpenAI Python client with pip:

pip install openai

Creating an Ollama Wrapper with the OpenAI Client (Python)


You can create a class that points to the local Ollama server but uses the same interface as the OpenAI client. This lets you interact with Ollama models just like OpenAI models:


from openai import OpenAI

class ChatModel:
    def __init__(self, base_url, key):
        self.client = OpenAI(
            base_url=base_url,
            api_key=key,
        )

    def chat_completion(self, model, messages):
        response = self.client.chat.completions.create(
            model=model,
            messages=messages
        )
        return response

BASE_URL = "http://localhost:11434/v1"  # Default local URL for Ollama
chatModel = ChatModel(base_url=BASE_URL, key="fake-key")  # Key is required but not used by Ollama

messages = [
    {"role": "system", "content": "You are a Jetson-based assistant."},
    {"role": "user", "content": "How can I optimize GPU usage on a Jetson Nano?"},
    {"role": "assistant", "content": "Use TensorRT for inference and disable services you don't need."},
    {"role": "user", "content": "Got it, thanks!"}
]

response = chatModel.chat_completion(model="llama3.2", messages=messages)
print(response.choices[0].message.content)

Using the Same Wrapper with OpenAI (Python)


To switch to OpenAI's hosted models, just change the base URL and provide a valid OpenAI API key:


from openai import OpenAI

class ChatModel:
    def __init__(self, base_url, key):
        self.client = OpenAI(
            base_url=base_url,
            api_key=key,
        )

    def chat_completion(self, model, messages):
        response = self.client.chat.completions.create(
            model=model,
            messages=messages
        )
        return response

BASE_URL = "https://api.openai.com/v1"  # OpenAI API URL
chatModel = ChatModel(base_url=BASE_URL, key="your-openai-key")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"},
    {"role": "assistant", "content": "The LA Dodgers won in 2020."},
    {"role": "user", "content": "Where was it played?"}
]

response = chatModel.chat_completion(model="gpt-4o", messages=messages)
print(response.choices[0].message.content)

Setting Model Parameters


The chat.completions.create method supports several parameters to fine-tune model behavior. Some commonly used parameters include:


  • max_completion_tokens: Limits the number of tokens in the completion.

  • temperature: Controls randomness (higher is more random).


Example:

response = self.client.chat.completions.create(
    model=model,
    messages=messages,
    max_completion_tokens=200,
    temperature=0.7
)

Refer to the official OpenAI documentation for a complete list of parameters.


Using Ollama with OpenAI’s JavaScript Library


First, install the OpenAI package:

npm install openai

Then, you can interact with Ollama by pointing the OpenAI library to the local Ollama server

import OpenAI from 'openai'

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama', // required but unused
})

const completion = await openai.chat.completions.create({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})

console.log(completion.choices[0].message.content)

This approach keeps the same workflow, allowing you to switch between Ollama and OpenAI with minimal changes to your application.

Comments


bottom of page