Support for AI Assistants other than OpenAI?

THenrich · July 2024

Are we going to get support for other AI Assistants like Gemini and Claude?

JFalcon · July 2024

While open-source models are not as robust as those like Gemini and Claude, you can run them locally with servers that support an OpenAI-compatible API interface. Two back-ends you can use to do this are LMStudio and Ollama, and I am very partial to using Ollama myself.

You will have the best experience if you have a powerful GPU with a lot of VRAM (an RTX 3090 with 24GB is good), Alternatively, you can do inferencing on the CPU if you have a decent processor and a good chunk of RAM (I would recommend 32GB or more).

The benefits of doing this are:
1. Your model is running locally and everything sent to the LLM remains on your system, not going to the internet.
2. You can choose many different models, some of which are better at some tasks than others.

The drawbacks are:
1. It does take computing resources to get good performance.
2. Most high-end PCs can only reasonably run smaller models, so the quality is not like GPT4, Claude.

However, even the smaller models are getting much better. In the order of model size (largest to smallest), Mistral AI's Codestral 22B, Microsoft's Phi-3 14B and Meta's Llama3.1 8B models are not bad. I have played with each using LINQPad.

The quick & dirty to get started:
1. Download and install the Ollama server beta for Windows: https://ollama.com/
2. Use ollama pull <modelname> with the name of the model you want to use. You can search for models and copy the 'run' command (replace 'run' with 'pull' to download the model down, similar to a docker image). There is a model search on the ollama page above.
3. Configure LINQPad settings as shown below (note by default, Ollama will store models in an ".ollama" folder in your user's AppData folder. You can set a System Environment variable OLLAMA_MODELS to the path if you want them placed somewhere else, like a separate SSD.

Note that I am using the Llama 3.1 8B parameter model with 8-bit quantization in that screenshot, so you can use ollama pull llama3.1:8b-instruct-q8_0 Ollama listens on port 11434 by default and supports a compatible chat completions interface.

Support for AI Assistants other than OpenAI?

Comments

Categories