Local AI Models

SupaSidebar supports three ways to run AI models for tag suggestions without sending your links to a hosted cloud service: built-in MLX models, Ollama, and any OpenAI-compatible server you point it at. The first two need no API key at all, and all three let you pick a model and see live connection status before you rely on them.

Option 1: Built-in Models (MLX)

The simplest option. Models download and run directly inside SupaSidebar using Apple’s MLX framework.

Requirements: Apple Silicon Mac (M1, M2, M3, or M4)

Setup:

Open Preferences → AI Tags
Set AI Provider to Local AI (MLX)
Click Download on your preferred model
Once downloaded, click Use This to activate it

Available Models:

Model	Size	Best For
Gemma 2 2B	~1.2GB	Compact & lightweight
Phi-4 Mini 3.8B	~2.2GB	Best for tagging (Recommended)
Qwen3 4B	~2.2GB	Good all-round
Qwen 2.5 7B	~4.3GB	Most accurate (16GB+ RAM)

RAM Guide:

8GB Mac - Use models up to 3B parameters (Gemma 2 2B, Phi-4 Mini)
16GB+ Mac - Can use all models including Qwen 2.5 7B

Option 2: Ollama (Any Local Model)

Use any model you want through Ollama. This gives you access to thousands of models and works on both Apple Silicon and Intel Macs.

Step 1: Install Ollama

Go to ollama.com and download the macOS app
Open Ollama - it runs quietly in the background

Step 2: Download a Model

Open Terminal and run:


ollama pull qwen2.5:3b

This downloads the Qwen 2.5 3B model, which we recommend for tag suggestions. You can also pull other models - any text generation model works.

Step 3: Configure in SupaSidebar

Open Preferences → AI Tags
Set AI Provider to Ollama (Local)
Click Check to verify the connection
Select your model from the dropdown (it auto-populates from Ollama)
Enable Auto-suggest tags if you want automatic suggestions

Recommended Ollama Models

Model	Command	Size	Best For
Qwen 2.5 3B	`ollama pull qwen2.5:3b`	1.9 GB	Best for tagging (Recommended)
Phi-4 Mini 3.8B	`ollama pull phi4-mini`	2.2 GB	Excellent at structured output
Llama 3.2 3B	`ollama pull llama3.2`	2.0 GB	Balanced speed and accuracy
Gemma 2 2B	`ollama pull gemma2:2b`	1.6 GB	Google’s efficient model
Qwen 2.5 1.5B	`ollama pull qwen2.5:1.5b`	1.0 GB	Smallest, fastest option

You can also use larger models for better accuracy:


ollama pull qwen2.5:7b    # 4.4 GB, very accurate
ollama pull llama3.1:8b   # 4.7 GB, excellent quality

Tips

Model names are not case sensitive - qwen2.5:3b and Qwen2.5:3B both work
Any text generation model works - if you already have models in Ollama, they’ll appear in the dropdown
The endpoint defaults to http://localhost:11434 - only change this if you’re running Ollama on another machine
You can use different models for Voice (in Voice settings) and Tags (in AI Tags settings)

Option 3: OpenAI-Compatible Server

Many local and self-hosted model runners (LM Studio, llama.cpp, vLLM, text-generation-webui) and most hosted gateways expose an OpenAI-compatible API. SupaSidebar can talk to any of them, so you can use a model running on another machine on your network or behind your own endpoint.

Setup

Open Preferences → AI Tags
Set AI Provider to OpenAI-Compatible
Enter the server’s base URL (for example http://localhost:1234/v1)
Add an API key if your server requires one. Leave it blank for local servers that don’t.
Click Check to verify the connection and see live status
Pick your model from the dropdown, or type the model name if the server doesn’t list one

The model dropdown auto-populates from servers that report their available models. If your server doesn’t, the model field accepts a name you type in.

MLX vs Ollama vs OpenAI-Compatible

	MLX (Built-in)	Ollama	OpenAI-Compatible
Setup	One-click download	Install Ollama first	Enter base URL (and key if needed)
Mac compatibility	Apple Silicon only	All Macs	All Macs
Model choice	4 curated models	Thousands of models	Whatever your server runs
Best for	Quick setup	Power users, custom models	Self-hosted or remote endpoints
Memory usage	Loaded in app memory	Runs as separate process	Runs on the server you point to

Our recommendation: Start with MLX + Phi-4 Mini for the easiest setup. Switch to Ollama if you want more model choices or have an Intel Mac, or use OpenAI-Compatible to reach a model running on another machine or your own endpoint.