Guides

A short list of guides, each one written from real time on the hardware. If you are new here, start with the beginner's guide or check what can I run on my GPU.

Start here

Beginner's guide to local AI
New to local LLMs? Start here. What "local" means, what hardware you need, and how to pick a first model.
How to run LLMs locally
Three ways to get a model running on your machine: Ollama, LM Studio, llama.cpp. Walks through install + first chat.
What can I run on my GPU?
Use this when you already have a GPU and want to know which models fit.
How much VRAM do I need?
The simple formula: parameters x bytes per param x overhead. Worked through Q4, Q5, Q8, and FP16.

Best GPU for LLMs (overall)
My picks by budget and use case. Real tokens-per-second numbers where I have them.
Best 24GB VRAM GPU for LLMs
RTX 3090, 4090, 7900 XTX, A5000. Which makes sense in 2026.
Best budget GPU for LLMs
Under $500 picks. RTX 3060 12GB still leads on $/GB VRAM.
RTX 4090 LLM guide
What it actually runs, what it does not, and how thermals behave in a real room.
RTX 3090 LLM guide
The used 3090 is the best $/GB you can buy. Caveats included.
RTX 4070 vs 4080 vs 4090
Which tier is the right break-point for you. Not always the 4090.
RTX 5090 vs 4090 for LLMs
Is the Blackwell jump worth it for local inference? Short answer: depends on workload.

Apple Silicon for LLMs
How unified memory actually performs vs discrete GPUs. M3 and M4 series.
Mac vs PC for LLMs
When a Mac wins, when a PC wins. Different workloads land in different places.

Ollama cheat sheet
Every command I actually use, with the flags I always forget.
LM Studio vs Ollama
Pick the right tool for the job. They are not interchangeable.
How to run LLMs on Windows
Native Windows + WSL2 routes. Driver gotchas included.
How to run LLMs on Mac
Ollama, LM Studio, and llama.cpp on Apple Silicon. What works best for which model size.

Quantization explained
Why Q4 fits where FP16 cannot, and where quality actually drops.
GGUF vs GPTQ
The two formats you keep seeing. Which to use when.
LLM system requirements
CPU, RAM, storage, and PSU notes most guides skip.

Llama 3.1 hardware requirements
What 8B, 70B, and 405B need. With Q4/Q8/FP16 breakdowns.
Qwen3 hardware requirements
Qwen3 is the most VRAM-efficient open model right now. Sizes that actually fit.
DeepSeek hardware requirements
DeepSeek R1 and V3, including realistic options for the 671B beast.
Phi-4 hardware requirements
Microsoft Phi-4 14B is the easiest mid-size model to run well.
Gemma 3 hardware requirements
Google Gemma 3, including the 27B that fits in 24GB VRAM at Q4.
Mistral hardware requirements
Mistral 7B, Mixtral 8x7B, and Mistral Small / Large.

Best local LLMs to run
The shortlist that actually holds up across coding, writing, and reasoning tasks.
Best LLM for coding
Which open model gets closest to GPT-4-class code completion.
Best LLM for writing locally
For drafting, editing, and tone control. Not the same as coding picks.
Best LLM for data analysis
Models that handle tabular reasoning and structured outputs well.