Guides
A short list of guides, each one written from real time on the hardware. If you are new here, start with the beginner's guide or check what can I run on my GPU.
Start here
- Beginner's guide to local AI
New to local LLMs? Start here. What "local" means, what hardware you need, and how to pick a first model.
- How to run LLMs locally
Three ways to get a model running on your machine: Ollama, LM Studio, llama.cpp. Walks through install + first chat.
- What can I run on my GPU?
Use this when you already have a GPU and want to know which models fit.
- How much VRAM do I need?
The simple formula: parameters x bytes per param x overhead. Worked through Q4, Q5, Q8, and FP16.
GPU buying guides
- Best GPU for LLMs (overall)
My picks by budget and use case. Real tokens-per-second numbers where I have them.
- Best 24GB VRAM GPU for LLMs
RTX 3090, 4090, 7900 XTX, A5000. Which makes sense in 2026.
- Best budget GPU for LLMs
Under $500 picks. RTX 3060 12GB still leads on $/GB VRAM.
- RTX 4090 LLM guide
What it actually runs, what it does not, and how thermals behave in a real room.
- RTX 3090 LLM guide
The used 3090 is the best $/GB you can buy. Caveats included.
- RTX 4070 vs 4080 vs 4090
Which tier is the right break-point for you. Not always the 4090.
- RTX 5090 vs 4090 for LLMs
Is the Blackwell jump worth it for local inference? Short answer: depends on workload.
Apple Silicon and laptops
- Apple Silicon for LLMs
How unified memory actually performs vs discrete GPUs. M3 and M4 series.
- Mac vs PC for LLMs
When a Mac wins, when a PC wins. Different workloads land in different places.
Software setup
- Ollama cheat sheet
Every command I actually use, with the flags I always forget.
- LM Studio vs Ollama
Pick the right tool for the job. They are not interchangeable.
- How to run LLMs on Windows
Native Windows + WSL2 routes. Driver gotchas included.
- How to run LLMs on Mac
Ollama, LM Studio, and llama.cpp on Apple Silicon. What works best for which model size.
Concepts
- Quantization explained
Why Q4 fits where FP16 cannot, and where quality actually drops.
- GGUF vs GPTQ
The two formats you keep seeing. Which to use when.
- LLM system requirements
CPU, RAM, storage, and PSU notes most guides skip.
Model-specific
- Llama 3.1 hardware requirements
What 8B, 70B, and 405B need. With Q4/Q8/FP16 breakdowns.
- Qwen3 hardware requirements
Qwen3 is the most VRAM-efficient open model right now. Sizes that actually fit.
- DeepSeek hardware requirements
DeepSeek R1 and V3, including realistic options for the 671B beast.
- Phi-4 hardware requirements
Microsoft Phi-4 14B is the easiest mid-size model to run well.
- Gemma 3 hardware requirements
Google Gemma 3, including the 27B that fits in 24GB VRAM at Q4.
- Mistral hardware requirements
Mistral 7B, Mixtral 8x7B, and Mistral Small / Large.
Use cases
- Best local LLMs to run
The shortlist that actually holds up across coding, writing, and reasoning tasks.
- Best LLM for coding
Which open model gets closest to GPT-4-class code completion.
- Best LLM for writing locally
For drafting, editing, and tone control. Not the same as coding picks.
- Best LLM for data analysis
Models that handle tabular reasoning and structured outputs well.