Almost every week a founder tells us: “We want to fine-tune a model on our data.” Almost every week, after looking at the actual use case, the right answer is something cheaper, faster, and easier to maintain. Here's the decision framework we use on real client projects.
What each one actually does
RAG (retrieval-augmented generation) gives the model your knowledge at question time. Your documents are indexed in a vector database; when a question comes in, the relevant passages are retrieved and handed to the model alongside the question. The model never “learns” your data — it reads the right page at the right moment.
Fine-tuning changes the model's weights using your examples. It doesn't teach the model new facts reliably — it teaches it new behaviour: a tone, a format, a classification scheme, a house style.
The decision table we use
Choose RAG when: answers must cite current documents, your knowledge changes weekly, you need audit trails showing where an answer came from, or you handle compliance-sensitive content. Updating knowledge = re-indexing a file. Done in minutes.
Choose fine-tuning when: you need consistent output structure at very high volume, a specific brand voice across millions of generations, or a small fast model to mimic an expensive one for one narrow task.
Choose neither when: a well-engineered prompt with a few examples solves it. This is the honest answer for the majority of business workflows we scope — and it ships in days, not weeks.
What we actually deploy
Nearly every production agent we ship uses RAG over pgvector or Pinecone, strong system prompting, and tool calling. Fine-tuning enters the picture in fewer than one in ten projects — usually months after launch, once real usage data shows a narrow, high-volume pattern worth optimizing. Starting with fine-tuning is starting with the expensive, rigid option before you know what your users actually ask.
The architecture decision should take one conversation, not one quarter. Bring us your use case and we'll tell you which bucket it falls into — including when the answer is the boring one.