When Should You Fine-Tune?
Fine-tuning isn't always the answer. Consider it when:
- You need consistent output formatting that prompting can't achieve
- You want to encode domain-specific knowledge or behavior
- You need to reduce token usage (shorter prompts after fine-tuning)
- You require faster inference with smaller specialized models
Fine-Tuning Techniques
Full Fine-Tuning
Updates all model parameters. Requires significant compute but produces the best results. Use for critical production models where quality is paramount.
LoRA and QLoRA
Low-Rank Adaptation adds small trainable matrices alongside frozen model weights. QLoRA quantizes the base model to 4-bit, making fine-tuning possible on consumer GPUs. This is the most popular approach for most use cases.
DPO (Direct Preference Optimization)
Aligns model outputs with human preferences without needing a separate reward model. Simpler than RLHF and increasingly the preferred approach for behavior alignment.
Data Preparation
The quality of your fine-tuning data matters far more than quantity. Focus on:
- Diverse, representative examples (1,000-10,000 is usually sufficient)
- Consistent formatting and style
- Edge cases and failure modes
- Deduplication and quality filtering
Evaluation
Always maintain a held-out test set. Use both automated metrics (perplexity, BLEU, exact match) and human evaluation. A/B testing in production is the gold standard for measuring real-world impact.