- Published on
v0.1 proved that compressing conversations into a knowledge graph works. But the pipeline was complex: two models, four LLM calls per turn, 14GB of VRAM. In v0.2 I fine-tuned my own model to do chat and extraction with a single prompt — the pipeline was cut in half, VRAM dropped to 6GB, and Acervo became stateless. In this post: the fine-tuning process, the lessons I couldn't find in any tutorial, and why a single model changed everything.