Programming

Published on
25 de marzo de 2026
[EN] Acervo v0.2: I fine-tuned my own model and the pipeline was cut in half
AI Programming
v0.1 proved that compressing conversations into a knowledge graph works. But the pipeline was complex: two models, four LLM calls per turn, 14GB of VRAM. In v0.2 I fine-tuned my own model to do chat and extraction with a single prompt — the pipeline was cut in half, VRAM dropped to 6GB, and Acervo became stateless. In this post: the fine-tuning process, the lessons I couldn't find in any tutorial, and why a single model changed everything.
Published on
22 de marzo de 2026
[EN] Acervo v0.1: I built my own AI memory because the standard solution doesn't work
AI Programming
Every time you talk to an LLM, you send the entire previous conversation. Turn 1: 200 tokens. Turn 100: the context fills up and it starts forgetting. RAG helps but brings kilos of raw text. Acervo is a different approach: extract structured knowledge from each conversation, compress it into a graph, and reconstruct context on demand — regardless of the session. In this post: the problem, why RAG isn't enough, how I came up with the idea, and what we built in the first version.

[EN] Acervo v0.2: I fine-tuned my own model and the pipeline was cut in half