Site Logo

Programming

  • Published on
    v0.1 proved that compressing conversations into a knowledge graph works. But the pipeline was complex: two models, four LLM calls per turn, 14GB of VRAM. In v0.2 I fine-tuned my own model to do chat and extraction with a single prompt — the pipeline was cut in half, VRAM dropped to 6GB, and Acervo became stateless. In this post: the fine-tuning process, the lessons I couldn't find in any tutorial, and why a single model changed everything.
  • Published on
    Every time you talk to an LLM, you send the entire previous conversation. Turn 1: 200 tokens. Turn 100: the context fills up and it starts forgetting. RAG helps but brings kilos of raw text. Acervo is a different approach: extract structured knowledge from each conversation, compress it into a graph, and reconstruct context on demand — regardless of the session. In this post: the problem, why RAG isn't enough, how I came up with the idea, and what we built in the first version.