Sandy Veliz

Blog personal donde comparto ideas, proyectos y temas que me apasionan: desarrollo web, inteligencia artificial, tecnología, Warhammer, juegos de rol, pintura y más. Un espacio para explorar tanto lo profesional como lo personal.

Published on
6 de abril de 2026
Acervo v0.5 — 21x más eficiente que un agente, y ahora recuerda lo que le decís
AI Programacion Acervo Fine-tuning LLM Knowledge-Graphs RAG
Hexagonal architecture, BFS semantic retrieval, compressed context format, conversation memory pipeline, and 79 turns of benchmark data across 3 domains.
Read more →
Published on
27 de marzo de 2026
Acervo v0.3 — Proof it works
AI Programacion Acervo Fine-tuning LLM Knowledge-Graphs
What shipped in v0.3: `acervo up`, document ingestion, graph inspection, chunk-aware retrieval, and 360 turns of benchmark data.
Read more →
Published on
25 de marzo de 2026
[EN] Acervo v0.2: I fine-tuned my own model and the pipeline was cut in half
AI Programming
v0.1 proved that compressing conversations into a knowledge graph works. But the pipeline was complex: two models, four LLM calls per turn, 14GB of VRAM. In v0.2 I fine-tuned my own model to do chat and extraction with a single prompt — the pipeline was cut in half, VRAM dropped to 6GB, and Acervo became stateless. In this post: the fine-tuning process, the lessons I couldn't find in any tutorial, and why a single model changed everything.
Read more →
Published on
25 de marzo de 2026
Acervo v0.2: fine-tuneé mi propio modelo y el pipeline se redujo a la mitad
AI Programacion
v0.1 demostró que comprimir conversaciones en un grafo de conocimiento funciona. Pero el pipeline era complejo: dos modelos, cuatro llamadas LLM por turno, 14GB de VRAM. En v0.2 fine-tuneé mi propio modelo para que haga chat y extracción con un solo prompt — el pipeline se redujo a la mitad, la VRAM bajó a 6GB, y Acervo se volvió stateless. En este post: el proceso de fine-tuning, las lecciones que no encontré en ningún tutorial, y por qué un solo modelo cambió todo.
Read more →
Published on
22 de marzo de 2026
[EN] Acervo v0.1: I built my own AI memory because the standard solution doesn't work
AI Programming
Every time you talk to an LLM, you send the entire previous conversation. Turn 1: 200 tokens. Turn 100: the context fills up and it starts forgetting. RAG helps but brings kilos of raw text. Acervo is a different approach: extract structured knowledge from each conversation, compress it into a graph, and reconstruct context on demand — regardless of the session. In this post: the problem, why RAG isn't enough, how I came up with the idea, and what we built in the first version.
Read more →

All Posts →

Acervo v0.5 — 21x más eficiente que un agente, y ahora recuerda lo que le decís

Acervo v0.3 — Proof it works

[EN] Acervo v0.2: I fine-tuned my own model and the pipeline was cut in half

Acervo v0.2: fine-tuneé mi propio modelo y el pipeline se redujo a la mitad

[EN] Acervo v0.1: I built my own AI memory because the standard solution doesn't work