My notes don't speak English. Most AI models do. I built a local RAG system with a model that actually understands Bahasa Indonesia, because querying your own knowledge shouldn't require translating your thoughts first.

The Problem

After two years of accumulating notes (lecture slides, research papers, personal summaries), I had a retrieval problem. Full-text search found keywords. I needed semantics. "Apa dampak kebijakan moneter terhadap inflasi?" should work even if I never wrote those exact words anywhere in my notes.

The obvious fix was to send everything to GPT. That fell apart on three counts:

Privacy. University notes, personal analysis, half-formed ideas. Not comfortable sending any of this to a third-party API.
Cost. At scale, API usage compounds. A system I use daily needs to cost nothing to run.
Language. Most of my notes are in Bahasa Indonesia. Cloud models handle this poorly: nuance flattens, responses drift toward translated-sounding English patterns, and the Indonesian framing I wrote in gets lost.

I needed a local solution that actually understood Indonesian.

What I Built

A RAG pipeline that runs entirely on my laptop. Ingest a directory of markdown files, build a vector index, query in natural language, get relevant answers. No internet connection required.

# Ingest a knowledge domain
sa ingest ~/notes/ekonomi/

# Query in Bahasa Indonesia
sa query "bagaimana kebijakan moneter mempengaruhi inflasi?"

# Switch to a different domain
sa ingest ~/notes/machine-learning/
sa query "perbedaan antara RAG dan fine-tuning?"

The system is organized around knowledge domains. Each directory is an independent knowledge base: economics, machine learning, personal journal, university courses. Switch contexts by switching the active directory.

How It Works

Markdown files (directory)
         ↓
    Text chunking
         ↓
 nomic-embed-text       ← builds dense vector index
         ↓
    Vector index (local)

         ↑ Query time:

    User query (Bahasa Indonesia)
         ↓
 nomic-embed-text       ← embeds the query
         ↓
  Similarity search     ← retrieves top-k relevant chunks
         ↓
   qwen2.5:3b           ← generates answer from retrieved context
         ↓
      Response

The pipeline runs via Ollama, which handles model management and inference. No GPU required. Both models run on CPU, which is slower but entirely functional for daily use on a laptop.

Why qwen2.5:3b

Model selection was the hardest decision.

The hard constraint: it had to run on a laptop with no dedicated GPU. That caps usable parameter count at roughly 3–7B before inference becomes too slow. Most capable small models in that range (Phi-3, Gemma 2B, TinyLlama) treat Bahasa Indonesia as an afterthought. Responses come back grammatically correct but tonally wrong. Indonesian concepts get flattened to their English equivalents.

qwen2.5 was different.

I tested five models before settling on qwen2.5. The others could technically respond in Indonesian. qwen2.5 could think in it.

Qwen is trained on multilingual data with genuine Southeast Asian language representation. At 3B parameters it stays within the hardware constraint. The Indonesian responses felt native, not translated.

Model	Size	Indonesian quality	Runs on CPU
qwen2.5:3b	3B	Native-level	✓
Phi-3 mini	3.8B	Functional	✓
Gemma 2B	2B	Poor	✓
Llama 3.2 3B	3B	Functional	✓
GPT-4o	N/A	Excellent	✗ (cloud only)

For embeddings, nomic-embed-text is the clear choice in the local/open-weight space: high retrieval quality, CPU-fast, and model-agnostic. You can swap the LLM without rebuilding your index.

Key Design Decisions

Domain isolation over one big index. A single monolithic index of all my notes sounds convenient until you ask a question about economics and get answers contaminated by machine learning notes. Each domain gets its own index. Context stays clean.

Manual re-indexing. When content in a domain updates, you run km ingest again. This is explicit, not automatic. Background sync that silently gets out of date is worse than a manual trigger you control and understand.

Markdown as the source format. Plain .md files, no proprietary format, no lock-in. The knowledge base was useful before I built this system; this is just a query layer on top of files that already existed.

CLI interface. Consistent with everything else I build. Runs anywhere without a GUI dependency. The bonus: scriptability. Chain queries in a pipeline when needed.

Limitations Worth Being Honest About

This is a personal tool, not a production system. A few real constraints:

Re-indexing large directories is slow. A 500-note domain takes a few minutes. Acceptable for daily use; annoying for bulk updates.
qwen2.5:3b makes mistakes. It's a 3B model. Complex multi-hop reasoning isn't reliable. I use this for retrieval and summary, not for analysis I'd trust without verification.
No session memory. Each query is stateless. Follow-up questions don't carry context from the previous answer. Useful for lookup; not a replacement for a proper chat interface.

These are deliberate trade-offs. The goal was a usable daily tool, not a research prototype. On that measure, it delivers.