How to Build a RAG Pipeline in 100 Lines of Python

Retrieval-Augmented Generation is the most practical AI pattern of 2025. Here's a minimal but production-ready implementation using LangChain, ChromaDB, and the OpenAI API.

M

Muunsparks

2025-02-28

1 min read

What Is RAG and Why Should You Care?

RAG — Retrieval-Augmented Generation — solves the most common problem with LLMs in production: they don't know about your data.

The pattern is simple: before sending a query to the LLM, retrieve relevant context from your own document store and include it in the prompt.

Step 1: Install Dependencies

pip install langchain langchain-openai chromadb pypdf

Step 2: Load and Chunk Your Documents

from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader('./docs', glob='**/*.pdf', loader_cls=PyPDFLoader)
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

Step 3: Create the Vector Store

from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings)

Production Considerations

Add reranking, hybrid search (BM25 + vector), and RAGAS evaluation before going to production.