3 min de lectura
Enterprise RAG in Java: From Theory to Production
RAG (Retrieval Augmented Generation) enables LLMs to answer questions using your own data. Learn how to implement enterprise-grade solutions in Java.
Why RAG?
LLMs have important limitations:
- Static knowledge: They donβt know your internal data
- Hallucinations: They make up information when they donβt know
- No updates: Their knowledge has a cutoff date
RAG solves these problems by connecting the LLM to your knowledge base.
RAG Architecture
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β User ββββββΆβ Retriever ββββββΆβ Vector Store β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β β
β βΌ
β ββββββββββββββββ
β β Relevant β
β β Documents β
β ββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββββββββββββββ
β LLM β
β (question + context) β
ββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Grounded β
β Response β
ββββββββββββββββ
Tech Stack
- LangChain4j: AI framework for Java
- PostgreSQL + pgvector: Scalable vector store
- Quarkus: Cloud-native framework
- Ollama: Local models (privacy)
pgvector Setup
-- Enable extension
CREATE EXTENSION vector;
-- Embeddings table
CREATE TABLE document_embeddings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB,
embedding vector(1536),
created_at TIMESTAMP DEFAULT NOW()
);
-- Index for efficient search
CREATE INDEX ON document_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Implementation with LangChain4j
EmbeddingStore Configuration
@ApplicationScoped
public class VectorStoreConfig {
@ConfigProperty(name = "pgvector.host")
String host;
@ConfigProperty(name = "pgvector.database")
String database;
@Produces
@ApplicationScoped
EmbeddingStore<TextSegment> embeddingStore() {
return PgVectorEmbeddingStore.builder()
.host(host)
.port(5432)
.database(database)
.user("app")
.password("secret")
.table("document_embeddings")
.dimension(1536)
.build();
}
}
Document Ingestion
@ApplicationScoped
public class DocumentIngestionService {
@Inject
EmbeddingStore<TextSegment> store;
@Inject
EmbeddingModel embeddingModel;
public void ingest(Path documentPath) {
// 1. Load document
Document document = FileSystemDocumentLoader.loadDocument(
documentPath,
new ApachePdfBoxDocumentParser()
);
// 2. Split into chunks
DocumentSplitter splitter = DocumentSplitters.recursive(
500, // max chunk size
50 // overlap
);
List<TextSegment> segments = splitter.split(document);
// 3. Generate embeddings
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
// 4. Store
store.addAll(embeddings, segments);
}
}
RAG Service
@ApplicationScoped
public class RagService {
private final Assistant assistant;
@Inject
public RagService(
ChatLanguageModel model,
EmbeddingStore<TextSegment> store,
EmbeddingModel embeddingModel
) {
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.build();
this.assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(model)
.contentRetriever(retriever)
.build();
}
public String query(String question) {
return assistant.answer(question);
}
interface Assistant {
@SystemMessage("""
You are an expert assistant. Answer ONLY based on the provided context.
If you don't find the information, say "I don't have information about that".
Cite sources when possible.
""")
String answer(@UserMessage String question);
}
}
Production Improvements
1. Smart Chunking
DocumentSplitter splitter = DocumentSplitters.recursive(
500,
50,
new OpenAiTokenizer("gpt-4") // Count real tokens
);
2. Enriched Metadata
TextSegment segment = TextSegment.from(
content,
Metadata.from("source", "manual-v2.pdf")
.add("page", "15")
.add("section", "Configuration")
);
Conclusion
RAG transforms LLMs from curiosities into real enterprise tools. With Java, pgvector, and LangChain4j, you can build robust systems that respect your data privacy and scale with your organization.
Liked it? Share it!
Comments (0)