Most RAG tutorials show you the same thing: embed a query, do a cosine similarity search, return the top-K results. That works for demos. It falls apart in production.
Real users ask vague questions. They misspell things. They ask compound questions that need information from multiple documents. They expect results that are actually relevant, not just mathematically similar.
This guide walks you through building a production-grade RAG pipeline on Convex using Memcity -- one that handles all of these cases with a 16-step retrieval pipeline.
What You'll Build
By the end of this guide, you'll have:
- A Convex backend that ingests documents and makes them searchable
- Hybrid search combining semantic vectors and BM25 keyword matching
- Reciprocal Rank Fusion to merge results from both search methods
- Jina Reranker v3 for second-pass precision scoring
- Knowledge graph traversal for finding related concepts across documents
- A working
getContextendpoint your frontend can call
The whole thing runs on Convex's serverless infrastructure. No vector database to manage, no infrastructure to maintain.
Prerequisites
You'll need:
- A Convex project (
npm create convex@latestif you don't have one) - A Jina AI API key (free tier available) for embeddings and reranking
- An OpenRouter API key for LLM-powered pipeline steps (query routing, entity extraction, HyDE)
Step 1: Install Memcity
Memcity distributes as source code through the shadcn registry. You own the code and can customize it.
Add the registry to your components.json:
{
"registries": {
"@memcity": {
"url": "https://memcity.dev/r/{name}.json",
"headers": {
"X-License-Key": "${MEMCITY_LICENSE_KEY}"
}
}
}
}Then install:
# Community (free) -- hybrid vector search, BM25, RRF fusion
npx shadcn@latest add @memcity/community
# Pro ($79) -- full 16-step pipeline, GraphRAG, file processing
npx shadcn@latest add @memcity/proThis drops the full Memcity source into convex/memcity/ along with a pre-configured client at convex/memory.ts.
Step 2: Register the Component
Convex components are self-contained backend modules with their own tables and functions. Register Memcity in your app config:
// convex/convex.config.ts
import { defineApp } from "convex/server";
import memory from "./memcity/convex.config";
const app = defineApp();
app.use(memory);
export default app;Step 3: Set Environment Variables
# Jina v4 embeddings (1024-dimensional vectors) + Reranker v3
npx convex env set JINA_API_KEY your-jina-key
# OpenRouter gateway for LLM reasoning (Pro+ tiers)
npx convex env set OPENROUTER_API_KEY your-openrouter-keyStep 4: Configure the Memory Client
The installed convex/memory.ts gives you a working default, but you can tune it:
// convex/memory.ts
import { Memory } from "./memcity/client";
import { components } from "./_generated/api";
export const memory = new Memory(components.memory, {
tier: "pro",
ai: {
gateway: "openrouter",
model: "google/gemini-2.0-flash-001",
},
search: {
maxResults: 10,
minScore: 0.1,
weights: {
semantic: 0.7, // Meaning-based search
bm25: 0.3, // Keyword-based search
},
},
});The weights control how hybrid search blends results. A 70/30 semantic/BM25 split works well for most documentation and knowledge base use cases. If your users search with exact terms (product IDs, error codes), increase bm25. If they ask natural language questions, increase semantic.
Step 5: Ingest Documents
Create an action that ingests content into a knowledge base:
// convex/ingest.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { memory } from "./memory";
export const ingestDocument = action({
args: {
orgId: v.string(),
knowledgeBaseId: v.string(),
text: v.string(),
source: v.string(),
},
handler: async (ctx, args) => {
const result = await memory.ingestText(ctx, {
orgId: args.orgId,
knowledgeBaseId: args.knowledgeBaseId,
text: args.text,
source: args.source,
});
// result.chunkCount tells you how many chunks were created
return result;
},
});When you call ingestText, Memcity runs a multi-phase pipeline:
- Chunking -- Splits your text into ~512 token overlapping chunks
- Embedding -- Sends each chunk to Jina v4 to generate 1024-dimensional vectors
- BM25 indexing -- Indexes the raw text for keyword search
- Entity extraction (Pro+) -- Identifies people, companies, concepts and their relationships
- Enrichment (Team) -- RLM-powered enrichment that generates summaries, Q&A pairs, and cross-references
Step 6: Build the Search Endpoint
This is where the 16-step pipeline runs:
// convex/search.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { memory } from "./memory";
export const search = action({
args: {
orgId: v.string(),
knowledgeBaseId: v.string(),
query: v.string(),
},
handler: async (ctx, args) => {
const results = await memory.getContext(ctx, {
orgId: args.orgId,
knowledgeBaseId: args.knowledgeBaseId,
query: args.query,
});
return results;
},
});A single call to getContext triggers up to 16 steps:
| Step | What It Does | Tier |
|---|---|---|
| 1. Quota Check | Rate limiting per org | Team |
| 2. Cache | Reuse embeddings for repeated queries | All |
| 3. Query Routing | Classify query complexity (simple/moderate/complex) | Pro+ |
| 4. Decomposition | Break compound queries into sub-questions | Pro+ |
| 5. HyDE | Generate hypothetical answer to improve recall | Pro+ |
| 6. Embed | Convert query to 1024-dim vector via Jina v4 | All |
| 7. Dual Search | Run semantic + BM25 search in parallel | All |
| 8. RRF Fusion | Merge results using Reciprocal Rank Fusion | All |
| 9. ACL Filter | Enforce document-level access control | Team |
| 10. Dedup | Remove duplicate chunks | All |
| 11. GraphRAG | Traverse knowledge graph for related entities | Pro+ |
| 12. Rerank | Re-score results with Jina Reranker v3 | Pro+ |
| 13. Chunk Expansion | Fetch surrounding context for top results | Pro+ |
| 14. Confidence | Score result confidence for UI display | All |
| 15. Episodic Memory | Incorporate user-specific context | Pro+ |
| 16. Format | Format results with citations, cache, and audit log | All |
Simple queries skip expensive steps (routing classifies them and takes the fast path). Complex queries get the full treatment. You don't write any conditional logic -- the pipeline adapts automatically.
Step 7: Wire It Up to Your Frontend
// app/search/page.tsx
"use client";
import { useAction } from "convex/react";
import { api } from "@/convex/_generated/api";
import { useState } from "react";
export default function SearchPage() {
const [query, setQuery] = useState("");
const [results, setResults] = useState<any>(null);
const search = useAction(api.search.search);
const handleSearch = async () => {
const res = await search({
orgId: "your-org-id",
knowledgeBaseId: "your-kb-id",
query,
});
setResults(res);
};
return (
<div>
<input
value={query}
onChange={(e) => setQuery(e.target.value)}
placeholder="Ask a question..."
onKeyDown={(e) => e.key === "Enter" && handleSearch()}
/>
{results?.results?.map((r: any, i: number) => (
<div key={i}>
<p>{r.text}</p>
<span>Score: {r.score.toFixed(2)}</span>
</div>
))}
</div>
);
}Performance: What to Expect
Typical latencies on Convex serverless:
| Query Type | Pipeline Steps | Latency |
|---|---|---|
| Simple (cached) | 2 steps | ~50ms |
| Simple (cold) | 5-6 steps | ~200ms |
| Moderate | 10-12 steps | ~400ms |
| Complex | All 16 steps | ~600-800ms |
These are competitive with dedicated vector databases, and you're getting a full RAG pipeline -- not just similarity search.
Beyond Basic RAG
Once you have the basics working, Memcity offers several ways to improve result quality:
- Knowledge Graph -- Entity extraction connects concepts across documents. "Who reports to the VP of Engineering?" works even if no single document contains the full answer.
- Enrichment Pipeline -- RLM-powered enrichment generates summaries, Q&A pairs, and cross-references at ingestion time, dramatically improving retrieval quality.
- File Ingestion -- Process PDFs, DOCX, images, audio, video, and 25+ file types. Memcity handles extraction and OCR.
- Episodic Memory -- Remember user preferences and conversation history so returning users get personalized results.
- Access Control -- Document-level permissions so search results respect who can see what.
Source Code
Memcity installs as source code into your project. There's no black box -- you can read every line of the pipeline in convex/memcity/. If you need to customize a step, you can.
Get started:
npx shadcn@latest add @memcity/communityFull documentation at memcity.dev/docs.