How to Build a RAG Pipeline with Convex

Most RAG tutorials show you the same thing: embed a query, do a cosine similarity search, return the top-K results. That works for demos. It falls apart in production.

Real users ask vague questions. They misspell things. They ask compound questions that need information from multiple documents. They expect results that are actually relevant, not just mathematically similar.

This guide walks you through building a production-grade RAG pipeline on Convex using Memcity -- one that handles all of these cases with a 16-step retrieval pipeline.

What You'll Build

By the end of this guide, you'll have:

A Convex backend that ingests documents and makes them searchable
Hybrid search combining semantic vectors and BM25 keyword matching
Reciprocal Rank Fusion to merge results from both search methods
Jina Reranker v3 for second-pass precision scoring
Knowledge graph traversal for finding related concepts across documents
A working getContext endpoint your frontend can call

The whole thing runs on Convex's serverless infrastructure. No vector database to manage, no infrastructure to maintain.

Prerequisites

You'll need:

A Convex project (npm create convex@latest if you don't have one)
A Jina AI API key (free tier available) for embeddings and reranking
An OpenRouter API key for LLM-powered pipeline steps (query routing, entity extraction, HyDE)

Step 1: Install Memcity

Memcity distributes as source code through the shadcn registry. You own the code and can customize it.

Add the registry to your components.json:

json

{
  "registries": {
    "@memcity": {
      "url": "https://memcity.dev/r/{name}.json",
      "headers": {
        "X-License-Key": "${MEMCITY_LICENSE_KEY}"
      }
    }
  }
}

Then install:

bash

# Community (free) -- hybrid vector search, BM25, RRF fusion
npx shadcn@latest add @memcity/community
 
# Pro ($79) -- full 16-step pipeline, GraphRAG, file processing
npx shadcn@latest add @memcity/pro

This drops the full Memcity source into convex/memcity/ along with a pre-configured client at convex/memory.ts.

Step 2: Register the Component

Convex components are self-contained backend modules with their own tables and functions. Register Memcity in your app config:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import memory from "./memcity/convex.config";
 
const app = defineApp();
app.use(memory);
 
export default app;

Step 3: Set Environment Variables

bash

# Jina v4 embeddings (1024-dimensional vectors) + Reranker v3
npx convex env set JINA_API_KEY your-jina-key
 
# OpenRouter gateway for LLM reasoning (Pro+ tiers)
npx convex env set OPENROUTER_API_KEY your-openrouter-key

Step 4: Configure the Memory Client

The installed convex/memory.ts gives you a working default, but you can tune it:

// convex/memory.ts
import { Memory } from "./memcity/client";
import { components } from "./_generated/api";
 
export const memory = new Memory(components.memory, {
  tier: "pro",
  ai: {
    gateway: "openrouter",
    model: "google/gemini-2.0-flash-001",
  },
  search: {
    maxResults: 10,
    minScore: 0.1,
    weights: {
      semantic: 0.7, // Meaning-based search
      bm25: 0.3,     // Keyword-based search
    },
  },
});

The weights control how hybrid search blends results. A 70/30 semantic/BM25 split works well for most documentation and knowledge base use cases. If your users search with exact terms (product IDs, error codes), increase bm25. If they ask natural language questions, increase semantic.

Step 5: Ingest Documents

Create an action that ingests content into a knowledge base:

// convex/ingest.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { memory } from "./memory";
 
export const ingestDocument = action({
  args: {
    orgId: v.string(),
    knowledgeBaseId: v.string(),
    text: v.string(),
    source: v.string(),
  },
  handler: async (ctx, args) => {
    const result = await memory.ingestText(ctx, {
      orgId: args.orgId,
      knowledgeBaseId: args.knowledgeBaseId,
      text: args.text,
      source: args.source,
    });
 
    // result.chunkCount tells you how many chunks were created
    return result;
  },
});

When you call ingestText, Memcity runs a multi-phase pipeline:

Chunking -- Splits your text into ~512 token overlapping chunks
Embedding -- Sends each chunk to Jina v4 to generate 1024-dimensional vectors
BM25 indexing -- Indexes the raw text for keyword search
Entity extraction (Pro+) -- Identifies people, companies, concepts and their relationships
Enrichment (Team) -- RLM-powered enrichment that generates summaries, Q&A pairs, and cross-references

Step 6: Build the Search Endpoint

This is where the 16-step pipeline runs:

// convex/search.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { memory } from "./memory";
 
export const search = action({
  args: {
    orgId: v.string(),
    knowledgeBaseId: v.string(),
    query: v.string(),
  },
  handler: async (ctx, args) => {
    const results = await memory.getContext(ctx, {
      orgId: args.orgId,
      knowledgeBaseId: args.knowledgeBaseId,
      query: args.query,
    });
 
    return results;
  },
});

A single call to getContext triggers up to 16 steps:

Step	What It Does	Tier
1. Quota Check	Rate limiting per org	Team
2. Cache	Reuse embeddings for repeated queries	All
3. Query Routing	Classify query complexity (simple/moderate/complex)	Pro+
4. Decomposition	Break compound queries into sub-questions	Pro+
5. HyDE	Generate hypothetical answer to improve recall	Pro+
6. Embed	Convert query to 1024-dim vector via Jina v4	All
7. Dual Search	Run semantic + BM25 search in parallel	All
8. RRF Fusion	Merge results using Reciprocal Rank Fusion	All
9. ACL Filter	Enforce document-level access control	Team
10. Dedup	Remove duplicate chunks	All
11. GraphRAG	Traverse knowledge graph for related entities	Pro+
12. Rerank	Re-score results with Jina Reranker v3	Pro+
13. Chunk Expansion	Fetch surrounding context for top results	Pro+
14. Confidence	Score result confidence for UI display	All
15. Episodic Memory	Incorporate user-specific context	Pro+
16. Format	Format results with citations, cache, and audit log	All

Simple queries skip expensive steps (routing classifies them and takes the fast path). Complex queries get the full treatment. You don't write any conditional logic -- the pipeline adapts automatically.

Step 7: Wire It Up to Your Frontend

tsx

// app/search/page.tsx
"use client";
 
import { useAction } from "convex/react";
import { api } from "@/convex/_generated/api";
import { useState } from "react";
 
export default function SearchPage() {
  const [query, setQuery] = useState("");
  const [results, setResults] = useState<any>(null);
  const search = useAction(api.search.search);
 
  const handleSearch = async () => {
    const res = await search({
      orgId: "your-org-id",
      knowledgeBaseId: "your-kb-id",
      query,
    });
    setResults(res);
  };
 
  return (
    <div>
      <input
        value={query}
        onChange={(e) => setQuery(e.target.value)}
        placeholder="Ask a question..."
        onKeyDown={(e) => e.key === "Enter" && handleSearch()}
      />
      {results?.results?.map((r: any, i: number) => (
        <div key={i}>
          <p>{r.text}</p>
          <span>Score: {r.score.toFixed(2)}</span>
        </div>
      ))}
    </div>
  );
}

Performance: What to Expect

Typical latencies on Convex serverless:

Query Type	Pipeline Steps	Latency
Simple (cached)	2 steps	~50ms
Simple (cold)	5-6 steps	~200ms
Moderate	10-12 steps	~400ms
Complex	All 16 steps	~600-800ms

These are competitive with dedicated vector databases, and you're getting a full RAG pipeline -- not just similarity search.

Beyond Basic RAG

Once you have the basics working, Memcity offers several ways to improve result quality:

Knowledge Graph -- Entity extraction connects concepts across documents. "Who reports to the VP of Engineering?" works even if no single document contains the full answer.
Enrichment Pipeline -- RLM-powered enrichment generates summaries, Q&A pairs, and cross-references at ingestion time, dramatically improving retrieval quality.
File Ingestion -- Process PDFs, DOCX, images, audio, video, and 25+ file types. Memcity handles extraction and OCR.
Episodic Memory -- Remember user preferences and conversation history so returning users get personalized results.
Access Control -- Document-level permissions so search results respect who can see what.

Source Code

Memcity installs as source code into your project. There's no black box -- you can read every line of the pipeline in convex/memcity/. If you need to customize a step, you can.

Get started:

bash

npx shadcn@latest add @memcity/community

Full documentation at memcity.dev/docs.