memcity
All posts
TutorialMarch 4, 2026

How to Build a RAG Pipeline with Convex

Step-by-step guide to building a production-grade RAG pipeline with vector search, BM25, reranking, and knowledge graph traversal -- all running on Convex.

By Memcity Team

Most RAG tutorials show you the same thing: embed a query, do a cosine similarity search, return the top-K results. That works for demos. It falls apart in production.

Real users ask vague questions. They misspell things. They ask compound questions that need information from multiple documents. They expect results that are actually relevant, not just mathematically similar.

This guide walks you through building a production-grade RAG pipeline on Convex using Memcity -- one that handles all of these cases with a 16-step retrieval pipeline.

What You'll Build

By the end of this guide, you'll have:

  • A Convex backend that ingests documents and makes them searchable
  • Hybrid search combining semantic vectors and BM25 keyword matching
  • Reciprocal Rank Fusion to merge results from both search methods
  • Jina Reranker v3 for second-pass precision scoring
  • Knowledge graph traversal for finding related concepts across documents
  • A working getContext endpoint your frontend can call

The whole thing runs on Convex's serverless infrastructure. No vector database to manage, no infrastructure to maintain.

Prerequisites

You'll need:

  1. A Convex project (npm create convex@latest if you don't have one)
  2. A Jina AI API key (free tier available) for embeddings and reranking
  3. An OpenRouter API key for LLM-powered pipeline steps (query routing, entity extraction, HyDE)

Step 1: Install Memcity

Memcity distributes as source code through the shadcn registry. You own the code and can customize it.

Add the registry to your components.json:

json
{
  "registries": {
    "@memcity": {
      "url": "https://memcity.dev/r/{name}.json",
      "headers": {
        "X-License-Key": "${MEMCITY_LICENSE_KEY}"
      }
    }
  }
}

Then install:

bash
# Community (free) -- hybrid vector search, BM25, RRF fusion
npx shadcn@latest add @memcity/community
 
# Pro ($79) -- full 16-step pipeline, GraphRAG, file processing
npx shadcn@latest add @memcity/pro

This drops the full Memcity source into convex/memcity/ along with a pre-configured client at convex/memory.ts.

Step 2: Register the Component

Convex components are self-contained backend modules with their own tables and functions. Register Memcity in your app config:

ts
// convex/convex.config.ts
import { defineApp } from "convex/server";
import memory from "./memcity/convex.config";
 
const app = defineApp();
app.use(memory);
 
export default app;

Step 3: Set Environment Variables

bash
# Jina v4 embeddings (1024-dimensional vectors) + Reranker v3
npx convex env set JINA_API_KEY your-jina-key
 
# OpenRouter gateway for LLM reasoning (Pro+ tiers)
npx convex env set OPENROUTER_API_KEY your-openrouter-key

Step 4: Configure the Memory Client

The installed convex/memory.ts gives you a working default, but you can tune it:

ts
// convex/memory.ts
import { Memory } from "./memcity/client";
import { components } from "./_generated/api";
 
export const memory = new Memory(components.memory, {
  tier: "pro",
  ai: {
    gateway: "openrouter",
    model: "google/gemini-2.0-flash-001",
  },
  search: {
    maxResults: 10,
    minScore: 0.1,
    weights: {
      semantic: 0.7, // Meaning-based search
      bm25: 0.3,     // Keyword-based search
    },
  },
});

The weights control how hybrid search blends results. A 70/30 semantic/BM25 split works well for most documentation and knowledge base use cases. If your users search with exact terms (product IDs, error codes), increase bm25. If they ask natural language questions, increase semantic.

Step 5: Ingest Documents

Create an action that ingests content into a knowledge base:

ts
// convex/ingest.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { memory } from "./memory";
 
export const ingestDocument = action({
  args: {
    orgId: v.string(),
    knowledgeBaseId: v.string(),
    text: v.string(),
    source: v.string(),
  },
  handler: async (ctx, args) => {
    const result = await memory.ingestText(ctx, {
      orgId: args.orgId,
      knowledgeBaseId: args.knowledgeBaseId,
      text: args.text,
      source: args.source,
    });
 
    // result.chunkCount tells you how many chunks were created
    return result;
  },
});

When you call ingestText, Memcity runs a multi-phase pipeline:

  1. Chunking -- Splits your text into ~512 token overlapping chunks
  2. Embedding -- Sends each chunk to Jina v4 to generate 1024-dimensional vectors
  3. BM25 indexing -- Indexes the raw text for keyword search
  4. Entity extraction (Pro+) -- Identifies people, companies, concepts and their relationships
  5. Enrichment (Team) -- RLM-powered enrichment that generates summaries, Q&A pairs, and cross-references

Step 6: Build the Search Endpoint

This is where the 16-step pipeline runs:

ts
// convex/search.ts
import { action } from "./_generated/server";
import { v } from "convex/values";
import { memory } from "./memory";
 
export const search = action({
  args: {
    orgId: v.string(),
    knowledgeBaseId: v.string(),
    query: v.string(),
  },
  handler: async (ctx, args) => {
    const results = await memory.getContext(ctx, {
      orgId: args.orgId,
      knowledgeBaseId: args.knowledgeBaseId,
      query: args.query,
    });
 
    return results;
  },
});

A single call to getContext triggers up to 16 steps:

StepWhat It DoesTier
1. Quota CheckRate limiting per orgTeam
2. CacheReuse embeddings for repeated queriesAll
3. Query RoutingClassify query complexity (simple/moderate/complex)Pro+
4. DecompositionBreak compound queries into sub-questionsPro+
5. HyDEGenerate hypothetical answer to improve recallPro+
6. EmbedConvert query to 1024-dim vector via Jina v4All
7. Dual SearchRun semantic + BM25 search in parallelAll
8. RRF FusionMerge results using Reciprocal Rank FusionAll
9. ACL FilterEnforce document-level access controlTeam
10. DedupRemove duplicate chunksAll
11. GraphRAGTraverse knowledge graph for related entitiesPro+
12. RerankRe-score results with Jina Reranker v3Pro+
13. Chunk ExpansionFetch surrounding context for top resultsPro+
14. ConfidenceScore result confidence for UI displayAll
15. Episodic MemoryIncorporate user-specific contextPro+
16. FormatFormat results with citations, cache, and audit logAll

Simple queries skip expensive steps (routing classifies them and takes the fast path). Complex queries get the full treatment. You don't write any conditional logic -- the pipeline adapts automatically.

Step 7: Wire It Up to Your Frontend

tsx
// app/search/page.tsx
"use client";
 
import { useAction } from "convex/react";
import { api } from "@/convex/_generated/api";
import { useState } from "react";
 
export default function SearchPage() {
  const [query, setQuery] = useState("");
  const [results, setResults] = useState<any>(null);
  const search = useAction(api.search.search);
 
  const handleSearch = async () => {
    const res = await search({
      orgId: "your-org-id",
      knowledgeBaseId: "your-kb-id",
      query,
    });
    setResults(res);
  };
 
  return (
    <div>
      <input
        value={query}
        onChange={(e) => setQuery(e.target.value)}
        placeholder="Ask a question..."
        onKeyDown={(e) => e.key === "Enter" && handleSearch()}
      />
      {results?.results?.map((r: any, i: number) => (
        <div key={i}>
          <p>{r.text}</p>
          <span>Score: {r.score.toFixed(2)}</span>
        </div>
      ))}
    </div>
  );
}

Performance: What to Expect

Typical latencies on Convex serverless:

Query TypePipeline StepsLatency
Simple (cached)2 steps~50ms
Simple (cold)5-6 steps~200ms
Moderate10-12 steps~400ms
ComplexAll 16 steps~600-800ms

These are competitive with dedicated vector databases, and you're getting a full RAG pipeline -- not just similarity search.

Beyond Basic RAG

Once you have the basics working, Memcity offers several ways to improve result quality:

  • Knowledge Graph -- Entity extraction connects concepts across documents. "Who reports to the VP of Engineering?" works even if no single document contains the full answer.
  • Enrichment Pipeline -- RLM-powered enrichment generates summaries, Q&A pairs, and cross-references at ingestion time, dramatically improving retrieval quality.
  • File Ingestion -- Process PDFs, DOCX, images, audio, video, and 25+ file types. Memcity handles extraction and OCR.
  • Episodic Memory -- Remember user preferences and conversation history so returning users get personalized results.
  • Access Control -- Document-level permissions so search results respect who can see what.

Source Code

Memcity installs as source code into your project. There's no black box -- you can read every line of the pipeline in convex/memcity/. If you need to customize a step, you can.

Get started:

bash
npx shadcn@latest add @memcity/community

Full documentation at memcity.dev/docs.

Try Memcity

Add AI memory to your Convex app in under 5 minutes. Vector search, knowledge graphs, episodic memory, and a 16-step RAG pipeline -- all in one component.

npx shadcn@latest add @memcity/communityRead the docs
ragconvexvector-searchtutorialai-memory