memcity

Getting Started

Configuration

Every configuration option explained with examples, defaults, and tier requirements.

How Configuration Works

When you create a Memory instance, you pass a configuration object. Memcity deep-merges your config with sensible defaults — you only need to specify what you want to change.

ts
const memory = new Memory(components.memcity, {
  tier: "pro",
  ai: { gateway: "openrouter" },
  // Everything else uses defaults
});

Tier enforcement is automatic. If you're on the Community tier and try to enable a Pro feature, Memcity silently overrides it to the default value. No errors, no crashes — it just uses what your tier supports.

ts
// On Community tier, this config:
const memory = new Memory(components.memcity, {
  tier: "community",
  search: { reranking: true }, // Pro+ feature
});
// Behaves exactly like this:
const memory = new Memory(components.memcity, {
  tier: "community",
  search: { reranking: false }, // Silently overridden
});

The Full MemoryConfig Interface

Here's the complete TypeScript interface showing every option:

ts
interface MemoryConfig {
  tier: "community" | "pro" | "team";
 
  ai: {
    gateway: "openrouter" | "vercel";
    model: string;
  };
 
  search: {
    maxResults: number;
    minScore: number;
    weights: {
      semantic: number;
      bm25: number;
    };
    enableQueryRouting: boolean;      // Pro+
    enableQueryDecomposition: boolean; // Pro+
    enableHyde: boolean;              // Pro+
    reranking: boolean;               // Pro+
    maxQueryExpansions: number;       // Pro+
    maxChunkExpansions: number;       // Pro+
  };
 
  chunking: {
    strategy: "recursive" | "fixed";
    chunkSize: number;
    chunkOverlap: number;
  };
 
  graph: {                            // Pro+
    enabled: boolean;
    traversalStrategy: "breadth_first" | "best_first" | "hybrid";
    maxDepth: number;
    maxNodes: number;
  };
 
  enterprise: {                       // Team only
    acl: boolean;
    auditLog: boolean;
    quotas: boolean;
  };
}

AI Configuration

Gateway: OpenRouter vs Vercel

The ai.gateway option controls how Memcity accesses language models:

OpenRouterVercel AI Gateway
SetupSet OPENROUTER_API_KEY env varUses Vercel's built-in credentials
Models200+ models from all providersOpenAI, Anthropic, Google
PricingPay-per-token via OpenRouterPay-per-token via Vercel
Best forMost users, widest model selectionVercel-deployed apps wanting simplicity
FallbacksAutomatic model fallbacksLimited fallback support
ts
// OpenRouter (recommended for most users)
ai: {
  gateway: "openrouter",
  model: "google/gemini-2.0-flash-001",
}
 
// Vercel AI Gateway
ai: {
  gateway: "vercel",
  model: "gpt-4o-mini",
}

Model Selection

The model is used for reasoning tasks — query routing, entity extraction, HyDE generation, query decomposition. It is not used for embeddings (those always use Jina v4).

ModelCostQualitySpeedBest For
google/gemini-2.0-flash-001LowGoodFastDefault choice, good balance
gpt-4o-miniLowGoodFastIf you prefer OpenAI
anthropic/claude-3.5-haikuLowGoodFastIf you prefer Anthropic
google/gemini-2.5-pro-previewHighExcellentSlowMaximum quality entity extraction
anthropic/claude-sonnet-4HighExcellentMediumComplex reasoning tasks

Recommendation: Start with google/gemini-2.0-flash-001. It's fast, cheap, and good enough for most use cases. Only upgrade if you need better entity extraction or query understanding.

Search Configuration

maxResults

How many results to return from a search. Default: 10.

When to change: If you're building a chat interface, 3-5 results is usually enough context. If you're building a search results page, 10-20 gives users more to browse.

ts
search: {
  maxResults: 5,  // For chat: fewer but more focused results
}

minScore

The minimum relevance score (0-1) a result must have to be included. Default: 0.1.

When to change: If you're getting too many low-quality results, raise this to 0.3 or 0.5. If you're getting too few results, lower it to 0.05.

ts
search: {
  minScore: 0.3,  // Only return results that are at least 30% relevant
}

weights: semantic vs bm25

These control how much weight to give semantic (meaning-based) search vs BM25 (keyword-based) search. They must sum to 1.0. Default: 0.7 semantic, 0.3 BM25.

What's the difference?

  • Semantic search understands meaning. "How do I cancel my subscription?" matches "To terminate your plan, visit account settings" even though the words are different.
  • BM25 search matches keywords. "error code 4012" matches documents containing exactly "error code 4012". It's precise but doesn't understand synonyms.
Use CaseSemanticBM25Why
Natural language Q&A0.80.2Users ask in their own words
Technical documentation0.60.4Function names and codes matter
Code search0.30.7Exact identifiers are critical
Legal/compliance docs0.50.5Both exact terms and concepts matter
ts
search: {
  weights: {
    semantic: 0.6,  // Understanding matters
    bm25: 0.4,      // But exact terms also matter
  },
}

enableQueryRouting (Pro+)

When enabled, Memcity classifies each query as simple, moderate, or complex before processing. This determines which pipeline steps activate:

  • Simple queries ("What is X?") skip decomposition and HyDE — they're fast.
  • Moderate queries ("How does X compare to Y?") use query expansion but skip decomposition.
  • Complex queries ("What are the implications of X on Y and Z?") use the full pipeline.

Default: false. When to enable: When your users ask a mix of simple and complex questions and you want to optimize for both speed and quality.

ts
search: {
  enableQueryRouting: true,
}

enableQueryDecomposition (Pro+)

Breaks complex queries into simpler sub-queries that are searched independently, then results are merged.

Before decomposition:

"Compare the vacation policy with the sick leave policy and explain which is more generous"

After decomposition:

  1. "What is the vacation policy?"
  2. "What is the sick leave policy?"
  3. "How do vacation days compare to sick days in terms of quantity?"

Each sub-query gets its own search, and results are merged. This dramatically improves recall for complex questions.

Default: false. When to enable: When users ask multi-part or comparative questions.

enableHyde (Pro+)

HyDE stands for Hypothetical Document Embeddings. Instead of just embedding the query, Memcity asks the LLM: "If a document existed that perfectly answered this question, what would it say?" Then it embeds that hypothetical answer and searches for real documents similar to it.

Why it works: Queries are short ("refund policy?") but answers are long and detailed. A hypothetical answer is more similar to the actual document than the short query is.

Example:

  • Query: "refund policy"
  • HyDE generates: "Our refund policy allows customers to return products within 30 days of purchase for a full refund. Items must be unused and in original packaging..."
  • This hypothetical text matches the real refund policy document much better than the two-word query would.

Default: false. When to enable: When users ask short questions about topics with detailed documentation.

ts
search: {
  enableHyde: true,
}

reranking (Pro+)

After the initial search retrieves candidates, a reranker (Jina Reranker v3) re-scores them using a cross-encoder model that looks at the query and each candidate together.

Why initial ranking isn't enough: The initial search uses separate embeddings for the query and documents. A reranker directly compares each pair, which is more accurate but slower (you can't rerank thousands of results, only the top candidates).

Think of it like a hiring process: the initial search is the resume screening (fast, approximate), and the reranker is the interview (slower, more accurate).

Default: false. When to enable: Almost always. This is the single most impactful quality improvement for most use cases. The latency cost (~100ms) is usually worth it.

ts
search: {
  reranking: true,
}

maxQueryExpansions (Pro+)

How many semantic variations of the query to generate. Default: 3.

Example: For the query "Python web frameworks", expansions might be:

  1. "Django Flask FastAPI web development Python"
  2. "Building web applications with Python"
  3. "Python HTTP server frameworks comparison"

More expansions improve recall but increase latency and cost. Range: 1-5.

maxChunkExpansions (Pro+)

For top results, how many surrounding chunks to fetch for additional context. Default: 2.

Think of it like reading a book — if a sentence matches your query, you probably want to read the paragraph (or page) around it. Chunk expansion gives you that context.

ts
search: {
  maxChunkExpansions: 3, // Fetch 3 chunks before and after each result
}

Chunking Configuration

What is Chunking?

When you ingest a document, Memcity splits it into "chunks" — smaller pieces of text. Each chunk gets its own embedding and can be retrieved independently.

Why not just embed the whole document? Because embeddings work best on focused pieces of text. A 50-page document embedded as one vector loses detail. But a 512-token chunk about "refund policy" creates a precise, searchable vector.

Strategy

StrategyDescriptionTier
recursiveSplits on paragraph → sentence → word boundaries, preserving structureAll
fixedSplits at a fixed token count regardless of structureAll

Use recursive (the default) in almost all cases. It produces more natural chunks that respect paragraph boundaries.

ts
chunking: {
  strategy: "recursive",
  chunkSize: 512,       // Target tokens per chunk
  chunkOverlap: 50,     // Overlap between consecutive chunks
}

chunkSize and chunkOverlap

  • chunkSize (default: 512): How many tokens per chunk. Smaller chunks (256) are more precise but lose context. Larger chunks (1024) have more context but are less focused.
  • chunkOverlap (default: 50): How many tokens overlap between consecutive chunks. This prevents information at chunk boundaries from being lost.
Content TypeChunk SizeOverlapWhy
FAQ / short answers25625Each Q&A pair should be one chunk
Technical docs51250Good balance for most content
Long-form articles1024100Preserve more narrative context
Legal documents512100Higher overlap prevents clause splitting

Graph Configuration (Pro+)

The knowledge graph automatically extracts entities and relationships from your documents. See Knowledge Graph for a deep dive.

traversalStrategy

How the graph is traversed when searching for related entities:

  • breadth_first — Explore all neighbors at each depth level before going deeper. Like exploring a building floor by floor.
  • best_first — Always follow the highest-scoring connection. Like a detective following the hottest lead.
  • hybrid (default) — BFS for the first hop, then best-first. Gets the best of both strategies.
ts
graph: {
  enabled: true,
  traversalStrategy: "hybrid",
  maxDepth: 3,     // How many hops to traverse (default: 3)
  maxNodes: 50,    // Max nodes to visit (default: 50)
}

Enterprise Configuration (Team)

These features are available on the Team tier only. Each has a dedicated documentation page:

ts
enterprise: {
  acl: true,        // Enable per-document access control
  auditLog: true,   // Enable immutable audit logging
  quotas: true,     // Enable usage quotas and rate limiting
}

Configuration Recipes

"Fast and Cheap" — Minimize Costs

Best for: prototypes, low-traffic apps, simple Q&A.

ts
const memory = new Memory(components.memcity, {
  tier: "community",
  ai: {
    gateway: "openrouter",
    model: "google/gemini-2.0-flash-001",
  },
  search: {
    maxResults: 5,
    weights: { semantic: 0.7, bm25: 0.3 },
    // All advanced features disabled by default on Community
  },
  chunking: {
    strategy: "recursive",
    chunkSize: 512,
    chunkOverlap: 50,
  },
});

"Maximum Quality" — Best Possible Results

Best for: customer-facing search, support bots, enterprise apps.

ts
const memory = new Memory(components.memcity, {
  tier: "pro",
  ai: {
    gateway: "openrouter",
    model: "google/gemini-2.5-pro-preview",
  },
  search: {
    maxResults: 10,
    minScore: 0.2,
    weights: { semantic: 0.7, bm25: 0.3 },
    enableQueryRouting: true,
    enableQueryDecomposition: true,
    enableHyde: true,
    reranking: true,
    maxQueryExpansions: 5,
    maxChunkExpansions: 3,
  },
  graph: {
    enabled: true,
    traversalStrategy: "hybrid",
    maxDepth: 3,
    maxNodes: 50,
  },
});

"Enterprise Secure" — Full Compliance

Best for: regulated industries, multi-tenant SaaS, enterprise deployments.

ts
const memory = new Memory(components.memcity, {
  tier: "team",
  ai: {
    gateway: "openrouter",
    model: "google/gemini-2.0-flash-001",
  },
  search: {
    maxResults: 10,
    enableQueryRouting: true,
    reranking: true,
  },
  enterprise: {
    acl: true,        // Per-document access control
    auditLog: true,   // Immutable operation logging
    quotas: true,     // Usage limits per organization
  },
});