Skip to main content
Behind the Scenes

Building a Production-Ready MCP Server with OAuth, TypeScript, and Our Battle Scars

Dive into our journey building a secure, multi-tenant MCP server using OAuth 2.0, TypeScript, and Firestore. Learn from our architectural pivots, security deep-dives, and real-world lessons in enabling AI agent capabilities.

Jesse Neumann

The dream of "AI Native" applications, where intelligent agents autonomously manage tasks and even orchestrate other agents, is rapidly materializing. For us, the Model Context Protocol (MCP) is the key—the universal translator enabling our AI to wield a powerful, secure toolkit of backend services. Our mission was ambitious: build an MCP server allowing AI agents to manage their own ecosystem, including other agents and their data integrations, all within a multi-tenant environment.

This article chronicles our journey: the architectural pivots forced by evolving standards, the security deep-dives that led us to OAuth 2.0, the pragmatic shift from Python to type-safe TypeScript, and the hard-won lessons building this system on Google Firestore. Expect not just the "what" and "how," but the crucial "why," complete with the dead-ends, "aha!" moments, and a few battle scars that shaped our path.

Companion Code: Many of the architectural patterns, security measures, and utility functions discussed in this article are demonstrated in a simplified open-source MCP server repository.

OAuth as Our Cornerstone

When AI agents can modify data across third-party apps and manage each other, security isn't a feature; it's the absolute bedrock.

The Initial Stumble: API Keys & SDK Hurdles

Our first foray used API keys in request headers. While seemingly simple, it quickly exposed limitations. The Python MCP SDK at the time didn't offer straightforward access to these headers or the underlying Starlette request object within tool handlers—a critical blocker for implementing proper authorization logic. This pushed us towards experimenting with a forked, version of the SDK waiting to be merged into the main branch, a clear indicator of an unconventional path we might later regret going down.

Interoperability, Standards, and OAuth 2.0

We briefly considered token query parameters (less secure) but the real turning point came when envisioning external MCP clients. How could they securely and reliably integrate if we relied on a custom, patched authentication scheme? The answer became crystal clear: OAuth 2.0. It's the only authentication method specified in the official MCP standard, inherently supports dynamic client registration (vital for scaling), and guarantees the broadest compatibility. This wasn't just a security upgrade; it was a strategic commitment on our part to the MCP ecosystem's health and our server's accessibility.

The OAuth 2.0 Version

Simplified OAuth 2.0 Authorization Code Flow. This diagram illustrates how Portal One (acting as the MCP Client application) obtains an Access Token on behalf of the User. It details the user's redirection to our TypeScript Authorization Server, the authentication and consent steps, the secure exchange of an authorization code for an access token via a backend channel, and finally, Portal One using this token to make authenticated calls to the MCP Resource Server. This flow is foundational to securing all subsequent tool interactions.
Security Note: Implementing OAuth demands diligence. Key lessons: enforce the principle of least privilege with scopes; ensure secure token storage on client-sides; strictly validate redirect_uris; and always use PKCE for public clients to prevent authorization code interception.

Architectural Evolution: From Python Dreams to TypeScript Realities

Our server architecture wasn't a pre-defined blueprint; it was an iterative response to technical challenges, SDK capabilities, and our own evolving understanding.

The Python Starting Line and the User Session Issue

We initially envisioned a rapid deployment using the Python MCP SDK and FastMCP, with the OAuth authorization server co-located with the resource server (our MCP server). The critical roadblock: our Python MCP server, as a distinct service, had no access to our main application's (Portal One) user sessions. This made linking an authorizing user to their Portal One account and, crucially, their workspace permissions, nearly impossible without re-implementing session management in Python—a significant and undesirable detour.

The TypeScript Pivot: Embracing a More Mature Ecosystem

The pragmatic solution was to host the OAuth endpoints on our existing TypeScript application server, which already managed user sessions and authentication. After a bit of "hacking" the Python SDK's OAuthServerProvider to work as a proxy to correctly target these new TypeScript auth routes, we achieved a fragile success. However, the next hurdle was accessing the validated token's associated user ID within the Python tool handlers.

As we researched solutions, we discovered the TypeScript MCP SDK had matured significantly for our use case. It offered:

  1. An authInfo object directly in the tool context, providing clean access to token details.
  2. ProxyOAuthServer class, specifically designed for architectures where the OAuth authorization server is separate from the MCP resource server. This was the decisive moment: we migrated the entire MCP server to TypeScript.

Our Core Technology Stack

  • MCP TypeScript SDK: The backbone for protocol implementation.
  • TypeScript & Node.js: For type-safe, efficient server-side logic.
  • Google Firestore: Our existing, scalable, and secure NoSQL database for user profiles, OAuth tokens, tool definitions, and all workspace-specific data. Its real-time capabilities and flexible data model were invaluable.
  • Zod: Already integral to our Portal One codebase, Zod was the natural choice for defining input/output schemas for tools. This enabled robust runtime validation and, critically, automatic JSON schema generation for LLM consumption via the MCP SDK.

Bringing It All Together

Let's walk through a concrete example: An AI agent, operating within "Workspace Alpha" for "User Jane," needs to retrieve details for a specific memory item.

Step 1 - LLM Intent & Tool Selection

  • User Jane asks the agent: "Can you remind me about the 'Q3 Marketing Plan' memory?"
  • The LLM, having access to the agent's tools (including their JSON schemas), determines it needs to use the get_memory tool. It knows the get_memory tool requires both workspace_id and memory_id arguments.
  • The workspace_id might be defaulted to "Workspace Alpha" in the schema provided to the LLM (more on this later). The LLM extracts "Q3 Marketing Plan" and, through prior knowledge or another tool call (e.g., find_entities_by_title), resolves it to memory_id: "mem_123".
// Headers: { "Authorization": "Bearer <JANE_OAUTH_TOKEN>" }
{
  "name": "get_memory",
  "arguments": {
    "workspace_id": "Workspace Alpha", // or defaulted
    "memory_id": "mem_123"
  }
}

Step 2 - MCP Server: Authentication & Authorization

  • The request hits our Node.js/TypeScript MCP server.
  • The MCP SDK middleware validates the Authorization: Bearer <JANE_OAUTH_TOKEN> header format and extracts the token. It makes this token available in req.authInfo.token
import { requireBearerAuth } from '@modelcontextprotocol/sdk/server/auth/middleware/bearerAuth.js';

app.use(
  '/mcp',
  requireBearerAuth({
    provider: proxyProvider,
    requiredScopes: ['default'],
  }),
  mcpRoutes,
);
  • The get_memory tool's handler is wrapped by our withWorkspaceAccess HOC.
  • Token Validation: The HOC uses the userId provided when we parsed the auth header in the ProxyOAuthServerProvider implementation. It then fetches Jane's user document, which includes workspaces: ["Workspace Alpha", "Workspace Beta"].
export function withWorkspaceAccess<T extends z.ZodTypeAny>(
  db: Firestore,
  inputSchema: T,
  handler: (
    args: z.infer<T>,
    req: RequestHandlerExtra<ServerRequest, ServerNotification>,
  ) => Promise<CallToolResult>,
) {
  return async (
    args: z.infer<T>,
    req: RequestHandlerExtra<ServerRequest, ServerNotification>,
  ) => {
    // We placed this here in our ProxyOAuthServerProvider implementation:
    const userId = req.authInfo?.extra?.userId || ''
    if (!userId) throw new Error('No user ID found in token.');
    const hasAccess = await checkWorkspaceAccess(db, userId, args.workspace_id);
    if (!hasAccess)
      throw new Error('You do not have access to this workspace.');
    return handler(args, req);
  };
}
  • Workspace Access Check: The HOC retrieves args.workspace_id ("Workspace Alpha"). It then calls checkWorkspaceAccess(db, "jane_doe", "Workspace Alpha", janeUserDocument). Since "Workspace Alpha" is in Jane's workspaces list, this returns true.
export async function checkWorkspaceAccess(
  db: Firestore,
  userId: string,
  workspaceId: string,
): Promise<boolean> {
  const userDoc = await db.collection('users').doc(userId).get();
  if (!userDoc.exists) return false;
  const userData = userDoc.data();
  return (
    Array.isArray(userData?.workspaces) &&
    userData.workspaces.includes(workspaceId)
  );
}
  • All checks pass. The HOC now calls the actual get_memory handler, passing the validated args and Jane's userId.
// Simplified get_memory handler logic
async function getMemoryHandler(args, req, userId) {
  const memoryRef = db
    .collection('workspaces')
    .doc(args.workspace_id) // "Workspace Alpha"
    .collection('memories')
    .doc(args.memory_id);    // "mem_123"
  const memorySnap = await memoryRef.get();
  if (!memorySnap.exists) {
    throw new Error(`Memory ${args.memory_id} not found in workspace ${args.workspace_id}.`);
  }
  const memoryData = memorySnap.data();
  // Convert Firestore Timestamps to JS Dates
  const output = { 
    ...memoryData, 
    created_at: memoryData.created_at.toDate(),
    updated_at: memoryData.updated_at.toDate()
  };
  return { content: [{ type: 'text', text: JSON.stringify(output) }] };
}

Step 3 - LLM Processes Result & Responds to User

The LLM receives the JSON, parses the memory details, and might respond to Jane: "Okay, Jane, the 'Q3 Marketing Plan' memory contains details about our upcoming initiatives, created on March 15th. It includes sections on..."

{
  "content": [{
    "type": "text",
    "text": "{\n  \"id\": \"mem_123\",\n  \"title\": \"Q3 Marketing Plan\",\n  \"created_at\": \"2025-03-15T10:00:00.000Z\",\n  \"updated_at\": \"2025-03-16T14:30:00.000Z\"\n}"
  }]
}

The Complete Flow

This end-to-end flow, secured by OAuth and contextualized by workspace_id, forms the backbone of our agent interactions.

Sequence diagram illustrating the detailed flow of an MCP tool call. It shows interactions between an LLM, MCP Client, MCP Server (which includes a Higher-Order Component for auth/workspace checks and a specific 'get_memory' tool Handler), and a Firestore Database. Key steps highlighted include the initial HTTP POST request with an auth header, token and user lookups in Firestore, a workspace authorization check, the tool handler querying Firestore for memory data, and the final response propagation back to the LLM. An alternative path shows an error response if authorization checks fail.
Detailed Request Flow for an MCP Tool Call. This sequence diagram breaks down the journey of a tool invocation, from the LLM's decision to the MCP Client's request, through the MCP Server's withWorkspaceAccess Higher-Order Component (HOC) performing authentication and workspace authorization checks against Firestore, to the specific get_memory tool handler executing its logic and querying the database, and finally the response returning to the LLM. It highlights the critical security and context validation steps within our server architecture.

Our Toolkit for Robustness

Building reliable tools required a multi-faceted approach.

Zod: Our Schema and Validation Powerhouse

Zod was already indispensable before since we use it for form validation in our frontend. Now it doubles up to do even more work in our MCP server:

  1. LLM Schema Generation: The MCP TypeScript SDK leverages Zod schemas to automatically produce the JSON Schemas LLMs require to understand tool parameters and structure calls correctly.
  2. Robust Runtime Validation: Incoming tool arguments are validated against these Zod schemas. If validation fails, our error handling allows the agent to potentially correct its call.
  3. Type Safety & Inference: TypeScript types derived from Zod schemas ensured consistency between expected and actual data structures, catching potential issues at compile time.

Error Handling: throw Error() for Clarity and SDK Synergy

We transitioned from returning error strings (a common pattern in simpler Python scripts) to consistently using throw new Error("Descriptive message here") for all exceptional cases (e.g., "Entity not found," "Invalid input," "Authorization failed"). The MCP SDK gracefully catches these, packaging them into a standard ToolCallResponse object with an isError: true flag. This simplified our tool logic significantly and provided a uniform error reporting mechanism to the client/LLM.

Firestore Timestamps: The .toDate() Discipline

A hard-learned lesson: Firestore Timestamp objects are not directly JSON-serializable as JavaScript Date objects. Early on, this caused frustrating frontend issues and required downstream services to perform conversions. Our unwavering rule now: always call .toDate() on timestamp fields immediately after fetching from Firestore and before including them in any API response or tool output. This prevents leaky abstractions and ensures predictable data formats.

Utility Functions: DRY Principles in Action

Tools like get_agent or get_conversation often need to fetch multiple related entities (memories, forms, tables, etc.). To avoid rampant code duplication, we developed generic utilities:

  • fetchResourceList: Takes a workspace ID, collection name, array of document IDs, and a list of fields to return. It handles the batch fetching and projection efficiently.
  • fetchModel: Fetches model details and crucially, abstracts the logic for masking sensitive data like api_keys within completion_params. These utilities significantly slimmed down our tool handlers (often by 5-10 lines per related resource in complex tools) and centralized common data access patterns, making maintenance far easier.
// Example: Using fetchResourceList in a tool
// Inside a handler wrapped by withWorkspaceAccess:
const agentData = await agentRef.get().then(snap => snap.data());
if (agentData) {
  const [memories, forms] = await Promise.all([
    fetchResourceList(db, args.workspace_id, 'memories', agentData.memories || [], ['id', 'title', 'description']),
    fetchResourceList(db, args.workspace_id, 'forms', agentData.forms || [], ['id', 'table_id'])
  ]);
  result.memories = memories;
  result.forms = forms;
}

Standardizing Entity Types

For shared concepts like ResourceType, EntityType, Status, and FieldType (for table schemas), we adopted a consistent pattern using TypeScript's string literal unions (via as const) paired with Zod's z.enum():

// utils/types.ts (example for FieldType)
export const FieldTypes = ['string', 'number', 'date', 'boolean'] as const;
export type FieldType = (typeof FieldTypes)[number]; // Creates "string" | "number" | ...
export const fieldTypeSchema = z.enum(FieldTypes); // Zod schema for validation

export interface Field {
  name: string;
  type: FieldType;
  id: string;
}
export const fieldSchema = z.object({
  name: z.string(),
  type: fieldTypeSchema,
  id: z.string(),
});

This provides a single source of truth for both TypeScript's static type checking (catching typos at compile time) and Zod's runtime validation, significantly improving developer experience and code robustness.

Agent Context: Smarter Agents, Shorter Prompts

With workspace_id now a required argument for most tools, how could agents seamlessly operate within their designated workspace without cluttering their system prompts? System prompts can be unreliable for such specific, recurring data; LLM adherence varies, and long prompts can degrade performance. Tool schemas, however, offer a more structured and reliable way to provide this context.

Our solution: When adding the available tools to the LLM call for each agent response, we dynamically patch the JSON schema of each tool. If a tool's inputSchema includes a workspace_id property, we inject a "default": "AGENT_NATIVE_WORKSPACE_ID" value into the inputSchema, where AGENT_NATIVE_WORKSPACE_ID is the actual ID of the workspace the requesting agent itself belongs to.

Impact on Agent Behavior

Injecting default values into the tool schemas prior to adding them to the LLM call gives us the opportunity to add context in a structured way.

Before (No Schema Default):

  • Agent is in "Project Zeta."
  • User: "List tables in Project Zeta." -> Agent correctly calls list_tables({ workspace_id: "Project Zeta" }).
  • User (later): "Now, list memories." -> Agent, having "forgotten" the explicit context or not seeing it as relevant for this new request, might call list_memories({}).
  • Result: Tool call failed: workspace_id is required. The agent might then try to ask the user for the workspace or give up.

After (Schema Default Injected):

  • Agent is in "Project Zeta." The list_memories tool schema it received has properties.workspace_id.default: "Project Zeta".
  • User: "List tables in Project Zeta." -> Agent calls list_tables({ workspace_id: "Project Zeta" }).
  • User (later): "Now, list memories." -> Agent calls list_memories({}). The default workspace id Project Zeta is injected into the tool schema prior to being sent to the LLM and marked as the default.
  • The LLM sees the default workspace_id in its given tool schema and chooses to include in the generated tool call arguments.
  • MCP Server receives: list_memories({ workspace_id: "Project Zeta" }).
  • Result: The withWorkspaceAccess HOC validates Jane's access to "Project Zeta," the tool executes successfully. The agent responds: "Here are the memories in Project Zeta..." This significantly improved the reliability of contextual operations and reduced frustrating "I need more information" loops from the agent.
Futuristic server illustration labeled 'MCP' and 'OAUTH', with glowing data lines connecting it to a distant cityscape, symbolizing a networked AI ecosystem.
Our MCP server, anchored by OAuth security, stands as a central hub in a connected AI ecosystem, processing and routing intelligent agent interactions.

Lessons From the Trenches

No significant engineering effort is without its share of "learning opportunities":

  • The Evolving Standard: We initially built our server using the /sse (Server-Sent Events) transport for streaming, only for it to be deprecated from the MCP standard less than six months later! This underscored the need to stay agile, build with abstractions that can accommodate change, and be prepared for breaking changes in a nascent ecosystem.
  • Python SDK Growing Pains: The early need to use a fork of the Python SDK to access request headers for API key auth was a significant early hurdle. It was a major motivator for exploring and ultimately adopting the TypeScript SDK, which, for our specific auth-server separation needs, was more mature at the time.
  • The OAuth Chicken-and-Egg: Setting up the separate OAuth server (on our main TypeScript app) and getting the resource server (our MCP server) to correctly proxy/validate tokens involved a lot of trial and error with configurations, understanding the nuances of the ProxyOAuthServer in the SDK, and ensuring secure communication between the two.

Key Learnings:

  1. Embrace the Bleeding Edge (with Caution): Working with new standards like MCP is exciting but requires vigilance. Actively monitor GitHub issues, participate in community discussions, and design for adaptability.
  2. Type Safety is Non-Negotiable for Complexity: TypeScript + Zod saved us countless hours by catching errors statically and ensuring data integrity at runtime, especially as the number of tools and their parameters grew.
  3. Iterate, Abstract, Refine: Start with the simplest thing that could work, then refactor to create abstractions (like HOCs and utility functions) as patterns emerge. Don't over-engineer upfront.
  4. Client-Side Feedback is King: Using our own Portal One app as the primary MCP client provided an incredibly fast and realistic development and testing loop. Dogfooding is invaluable.
  5. Performance & Scalability (Initial Thoughts): While not yet a bottleneck for our current load, we're mindful of Firestore query patterns (e.g., avoiding large, unindexed scans by ensuring queries are highly selective). For future scaling, we'd consider Firestore indexing strategies, optimizing data structures, and potentially horizontal scaling of the Node.js MCP server instances. Cold starts for serverless deployments (if we were to go that route for some tools) would also be a key performance consideration.

What We'd Do Differently

Adopt TypeScript SDK Sooner for Auth Needs: Had we fully grasped the TypeScript SDK's capabilities for separate auth/resource server patterns earlier, we might have saved the initial Python development cycle for the MCP server itself.

The Path Forward

Our MCP server is now a powerful, secure, and extensible platform. Immediate next steps include:

  • Dynamic Tool Updates: Implementing support for MCP server-initiated events, so Portal One can automatically sync tool changes without requiring manual user action, making the tool ecosystem truly dynamic.
  • Expanding the Toolkit: Continuously adding new, sophisticated tools to enhance agent capabilities based on user needs and emerging AI use cases.
  • Granular Permissions & Scopes: Exploring more fine-grained permissions within workspaces for tool usage, potentially tied to more specific OAuth scopes, to further enhance security and control.

Conclusion: Powering the AI Native Future, Securely and Scalably

The journey to this MCP server—from Python experiments and API key workarounds to a type-safe TypeScript implementation with robust OAuth 2.0—has been transformative. The most significant outcome is a flexible architecture supporting multiple, specialized MCP servers, all delegating authentication to a central OAuth provider. This empowers any Portal One user to securely leverage a diverse and growing set of AI-driven tools.

The ability for users to seamlessly and securely "plug in" their own tools, or for us to rapidly deploy new ones, solves a foundational challenge we've faced since Portal One's inception. It's a critical step towards our vision of truly AI Native applications, where intelligent agents are not just assistants, but empowered, autonomous actors in a secure and interconnected digital world.


Full Example Code

We believe in sharing knowledge so we've extracted key components like our withWorkspaceAccess HOC, utility functions, and example tool structures into a public demo repository. Check out the GitHub repository here and star it if you find it helpful!