Skip to main content

MCP Servers

Website Inspector

Overview

The Website Inspector is a powerful and intelligent MCP server designed to bridge the gap between AI agents and the content of the World Wide Web. It provides a robust tool that can crawl a website, process its content, and perform a semantic search to find the most relevant information based on a user's natural language query.

This server acts as a specialized Retrieval-Augmented Generation (RAG) tool, allowing an AI to "read" a website and answer questions about it without having to process the entire site from scratch for every query.

Key Features

  • Deep Crawling: Can navigate and index content across multiple pages of a target website, up to a configurable depth.
  • Intelligent Caching: Automatically caches website content. If the content is fresh, it serves results instantly. If it's outdated, it re-crawls to ensure the AI has the most current information.
  • Semantic Search: Uses advanced vector embeddings (text-embedding-005) to understand the meaning behind a query, not just keywords, leading to highly relevant results.
  • Content Extraction: Focuses on the main content of pages, stripping away irrelevant boilerplate like navbars, footers, and ads.
  • Configurable & Secure: Offers control over crawl depth, page limits, and cache duration. Maintained by Portal One and supports standard MCP authentication protocols.
  • Clear Progress Notifications: Provides detailed, real-time feedback on the progress of a request, from crawling and embedding to searching.

Use Cases

  • AI-Powered Research: An AI agent can use this tool to research topics from specific, authoritative websites.
  • Customer Support Automation: An AI can answer user questions by consulting a company's official documentation or help center.
  • Content Summarization: An agent can find the most relevant sections of a long article or blog post to create a concise summary.
  • Competitive Analysis: An AI can be tasked to find information about a competitor's products or services from their website.

Server Details

  • Maintainer: Portal One
  • Maintainer Site: portal.one
  • MCP URL: https://mcp.website-inspector.portal.one/mcp
  • Authentication: OAuth2Dynamic Client Registration

Tool: website_search

This is the primary tool provided by the Website Inspector server.

Description: Performs a semantic search of a given website and returns the most relevant text chunks. It will crawl the site if the content is not already cached or if the cached content is expired.

Input Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-The URL of the website to crawl and search.
querystringYes-The natural language query to perform on the website content.
kintegerNo3The number of search results (text chunks) to return.
maxDepthintegerNo2Maximum depth to crawl. 0 scrapes only the entered URL.
limitintegerNo50The maximum number of pages to crawl during the indexing process.
maxAgeintegerNo3600000Max age in milliseconds (1 hour) for cached content before re-crawling.

Output Schema

On a successful run, the tool returns a structuredContent object with the following format:

json

{
  "results": [
    {
      "pageContent": "A string containing the relevant text chunk found on the page.",
      "metadata": {
        "title": "The title of the source page.",
        "description": "The meta description of the source page.",
        "url": "The exact URL where the text chunk was found.",
        "loc": { "lines": { "from": 1, "to": 10 } }
      }
    }
  ]
}

Example Workflow

Scenario: A user asks an AI, "How do I sign up for Portal One?" The AI uses the website_search tool to find the answer on the portal.one website.

1. Tool Call (Request from AI):

json

{
  "tool": "website_search",
  "args": {
    "url": "https://portal.one/faq",
    "query": "How do I sign up?"
  }
}

2. Progress Notifications (from Server):

The user would see a series of clear progress steps:

[1/6] Initializing search for https://portal.one/faq...
[2/6] Cached content was not found. Crawling fresh data...
[3/6] Processing 1 page from https://portal.one/faq...
[4/6] Embedding content for search...
[5/6] Searching for "How do I sign up?"...
[6/6] Successfully completed search.

3. Successful Response (from Server):

json

{
  "structuredContent": {
    "results": [
      {
        "pageContent": "### How do I sign up or get started with Portal One?\n\nGetting started is easy! Visit our website at [portal.one](https://portal.one/) and look for the \"Get Started Now\" button. The process is quick, and you'll be on your way to commanding your AI agents.",
        "metadata": {
          "title": "Portal One - Frequently Asked Questions",
          "description": "Find answers to common questions about Portal One, AI agent management, LLMs, MCP servers, AI workflow scheduling, and human oversight.",
          "url": "https://portal.one/faq/",
          "loc": {
            "lines": {
              "from": 21,
              "to": 27
            }
          }
        }
      }
    ]
  }
}

Error Handling

  • If the target website cannot be crawled (e.g., due to a firewall or if it's offline), the tool will return a clear error message like: Could not crawl any content from [URL].
  • If no relevant results are found for the query, it will return: No results found for query "[query]" at [URL].