Skip to content

How to Configure Crawling for Your Site

Different websites need different crawling approaches. This guide gives you ready-to-use recipes for the most common site types. Pick the one that matches your site, apply the settings, and you’re done.

Not sure what type of site you have? Start with Standard mode and the default settings. It works for most websites. Only change things if your content comes back empty or noisy.

For a complete list of every setting and what it does, see Crawling Settings Reference.


Your website typeModeKey settingsSpeed
Static docs (Docusaurus, MkDocs, Hugo, Jekyll)StandardStatic Site Mode ONFastest
WordPress / CMS blogStandardReader Mode ONFast
React / Vue / Angular SPAAdvancedReader Mode ON, Enhanced SPA Detection ONSlower
Confluence / WikiAdvancedReader Mode ON, Content Selector, Concurrency 2—3Slowest
Any site without a sitemapAdvancedAdjust per site type aboveVaries
Large site (1,000+ pages)Standard (if sitemap exists)Tune Max Pages and ConcurrencyVaries

Examples: Docusaurus, MkDocs, ReadTheDocs, Hugo, Jekyll

SettingValue
ModeStandard
Static Site ModeON
Reader ModeOFF
Concurrency6—8

Why this works: These tools generate clean HTML with sitemaps. No browser rendering or content cleaning needed. This is the fastest possible configuration.


Examples: WordPress, Ghost, Webflow, Squarespace

SettingValue
ModeStandard
Static Site ModeOFF
Reader ModeON
Concurrency5—8

Why this works: CMS sites usually have sitemaps but include sidebars, related posts, ads, and navigation that pollute your knowledge base. Reader Mode strips all of that out and keeps just the article content.


Examples: Custom docs portals, Storybook, SPA-based knowledge bases

SettingValue
ModeAdvanced
Reader ModeON
Enhanced SPA DetectionON
Content Load Delay2 seconds
Max Depth10—15
Concurrency5

Why this works: Modern web apps (SPAs) build pages dynamically with JavaScript and often hide navigation links from standard crawlers. Enhanced SPA Detection finds these hidden links, and the browser renders the full page content. Many chatbot tools can’t handle these sites at all — ChatbotIQ detects them automatically and adapts.


Examples: Confluence Cloud, Confluence Server, MediaWiki

SettingValue
ModeAdvanced
Reader ModeON
Content Selector#main-content (ask your developer if unsure)
Content Load Delay3 seconds
Concurrency2—3
Max Depth15—20

Why this works: Confluence loads content in the background after the page appears, so it needs extra wait time. Low concurrency prevents Confluence from blocking the crawler (it’s aggressive about rate limiting). The content selector targets the article area and skips Confluence’s menus and sidebar.

Important: High concurrency on Confluence will trigger 429 rate-limit errors. Keep it at 2—3.


Examples: Legacy sites, hand-coded HTML, miscellaneous web apps

SettingValue
ModeAdvanced
Static Site ModeON if static HTML, OFF if JavaScript
Reader ModeON
Max PagesStart with 100, increase after review
Max Depth5—10
Concurrency5—8

Why this works: Without a sitemap, Advanced mode is your only option — it discovers pages by following links. Start with a smaller page limit to verify the crawler is finding the right pages before scaling up.


Examples: Enterprise documentation, large knowledge bases

SettingValue
ModeStandard (if sitemap exists)
Reader ModeON (if needed)
Max PagesYour plan’s page limit
Concurrency6—8

Why this works: Standard mode handles large sites efficiently because discovery is instant via the sitemap. High concurrency speeds up the crawl phase. If the site has no sitemap, use Advanced mode but expect longer discovery times.


ConfigurationSpeedBest for
Standard + Static Site Mode~100+ pages/minStatic HTML with sitemap
Standard + Browser (default)~20—30 pages/minCMS/WordPress with sitemap
Advanced + Browser~10—20 pages/minSPAs, no-sitemap sites
Advanced + Reader Mode~5—15 pages/minJS-heavy sites needing clean extraction
Advanced + Reader Mode + Low Concurrency~2—5 pages/minConfluence, rate-limited sites

Standard mode + Reader Mode ON covers ~70% of websites. It works for any site with a sitemap and server-rendered content.

If you need maximum coverage, Advanced mode + Reader Mode + Enhanced SPA Detection handles ~95% of sites — just slower.

  1. Start fast: Use Standard mode with defaults. If pages come back with good content, you’re done.
  2. Escalate if needed: If content is missing or noisy, enable Reader Mode. If pages are missing entirely, switch to Advanced mode.
  3. Only go slow when necessary: Low concurrency and high Content Load Delay are only needed for specific sites like Confluence.

If your crawl isn’t working as expected, see the crawling troubleshooting section for common issues and fixes.