How to Configure Crawling for Your Site
Different websites need different crawling approaches. This guide gives you ready-to-use recipes for the most common site types. Pick the one that matches your site, apply the settings, and you’re done.
Not sure what type of site you have? Start with Standard mode and the default settings. It works for most websites. Only change things if your content comes back empty or noisy.
For a complete list of every setting and what it does, see Crawling Settings Reference.
Quick decision table
Section titled “Quick decision table”| Your website type | Mode | Key settings | Speed |
|---|---|---|---|
| Static docs (Docusaurus, MkDocs, Hugo, Jekyll) | Standard | Static Site Mode ON | Fastest |
| WordPress / CMS blog | Standard | Reader Mode ON | Fast |
| React / Vue / Angular SPA | Advanced | Reader Mode ON, Enhanced SPA Detection ON | Slower |
| Confluence / Wiki | Advanced | Reader Mode ON, Content Selector, Concurrency 2—3 | Slowest |
| Any site without a sitemap | Advanced | Adjust per site type above | Varies |
| Large site (1,000+ pages) | Standard (if sitemap exists) | Tune Max Pages and Concurrency | Varies |
Recipe: Static documentation site
Section titled “Recipe: Static documentation site”Examples: Docusaurus, MkDocs, ReadTheDocs, Hugo, Jekyll
| Setting | Value |
|---|---|
| Mode | Standard |
| Static Site Mode | ON |
| Reader Mode | OFF |
| Concurrency | 6—8 |
Why this works: These tools generate clean HTML with sitemaps. No browser rendering or content cleaning needed. This is the fastest possible configuration.
Recipe: WordPress or CMS blog
Section titled “Recipe: WordPress or CMS blog”Examples: WordPress, Ghost, Webflow, Squarespace
| Setting | Value |
|---|---|
| Mode | Standard |
| Static Site Mode | OFF |
| Reader Mode | ON |
| Concurrency | 5—8 |
Why this works: CMS sites usually have sitemaps but include sidebars, related posts, ads, and navigation that pollute your knowledge base. Reader Mode strips all of that out and keeps just the article content.
Recipe: React / Vue / Angular SPA
Section titled “Recipe: React / Vue / Angular SPA”Examples: Custom docs portals, Storybook, SPA-based knowledge bases
| Setting | Value |
|---|---|
| Mode | Advanced |
| Reader Mode | ON |
| Enhanced SPA Detection | ON |
| Content Load Delay | 2 seconds |
| Max Depth | 10—15 |
| Concurrency | 5 |
Why this works: Modern web apps (SPAs) build pages dynamically with JavaScript and often hide navigation links from standard crawlers. Enhanced SPA Detection finds these hidden links, and the browser renders the full page content. Many chatbot tools can’t handle these sites at all — ChatbotIQ detects them automatically and adapts.
Recipe: Confluence or wiki
Section titled “Recipe: Confluence or wiki”Examples: Confluence Cloud, Confluence Server, MediaWiki
| Setting | Value |
|---|---|
| Mode | Advanced |
| Reader Mode | ON |
| Content Selector | #main-content (ask your developer if unsure) |
| Content Load Delay | 3 seconds |
| Concurrency | 2—3 |
| Max Depth | 15—20 |
Why this works: Confluence loads content in the background after the page appears, so it needs extra wait time. Low concurrency prevents Confluence from blocking the crawler (it’s aggressive about rate limiting). The content selector targets the article area and skips Confluence’s menus and sidebar.
Important: High concurrency on Confluence will trigger 429 rate-limit errors. Keep it at 2—3.
Recipe: Site without a sitemap
Section titled “Recipe: Site without a sitemap”Examples: Legacy sites, hand-coded HTML, miscellaneous web apps
| Setting | Value |
|---|---|
| Mode | Advanced |
| Static Site Mode | ON if static HTML, OFF if JavaScript |
| Reader Mode | ON |
| Max Pages | Start with 100, increase after review |
| Max Depth | 5—10 |
| Concurrency | 5—8 |
Why this works: Without a sitemap, Advanced mode is your only option — it discovers pages by following links. Start with a smaller page limit to verify the crawler is finding the right pages before scaling up.
Recipe: Large site (1,000+ pages)
Section titled “Recipe: Large site (1,000+ pages)”Examples: Enterprise documentation, large knowledge bases
| Setting | Value |
|---|---|
| Mode | Standard (if sitemap exists) |
| Reader Mode | ON (if needed) |
| Max Pages | Your plan’s page limit |
| Concurrency | 6—8 |
Why this works: Standard mode handles large sites efficiently because discovery is instant via the sitemap. High concurrency speeds up the crawl phase. If the site has no sitemap, use Advanced mode but expect longer discovery times.
Speed vs. coverage tradeoffs
Section titled “Speed vs. coverage tradeoffs”Fastest to slowest
Section titled “Fastest to slowest”| Configuration | Speed | Best for |
|---|---|---|
| Standard + Static Site Mode | ~100+ pages/min | Static HTML with sitemap |
| Standard + Browser (default) | ~20—30 pages/min | CMS/WordPress with sitemap |
| Advanced + Browser | ~10—20 pages/min | SPAs, no-sitemap sites |
| Advanced + Reader Mode | ~5—15 pages/min | JS-heavy sites needing clean extraction |
| Advanced + Reader Mode + Low Concurrency | ~2—5 pages/min | Confluence, rate-limited sites |
Best all-rounder
Section titled “Best all-rounder”Standard mode + Reader Mode ON covers ~70% of websites. It works for any site with a sitemap and server-rendered content.
If you need maximum coverage, Advanced mode + Reader Mode + Enhanced SPA Detection handles ~95% of sites — just slower.
When to trade speed for coverage
Section titled “When to trade speed for coverage”- Start fast: Use Standard mode with defaults. If pages come back with good content, you’re done.
- Escalate if needed: If content is missing or noisy, enable Reader Mode. If pages are missing entirely, switch to Advanced mode.
- Only go slow when necessary: Low concurrency and high Content Load Delay are only needed for specific sites like Confluence.
If your crawl isn’t working as expected, see the crawling troubleshooting section for common issues and fixes.
Related
Section titled “Related”- Crawling Settings Reference — every setting with type, default, and description
- How Web Crawling Works — understand Standard vs. Advanced mode
- Add a Website Source — the basics of adding a source
- Keep Content Up to Date — re-crawling and refresh strategies