Skip to content

Crawling Settings Reference

This page documents every crawling and indexing setting available when adding or configuring a website source. For practical recipes by site type, see Configure Crawling for Your Site.


ModeDiscovery methodWhen to use
StandardReads your sitemap.xmlMost websites. Fast discovery, separate crawl step.
AdvancedFollows links from the starting URLSites without a sitemap, SPAs, or when the sitemap is missing pages. Single-pass: content is captured during discovery.

SettingAvailable inTypeDefaultDescription
Static Site ModeBothToggleOffUses direct HTTP requests instead of a browser. ~30x faster for static HTML sites. Disables all JavaScript-dependent settings.
Wait for CSS SelectorBoth (browser only)Text(empty)Wait until a specific page element appears before extracting content. Useful for pages that show a loading spinner first. Your developer can provide the right value (e.g., #main-content). Not available when Static Site Mode is on.
Content Load DelayBoth (browser only)Number (0—10 seconds)0Extra seconds to wait before capturing the page. The crawler already waits for content to stabilize, but slow sites may need more time. Not available when Static Site Mode is on.

SettingAvailable inTypeDefaultDescription
Reader ModeBothToggleOnStrips navigation, sidebars, footers, and other UI chrome. Extracts only the main article content, similar to your browser’s Reader View. Essential for CMS platforms, wikis, and sites with heavy page layout.
Content SelectorBoth (Reader Mode only)Text(empty)Narrows extraction to a specific part of the page. Only content inside that section is indexed. Available when Reader Mode is on. Your developer can help you find the right value for your site (common examples: #main-content for Confluence, article for blogs).
Include ImagesBothToggleOffPreserve image references in extracted content. When enabled, your bot can include relevant images in responses.

SettingTypeDefaultDescription
Max DepthNumber10How many links deep to follow from the starting URL. Depth 1 = only links on the starting page. Increase for deeply nested documentation.
Preserve Query StringsToggleOffWhen off, example.com/page and example.com/page?v=2 are treated as the same page. Enable only when different URL parameters lead to different content (e.g., versioned docs).
Enhanced SPA DetectionToggleOffFinds hidden navigation links in modern web apps (React, Vue, Angular) that normal link discovery misses. Essential for single-page applications. Not available when Static Site Mode is on.
Manual URL SeedingText (one URL per line)(empty)Additional starting URLs for the crawler. Useful when your docs have multiple unlinked sections.
Exclude Navigation LinksToggleOffIgnores links in navigation areas (header, footer, sidebar menus). Only follows links in the main content. Available when Manual URL Seeding is configured.

SettingAvailable inTypeDefaultDescription
Max PagesBothNumber (1—10,000)1,000Maximum pages to discover. Discovery stops at this limit. Start with 100 for your first crawl, then increase.
ConcurrencyBothNumber (1—8)5Pages processed simultaneously. Higher = faster but may trigger rate limiting. Use 2—3 for Confluence and rate-sensitive sites.

Website sources are automatically re-crawled on a schedule to keep your knowledge base up to date.

SettingTypeDefaultDescription
Auto-Recrawl EnabledToggleOnAutomatically re-crawl this source on a schedule. Disable to only crawl manually.

The recrawl frequency depends on your plan:

PlanFrequency
FreeNot available
PersonalNot available
StandardWeekly (every 7 days)
BusinessDaily (every 24 hours)

Auto-recrawl only applies to website sources. PDF, QA, and file sources are not recrawled. The source must have completed its initial discovery before scheduled recrawls begin.


ConfigurationApproximate speedBest for
Standard + Static Site Mode~100+ pages/minStatic HTML with sitemap
Standard + Browser (default)~20—30 pages/minCMS/WordPress with sitemap
Advanced + Browser~10—20 pages/minSPAs, no-sitemap sites
Advanced + Reader Mode~5—15 pages/minJS-heavy sites, clean extraction
Advanced + Reader Mode + Low Concurrency~2—5 pages/minConfluence, rate-limited sites