This page documents every crawling and indexing setting available when adding or configuring a website source. For practical recipes by site type, see Configure Crawling for Your Site.
| Mode | Discovery method | When to use |
|---|
| Standard | Reads your sitemap.xml | Most websites. Fast discovery, separate crawl step. |
| Advanced | Follows links from the starting URL | Sites without a sitemap, SPAs, or when the sitemap is missing pages. Single-pass: content is captured during discovery. |
| Setting | Available in | Type | Default | Description |
|---|
| Static Site Mode | Both | Toggle | Off | Uses direct HTTP requests instead of a browser. ~30x faster for static HTML sites. Disables all JavaScript-dependent settings. |
| Wait for CSS Selector | Both (browser only) | Text | (empty) | Wait until a specific page element appears before extracting content. Useful for pages that show a loading spinner first. Your developer can provide the right value (e.g., #main-content). Not available when Static Site Mode is on. |
| Content Load Delay | Both (browser only) | Number (0—10 seconds) | 0 | Extra seconds to wait before capturing the page. The crawler already waits for content to stabilize, but slow sites may need more time. Not available when Static Site Mode is on. |
| Setting | Available in | Type | Default | Description |
|---|
| Reader Mode | Both | Toggle | On | Strips navigation, sidebars, footers, and other UI chrome. Extracts only the main article content, similar to your browser’s Reader View. Essential for CMS platforms, wikis, and sites with heavy page layout. |
| Content Selector | Both (Reader Mode only) | Text | (empty) | Narrows extraction to a specific part of the page. Only content inside that section is indexed. Available when Reader Mode is on. Your developer can help you find the right value for your site (common examples: #main-content for Confluence, article for blogs). |
| Include Images | Both | Toggle | Off | Preserve image references in extracted content. When enabled, your bot can include relevant images in responses. |
| Setting | Type | Default | Description |
|---|
| Max Depth | Number | 10 | How many links deep to follow from the starting URL. Depth 1 = only links on the starting page. Increase for deeply nested documentation. |
| Preserve Query Strings | Toggle | Off | When off, example.com/page and example.com/page?v=2 are treated as the same page. Enable only when different URL parameters lead to different content (e.g., versioned docs). |
| Enhanced SPA Detection | Toggle | Off | Finds hidden navigation links in modern web apps (React, Vue, Angular) that normal link discovery misses. Essential for single-page applications. Not available when Static Site Mode is on. |
| Manual URL Seeding | Text (one URL per line) | (empty) | Additional starting URLs for the crawler. Useful when your docs have multiple unlinked sections. |
| Exclude Navigation Links | Toggle | Off | Ignores links in navigation areas (header, footer, sidebar menus). Only follows links in the main content. Available when Manual URL Seeding is configured. |
| Setting | Available in | Type | Default | Description |
|---|
| Max Pages | Both | Number (1—10,000) | 1,000 | Maximum pages to discover. Discovery stops at this limit. Start with 100 for your first crawl, then increase. |
| Concurrency | Both | Number (1—8) | 5 | Pages processed simultaneously. Higher = faster but may trigger rate limiting. Use 2—3 for Confluence and rate-sensitive sites. |
Website sources are automatically re-crawled on a schedule to keep your knowledge base up to date.
| Setting | Type | Default | Description |
|---|
| Auto-Recrawl Enabled | Toggle | On | Automatically re-crawl this source on a schedule. Disable to only crawl manually. |
The recrawl frequency depends on your plan:
| Plan | Frequency |
|---|
| Free | Not available |
| Personal | Not available |
| Standard | Weekly (every 7 days) |
| Business | Daily (every 24 hours) |
Auto-recrawl only applies to website sources. PDF, QA, and file sources are not recrawled. The source must have completed its initial discovery before scheduled recrawls begin.
| Configuration | Approximate speed | Best for |
|---|
| Standard + Static Site Mode | ~100+ pages/min | Static HTML with sitemap |
| Standard + Browser (default) | ~20—30 pages/min | CMS/WordPress with sitemap |
| Advanced + Browser | ~10—20 pages/min | SPAs, no-sitemap sites |
| Advanced + Reader Mode | ~5—15 pages/min | JS-heavy sites, clean extraction |
| Advanced + Reader Mode + Low Concurrency | ~2—5 pages/min | Confluence, rate-limited sites |