Crawling Settings Reference

This page documents every crawling and indexing setting available when adding or configuring a website source. For practical recipes by site type, see Configure Crawling for Your Site.

Where these settings live: Settings appear when you choose Manual configuration mode in the Add Source wizard (the default is AI-assisted, which picks settings for you). Open Advanced Options to reveal the Performance, Content Extraction, and Discovery Tuning sections. Page limit and Include Images are shown in both AI-assisted and Manual modes.

Discovery Mode

In the wizard, “Discovery Mode” selects how pages are found.

UI label	Discovery method	When to use
Sitemap Discovery (Recommended)	Reads your sitemap.xml	Most websites. Fast discovery, separate crawl step.
Link Crawling	Follows links from the starting URL	Sites without a sitemap, SPAs, or when the sitemap is missing pages. Single-pass: content is captured during discovery.

Site Presets

In Manual mode, Advanced Options opens with a Site Preset picker that fills in all settings for a common site type. You can still adjust any value afterward (which switches the preset to Custom).

Preset (UI label)	Best for	What it sets
Standard Website	Most websites	Browser rendering, Reader Mode on, Concurrency 10, no wait delay
SPA / JavaScript App	React, Vue, Angular	Advanced Link Detection on, Reader Mode off, Concurrency 5, 3-second content wait
Confluence / Wiki	Confluence, wikis	Reader Mode on, Concurrency 2 (avoids rate limits), 3-second content wait
Static Site / Blog	Plain HTML, blogs	Simple Mode on (no browser), Concurrency 20, fast HTTP crawling
Custom	Manual control	Whatever you set by hand

Performance

These settings live under Advanced Options → Performance.

Setting (UI label)	Available in	Type	Default	Description
Max Depth	Link Crawling only	Number (1-100)	10	How many links deep to follow from the starting URL. Depth 1 = only links on the starting page. Increase for deeply nested documentation.
Max Pages	Both	Number (1-10,000)	Your remaining plan budget	Maximum pages to discover. Discovery stops at this limit. Start with 100 for your first crawl, then increase. (Also shown as the Page limit field in both modes.)
Concurrency	Both	Number (1-50)	5	Pages processed simultaneously. Higher = faster but may trigger rate limiting. Use 2-3 for Confluence and rate-sensitive sites.
Simple Mode	Both	Toggle	Off	Uses direct HTTP requests instead of a browser. ~30x faster for static HTML sites. Disables all JavaScript-dependent settings.

Content Extraction

These settings live under Advanced Options → Content Extraction.

Setting (UI label)	Available in	Type	Default	Description
Reader Mode	Both	Toggle	On	Strips navigation, sidebars, footers, and other UI chrome. Extracts only the main article content, similar to your browser’s Reader View. Essential for CMS platforms, wikis, and sites with heavy page layout.
Content Area Selector	Both (Reader Mode only)	Text	(empty)	Narrows extraction to a specific part of the page. Only content inside that section is indexed. Available when Reader Mode is on. Your developer can help you find the right value for your site (common examples: `#main-content` for Confluence, `article` for blogs).
Wait for Page Element (Optional)	Both (browser only)	Text	(empty)	Wait until a specific page element appears before extracting content. Useful for pages that show a loading spinner first. Your developer can provide the right value (e.g., `#main-content`). Not available when Simple Mode is on.
Wait Time for Page Content (seconds)	Both (browser only)	Number (0-10 seconds, 0.5 steps)	0	Extra seconds to wait before capturing the page. The crawler already waits for content to stabilize, but slow sites may need more time. Not available when Simple Mode is on.

Discovery Tuning (Link Crawling only)

These settings live under Advanced Options → Discovery Tuning and only appear in Link Crawling mode.

Setting (UI label)	Type	Default	Description
Keep URL Parameters	Toggle	Off	When off, `example.com/page` and `example.com/page?v=2` are treated as the same page. Enable only when different URL parameters lead to different content (e.g., versioned docs).
Advanced Link Detection	Toggle	Off	Finds hidden navigation links in modern web apps (React, Vue, Angular) that normal link discovery misses. Essential for single-page applications. Not available when Simple Mode is on.
Add Specific Pages (Optional)	Text (one URL per line)	(empty)	Additional starting URLs for the crawler. Useful when your docs have multiple unlinked sections.
Skip Menu and Footer Links	Toggle	Off	Ignores links in navigation areas (header, footer, sidebar menus). Only follows links in the main content. Available when Add Specific Pages is configured.

Shown in both configuration modes

Setting (UI label)	Type	Default	Description
Page limit	Number (1-10,000)	Your remaining plan budget	Maximum pages to discover. Same as Max Pages, but shown at the top level in both AI-assisted and Manual modes.
Include Images	Toggle	Off	Preserve image references in extracted content. When enabled, your bot can include relevant images in responses.

Auto-recrawl

Website sources are automatically re-crawled on a schedule to keep your knowledge base up to date.

Setting	Type	Default	Description
Auto-Recrawl Enabled	Toggle	On	Automatically re-crawl this source on a schedule. Disable to only crawl manually.

The recrawl frequency depends on your plan:

Plan	Frequency
Free	Not available
Personal	Not available
Standard	Weekly (every 7 days)
Business	Daily (every 24 hours)

Auto-recrawl only applies to website sources. PDF, QA, and file sources are not recrawled. The source must have completed its initial discovery before scheduled recrawls begin.

Speed benchmarks

Configuration	Approximate speed	Best for
Sitemap Discovery + Simple Mode	~100+ pages/min	Static HTML with sitemap
Sitemap Discovery + Browser (default)	~20-30 pages/min	CMS/WordPress with sitemap
Link Crawling + Browser	~10-20 pages/min	SPAs, no-sitemap sites
Link Crawling + Reader Mode	~5-15 pages/min	JS-heavy sites, clean extraction
Link Crawling + Reader Mode + Low Concurrency	~2-5 pages/min	Confluence, rate-limited sites

Configure Crawling for Your Site - practical recipes by site type
How Web Crawling Works - conceptual overview
Add a Website Source - getting started with sources