URLs & Feeds
Keep your knowledge base up to date with web content. Add individual URLs for one-time indexing or RSS/Atom feeds for automatic updates.Add a URL
Index content from a web page:Crawl a Website
Index multiple pages from a website:Crawl Options
| Option | Description | Default |
|---|---|---|
max_pages | Maximum pages to index | 100 |
max_depth | How many links deep to follow | 3 |
include_patterns | URL patterns to include (glob) | ["*"] |
exclude_patterns | URL patterns to skip (glob) | [] |
respect_robots | Honor robots.txt | true |
wait_time_ms | Delay between requests | 1000 |
Check Crawl Status
Add an RSS/Atom Feed
Subscribe to a feed for automatic updates:Feed Options
| Option | Description | Default |
|---|---|---|
check_interval | How often to check for updates | "1h" |
max_items | Maximum items to keep indexed | 100 |
index_full_content | Fetch and index full article | true |
15m, 30m, 1h, 6h, 12h, 24h
List URLs
List Feeds
Refresh a URL
Re-index a URL to get updated content:Refresh a Feed
Immediately check a feed for new items:Delete a URL
Delete a Feed
Bulk Add URLs
Add multiple URLs at once:Content Extraction
Web pages are processed to extract meaningful content:- HTML parsing - Extract text content
- Boilerplate removal - Remove navigation, footers, ads
- Structure preservation - Keep headings, lists, tables
- Metadata extraction - Title, description, publish date

