OnPage Crawl & Technical Audits
Overview
The OnPage crawl family submits a full-site crawl to DataForSEO, then provides slice-and-dice tools to find specific technical issues once the crawl finishes. Async by design: submit-onpage-crawl kicks off the job, then use the retrieval tools with the returned task_id to pull filtered results.
For full-site audits with issue counts and health scoring see the Site Audits page - that uses the same underlying crawl but stores results per-site and gives you a scored health report.
submit-onpage-crawl
Kicks off an async crawl of a domain. Returns a task_id which you pass to the retrieval tools below once the crawl finishes (usually 2-15 minutes depending on site size).
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
target | string | Yes | The domain to crawl (e.g. "example.com") |
max_crawl_pages | integer | No | Maximum pages to crawl (default 100) |
"Crawl example.com and audit the first 500 pages."
get-onpage-crawl-summary
Aggregate summary of a completed crawl: pages crawled, pages with issues, broken links, redirect chains, duplicate content counts, and onpage_score. Run this first after a crawl to get the high-level scorecard.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
task_id | string | Yes | ID from submit-onpage-crawl |
get-onpage-crawl-pages
Page-level results from a crawl: URL, status code, response time, size, title, meta description, word count, and per-page checks. Use to drill into specific issues after viewing the summary.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
task_id | string | Yes | ID from submit-onpage-crawl |
limit | integer | No | Max pages to return (default 100) |
get-onpage-duplicate-tags
Pages with duplicate title or meta description tags - a classic SEO cleanup target. Duplicate tags signal weak on-page targeting and confuse search engines about which page should rank for which query.
get-onpage-duplicate-content
Pages with substantially identical body content. Common culprits: faceted navigation, session IDs, printer-friendly variants, and staging bleed-through. Dilutes ranking signals across URLs.
get-onpage-redirect-chains
URLs that redirect more than once before reaching a final destination. Multi-hop chains waste crawl budget and leak link equity at each step. Flag chains >1 redirect for cleanup.
get-onpage-non-indexable
Pages the crawler found but that can't be indexed (robots.txt blocked, noindex meta, canonical conflicts, 4xx/5xx responses). Use to audit whether deindexing is intentional.
get-onpage-keyword-density
Top keywords and phrases extracted from crawled pages, with frequency counts. Useful for understanding what a competitor's pages are topically dense around, or auditing your own pages for keyword cannibalization.
get-onpage-links
All internal and external links discovered during the crawl. Returns source page, target URL, anchor text, and link type (dofollow/nofollow). Use to audit internal link structure or find broken outbound links.
get-onpage-raw-html
Raw HTML as the crawler saw it for a specific URL. Use when rendered HTML (via get-content-parsing) isn't what you need - e.g., to check server-side rendering, meta tag placement, or structured data as-served.
Crawl Workflow
submit-onpage-crawl- kick off the crawl, save thetask_id.- Wait 2-15 minutes for completion. For tracked sites,
sync-site-audithandles the timing for you. get-onpage-crawl-summary- high-level scorecard and issue counts.- Drill into issues:
get-onpage-duplicate-tags,get-onpage-duplicate-content,get-onpage-redirect-chains,get-onpage-non-indexable. get-onpage-crawl-pages- full page-level data when you need specifics.get-onpage-raw-html- diagnose individual problem pages.