OnPage Crawl & Technical Audits

Overview

The OnPage crawl family submits a full-site crawl to DataForSEO, then provides slice-and-dice tools to find specific technical issues once the crawl finishes. Async by design: submit-onpage-crawl kicks off the job, then use the retrieval tools with the returned task_id to pull filtered results.

For full-site audits with issue counts and health scoring see the Site Audits page - that uses the same underlying crawl but stores results per-site and gives you a scored health report.

submit-onpage-crawl

Kicks off an async crawl of a domain. Returns a task_id which you pass to the retrieval tools below once the crawl finishes (usually 2-15 minutes depending on site size).

Parameters

ParameterTypeRequiredDescription
targetstringYesThe domain to crawl (e.g. "example.com")
max_crawl_pagesintegerNoMaximum pages to crawl (default 100)
"Crawl example.com and audit the first 500 pages."

get-onpage-crawl-summary

Aggregate summary of a completed crawl: pages crawled, pages with issues, broken links, redirect chains, duplicate content counts, and onpage_score. Run this first after a crawl to get the high-level scorecard.

Parameters

ParameterTypeRequiredDescription
task_idstringYesID from submit-onpage-crawl

get-onpage-crawl-pages

Page-level results from a crawl: URL, status code, response time, size, title, meta description, word count, and per-page checks. Use to drill into specific issues after viewing the summary.

Parameters

ParameterTypeRequiredDescription
task_idstringYesID from submit-onpage-crawl
limitintegerNoMax pages to return (default 100)

get-onpage-duplicate-tags

Pages with duplicate title or meta description tags - a classic SEO cleanup target. Duplicate tags signal weak on-page targeting and confuse search engines about which page should rank for which query.

get-onpage-duplicate-content

Pages with substantially identical body content. Common culprits: faceted navigation, session IDs, printer-friendly variants, and staging bleed-through. Dilutes ranking signals across URLs.

get-onpage-redirect-chains

URLs that redirect more than once before reaching a final destination. Multi-hop chains waste crawl budget and leak link equity at each step. Flag chains >1 redirect for cleanup.

get-onpage-non-indexable

Pages the crawler found but that can't be indexed (robots.txt blocked, noindex meta, canonical conflicts, 4xx/5xx responses). Use to audit whether deindexing is intentional.

get-onpage-keyword-density

Top keywords and phrases extracted from crawled pages, with frequency counts. Useful for understanding what a competitor's pages are topically dense around, or auditing your own pages for keyword cannibalization.

All internal and external links discovered during the crawl. Returns source page, target URL, anchor text, and link type (dofollow/nofollow). Use to audit internal link structure or find broken outbound links.

get-onpage-raw-html

Raw HTML as the crawler saw it for a specific URL. Use when rendered HTML (via get-content-parsing) isn't what you need - e.g., to check server-side rendering, meta tag placement, or structured data as-served.

Crawl Workflow

  1. submit-onpage-crawl - kick off the crawl, save the task_id.
  2. Wait 2-15 minutes for completion. For tracked sites, sync-site-audit handles the timing for you.
  3. get-onpage-crawl-summary - high-level scorecard and issue counts.
  4. Drill into issues: get-onpage-duplicate-tags, get-onpage-duplicate-content, get-onpage-redirect-chains, get-onpage-non-indexable.
  5. get-onpage-crawl-pages - full page-level data when you need specifics.
  6. get-onpage-raw-html - diagnose individual problem pages.