transformSEO

Overview

The OnPage crawl family submits a full-site crawl to DataForSEO, then provides slice-and-dice tools to find specific technical issues once the crawl finishes. Async by design: submit-onpage-crawl kicks off the job, then use the retrieval tools with the returned task_id to pull filtered results.

For full-site audits with issue counts and health scoring see the Site Audits page - that uses the same underlying crawl but stores results per-site and gives you a scored health report.

submit-onpage-crawl

Kicks off an async crawl of a domain. Returns a task_id which you pass to the retrieval tools below once the crawl finishes (usually 2-15 minutes depending on site size).

Parameters

Parameter	Type	Required	Description
`target`	string	Yes	The domain to crawl (e.g. "example.com")
`max_crawl_pages`	integer	No	Maximum pages to crawl (default 100)

"Crawl example.com and audit the first 500 pages."

get-onpage-crawl-summary

Aggregate summary of a completed crawl: pages crawled, pages with issues, broken links, redirect chains, duplicate content counts, and onpage_score. Run this first after a crawl to get the high-level scorecard.

Parameters

Parameter	Type	Required	Description
`task_id`	string	Yes	ID from `submit-onpage-crawl`

get-onpage-crawl-pages

Page-level results from a crawl: URL, status code, response time, size, title, meta description, word count, and per-page checks. Use to drill into specific issues after viewing the summary.

Parameters

Parameter	Type	Required	Description
`task_id`	string	Yes	ID from `submit-onpage-crawl`
`limit`	integer	No	Max pages to return (default 100)

get-onpage-duplicate-tags

Pages with duplicate title or meta description tags - a classic SEO cleanup target. Duplicate tags signal weak on-page targeting and confuse search engines about which page should rank for which query.

get-onpage-duplicate-content

Pages with substantially identical body content. Common culprits: faceted navigation, session IDs, printer-friendly variants, and staging bleed-through. Dilutes ranking signals across URLs.

get-onpage-redirect-chains

URLs that redirect more than once before reaching a final destination. Multi-hop chains waste crawl budget and leak link equity at each step. Flag chains >1 redirect for cleanup.

get-onpage-non-indexable

Pages the crawler found but that can't be indexed (robots.txt blocked, noindex meta, canonical conflicts, 4xx/5xx responses). Use to audit whether deindexing is intentional.

get-onpage-keyword-density

Top keywords and phrases extracted from crawled pages, with frequency counts. Useful for understanding what a competitor's pages are topically dense around, or auditing your own pages for keyword cannibalization.

get-onpage-links

All internal and external links discovered during the crawl. Returns source page, target URL, anchor text, and link type (dofollow/nofollow). Use to audit internal link structure or find broken outbound links.

get-onpage-raw-html

Raw HTML as the crawler saw it for a specific URL. Use when rendered HTML (via get-content-parsing) isn't what you need - e.g., to check server-side rendering, meta tag placement, or structured data as-served.

Crawl Workflow

submit-onpage-crawl - kick off the crawl, save the task_id.
Wait 2-15 minutes for completion. For tracked sites, sync-site-audit handles the timing for you.
get-onpage-crawl-summary - high-level scorecard and issue counts.
Drill into issues: get-onpage-duplicate-tags, get-onpage-duplicate-content, get-onpage-redirect-chains, get-onpage-non-indexable.
get-onpage-crawl-pages - full page-level data when you need specifics.
get-onpage-raw-html - diagnose individual problem pages.

OnPage Crawl & Technical Audits

Overview

submit-onpage-crawl

Parameters

get-onpage-crawl-summary

Parameters

get-onpage-crawl-pages

Parameters

get-onpage-duplicate-tags

get-onpage-duplicate-content

get-onpage-redirect-chains

get-onpage-non-indexable

get-onpage-keyword-density

get-onpage-links

get-onpage-raw-html

Crawl Workflow