Skip to main content
POST
/
v1
/
scrape
Scrape a URL (ephemeral browser)
curl --request POST \
  --url https://api.webcompute.dev/v1/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "proxy": "<string>",
  "waitFor": "<string>",
  "timeout": 30000,
  "policy": {},
  "format": "markdown",
  "selector": "<string>",
  "iframe": "<string>",
  "solveCaptcha": false,
  "mode": "structured",
  "extract": "tree",
  "maxChars": 20000,
  "startFromChar": 5000000,
  "readFrom": "start",
  "include": [],
  "includeLinks": false,
  "iframeText": false
}
'
{
  "v": 1,
  "url": "https://example.com/articles",
  "title": "Example Domain",
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples.",
  "provenance": {},
  "diagnostics": {},
  "completeness": {
    "ratio": 0.97,
    "complete": true,
    "capLimited": false,
    "pruningLimited": false
  },
  "statusCode": 200,
  "captchaSolved": false,
  "captchaDetected": "cloudflare-turnstile",
  "captchaState": "solved",
  "elapsedMs": 842,
  "tree": [
    {
      "depth": 0,
      "text": "Section heading",
      "children": "<array>"
    }
  ],
  "description": "Short summary surfaced under mode=summary.",
  "links": [
    {
      "text": "Pricing",
      "href": "https://example.com/pricing"
    }
  ],
  "forms": [
    {
      "valuePresent": false,
      "name": "email",
      "label": "Email address",
      "placeholder": "you@example.com",
      "type": "email"
    }
  ],
  "headings": [
    {
      "level": 2,
      "text": "Features"
    }
  ],
  "captcha": [
    {
      "type": "cloudflare-turnstile",
      "category": "blocking",
      "state": "solved",
      "sitekey": "0x4AAAAAAABbbbcccc",
      "pageUrl": "https://example.com/login",
      "frameId": "7A2A1B9DDEF4B8B9A6E99D8A7A65DCEE",
      "detectedAt": 1713890000123,
      "resolvedAt": 1713890001456,
      "method": "auto-wait",
      "elapsedMs": 333
    }
  ],
  "warnings": [
    "scrape-truncated:recommendedMaxChars=30000"
  ],
  "truncated": true,
  "charsOmitted": 1530,
  "nodesOmitted": 42,
  "pruningRatio": 0.38
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string
required

Target URL to navigate to

proxy
string

Proxy URL (protocol://user:pass@host:port)

waitFor
string

CSS selector to wait for before action

timeout
number
default:30000

Action timeout in milliseconds (1-60000)

policy
object

Browser navigation policy for this action

format
enum<string>
default:markdown

Content output format

Available options:
html,
markdown,
text
selector
string

CSS selector scoping deterministic content reads. Omit to let the scraper pick a semantic landmark or fall back to body with landmark pruning.

iframe
string

CSS selector of a specific iframe to scrape.

solveCaptcha
boolean
default:false

Request passive Phase 1 CAPTCHA auto-resolution before extracting content

mode
enum<string>
default:structured

Output shape. 'summary' emits short content (<=1200 chars) plus description; 'structured' emits the visible content payload with optional tree[] when extract='tree'; 'raw' opts out of the maxChars clamp.

Available options:
summary,
structured,
raw
extract
enum<string> | null

Opt-in content extractor. 'tree' emits hierarchical visible content (comment threads, file trees, nested menus). Use runtime extract for repeated records.

Available options:
tree
maxChars
integer
default:20000

Soft cap on response content character count. Clamps above default trigger truncated=true + charsOmitted in the response.

Required range: 1 <= x <= 50000
startFromChar
integer

Zero-based content character offset for continuing a truncated read.

Required range: 0 <= x <= 10000000
readFrom
enum<string>
default:start

Read window direction. Use end for latest/bottom content; startFromChar is ignored in end-window mode.

Available options:
start,
end
include
enum<string>[]

Opt-in extras added to the lean default scrape response. When supplied, this exact list controls optional categories; include: [] explicitly opts out of all extras. Security signals (captchaDetected/captchaState/captchaSolved), structural metadata, and diagnostics (truncated/etc.) are always populated regardless. Duplicates are collapsed.

Available options:
links,
forms,
headings,
captcha

Include deterministic anchor links in read output.

iframeText
boolean
default:false

When true and format is not 'html', append iframe text to content.

Response

Scrape result with deterministic content. Response always includes completeness.

v
enum<number>
required
Available options:
1
Example:

1

url
string
required
Example:

"https://example.com/articles"

title
string
required
Example:

"Example Domain"

format
enum<string>
required
Available options:
html,
markdown,
text
Example:

"markdown"

content
string
required
Example:

"# Example Domain\n\nThis domain is for use in illustrative examples."

provenance
object
required
diagnostics
object
required
completeness
object
required
statusCode
number
required
Example:

200

captchaSolved
boolean
required
Example:

false

captchaDetected
enum<string> | null
required
Available options:
recaptcha-v2,
recaptcha-v3,
hcaptcha,
cloudflare-turnstile,
cloudflare-challenge,
datadome-captcha,
aws-waf-captcha,
geetest,
waf,
unknown
Example:

"cloudflare-turnstile"

captchaState
enum<string> | null
required
Available options:
detected,
solving,
solved,
failed,
timeout,
cancelled,
expired,
interactive_required
Example:

"solved"

elapsedMs
number
required
Example:

842

tree
object[]
description
string
Example:

"Short summary surfaced under mode=summary."

forms
object[]
headings
object[]
captcha
object[]
warnings
string[]
Example:
[
"scrape-truncated:recommendedMaxChars=30000"
]
truncated
boolean
Example:

true

charsOmitted
number
Example:

1530

nodesOmitted
number
Example:

42

pruningRatio
number
Example:

0.38