Skip to main content
POST
/
v1
/
browsers
/
{id}
/
scrape
Scrape content using a persistent browser
curl --request POST \
  --url https://api.webcompute.dev/v1/browsers/{id}/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "tabId": "<string>",
  "expectedRuntimeEpoch": 123,
  "expectedSessionRevision": 123,
  "format": "markdown",
  "selector": "<string>",
  "iframe": "<string>",
  "waitFor": "<string>",
  "waitTimeout": 5000,
  "solveCaptcha": false,
  "timeout": 30000,
  "mode": "structured",
  "extract": "tree",
  "maxChars": 20000,
  "startFromChar": 5000000,
  "readFrom": "start",
  "include": [],
  "includeLinks": false,
  "iframeText": false,
  "policy": {}
}
'
{
  "v": 1,
  "url": "https://example.com/articles",
  "title": "Example Domain",
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples.",
  "provenance": {},
  "diagnostics": {},
  "completeness": {
    "ratio": 0.97,
    "complete": true,
    "capLimited": false,
    "pruningLimited": false
  },
  "statusCode": 200,
  "captchaSolved": false,
  "captchaDetected": "cloudflare-turnstile",
  "captchaState": "solved",
  "elapsedMs": 842,
  "tree": [
    {
      "depth": 0,
      "text": "Section heading",
      "children": "<array>"
    }
  ],
  "description": "Short summary surfaced under mode=summary.",
  "links": [
    {
      "text": "Pricing",
      "href": "https://example.com/pricing"
    }
  ],
  "forms": [
    {
      "valuePresent": false,
      "name": "email",
      "label": "Email address",
      "placeholder": "you@example.com",
      "type": "email"
    }
  ],
  "headings": [
    {
      "level": 2,
      "text": "Features"
    }
  ],
  "captcha": [
    {
      "type": "cloudflare-turnstile",
      "category": "blocking",
      "state": "solved",
      "sitekey": "0x4AAAAAAABbbbcccc",
      "pageUrl": "https://example.com/login",
      "frameId": "7A2A1B9DDEF4B8B9A6E99D8A7A65DCEE",
      "detectedAt": 1713890000123,
      "resolvedAt": 1713890001456,
      "method": "auto-wait",
      "elapsedMs": 333
    }
  ],
  "warnings": [
    "scrape-truncated:recommendedMaxChars=30000"
  ],
  "truncated": true,
  "charsOmitted": 1530,
  "nodesOmitted": 42,
  "pruningRatio": 0.38
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

id
string
required

Body

application/json
url
string

Target URL to navigate to before scraping. Omit to read the current page DOM without navigation or reload.

tabId
string

Runtime tab expected to remain active for current-page reads

expectedRuntimeEpoch
number

Expected runtime epoch for current-page stale-state protection

expectedSessionRevision
number

Expected browser-server session revision for current-page stale-state protection

format
enum<string>
default:markdown

Content output format

Available options:
html,
markdown,
text
selector
string

CSS selector scoping deterministic content reads. Omit to let the scraper pick a semantic landmark or fall back to body with landmark pruning.

iframe
string

CSS selector of the target iframe element

waitFor
string

CSS selector to wait for before reading content

waitTimeout
integer
default:5000

Max time in ms to wait for the waitFor selector.

Required range: 0 <= x <= 60000
solveCaptcha
boolean
default:false

Request passive Phase 1 CAPTCHA auto-resolution before extracting content

timeout
number
default:30000

Scrape operation timeout in milliseconds (1-60000)

mode
enum<string>
default:structured

Output shape. 'summary' emits short content (<=1200 chars) plus description; 'structured' emits the visible content payload with optional tree[] when extract='tree'; 'raw' opts out of the maxChars clamp.

Available options:
summary,
structured,
raw
extract
enum<string> | null

Opt-in content extractor. 'tree' emits hierarchical visible content. Use runtime extract for repeated records.

Available options:
tree
maxChars
integer
default:20000

Soft cap on response content character count. Clamps above default trigger truncated=true + charsOmitted in the response.

Required range: 1 <= x <= 50000
startFromChar
integer

Zero-based content character offset for continuing a truncated read.

Required range: 0 <= x <= 10000000
readFrom
enum<string>
default:start

Read window direction. Use end for latest/bottom content; startFromChar is ignored in end-window mode.

Available options:
start,
end
include
enum<string>[]

Opt-in extras added to the lean default scrape response. When supplied, this exact list controls optional categories; include: [] explicitly opts out of all extras. Unlisted categories are omitted.

Available options:
links,
forms,
headings,
captcha

Include deterministic anchor links in read output.

iframeText
boolean
default:false

When true and format is not 'html', append iframe text to content.

policy
object

Browser navigation policy for navigations performed by this scrape

Response

Scraped content with deterministic read diagnostics.

v
enum<number>
required
Available options:
1
Example:

1

url
string
required
Example:

"https://example.com/articles"

title
string
required
Example:

"Example Domain"

format
enum<string>
required
Available options:
html,
markdown,
text
Example:

"markdown"

content
string
required
Example:

"# Example Domain\n\nThis domain is for use in illustrative examples."

provenance
object
required
diagnostics
object
required
completeness
object
required
statusCode
number
required
Example:

200

captchaSolved
boolean
required
Example:

false

captchaDetected
enum<string> | null
required
Available options:
recaptcha-v2,
recaptcha-v3,
hcaptcha,
cloudflare-turnstile,
cloudflare-challenge,
datadome-captcha,
aws-waf-captcha,
geetest,
waf,
unknown
Example:

"cloudflare-turnstile"

captchaState
enum<string> | null
required
Available options:
detected,
solving,
solved,
failed,
timeout,
cancelled,
expired,
interactive_required
Example:

"solved"

elapsedMs
number
required
Example:

842

tree
object[]
description
string
Example:

"Short summary surfaced under mode=summary."

forms
object[]
headings
object[]
captcha
object[]
warnings
string[]
Example:
[
"scrape-truncated:recommendedMaxChars=30000"
]
truncated
boolean
Example:

true

charsOmitted
number
Example:

1530

nodesOmitted
number
Example:

42

pruningRatio
number
Example:

0.38