Scrape content using a persistent browser

curl --request POST \
  --url https://api.webcompute.dev/v1/browsers/{id}/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "tabId": "<string>",
  "expectedRuntimeEpoch": 123,
  "expectedSessionRevision": 123,
  "format": "markdown",
  "selector": "<string>",
  "iframe": "<string>",
  "waitFor": "<string>",
  "waitTimeout": 5000,
  "solveCaptcha": false,
  "timeout": 30000,
  "mode": "structured",
  "extract": "tree",
  "maxChars": 20000,
  "startFromChar": 5000000,
  "readFrom": "start",
  "include": [],
  "includeLinks": false,
  "iframeText": false,
  "policy": {}
}
'

{
  "v": 1,
  "url": "https://example.com/articles",
  "title": "Example Domain",
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples.",
  "provenance": {},
  "diagnostics": {},
  "completeness": {
    "ratio": 0.97,
    "complete": true,
    "capLimited": false,
    "pruningLimited": false
  },
  "statusCode": 200,
  "captchaSolved": false,
  "captchaDetected": "cloudflare-turnstile",
  "captchaState": "solved",
  "elapsedMs": 842,
  "tree": [
    {
      "depth": 0,
      "text": "Section heading",
      "children": "<array>"
    }
  ],
  "description": "Short summary surfaced under mode=summary.",
  "links": [
    {
      "text": "Pricing",
      "href": "https://example.com/pricing"
    }
  ],
  "forms": [
    {
      "valuePresent": false,
      "name": "email",
      "label": "Email address",
      "placeholder": "you@example.com",
      "type": "email"
    }
  ],
  "headings": [
    {
      "level": 2,
      "text": "Features"
    }
  ],
  "captcha": [
    {
      "type": "cloudflare-turnstile",
      "category": "blocking",
      "state": "solved",
      "sitekey": "0x4AAAAAAABbbbcccc",
      "pageUrl": "https://example.com/login",
      "frameId": "7A2A1B9DDEF4B8B9A6E99D8A7A65DCEE",
      "detectedAt": 1713890000123,
      "resolvedAt": 1713890001456,
      "method": "auto-wait",
      "elapsedMs": 333
    }
  ],
  "warnings": [
    "scrape-truncated:recommendedMaxChars=30000"
  ],
  "truncated": true,
  "charsOmitted": 1530,
  "nodesOmitted": 42,
  "pruningRatio": 0.38
}

POST

browsers

{id}

scrape

Scrape content using a persistent browser

curl --request POST \
  --url https://api.webcompute.dev/v1/browsers/{id}/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "tabId": "<string>",
  "expectedRuntimeEpoch": 123,
  "expectedSessionRevision": 123,
  "format": "markdown",
  "selector": "<string>",
  "iframe": "<string>",
  "waitFor": "<string>",
  "waitTimeout": 5000,
  "solveCaptcha": false,
  "timeout": 30000,
  "mode": "structured",
  "extract": "tree",
  "maxChars": 20000,
  "startFromChar": 5000000,
  "readFrom": "start",
  "include": [],
  "includeLinks": false,
  "iframeText": false,
  "policy": {}
}
'

{
  "v": 1,
  "url": "https://example.com/articles",
  "title": "Example Domain",
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples.",
  "provenance": {},
  "diagnostics": {},
  "completeness": {
    "ratio": 0.97,
    "complete": true,
    "capLimited": false,
    "pruningLimited": false
  },
  "statusCode": 200,
  "captchaSolved": false,
  "captchaDetected": "cloudflare-turnstile",
  "captchaState": "solved",
  "elapsedMs": 842,
  "tree": [
    {
      "depth": 0,
      "text": "Section heading",
      "children": "<array>"
    }
  ],
  "description": "Short summary surfaced under mode=summary.",
  "links": [
    {
      "text": "Pricing",
      "href": "https://example.com/pricing"
    }
  ],
  "forms": [
    {
      "valuePresent": false,
      "name": "email",
      "label": "Email address",
      "placeholder": "you@example.com",
      "type": "email"
    }
  ],
  "headings": [
    {
      "level": 2,
      "text": "Features"
    }
  ],
  "captcha": [
    {
      "type": "cloudflare-turnstile",
      "category": "blocking",
      "state": "solved",
      "sitekey": "0x4AAAAAAABbbbcccc",
      "pageUrl": "https://example.com/login",
      "frameId": "7A2A1B9DDEF4B8B9A6E99D8A7A65DCEE",
      "detectedAt": 1713890000123,
      "resolvedAt": 1713890001456,
      "method": "auto-wait",
      "elapsedMs": 333
    }
  ],
  "warnings": [
    "scrape-truncated:recommendedMaxChars=30000"
  ],
  "truncated": true,
  "charsOmitted": 1530,
  "nodesOmitted": 42,
  "pruningRatio": 0.38
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

string

required

Body

application/json

url

string

Target URL to navigate to before scraping. Omit to read the current page DOM without navigation or reload.

tabId

string

Runtime tab expected to remain active for current-page reads

expectedRuntimeEpoch

number

Expected runtime epoch for current-page stale-state protection

expectedSessionRevision

number

Expected browser-server session revision for current-page stale-state protection

format

enum<string>

default:markdown

Content output format

Available options:

html,

markdown,

text

selector

string

CSS selector scoping deterministic content reads. Omit to let the scraper pick a semantic landmark or fall back to body with landmark pruning.

iframe

string

CSS selector of the target iframe element

waitFor

string

CSS selector to wait for before reading content

waitTimeout

integer

default:5000

Max time in ms to wait for the waitFor selector.

Required range: 0 <= x <= 60000

solveCaptcha

boolean

default:false

Request passive Phase 1 CAPTCHA auto-resolution before extracting content

timeout

number

default:30000

Scrape operation timeout in milliseconds (1-60000)

mode

enum<string>

default:structured

Output shape. 'summary' emits short content (<=1200 chars) plus description; 'structured' emits the visible content payload with optional tree[] when extract='tree'; 'raw' opts out of the maxChars clamp.

Available options:

summary,

structured,

raw

extract

enum<string> | null

Opt-in content extractor. 'tree' emits hierarchical visible content. Use runtime extract for repeated records.

Available options:

tree

maxChars

integer

default:20000

Soft cap on response content character count. Clamps above default trigger truncated=true + charsOmitted in the response.

Required range: 1 <= x <= 50000

startFromChar

integer

Zero-based content character offset for continuing a truncated read.

Required range: 0 <= x <= 10000000

readFrom

enum<string>

default:start

Read window direction. Use end for latest/bottom content; startFromChar is ignored in end-window mode.

Available options:

start,

end

include

enum<string>[]

Opt-in extras added to the lean default scrape response. When supplied, this exact list controls optional categories; include: [] explicitly opts out of all extras. Unlisted categories are omitted.

Available options:

links,

forms,

headings,

captcha

includeLinks

boolean

default:false

Include deterministic anchor links in read output.

iframeText

boolean

default:false

When true and format is not 'html', append iframe text to content.

policy

object

Browser navigation policy for navigations performed by this scrape

Response

Scraped content with deterministic read diagnostics.

enum<number>

required

Available options:

1

Example:

1

url

string

required

Example:

"https://example.com/articles"

title

string

required

Example:

"Example Domain"

format

enum<string>

required

Available options:

html,

markdown,

text

Example:

"markdown"

content

string

required

Example:

"# Example Domain\n\nThis domain is for use in illustrative examples."

provenance

object

required

diagnostics

object

required

completeness

object

required

Show child attributes

statusCode

number

required

Example:

200

captchaSolved

boolean

required

Example:

false

captchaDetected

enum<string> | null

required

Available options:

recaptcha-v2,

recaptcha-v3,

hcaptcha,

cloudflare-turnstile,

cloudflare-challenge,

datadome-captcha,

aws-waf-captcha,

geetest,

waf,

unknown

Example:

"cloudflare-turnstile"

captchaState

enum<string> | null

required

Available options:

detected,

solving,

solved,

failed,

timeout,

cancelled,

expired,

interactive_required

Example:

"solved"

elapsedMs

number

required

Example:

842

tree

object[]

Show child attributes

description

string

Example:

"Short summary surfaced under mode=summary."

links

object[]

Show child attributes

forms

object[]

Show child attributes

headings

object[]

Show child attributes

captcha

object[]

Show child attributes

warnings

string[]

Example:

[
  "scrape-truncated:recommendedMaxChars=30000"
]

truncated

boolean

Example:

true

charsOmitted

number

Example:

1530

nodesOmitted

number

Example:

42

pruningRatio

number

Example:

0.38

Get a browser recording segment frame

Scrape a URL (ephemeral browser)

⌘I