Browser Automation
セキュリティ警告

Advanced headless browser automation skill for OpenClaw agents. Enables intelligent web navigation, form filling, data extraction, and UI testing with structured commands and semantic element targeting.

インストール
$clawhub install browserautomation-skill

OpenClaw Browser Automation Skill

IMPORTANT: Before You Start

READ THE PROJECT README.md FIRST!

Before executing any browser automation tasks: 1. Read the project's README.md to understand the context and requirements 2. Identify target URLs, credentials (if any), and expected outcomes 3. Plan your automation workflow before running commands 4. Check if authentication/session state exists that can be reused

Installation

npm install -g agent-browser
agent-browser install --with-deps

Verify Installation

agent-browser --version

If installation fails, try: bash npx agent-browser install --with-deps

Core Workflow Pattern

Every browser automation task follows this pattern:

1. OPEN    -> Navigate to target URL
2. SNAPSHOT -> Analyze page structure, get element refs
3. INTERACT -> Click, fill, select using refs
4. VERIFY   -> Re-snapshot to confirm changes
5. REPEAT   -> Continue until task complete
6. CLOSE    -> Clean up browser session

Quick Reference

Command Description
agent-browser open <url> Navigate to URL
agent-browser back Go back in history
agent-browser forward Go forward
agent-browser reload Reload current page
agent-browser close Close browser session

Page Analysis

Command Description
agent-browser snapshot -i Most used: Interactive elements with refs
agent-browser snapshot -i -c Compact interactive snapshot
agent-browser snapshot -s "#main" Scope to specific container
agent-browser snapshot -d 3 Limit tree depth

Element Interaction (use @refs from snapshot)

Command Description
agent-browser click @e1 Click element
agent-browser fill @e1 "text" Clear field and type (preferred for inputs)
agent-browser type @e1 "text" Type without clearing
agent-browser press Enter Press keyboard key
agent-browser press Control+a Key combination
agent-browser select @e1 "value" Select dropdown option
agent-browser check @e1 Check checkbox
agent-browser uncheck @e1 Uncheck checkbox
agent-browser hover @e1 Hover over element
agent-browser upload @e1 file.pdf Upload file

Data Extraction

Command Description
agent-browser get text @e1 Get element text content
agent-browser get html @e1 Get inner HTML
agent-browser get value @e1 Get input field value
agent-browser get attr @e1 href Get specific attribute
agent-browser get title Get page title
agent-browser get url Get current URL
agent-browser get count ".selector" Count matching elements

Waiting (Critical for Reliability)

Command Description
agent-browser wait @e1 Wait for element to appear
agent-browser wait 2000 Wait milliseconds
agent-browser wait --text "Success" Wait for text to appear
agent-browser wait --url "/dashboard" Wait for URL change
agent-browser wait --load networkidle Wait for network idle

Screenshots & PDF

Command Description
agent-browser screenshot out.png Save screenshot
agent-browser screenshot --full out.png Full page screenshot
agent-browser pdf output.pdf Save page as PDF

Common Task Recipes

Recipe 1: Login Flow

# 1. Open login page
agent-browser open https://example.com/login

# 2. Get interactive elements
agent-browser snapshot -i
# Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]

# 3. Fill credentials
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "secure_password"

# 4. Submit
agent-browser click @e3

# 5. Wait for redirect
agent-browser wait --url "/dashboard"
agent-browser wait --load networkidle

# 6. Save session for reuse
agent-browser state save session.json

# 7. Verify success
agent-browser snapshot -i

Recipe 2: Data Extraction Loop

# Navigate to listing page
agent-browser open https://example.com/products

# Get initial snapshot
agent-browser snapshot -i

# Extract data from each item
agent-browser get text @e1
agent-browser get attr @e2 href

# Check for pagination
agent-browser snapshot -s ".pagination"

# Click next if exists
agent-browser click @e5
agent-browser wait --load networkidle

# Re-snapshot for new content
agent-browser snapshot -i

Recipe 3: Form Submission with Validation

# Open form
agent-browser open https://example.com/contact

# Analyze form structure
agent-browser snapshot -i

# Fill all required fields
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser fill @e3 "Hello, this is my message."

# Select dropdown if present
agent-browser select @e4 "support"

# Check required checkbox
agent-browser check @e5

# Submit form
agent-browser click @e6

# Wait and verify submission
agent-browser wait --text "Thank you"
agent-browser snapshot -i

Recipe 4: Session Persistence

# First time: Login and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "/home"
agent-browser state save auth-session.json

# Later: Restore session and continue
agent-browser state load auth-session.json
agent-browser open https://app.example.com/dashboard
# Already logged in!

Recipe 5: Multi-Tab Workflow

# Open first site
agent-browser open https://site-a.com

# Open second tab
agent-browser tab new https://site-b.com

# List tabs
agent-browser tab
# Output: Tab 1: site-a.com, Tab 2: site-b.com (active)

# Switch between tabs
agent-browser tab 1
agent-browser snapshot -i
# Work on tab 1...

agent-browser tab 2
agent-browser snapshot -i
# Work on tab 2...

Recipe 6: Debugging Failed Automation

# Enable headed mode to see what's happening
agent-browser open https://example.com --headed

# Check for JavaScript errors
agent-browser errors

# View console output
agent-browser console

# Highlight element to verify selection
agent-browser highlight @e1

# Take screenshot for debugging
agent-browser screenshot debug.png

# Start trace for detailed analysis
agent-browser trace start
# ... perform actions ...
agent-browser trace stop trace.zip

Semantic Locators (Alternative to Refs)

When refs are unstable or you need more readable selectors:

# Find by role
agent-browser find role button click --name "Submit"

# Find by text content
agent-browser find text "Sign In" click

# Find by label
agent-browser find label "Email" fill "[email protected]"

# Find first matching selector
agent-browser find first ".item" click

# Find nth element
agent-browser find nth 2 "a" text

Network Control

# Mock API response
agent-browser network route "*/api/user" --body '{"name":"Test User"}'

# Block analytics/ads
agent-browser network route "*google-analytics*" --abort
agent-browser network route "*facebook.com*" --abort

# View captured requests
agent-browser network requests --filter api

# Remove routes
agent-browser network unroute

Browser Configuration

# Set viewport for responsive testing
agent-browser set viewport 1920 1080
agent-browser set viewport 375 667  # Mobile

# Emulate device
agent-browser set device "iPhone 14"
agent-browser set device "Pixel 5"

# Set geolocation
agent-browser set geo 40.7128 -74.0060  # New York

# Dark mode testing
agent-browser set media dark

Best Practices

1. Always Snapshot After Navigation

agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser snapshot -i  # ALWAYS do this after navigation

2. Use fill Instead of type for Inputs

# GOOD: Clears existing text first
agent-browser fill @e1 "new text"

# BAD: Appends to existing text
agent-browser type @e1 "new text"

3. Add Explicit Waits for Reliability

# After clicking that triggers navigation
agent-browser click @e1
agent-browser wait --load networkidle

# After AJAX updates
agent-browser click @e1
agent-browser wait --text "Updated"

4. Handle Iframes Explicitly

# Switch to iframe before interacting
agent-browser frame "#iframe-id"
agent-browser snapshot -i
agent-browser click @e1

# Return to main frame
agent-browser frame main

5. Save Session State Early

# Save immediately after successful login
agent-browser state save session.json
# Can reload if something breaks later

Error Recovery

Element Not Found

# Re-snapshot to get updated refs
agent-browser snapshot -i

# Try semantic locator
agent-browser find text "Button Text" click

# Check if element is in iframe
agent-browser frame "#iframe"
agent-browser snapshot -i

Page Not Loading

# Increase timeout
agent-browser open https://slow-site.com --timeout 60000

# Wait explicitly
agent-browser wait --load networkidle
agent-browser wait 5000

Session Lost

# Reload saved state
agent-browser state load session.json
agent-browser reload

Debug Mode

# Visual debugging
agent-browser open https://example.com --headed
agent-browser screenshot debug.png
agent-browser errors
agent-browser console

Parallel Sessions

For working with multiple isolated browsers:

# Session 1
agent-browser --session user1 open https://app.com
agent-browser --session user1 snapshot -i

# Session 2 (completely isolated)
agent-browser --session user2 open https://app.com
agent-browser --session user2 snapshot -i

# List all sessions
agent-browser session list

# Each session has separate cookies, storage, and state

JSON Output for Parsing

Add --json flag to get machine-readable output:

agent-browser snapshot -i --json | jq '.elements[]'
agent-browser get text @e1 --json
agent-browser get url --json

Video Recording

# Start recording from current page
agent-browser record start ./demo.webm

# Perform your automation
agent-browser click @e1
agent-browser fill @e2 "text"

# Stop and save
agent-browser record stop

Troubleshooting Checklist

  1. Command not found: Run agent-browser install --with-deps
  2. Element not found: Run agent-browser snapshot -i to refresh refs
  3. Page timeout: Add --timeout 60000 for slow pages
  4. Can't see what's happening: Add --headed flag
  5. Login not persisting: Use agent-browser state save/load
  6. Refs changed: Always re-snapshot after navigation
  7. Iframe content: Use agent-browser frame to switch context