Desktop Control Skill
The most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
🎯 Features
Mouse Control
✅ Absolute positioning - Move to exact coordinates
✅ Relative movement - Move from current position
✅ Smooth movement - Natural, human-like mouse paths
✅ Click types - Left, right, middle, double, triple clicks
✅ Drag & drop - Drag from point A to point B
✅ Scroll - Vertical and horizontal scrolling
✅ Position tracking - Get current mouse coordinates
Keyboard Control
✅ Text typing - Fast, accurate text input
✅ Hotkeys - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)
✅ Special keys - Enter, Tab, Escape, Arrow keys, F-keys
✅ Key combinations - Multi-key press combinations
✅ Hold & release - Manual key state control
✅ Typing speed - Configurable WPM (instant to human-like)
Screen Operations
✅ Screenshot - Capture entire screen or regions
✅ Image recognition - Find elements on screen (via OpenCV)
✅ Color detection - Get pixel colors at coordinates
✅ Multi-monitor - Support for multiple displays
Window Management
✅ Window list - Get all open windows
✅ Activate window - Bring window to front
✅ Window info - Get position, size, title
✅ Minimize/Maximize - Control window states
Safety Features
✅ Failsafe - Move mouse to corner to abort
✅ Pause control - Emergency stop mechanism
✅ Approval mode - Require confirmation for actions
✅ Bounds checking - Prevent out-of-screen operations
✅ Logging - Track all automation actions
🚀 Quick Start
Installation
First, install required dependencies:
pip install pyautogui pillow opencv-python pygetwindow
Basic Usage
from skills.desktop_control import DesktopController
# Initialize controller
dc = DesktopController(failsafe=True)
# Mouse operations
dc.move_mouse(500, 300) # Move to coordinates
dc.click() # Left click at current position
dc.click(100, 200, button="right") # Right click at position
# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c") # Copy
dc.press("enter")
# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()
📋 Complete API Reference
Mouse Functions
move_mouse(x, y, duration=0, smooth=True)
Move mouse to absolute screen coordinates.
Parameters:
x(int): X coordinate (pixels from left)y(int): Y coordinate (pixels from top)duration(float): Movement time in seconds (0 = instant, 0.5 = smooth)smooth(bool): Use bezier curve for natural movement
Example:
# Instant movement
dc.move_mouse(1000, 500)
# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)
move_relative(x_offset, y_offset, duration=0)
Move mouse relative to current position.
Parameters:
x_offset(int): Pixels to move horizontally (positive = right)y_offset(int): Pixels to move vertically (positive = down)duration(float): Movement time in seconds
Example:
# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)
click(x=None, y=None, button='left', clicks=1, interval=0.1)
Perform mouse click.
Parameters:
x, y(int, optional): Coordinates to click (None = current position)button(str): 'left', 'right', 'middle'clicks(int): Number of clicks (1 = single, 2 = double)interval(float): Delay between multiple clicks
Example:
# Simple left click
dc.click()
# Double-click at specific position
dc.click(500, 300, clicks=2)
# Right-click
dc.click(button='right')
drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')
Drag and drop operation.
Parameters:
start_x, start_y(int): Starting coordinatesend_x, end_y(int): Ending coordinatesduration(float): Drag durationbutton(str): Mouse button to use
Example:
# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)
scroll(clicks, direction='vertical', x=None, y=None)
Scroll mouse wheel.
Parameters:
clicks(int): Scroll amount (positive = up/left, negative = down/right)direction(str): 'vertical' or 'horizontal'x, y(int, optional): Position to scroll at
Example:
# Scroll down 5 clicks
dc.scroll(-5)
# Scroll up 10 clicks
dc.scroll(10)
# Horizontal scroll
dc.scroll(5, direction='horizontal')
get_mouse_position()
Get current mouse coordinates.
Returns: (x, y) tuple
Example:
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")
Keyboard Functions
type_text(text, interval=0, wpm=None)
Type text with configurable speed.
Parameters:
text(str): Text to typeinterval(float): Delay between keystrokes (0 = instant)wpm(int, optional): Words per minute (overrides interval)
Example:
# Instant typing
dc.type_text("Hello World")
# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)
# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)
press(key, presses=1, interval=0.1)
Press and release a key.
Parameters:
key(str): Key name (see Key Names section)presses(int): Number of times to pressinterval(float): Delay between presses
Example:
# Press Enter
dc.press('enter')
# Press Space 3 times
dc.press('space', presses=3)
# Press Down arrow
dc.press('down')
hotkey(*keys, interval=0.05)
Execute keyboard shortcut.
Parameters:
*keys(str): Keys to press togetherinterval(float): Delay between key presses
Example:
# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')
# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')
# Open Run dialog (Win+R)
dc.hotkey('win', 'r')
# Save (Ctrl+S)
dc.hotkey('ctrl', 's')
# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')
key_down(key) / key_up(key)
Manually control key state.
Example:
# Hold Shift
dc.key_down('shift')
dc.type_text("hello") # Types "HELLO"
dc.key_up('shift')
# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')
Screen Functions
screenshot(region=None, filename=None)
Capture screen or region.
Parameters:
region(tuple, optional): (left, top, width, height) for partial capturefilename(str, optional): Path to save image
Returns: PIL Image object
Example:
# Full screen
img = dc.screenshot()
# Save to file
dc.screenshot(filename="screenshot.png")
# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))
get_pixel_color(x, y)
Get color of pixel at coordinates.
Returns: RGB tuple (r, g, b)
Example:
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
find_on_screen(image_path, confidence=0.8)
Find image on screen (requires OpenCV).
Parameters:
image_path(str): Path to template imageconfidence(float): Match threshold (0-1)
Returns: (x, y, width, height) or None
Example:
# Find button on screen
location = dc.find_on_screen("button.png")
if location:
x, y, w, h = location
# Click center of found image
dc.click(x + w//2, y + h//2)
get_screen_size()
Get screen resolution.
Returns: (width, height) tuple
Example:
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")
Window Functions
get_all_windows()
List all open windows.
Returns: List of window titles
Example:
windows = dc.get_all_windows()
for title in windows:
print(f"Window: {title}")
activate_window(title_substring)
Bring window to front by title.
Parameters:
title_substring(str): Part of window title to match
Example:
# Activate Chrome
dc.activate_window("Chrome")
# Activate VS Code
dc.activate_window("Visual Studio Code")
get_active_window()
Get currently focused window.
Returns: Window title (str)
Example:
active = dc.get_active_window()
print(f"Active window: {active}")
Clipboard Functions
copy_to_clipboard(text)
Copy text to clipboard.
Example:
dc.copy_to_clipboard("Hello from OpenClaw!")
get_from_clipboard()
Get text from clipboard.
Returns: str
Example:
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")
⌨️ Key Names Reference
Alphabet Keys
'a' through 'z'
Number Keys
'0' through '9'
Function Keys
'f1' through 'f24'
Special Keys
'enter'/'return''esc'/'escape''space'/'spacebar''tab''backspace''delete'/'del''insert''home''end''pageup'/'pgup''pagedown'/'pgdn'
Arrow Keys
'up'/'down'/'left'/'right'
Modifier Keys
'ctrl'/'control''shift''alt''win'/'winleft'/'winright''cmd'/'command'(Mac)
Lock Keys
'capslock''numlock''scrolllock'
Punctuation
'.'/','/'?'/'!'/';'/':''['/']'/'{'/'}''('/')''+'/'-'/'*'/'/'/'='
🛡️ Safety Features
Failsafe Mode
Move mouse to any corner of the screen to abort all automation.
# Enable failsafe (enabled by default)
dc = DesktopController(failsafe=True)
Pause Control
# Pause all automation for 2 seconds
dc.pause(2.0)
# Check if automation is safe to proceed
if dc.is_safe():
dc.click(500, 500)
Approval Mode
Require user confirmation before actions:
dc = DesktopController(require_approval=True)
# This will ask for confirmation
dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
🎨 Advanced Examples
Example 1: Automated Form Filling
dc = DesktopController()
# Click name field
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)
# Tab to next field
dc.press('tab')
dc.type_text("[email protected]", wpm=80)
# Tab to password
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)
# Submit form
dc.press('enter')
Example 2: Screenshot Region and Save
# Capture specific area
region = (100, 100, 800, 600) # left, top, width, height
img = dc.screenshot(region=region)
# Save with timestamp
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")
Example 3: Multi-File Selection
# Hold Ctrl and click multiple files
dc.key_down('ctrl')
dc.click(100, 200) # First file
dc.click(100, 250) # Second file
dc.click(100, 300) # Third file
dc.key_up('ctrl')
# Copy selected files
dc.hotkey('ctrl', 'c')
Example 4: Window Automation
# Activate Calculator
dc.activate_window("Calculator")
time.sleep(0.5)
# Type calculation
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)
# Take screenshot of result
dc.screenshot(filename="calculation_result.png")
Example 5: Drag & Drop File
# Drag file from source to destination
dc.drag(
start_x=200, start_y=300, # File location
end_x=800, end_y=500, # Folder location
duration=1.0 # Smooth 1-second drag
)
⚡ Performance Tips
Use instant movements for speed:
duration=0Batch operations instead of individual calls
Cache screen positions instead of recalculating
Disable failsafe for maximum performance (use with caution)
Use hotkeys instead of menu navigation
⚠️ Important Notes
Screen coordinates start at (0, 0) in top-left corner
Multi-monitor setups may have negative coordinates for secondary displays
Windows DPI scaling may affect coordinate accuracy
Failsafe corners are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1)
Some applications may block simulated input (games, secure apps)
🔧 Troubleshooting
Mouse not moving to correct position
Check DPI scaling settings
Verify screen resolution matches expectations
Use
get_screen_size()to confirm dimensions
Keyboard input not working
Ensure target application has focus
Some apps require admin privileges
Try increasing
intervalfor reliability
Failsafe triggering accidentally
Increase screen border tolerance
Move mouse away from corners during normal use
Disable if needed:
DesktopController(failsafe=False)
Permission errors
Run Python with administrator privileges for some operations
Some secure applications block automation
📦 Dependencies
PyAutoGUI - Core automation engine
Pillow - Image processing
OpenCV (optional) - Image recognition
PyGetWindow - Window management
Install all:
pip install pyautogui pillow opencv-python pygetwindow
Built for OpenClaw - The ultimate desktop automation companion 🦞