Skill Auditor v3.1.0

The definitive security scanner for OpenClaw/ClawHub skills. Best-in-class detection across 18 security checks including prompt injection detection — the first scanner to catch agent manipulation attacks in skill documentation. 5-dimension trust scoring, trend tracking, diff analysis, and benchmarking. Zero false positives on legitimate skills.

When to Activate

Installing a new skill from ClawHub - run inspect.sh for full pre-install validation
Auditing existing skills - use audit.sh to scan any skill directory
Generating trust scores - use trust_score.py for 0-100 rating across 5 dimensions
Comparing skills - use trust_score.py --compare for side-by-side analysis
Tracking improvements - use trust_score.py --save-trend to monitor score over time
Reviewing updates - use diff-audit.sh to compare before/after versions
Batch scanning - use audit-all.sh or benchmark.sh for fleet-wide analysis

Quick Start


# Audit a single skill
bash audit.sh /path/to/skill

# Trust score (0-100 across 5 dimensions)
python3 trust_score.py /path/to/skill

# Compare two skills side by side
python3 trust_score.py /path/to/skill1 --compare /path/to/skill2

# Track score over time
python3 trust_score.py /path/to/skill --save-trend
python3 trust_score.py /path/to/skill --trend

# Diff audit (before/after update)
bash diff-audit.sh /path/to/old-version /path/to/new-version

# Benchmark against a corpus
bash benchmark.sh /path/to/skills-dir

# Inspect a ClawHub skill before installing
bash inspect.sh skill-slug

# Audit all installed skills
bash audit-all.sh

# Generate a markdown report
bash report.sh

# Run test suite (28 assertions)
bash test.sh

Guardrails / Anti-Patterns

DO:

✓ Always audit skills before installing from untrusted sources
✓ Review trust scores - reject skills scoring below 60 (D grade)
✓ Use diff-audit.sh when updating skills to catch regressions
✓ Use --json output for CI/CD pipeline integration
✓ Run --save-trend periodically to track skill health

DON'T:

✗ Install skills scoring below 40 (F grade) without extensive manual review
✗ Ignore CRITICAL findings - they indicate potential security threats
✗ Blindly add skills to allowlist without understanding why they access credentials
✗ Skip audit because a skill is "popular" or "official"

Security Checks (18 total)

#	Check	Severity	Description
1	credential-harvest	CRITICAL	Scripts reading API keys/tokens AND making network calls
2	exfiltration-url	CRITICAL	webhook.site, requestbin, ngrok URLs in scripts
3	obfuscated-payload	CRITICAL	Base64-encoded URLs or shell commands
4	sensitive-fs	CRITICAL	/etc/passwd, ~/.ssh, ~/.aws/credentials access
5	crypto-wallet	CRITICAL	Hardcoded ETH/BTC wallet addresses (drain attacks)
6	dependency-confusion	CRITICAL	Internal/private-scoped packages in public deps
7	typosquatting	CRITICAL	Misspelled package names (lodahs, requets, etc.)
8	symlink-attack	CRITICAL	Symlinks targeting sensitive system paths
9	code-execution	WARNING	eval(), exec(), subprocess patterns
10	time-bomb	WARNING	Date/time comparisons that could trigger delayed payloads
11	telemetry-detected	WARNING	Analytics SDKs, tracking pixels, phone-home behavior
12	excessive-permissions	WARNING	>15 bins/env/config items requested
13	unusual-ports	WARNING	Network calls to non-standard ports
14	prompt-injection	CRITICAL	Agent manipulation in docs: "ignore instructions", role hijacking, hidden HTML directives
15	download-execute	CRITICAL	curl\
16	hidden-file	WARNING	Suspicious dotfiles that may hide malicious content
17	env-exfiltration	CRITICAL	Reading sensitive env vars + outbound network calls
18	privilege-escalation	CRITICAL	sudo, chmod 777/setuid, writes to system paths

Context-aware: credential mentions in documentation are INFO, not CRITICAL.

Trust Score (5 Dimensions)

Dimension	Max	What's Measured
Security	35	Audit findings (criticals = -18, warnings = -4)
Quality	22	Description, version, usage docs, examples, metadata, changelog
Structure	18	File organization, tests, README, reasonable scope
Transparency	15	License, no minified code, code comments
Behavioral	10	Rate limiting, error handling, input validation

Grades: A (90+), B (75+), C (60+), D (40+), F (<40)

Comparative Scoring

python3 trust_score.py /path/to/skill-a --compare /path/to/skill-b

Shows per-dimension deltas and overall score difference.

Trend Tracking

python3 trust_score.py /path/to/skill --save-trend   # Record score
python3 trust_score.py /path/to/skill --trend         # View history

Stores up to 50 entries per skill in trust_trends.json.

Tools

File	Purpose
audit.sh	Single skill security audit (18 checks)
audit-all.sh	Batch scan all installed skills
trust_score.py	Trust score calculator (5-dimension, 0-100)
diff-audit.sh	Compare skill versions for security regressions
benchmark.sh	Corpus-wide audit with aggregate statistics
inspect.sh	ClawHub pre-install workflow
report.sh	Markdown report generator
test.sh	Automated test suite (28 assertions, 12 test skills)
allowlist.json	Known-good credential skills

Test Suite

12 test skills (8 malicious, 4 clean) with 28 automated assertions:

bash test.sh

Malicious fixtures: credential harvest, obfuscated payload, sensitive fs reads, crypto wallets, time bombs, symlink attacks, prompt injection, download-execute, privilege escalation. Clean fixtures: basic skill, credential docs (false positive check), network skill, dotfiles skill.

Exit Codes

0: PASS / safe to install
1: REVIEW / warnings found
2: FAIL / critical issues
3: Error / bad input

Changelog

See CHANGELOG.md for full version history.

Yoder Skill Auditor