Skill Auditor v3.1.0
The definitive security scanner for OpenClaw/ClawHub skills. Best-in-class detection across 18 security checks including prompt injection detection — the first scanner to catch agent manipulation attacks in skill documentation. 5-dimension trust scoring, trend tracking, diff analysis, and benchmarking. Zero false positives on legitimate skills.
When to Activate
- Installing a new skill from ClawHub - run
inspect.shfor full pre-install validation - Auditing existing skills - use
audit.shto scan any skill directory - Generating trust scores - use
trust_score.pyfor 0-100 rating across 5 dimensions - Comparing skills - use
trust_score.py --comparefor side-by-side analysis - Tracking improvements - use
trust_score.py --save-trendto monitor score over time - Reviewing updates - use
diff-audit.shto compare before/after versions - Batch scanning - use
audit-all.shorbenchmark.shfor fleet-wide analysis
Quick Start
# Audit a single skill
bash audit.sh /path/to/skill
# Trust score (0-100 across 5 dimensions)
python3 trust_score.py /path/to/skill
# Compare two skills side by side
python3 trust_score.py /path/to/skill1 --compare /path/to/skill2
# Track score over time
python3 trust_score.py /path/to/skill --save-trend
python3 trust_score.py /path/to/skill --trend
# Diff audit (before/after update)
bash diff-audit.sh /path/to/old-version /path/to/new-version
# Benchmark against a corpus
bash benchmark.sh /path/to/skills-dir
# Inspect a ClawHub skill before installing
bash inspect.sh skill-slug
# Audit all installed skills
bash audit-all.sh
# Generate a markdown report
bash report.sh
# Run test suite (28 assertions)
bash test.sh
Guardrails / Anti-Patterns
DO:
- ✓ Always audit skills before installing from untrusted sources
- ✓ Review trust scores - reject skills scoring below 60 (D grade)
- ✓ Use diff-audit.sh when updating skills to catch regressions
- ✓ Use --json output for CI/CD pipeline integration
- ✓ Run --save-trend periodically to track skill health
DON'T: - ✗ Install skills scoring below 40 (F grade) without extensive manual review - ✗ Ignore CRITICAL findings - they indicate potential security threats - ✗ Blindly add skills to allowlist without understanding why they access credentials - ✗ Skip audit because a skill is "popular" or "official"
Security Checks (18 total)
| # | Check | Severity | Description |
|---|---|---|---|
| 1 | credential-harvest | CRITICAL | Scripts reading API keys/tokens AND making network calls |
| 2 | exfiltration-url | CRITICAL | webhook.site, requestbin, ngrok URLs in scripts |
| 3 | obfuscated-payload | CRITICAL | Base64-encoded URLs or shell commands |
| 4 | sensitive-fs | CRITICAL | /etc/passwd, ~/.ssh, ~/.aws/credentials access |
| 5 | crypto-wallet | CRITICAL | Hardcoded ETH/BTC wallet addresses (drain attacks) |
| 6 | dependency-confusion | CRITICAL | Internal/private-scoped packages in public deps |
| 7 | typosquatting | CRITICAL | Misspelled package names (lodahs, requets, etc.) |
| 8 | symlink-attack | CRITICAL | Symlinks targeting sensitive system paths |
| 9 | code-execution | WARNING | eval(), exec(), subprocess patterns |
| 10 | time-bomb | WARNING | Date/time comparisons that could trigger delayed payloads |
| 11 | telemetry-detected | WARNING | Analytics SDKs, tracking pixels, phone-home behavior |
| 12 | excessive-permissions | WARNING | >15 bins/env/config items requested |
| 13 | unusual-ports | WARNING | Network calls to non-standard ports |
| 14 | prompt-injection | CRITICAL | Agent manipulation in docs: "ignore instructions", role hijacking, hidden HTML directives |
| 15 | download-execute | CRITICAL | curl\ |
| 16 | hidden-file | WARNING | Suspicious dotfiles that may hide malicious content |
| 17 | env-exfiltration | CRITICAL | Reading sensitive env vars + outbound network calls |
| 18 | privilege-escalation | CRITICAL | sudo, chmod 777/setuid, writes to system paths |
Context-aware: credential mentions in documentation are INFO, not CRITICAL.
Trust Score (5 Dimensions)
| Dimension | Max | What's Measured |
|---|---|---|
| Security | 35 | Audit findings (criticals = -18, warnings = -4) |
| Quality | 22 | Description, version, usage docs, examples, metadata, changelog |
| Structure | 18 | File organization, tests, README, reasonable scope |
| Transparency | 15 | License, no minified code, code comments |
| Behavioral | 10 | Rate limiting, error handling, input validation |
Grades: A (90+), B (75+), C (60+), D (40+), F (<40)
Comparative Scoring
python3 trust_score.py /path/to/skill-a --compare /path/to/skill-b
Shows per-dimension deltas and overall score difference.
Trend Tracking
python3 trust_score.py /path/to/skill --save-trend # Record score
python3 trust_score.py /path/to/skill --trend # View history
Stores up to 50 entries per skill in trust_trends.json.
Tools
| File | Purpose |
|---|---|
| audit.sh | Single skill security audit (18 checks) |
| audit-all.sh | Batch scan all installed skills |
| trust_score.py | Trust score calculator (5-dimension, 0-100) |
| diff-audit.sh | Compare skill versions for security regressions |
| benchmark.sh | Corpus-wide audit with aggregate statistics |
| inspect.sh | ClawHub pre-install workflow |
| report.sh | Markdown report generator |
| test.sh | Automated test suite (28 assertions, 12 test skills) |
| allowlist.json | Known-good credential skills |
Test Suite
12 test skills (8 malicious, 4 clean) with 28 automated assertions:
bash test.sh
Malicious fixtures: credential harvest, obfuscated payload, sensitive fs reads, crypto wallets, time bombs, symlink attacks, prompt injection, download-execute, privilege escalation. Clean fixtures: basic skill, credential docs (false positive check), network skill, dotfiles skill.
Exit Codes
- 0: PASS / safe to install
- 1: REVIEW / warnings found
- 2: FAIL / critical issues
- 3: Error / bad input
Changelog
See CHANGELOG.md for full version history.