Cron Retry Skill
Automatically detect and retry cron jobs that failed due to network/connection errors.
Quick Start (Heartbeat Integration)
Add this to your HEARTBEAT.md:
## Cron Recovery Check
Check for cron jobs with lastStatus: "error". If the error matches network patterns (connection error, sendMessage failed, fetch failed, ETIMEDOUT, ECONNREFUSED), retry the job using cron tool with action: "run" and the job ID. Report what was recovered.
That's it. On each heartbeat, failed network jobs get retried automatically.
How It Works
On heartbeat, check all cron jobs via
cron listFilter for jobs where
lastStatus = "error"andenabled = trueCheck if
lastErrormatches network-related patternsRe-run eligible jobs via
cron runReport results
Network Error Patterns (Retryable)
These errors indicate transient network issues worth retrying:
Network request.*failedConnection errorECONNREFUSEDETIMEDOUTENOTFOUNDsendMessage.*failedfetch failedsocket hang up
What Gets Retried vs Skipped
Retried:
Network timeouts
Connection refused
Message send failures
DNS lookup failures
Skipped (not retried):
Logic errors (bad config, missing data)
Auth failures
Disabled jobs
Jobs that just ran successfully
Manual Recovery Check
To check and retry failed jobs manually:
# List all jobs and their status
clawdbot cron list
# Find failed jobs
clawdbot cron list | jq '.jobs[] | select(.state.lastStatus == "error") | {name, error: .state.lastError}'
# Retry a specific job
clawdbot cron run --id <JOB_ID>
Agent Implementation
When implementing the heartbeat check:
1. Call cron tool with action: "list"
2. For each job in response.jobs:
- Skip if job.enabled !== true
- Skip if job.state.lastStatus !== "error"
- Check if job.state.lastError matches network patterns
- If retryable: call cron tool with action: "run", jobId: job.id
3. Report: "Recovered X jobs" or "No failed jobs to recover"
Example Scenario
7:00 PM — Evening briefing cron fires
Network hiccup — Telegram send fails
Job marked
lastStatus: "error",lastError: "Network request for 'sendMessage' failed!"7:15 PM — Connection restored, heartbeat runs
Skill detects the failed job, sees it's a network error
Retries the job → briefing delivered
Reports: "Recovered 1 job: evening-wrap-briefing"
Safety
Only retries transient network errors
Respects job enabled state
Won't create retry loops (checks lastRunAtMs)
Reports all recovery attempts