The problem with cron jobs
When a cron job silently fails, you might not notice until it's too late. A script error, a crash, or a network issue can cause a job to stop working — and there's no built-in alert system.
What we’ll configure
- Monitoring for cron job success
- Retry logic on failure
- Alerting if all retries fail
1. Wrap your cron job in a shell script
This wrapper will retry the job up to 3 times if it fails:
#!/bin/bash
MAX_RETRIES=3
RETRY_DELAY=10
COUNT=0
SUCCESS=0
while [ $COUNT -lt $MAX_RETRIES ]; do
echo "Attempt $((COUNT+1))..."
/usr/bin/python3 /scripts/my_task.py && SUCCESS=1 && break
COUNT=$((COUNT+1))
sleep $RETRY_DELAY
done
if [ $SUCCESS -eq 1 ]; then
curl -fsS https://ev.okchecker.com/p/<api-key>/backup-db > /dev/null
else
curl -fsS https://ev.okchecker.com/p/<api-key>/backup-db?status=fail > /dev/null
fi
2. Add it to your crontab
Open crontab -e
and add:
*/15 * * * * /scripts/task_wrapper.sh
This runs every 15 minutes. On success, it sends a ping. On failure, it sends a different signal that can trigger an alert.
3. Configure alerts
In your Monitoring SaaS dashboard, set this as a "cron monitor" and define the expected interval. If no ping arrives or a failure ping is received — you'll get notified.
4. Security tips
- Store secrets in environment variables, not hardcoded
- Log failures to troubleshoot issues
5. Alternative: systemd with restart
If you use systemd
instead of cron, you can use Restart=on-failure
in your service file. Add a curl
ping to ExecStartPost
to integrate with monitoring.
Conclusion
Adding monitoring and retry logic to your cron jobs helps prevent data loss and ensures task reliability. Better safe than sorry!
Start Monitoring