Incident Lifecycle
Every incident follows a clear path: Triggered → Acknowledged → Resolved.
States
Triggered
The incident is new. Notifications are being sent according to the escalation policy. The clock is ticking.
Acknowledged
Someone has taken ownership. Escalation stops. The team knows who's handling it.
Resolved
The issue is fixed. The incident is closed. Metrics are recorded.
Triggered ──→ Acknowledged ──→ Resolved
│ │
│ ├──→ Snoozed (auto-retriggers)
│ │
└──→ Escalated (no ack within timeout)
Triggering Incidents
Incidents are triggered by:
- Inbound events — from monitoring tool webhooks
- Manual creation — from the Dashboard or API
- AI grouping — multiple alerts grouped into one incident
Each incident gets:
- Unique incident number (
#INC-1042) - Severity level (critical, error, warning, info)
- Assigned service and escalation policy
- Full timeline from trigger to resolution
Acknowledging
Acknowledge from any surface:
- Dashboard — click Acknowledge
- Mobile app — swipe to ack
- Slack/Teams — click the Ack button on the notification
- Phone call — press 1
- API —
PUT /v1/incidents/{id}/acknowledge
Acknowledging stops escalation and tells the team: "I'm on it."
Auto-acknowledge: Configure services to auto-ack when someone joins the war room or posts in the incident Slack channel.
Resolving
Resolve when the issue is fixed:
- Manual — from Dashboard, mobile, Slack, or API
- Auto-resolve — when your monitoring tool sends a recovery event
- Timeout — auto-resolve after N hours if no further events (configurable per service)
# Resolve via API
curl -X PUT https://api.notifyhero.com/v1/incidents/INC-1042/resolve \
-H "Authorization: Bearer nh_live_abc123" \
-H "Content-Type: application/json" \
-d '{"resolution_note": "Scaled up API pods to handle traffic spike"}'
Priority Levels
Assign priority to control response urgency:
| Priority | Label | Expected Response | |----------|-------|-------------------| | P1 | Critical | Immediate — all hands on deck | | P2 | High | Within 15 minutes | | P3 | Medium | Within 1 hour | | P4 | Low | Next business day | | P5 | Informational | No response required |
Priority can be set manually or auto-assigned based on severity and service tier.
Merging Incidents
When multiple incidents are related (e.g., same root cause), merge them:
- Select incidents from the Dashboard
- Click Merge
- Choose the primary incident
- All alerts, timeline entries, and responders consolidate
Merged incidents share a single timeline and resolution.
Snoozing
Temporarily dismiss an incident that can't be fixed right now:
- Snooze for 30 min / 1 hour / 4 hours / custom
- The incident moves to "Snoozed" state
- After the snooze expires, it re-triggers and restarts escalation
Use snooze for known issues with a scheduled fix window. Don't use it to avoid alerts — that's what suppression rules are for.
Reassignment
Transfer ownership of an incident:
- Open the incident
- Click Reassign
- Select a person or team
- They receive an immediate notification
The original responder is released. Full reassignment history is logged in the timeline.
Timeline
Every incident has a complete, immutable timeline:
09:00:00 Triggered by Datadog webhook
09:00:01 Notification sent to Alice (push)
09:00:02 Notification sent to Alice (Slack)
09:01:01 Notification sent to Alice (phone)
09:01:45 Acknowledged by Alice
09:02:00 War room created: #inc-1042
09:15:00 Note added: "Identified — bad deploy at 08:55"
09:22:00 Resolved by Alice
09:22:01 Resolution note: "Rolled back deploy v2.4.1"
Every action, every notification, every escalation — logged with timestamps.