Core Concepts
Understand the building blocks of NotifyHero's incident management platform.
Core Concepts
Understand the building blocks of NotifyHero's incident management platform.
Events, Alerts, and Incidents
These three levels form the core data model:
Events
Raw signals from your monitoring tools. A single Datadog monitor firing sends one event to NotifyHero via webhook.
{
"event_type": "trigger",
"severity": "critical",
"title": "Disk usage > 90%",
"source": "db-primary-01",
"dedup_key": "disk-db-primary-01"
}
Alerts
Events are deduplicated and enriched into alerts. Multiple events with the same dedup_key collapse into a single alert. NotifyHero adds context: runbooks, past incidents, service ownership.
Incidents
Related alerts are grouped into incidents — the unit your team actually responds to. NotifyHero's AI groups alerts automatically (e.g., 47 server alerts from one bad deploy become 1 incident).
The flow: Event → Alert → Incident → Notification → Response → Resolution
Severity Levels
Every event has a severity that determines routing and urgency:
| Severity | Behavior | Example |
|---|---|---|
| Critical | Immediate notification via high-urgency channels (phone, SMS) | Database down |
| Error | High-urgency notification | API error rate > 5% |
| Warning | Low-urgency notification (email, Slack) | CPU at 80% |
| Info | Logged, no notification by default | Deployment completed |
You can customize severity-to-urgency mapping per service.
Services
A service represents a system, application, or component your team owns. Every alert routes through a service.
Examples:
checkout-service— owned by the Payments teamapi-gateway— owned by the Platform teamcdn-edge— owned by the Infrastructure team
Each service has:
- An escalation policy (who gets notified and when)
- Integrations (which monitoring tools send events)
- Runbooks (how to respond)
Escalation Policies
Escalation policies define who gets notified, when, and what happens if they don't respond.
Level 1: On-call engineer (immediate)
Level 2: Team lead (after 5 minutes)
Level 3: Engineering manager (after 15 minutes)
Level 4: VP Engineering (after 30 minutes)
If Level 1 acknowledges the incident, escalation stops. If not, it moves up. See Escalation Policies for full details.
Teams
Teams group people and services together. A team owns services and has its own schedules and escalation policies.
Platform Team
├── Members: Alice, Bob, Carol
├── Services: api-gateway, auth-service, rate-limiter
├── Schedule: Platform On-Call (weekly rotation)
└── Escalation: Platform Escalation Policy
Teams can be nested. A "Backend" team can contain "Platform" and "Payments" sub-teams.
On-Call Schedules
Schedules define who is on-call and when. They support:
- Rotations — weekly, daily, or custom intervals
- Layers — primary, secondary, and override layers
- Handoff times — when shifts change
- Restrictions — weekday-only or weekend-only schedules
The person on-call at the time an incident triggers is the first to be notified. See Schedules.
How It All Fits Together
Monitoring Tool → Event → Service → Alert → Incident
↓
Escalation Policy
↓
On-Call Schedule
↓
Notification (phone/SMS/Slack/etc.)
↓
Acknowledge → Investigate → Resolve
↓
Postmortem → Action Items
Every step is logged. Every action has a timestamp. Full audit trail from trigger to resolution.