Core Concepts
Understand the building blocks of NotifyHero's incident management platform.
Events, Alerts, and Incidents
These three levels form the core data model:
Events
Raw signals from your monitoring tools. A single Datadog monitor firing sends one event to NotifyHero via webhook.
{
"event_type": "trigger",
"severity": "critical",
"title": "Disk usage > 90%",
"source": "db-primary-01",
"dedup_key": "disk-db-primary-01"
}
Alerts
Events are deduplicated and enriched into alerts. Multiple events with the same dedup_key collapse into a single alert. NotifyHero adds context: runbooks, past incidents, service ownership.
Incidents
Related alerts are grouped into incidents — the unit your team actually responds to. NotifyHero's AI groups alerts automatically (e.g., 47 server alerts from one bad deploy become 1 incident).
The flow: Event → Alert → Incident → Notification → Response → Resolution
Severity Levels
Every event has a severity that determines routing and urgency:
| Severity | Behavior | Example | |----------|----------|---------| | Critical | Immediate notification via high-urgency channels (phone, SMS) | Database down | | Error | High-urgency notification | API error rate > 5% | | Warning | Low-urgency notification (email, Slack) | CPU at 80% | | Info | Logged, no notification by default | Deployment completed |
You can customize severity-to-urgency mapping per service.
Services
A service represents a system, application, or component your team owns. Every alert routes through a service.
Examples:
checkout-service— owned by the Payments teamapi-gateway— owned by the Platform teamcdn-edge— owned by the Infrastructure team
Each service has:
- An escalation policy (who gets notified and when)
- Integrations (which monitoring tools send events)
- Runbooks (how to respond)
Escalation Policies
Escalation policies define who gets notified, when, and what happens if they don't respond.
Level 1: On-call engineer (immediate)
Level 2: Team lead (after 5 minutes)
Level 3: Engineering manager (after 15 minutes)
Level 4: VP Engineering (after 30 minutes)
If Level 1 acknowledges the incident, escalation stops. If not, it moves up. See Escalation Policies for full details.
Teams
Teams group people and services together. A team owns services and has its own schedules and escalation policies.
Platform Team
├── Members: Alice, Bob, Carol
├── Services: api-gateway, auth-service, rate-limiter
├── Schedule: Platform On-Call (weekly rotation)
└── Escalation: Platform Escalation Policy
Teams can be nested. A "Backend" team can contain "Platform" and "Payments" sub-teams.
On-Call Schedules
Schedules define who is on-call and when. They support:
- Rotations — weekly, daily, or custom intervals
- Layers — primary, secondary, and override layers
- Handoff times — when shifts change
- Restrictions — weekday-only or weekend-only schedules
The person on-call at the time an incident triggers is the first to be notified. See Schedules.
How It All Fits Together
Monitoring Tool → Event → Service → Alert → Incident
↓
Escalation Policy
↓
On-Call Schedule
↓
Notification (phone/SMS/Slack/etc.)
↓
Acknowledge → Investigate → Resolve
↓
Postmortem → Action Items
Every step is logged. Every action has a timestamp. Full audit trail from trigger to resolution.