NotifyHero Docs

Core Concepts

Understand the building blocks of NotifyHero's incident management platform.

Core Concepts

Understand the building blocks of NotifyHero's incident management platform.


Events, Alerts, and Incidents

These three levels form the core data model:

Events

Raw signals from your monitoring tools. A single Datadog monitor firing sends one event to NotifyHero via webhook.

{
  "event_type": "trigger",
  "severity": "critical",
  "title": "Disk usage > 90%",
  "source": "db-primary-01",
  "dedup_key": "disk-db-primary-01"
}

Alerts

Events are deduplicated and enriched into alerts. Multiple events with the same dedup_key collapse into a single alert. NotifyHero adds context: runbooks, past incidents, service ownership.

Incidents

Related alerts are grouped into incidents — the unit your team actually responds to. NotifyHero's AI groups alerts automatically (e.g., 47 server alerts from one bad deploy become 1 incident).

The flow: Event → Alert → Incident → Notification → Response → Resolution


Severity Levels

Every event has a severity that determines routing and urgency:

SeverityBehaviorExample
CriticalImmediate notification via high-urgency channels (phone, SMS)Database down
ErrorHigh-urgency notificationAPI error rate > 5%
WarningLow-urgency notification (email, Slack)CPU at 80%
InfoLogged, no notification by defaultDeployment completed

You can customize severity-to-urgency mapping per service.


Services

A service represents a system, application, or component your team owns. Every alert routes through a service.

Examples:

  • checkout-service — owned by the Payments team
  • api-gateway — owned by the Platform team
  • cdn-edge — owned by the Infrastructure team

Each service has:

  • An escalation policy (who gets notified and when)
  • Integrations (which monitoring tools send events)
  • Runbooks (how to respond)

Escalation Policies

Escalation policies define who gets notified, when, and what happens if they don't respond.

Level 1: On-call engineer (immediate)
Level 2: Team lead (after 5 minutes)
Level 3: Engineering manager (after 15 minutes)
Level 4: VP Engineering (after 30 minutes)

If Level 1 acknowledges the incident, escalation stops. If not, it moves up. See Escalation Policies for full details.


Teams

Teams group people and services together. A team owns services and has its own schedules and escalation policies.

Platform Team
├── Members: Alice, Bob, Carol
├── Services: api-gateway, auth-service, rate-limiter
├── Schedule: Platform On-Call (weekly rotation)
└── Escalation: Platform Escalation Policy

Teams can be nested. A "Backend" team can contain "Platform" and "Payments" sub-teams.


On-Call Schedules

Schedules define who is on-call and when. They support:

  • Rotations — weekly, daily, or custom intervals
  • Layers — primary, secondary, and override layers
  • Handoff times — when shifts change
  • Restrictions — weekday-only or weekend-only schedules

The person on-call at the time an incident triggers is the first to be notified. See Schedules.


How It All Fits Together

Monitoring Tool → Event → Service → Alert → Incident

                              Escalation Policy

                              On-Call Schedule

                              Notification (phone/SMS/Slack/etc.)

                              Acknowledge → Investigate → Resolve

                              Postmortem → Action Items

Every step is logged. Every action has a timestamp. Full audit trail from trigger to resolution.

On this page