Incident Response

Coordinate your team during active incidents with war rooms, roles, and structured workflows.


War Rooms

For P1 and P2 incidents, NotifyHero auto-creates a dedicated war room in Slack or Teams:

  • Channel name: #inc-1042-api-outage
  • Auto-invited: All responders, escalation targets, and stakeholders
  • Pinned context: Incident details, runbook links, recent deploys, similar past incidents
  • Auto-archived: 7 days after resolution (configurable)

Manual War Rooms

Create a war room for any incident from the Dashboard → Create War Room.

Tip: Configure auto-war-room triggers per service. Critical services get war rooms automatically; lower-priority services don't.


Incident Roles

Assign roles to keep response organized:

| Role | Responsibility | |------|---------------| | Incident Commander | Owns the response. Makes decisions. Coordinates the team. | | Communications Lead | Updates stakeholders, status page, and executives. | | Operations Lead | Executes technical investigation and remediation. | | Scribe | Documents actions, decisions, and timeline in real-time. |

Assign roles from the incident page or war room. NotifyHero tracks who held each role in the timeline.

Auto-assign: Set default role assignments per escalation policy. The first responder becomes Incident Commander; the second becomes Operations Lead.


Task Checklists

Attach a checklist to any incident to track response steps:

From Templates

Pre-define checklists for common incident types:

Template: "Database Incident"
☐ Check replication lag
☐ Verify connection pool status
☐ Check slow query log
☐ Confirm backup status
☐ Notify DBA on-call
☐ Update status page

Ad-Hoc Tasks

Add tasks during an active incident:

☐ Roll back deploy v2.4.1
☐ Scale API pods to 10
☐ Clear Redis cache
☑ Confirmed: deploy was the root cause

Tasks can be assigned to specific responders and checked off from the war room or Dashboard.


Communication Templates

Pre-written templates for stakeholder updates:

Internal Update

**Incident #INC-1042 — API Latency**
Status: Investigating
Impact: API response times elevated (p99 > 2s)
Next update in: 15 minutes
Commander: Alice

Customer-Facing (Status Page)

We are currently investigating elevated API response times.
Some requests may experience slower than usual responses.
We will provide an update within 15 minutes.

Executive Summary

Active P1: API latency affecting checkout flow.
Customer impact: ~5% of transactions timing out.
ETA to resolution: 30 minutes. Team is rolling back a deploy.

Templates are customizable per service and incident type. Use them from the war room or status page editor.


Runbooks

Link runbooks to services so responders know exactly what to do:

  1. Go to Service → Runbooks → Add Runbook
  2. Add a title, description, and steps (Markdown supported)
  3. Link to external docs if needed (Notion, Confluence, GitHub wiki)

When an incident triggers for that service, the runbook is:

  • Attached to the incident automatically
  • Pinned in the war room
  • Included in the notification context

Conference Bridge

For voice coordination during major incidents:

  • Auto-bridge: Automatically start a Zoom/Google Meet/Teams call for P1 incidents
  • Dial-in: Include phone dial-in for people without laptop access
  • Recording: Auto-record for postmortem review (configurable)

Configure at Settings → Incident Response → Conference Bridge.


Stakeholder Notifications

Keep non-responders informed without pulling them into the war room:

  1. Add stakeholders to an incident (executives, PMs, customer success)
  2. They receive status updates but aren't part of the escalation chain
  3. Updates are sent at intervals you define (every 15 min, 30 min, etc.)

Stakeholders can subscribe to incident types or services they care about.


Best Practices

  • Assign an Incident Commander immediately — every incident needs one owner
  • Use templates — don't write updates from scratch during an outage
  • Keep the war room focused — side discussions go elsewhere
  • Update the status page early — customers would rather hear "investigating" than nothing
  • Record everything — your future postmortem self will thank you