Step-by-step guide to set up and work with alerts in AppDynamics (the screen you shared is the Alert & Respond → Policies page).

0) How AppDynamics alerts are wired

Health Rules: “what to watch” (e.g., CPU > 85% for 5 min).
Policies: “when/where to notify” (tie events like Health Rule Violation – Opens/Closes to actions).
Actions: “how to notify/run something” (email, HTTP webhook for Slack/PagerDuty/Jira, diagnostic dump, script).
(Optional) Anomaly Detection: AppD’s ML events you can also alert on.
Templates & Digests: re-usable message bodies and scheduled summary emails.

You’ll usually create Actions → Health Rules → Policies (in that order).

1) Prepare notifications

1.1 Configure email/SMS (once)

Alert & Respond → Email / SMS Configuration

Add SMTP server (host, port, auth, From address).
(Optional) Add SMS provider if you’ll send SMS.

1.2 Create actions you’ll reuse

Alert & Respond → Actions → Create

Email Action: Name “Email—SRE On-Call”, add recipients (team list), subject like [AppD][${severity}] ${application} ${eventType}: ${summary} and use variables in the body (pick from UI).
HTTP Request Action (for Slack/PagerDuty/Jira): paste webhook URL, set method/headers, payload JSON using event variables (e.g., ${eventMessage}, ${nodeName}, ${healthRuleName}).
Diagnostic/Remediation Actions (optional): thread dump, heap dump, start CPU profiling, or run script (requires Machine Agent with permissions).

Tip: keep action names generic (Email—SRE, Webhook—Slack-#prod-alerts) so many policies can reuse them.

2) Define what “bad” looks like (Health Rules)

Alert & Respond → Health Rules → Create

Scope: pick the application (e.g., motoshare), then the entity type:
- Applications / Tiers & Nodes / Business Transactions / Databases / Servers / Containers, etc.
Affects: choose All or a subset (tags/tiers/BTs).
Conditions:
- Choose metric (Browse or paste Metric Path).
- Set Critical and optionally Warning thresholds.
- Evaluation: “X of last Y minutes” (e.g., 5 of last 5).
- Static thresholds or dynamic/baseline (e.g., “> 3× baseline, min 50 req/min”).
Rolling up: “All” vs “Any” affected entities trigger.
Re-arm: add a wait time (e.g., 10–15 min) to avoid flapping.
Save.

Fast starters (create a few common rules)

CPU Hot (Tiers & Nodes): Hardware Resources|CPU|%Busy > 85% for 5 of 5 min (Critical), > 75% (Warning).
Memory Pressure: Hardware Resources|Memory|%Used > 90% for 10 of 10 min.
BT Slow (Business Transactions): Average Response Time > baseline + 3σ, min load ≥ 30 cpm, for 5 of 10 min.
BT Errors: Errors per Minute > baseline + 2σ and Error % > 5% for 5 of 10 min.
Container Restarts (if Server/Container visibility enabled): restarts > 0 in 10 min.

Keep rules separate for infrastructure vs. application to route to the right teams.

3) (Optional) Enable Anomaly Detection

Alert & Respond → Anomaly Detection

Turn on for your application/tiers/BTs.
Choose sensitivity and minimum traffic.
These generate “Anomaly Detected” events you can target in policies (useful when you don’t want to hand-tune thresholds).

4) Tie events to notifications (Policies)

You can use the Policy Setup Wizard (shown on your screen) or build manually.

4.1 Quick path: Policy Setup Wizard

Alert & Respond → Policies → Policy Setup Wizard

Pick Event Types:
- Health Rule Violation – Opens (and later add a second policy for Closes).
- (Optional) Anomaly Detected.
Scope: select Application = motoshare (and narrow to tiers/BTs if needed).
Action: attach “Email—SRE On-Call”, “Webhook—Slack-#prod-alerts”, etc.
Execution Settings:
- “Trigger only on Critical” (optional) to reduce noise.
- Add re-notification/cool-down (e.g., re-notify every 60 min while still violating).
Review → Create.

4.2 Manual path (more control)

Alert & Respond → Policies → Create Policy Manually

Section 1: Events to Match
Add:
- Health Rule Violation – Opens (filter to your exact Health Rules if you want).
- (Optional) Health Rule Violation – Closes in a separate “Recovery” policy.
- (Optional) Anomaly Detected / Server Crash / Resource Pool Limit Reached, etc.
Section 2: Evaluation Scope
Application = motoshare, restrict entities if needed.
Section 3: Actions
Add your email/webhook/diagnostic actions.
Set Do not execute more than once every N minutes and wait times to control noise.
Section 4: Schedule/Suppression
Exclude maintenance windows or allow “policy is disabled during deploys”.
Save.

Best practice: create two policies per health rule family—Alert (Opens) and Recovery (Closes)—so teams know when things clear.

5) Test end-to-end

Pick a rule that’s easy to trigger (e.g., lower CPU threshold on a dev tier temporarily).
Use Alert & Respond → Policies → (policy) → Test/Preview if available, or apply load to trigger a BT rule.
Confirm: event appears in Alert & Respond → Events and notifications arrive in email/Slack/Jira.

6) Reduce noise and keep signals strong

Prefer baselines for BT latency/error rules; add minimum load guards.
Use “X of last Y minutes” and re-arm times to avoid flapping.
Route infra vs app alerts to different actions/recipients.
Add maintenance windows for deploys.
Use Email Digests for daily/weekly rollups; keep real-time policies for actionable issues.
Review Top Talkers (which rules/policies fire the most) and tune monthly.

7) Example: three core policies you can add now

App – BT Slow (Critical) → Slack + Email
Event: Health Rule Violation – BT Slow (Critical) Opens → Actions: Slack + Email.
Infra – Node CPU (Critical) → Email only
Event: Health Rule Violation – CPU Hot Opens → Action: Email—SRE.
Recoveries – All Critical Closes → “green” message
Event: Health Rule Violation – Closes (filter to the two rules above) → Actions: Slack “recovered” + Email.

8) Nice-to-have add-ons

HTTP Request Templates & Alerting Templates: standardize payloads/messages once, reuse everywhere.
Mutual TLS Configuration: if your webhook target (e.g., internal gateway) requires client certs.
Runbook links: include a URL in email/webhook pointing to a wiki playbook per alert.
Auto-diagnostics: attach a thread dump or snapshot action to severe BT rules.

AppDynamics Health Rules + Policies (Starter Kit for motoshare)

1) Health Rules

🔹 Node CPU Hot

Scope: Tiers & Nodes → All nodes in motoshare
Critical: CPU % Busy > 85% for 5 of last 5 min
Warning: CPU % Busy > 75% for 5 of last 5 min
Re-arm: 10 min

🔹 Node Memory Pressure

Scope: Tiers & Nodes → All nodes in motoshare
Critical: Memory % Used > 90% for 10 of last 10 min
Warning: Memory % Used > 80%
Re-arm: 15 min

🔹 Business Transaction – Slow

Scope: Business Transactions → All BTs in motoshare
Critical: Avg Response Time > Baseline + 3σ for ≥ 30 cpm traffic, 5 of 10 min
Warning: Avg Response Time > Baseline + 2σ
Re-arm: 10 min

🔹 Business Transaction – Errors

Scope: Business Transactions → All BTs in motoshare
Critical: Error % > 5% AND Errors/min > 30, for 5 of 10 min
Warning: Error % > 2%
Re-arm: 10 min

🔹 Container Restarts (if Server/Container visibility enabled)

Scope: Containers in motoshare
Critical: Restart Count > 0 in 10 min
Re-arm: 20 min

2) Actions

Create these once under Alert & Respond → Actions:

Email—SRE On-Call
- To: sre-team@motoshare.com
- Subject: [AppD][${severity}] ${application} ${eventType}: ${summary}
- Body: include ${healthRuleName}, ${nodeName}, ${eventMessage}, ${incidentID}
Slack—#alerts
- Type: HTTP Request Action
- URL: Slack Webhook
- JSON payload: { "text": "*[AppDynamics Alert]*nApp: ${application}nEvent: ${eventType}nRule: ${healthRuleName}nNode: ${nodeName}nMessage: ${eventMessage}nLink: ${eventLink}" }

3) Policies

🔹 Policy 1: Infra – Node CPU Critical

Event: Health Rule Violation Opens → “Node CPU Hot”
Action: Email—SRE On-Call
Rate Limit: Once every 15 min

🔹 Policy 2: Infra – Node Memory Critical

Event: Health Rule Violation Opens → “Node Memory Pressure”
Action: Email—SRE On-Call
Rate Limit: Once every 20 min

🔹 Policy 3: App – BT Slow

Event: Health Rule Violation Opens → “BT Slow”
Action: Slack—#alerts + Email—SRE On-Call
Rate Limit: Once every 30 min

🔹 Policy 4: App – BT Errors

Event: Health Rule Violation Opens → “BT Errors”
Action: Slack—#alerts + Email—SRE On-Call
Rate Limit: Once every 30 min

🔹 Policy 5: Recoveries

Event: Health Rule Violation Closes (all above rules)
Action: Slack—#alerts (Recovery message)
Rate Limit: Once every 15 min

4) Testing & Validation

Temporarily lower thresholds (e.g., CPU > 5%) to trigger an alert.
Check Alert & Respond → Events for policy hits.
Confirm notifications in Slack and email.
Reset thresholds back to production values.

✅ With this pack:

Infra alerts (CPU, memory, container restarts) go to Email—SRE.
App-level issues (BT slow, BT errors) go to Slack + Email.
Recovery events let the team know when it’s safe.

Rajesh Kumar

I’m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.

Do you want to learn Quantum Computing?

Please find my social handles as below;

Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND

Rajesh Kumar DailyLogs