Datadog Agent Troubleshooting Guide

The Datadog Agent is a monitoring agent that collects metrics and logs from your infrastructure, applications, and services. If you are experiencing issues with the Datadog Agent, you can troubleshoot it using the following steps:

  1. Check the Agent Status: Check if the Datadog Agent is running on the machine by running the command “sudo service datadog-agent status” or “sudo systemctl status datadog-agent”. If the agent is not running, try restarting the agent by running the command “sudo service datadog-agent restart” or “sudo systemctl restart datadog-agent”.
  2. Check the Agent Logs: Check the logs for any error messages that may indicate issues with the agent. The agent logs can be found in the /var/log/datadog directory. Check for any error messages or stack traces and try to resolve the issue based on the error message.
  3. Check the Configuration: Verify the agent configuration to ensure that it is properly configured. The agent configuration file is located in /etc/datadog-agent/datadog.yaml. Check for any errors in the configuration file and ensure that it is properly configured to collect the metrics and logs that you need.
  4. Check the Connectivity: Check if the agent is able to connect to the Datadog backend. To do this, run the command “sudo datadog-agent info” and check the “Status” field. If the status is “ok”, then the agent is connected to the backend. If the status is “not running”, then there may be an issue with the agent configuration or connectivity.
  5. Check for Firewalls: If you are running a firewall on the machine, ensure that the agent is allowed to communicate with the Datadog backend. The agent communicates over ports 443 and 8125. If these ports are blocked by a firewall, the agent may not be able to communicate with the backend.
  6. Check for Resource Constraints: Check if the agent is running into resource constraints, such as running out of disk space or memory. If the agent is running into resource constraints, try freeing up resources or increasing the resource limits for the agent.
  7. Contact Support: If you are still experiencing issues with the Datadog Agent, contact Datadog Support for assistance. Provide any relevant error messages or logs to help them diagnose and resolve the issue.

1. Check Datadog Agent Process is running or not

$ systemctl start datadog-agent
$ systemctl stop datadog-agent
$ systemctl restart datadog-agent
$ systemctl status datadog-agent

2. Check Datadog Agent Logs for Error


$ ls /var/log/datadog/
$ more /etc/datadog-agent/datadog.yaml
$ more /var/log/datadog/agent.log
$ more /var/log/datadog/process-agent.log
$ more /var/log/datadog/trace-agent.log

$ more /etc/datadog-agent/datadog.yaml | grep -i error
$ more /var/log/datadog/agent.log | grep -i error
$ more /var/log/datadog/process-agent.log | grep -i error
$ more /var/log/datadog/trace-agent.log | grep -i error

3. Check Datadog Agent Configuration file(datadog.yaml) syntax Error

4. Check Datadog Integration Configuration file(datadog.yaml) syntax Error


$ datadog-agent status
$ datadog-agent check apache
$ sudo -u dd-agent datadog-agent check apache
$ datadog-agent config
$ datadog-agent diagnose
$ datadog-agent health
$ datadog-agent integration
$ datadog-agent integration show datadog-apache

5. Check Datadog Agent API Key in datadog.yaml

6. Check ENV variable set for Datadog Agent conflict with datadog.yaml

7. Print the runtime configuration of a running agent

$ datadog-agent config

8. Print all configurations loaded & resolved of a running agent

$ datadog-agent configcheck

9. Execute some connectivity diagnosis on your system

$ datadog-agent diagnose

10. Print the current agent health

$ datadog-agent health

11. Print basic statistics on the metrics processed by dogstatsd

$ datadog-agent dogstatsd-stats
Rajesh Kumar
Follow me
Latest posts by Rajesh Kumar (see all)
Subscribe
Notify of
guest
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
Manoj Singh
Manoj Singh
8 months ago

Hi Rajesh, I wanted to reach out regarding an issue I’ve encountered after installing the Datadog agent (version 7.45) on my two Windows EC2 machines.
Specifically, I have one machine that is running with the AWS metadata version IMDSv1, and another machine running with IMDSv2. While setting up the Datadog agent, I noticed that the EC2 tags are not being synced properly on the machine utilizing IMDSv2.
I’ve reviewed my configuration to ensure that the setup is correct for both instances and that the appropriate IAM roles and permissions are in place for accessing EC2 metadata and tags. Additionally, I’ve checked the role permissions to make sure they are correctly configured.
I’ve examined the Datadog agent logs on both machines, but I haven’t been able to identify any error messages or warnings related to the issue. Furthermore, I confirmed that there are no network-related problems, such as firewall rules or security group settings, that might be impeding communication with the metadata endpoints.
Considering the steps I’ve taken so far, I’m uncertain about what might be causing this synchronization problem for the EC2 tags on the machine with IMDSv2. I’m wondering if you could provide any insights, suggestions, or further troubleshooting steps that could help me resolve this issue.

1
0
Would love your thoughts, please comment.x
()
x