이번 포스팅에서는 AWS CloudWatch Agent 설치 (EC2, Amazon Linux)에 대해서 알아보도록 하겠습니다.
1. 목표
EC2에 CloudWatch Agent 설치 후 Disk, Memory Metric을 CloudWatch로 수집.
AWS 권한은 IAM Role을 사용하며, Agent는 EC2 OS User root로 서비스를 올림.
CloudWatch Metirc을 이용하여 EC2 Disk, Memory 모니터링.
2. 테스트 환경
Service : EC2
OS : Amazone Linux 2023
AWS 구성은 AWS Web Console에서 진행
3. AWS 환경 구성
3.1 IAM Role 생성
- Create Role
- Trusted entity type: AWS service
- Use Case: EC2
- Add permissions: CloudWatchAgentServerPolicy
- 그 외 구성: Default 설정
3.2 IAM Role을 EC2에 연결
- EC2 Instance 우클릭 후 Security > Modify IAM Role
- IAM Role: 위에서 생성한 IAM Role
- Update IAM Role 클릭
4. CloudWatch Agent 설치 명령어
sudo su - root
sudo yum install amazon-cloudwatch-agent
find / -name amazon-cloudwatch-agent-config-wizard
반환 내용 : /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
5. CloudWatch Agent에 사용될 Config 파일 구성
cd /opt/aws/amazon-cloudwatch-agent/bin/
./amazon-cloudwatch-agent-config-wizard
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:
Trying to fetch the default region based on ec2 metadata...
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:
Which user are you planning to run the agent?
1. root
2. cwagent
3. others
default choice: [1]:
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:
Which port do you want StatsD daemon to listen to?
default choice: [8125]
What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:
2
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can fic metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:
Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to i?
1. yes
2. no
default choice: [2]:
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:
2
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:
2
Program exits now.
6. 참고 및 권장 사항
E! [telegraf] Error running agent: Error parsing /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml, open /usr/share/collectd/types.db: no such file or directory
위 에러 발생 시 아래 명령어 실행하여 collectd 디렉토리와 types.db 파일을 생성한다. 근데 그냥 에러 발생 테스트 전에 미리 생성해준다.
mkdir /usr/share/collectd
touch /usr/share/collectd/types.db
7. Config 파일 확인
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
8. 생성한 Config 파일을 적용
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
9. Status, Start, Stop 명령어
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a start
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop
10. 결과
약 15분정도 기다린 후 AWS Console에서 Cloudwatch Metric을 확인해보면 정상적으로 수집되는 것을 확인한다.
번외. Cloudwatch Agent 설치 후 Disk, Memory Metric이 정상적으로 수집되지 않을 때
1) 로그 확인
아래 경로의 로그를 확인해보자
/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log
아래 로그는 권한이 부족하다는 에러로 IP제한 정책에 의해서 차단되어있었던 케이스였다.
[processors.ec2tagger] ec2tagger: Unable to describe ec2 tags for initial retrieval: UnauthorizedOperation: You are not authorized to perform this operation.
2) EC2에 Role이 Attach되어 있는지 확인
3) Role에 충분한 권한이 들어가 있는지 확인
IP 제한 정책, MFA 강제 정책을 사용한 경우 의도치 않게 Deny 정책에 의해 차단되어 있을 수 있음
4) EC2에서 Attach한 Role을 사용하는지 확인(현재 사용하는 자격 증명 확인)
aws sts get-caller-identity
5) EC2에서 monitoring.amazonaws.com 443 Port로 통신이 가능한지 확인하기
telnet monitoring.amazonaws.com 443