Prometheus安装文档
2023-12-14
| 2023-12-14
0  |  0 分钟
type
status
date
slug
summary
tags
category
icon
password
标签

Prometheus安装

# 下载 wget https://github.com/prometheus/prometheus/releases/download/v2.45.1/prometheus-2.45.1.linux-amd64.tar.gz # 解压 tar xvf prometheus-2.45.0.linux-amd64.tar.gz #创建prometheus用户 useradd -M -s /usr/sbin/nologin prometheus #更改prometheus用户的文件夹权限 chown prometheus:prometheus -R /d/app/prometheus # 创建systemd服务 cat > /lib/systemd/system/prometheus.service << "EOF" [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] Type=simple User=prometheus Group=prometheus Restart=on-failure ExecStart=/d/app/prometheus/prometheus/prometheus \ --config.file=/d/app/prometheus/prometheus/prometheus.yml \ --storage.tsdb.path=/d/data/prometheus \ --storage.tsdb.retention.time=30d \ --web.enable-lifecycle [Install] WantedBy=multi-user.target EOF # 检查 ls -l /lib/systemd/system/prometheus.service # reload systemctl daemon-reload #启动 systemctl start prometheus systemctl enable prometheus # 放开端口 firewall-cmd --zone=public --add-port=9090/tcp --permanent firewall-cmd --reload
 

Alertmanager安装

# 下载 wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz # 解压 tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz #更改prometheus用户的文件夹权限 chown prometheus:prometheus -R /d/app/prometheus # 创建systemd服务 cat > /lib/systemd/system/alertmanager.service << "EOF" [Unit] Description=Alert Manager Wants=network-online.target After=network-online.target [Service] Type=simple User=prometheus Group=prometheus ExecStart=/d/app/prometheus/alertmanager/alertmanager \ --config.file=/d/app/prometheus/alertmanager/alertmanager.yml \ --storage.path=/d/data/alertmanager Restart=always [Install] WantedBy=multi-user.target EOF # 检查 ls -l /lib/systemd/system/alertmanager.service # reload systemctl daemon-reload #启动 systemctl start alertmanager systemctl enable alertmanager # 放开端口 firewall-cmd --zone=public --add-port=9093/tcp --permanent firewall-cmd --reload

添加Prometheus配置

因为我们安装了Alertmanager,所以需要添加到Prometheus里面
$ vi /d/app/prometheus/prometheus/prometheus.yml
# - alertmanager:9093修改为localhost:9093
因为我们是装在同一个机器上,所以是localhost,若安装不在同一个机器上请修改为正确IP地址
# Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - localhost:9093 <--修改这里,注意缩进
注意!yml文件是有缩进格式的,修改时不要打乱原有格式
rule_files:添加- "alert.yml",前面缩进只需保留两格!
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "alert.yml" <--添加这里,注意缩进2格即可 # - "first_rules.yml" # - "second_rules.yml"
增加触发器配置文件
编辑新文件alert.yml添加以下内容,注意格式缩进
cat > /d/app/prometheus/prometheus/alert.yml << "EOF" groups: - name: Prometheus alert rules: # 任何实例超过30s无法联系的情况发出告警 - alert: 服务告警 expr: up == 0 for: 30s labels: severity: critical annotations: instance: "{{ $labels.instance }}" description: "{{ $labels.job }} 服务已关闭" EOF
检查一下配置文件,与下方一致即为成功
$ cd /d/app/prometheus/prometheus $ ./promtool check config prometheus.yml Checking prometheus.yml SUCCESS: 1 rule files found SUCCESS: prometheus.yml is valid prometheus config file syntax Checking alert.yml SUCCESS: 1 rules found
接下来重启一下Prometheus或重新加载配置文件
$ systemctl restart prometheus # 二选一即可 $ curl -X POST http://localhost:9090/-/reload
再次访问http://localhost:9093/,并检查Status,确认没有问题

Grafana安装

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.1.4.linux-amd64.tar.gz tar -zxvf grafana-enterprise-10.1.4.linux-amd64.tar.gz #更改prometheus用户的文件夹权限 chown prometheus:prometheus -R /d/app/prometheus # 创建systemd服务 cat > /lib/systemd/system/grafana.service << "EOF" [Unit] Description=Grafana Documentation=http://docs.grafana.org [Service] Type=simple User=prometheus Group=prometheus Restart=on-failure ExecStart=/d/app/prometheus/grafana/bin/grafana-server \ --config=/d/app/prometheus/grafana/conf/defaults.ini \ --homepath=/d/app/prometheus/grafana [Install] WantedBy=multi-user.target EOF # 检查 ls -l /lib/systemd/system/grafana.service # reload systemctl daemon-reload #启动 systemctl start grafana systemctl enable grafana # 放开端口 firewall-cmd --zone=public --add-port=3000/tcp --permanent firewall-cmd --reload

node_exporter安装

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz tar -zxvf node_exporter-1.6.1.linux-amd64.tar.gz #更改prometheus用户的文件夹权限 chown prometheus:prometheus -R /d/app/prometheus # 创建systemd服务 cat > /lib/systemd/system/node_exporter.service << "EOF" [Unit] Description=node_exporter Documentation=https://prometheus.io/ After=network.target [Service] User=prometheus Group=prometheus ExecStart=/d/app/prometheus/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target EOF # 检查 ls -l /lib/systemd/system/node_exporter.service # reload systemctl daemon-reload #启动 systemctl start node_exporter systemctl enable node_exporter # 放开端口 firewall-cmd --zone=public --add-port=9100/tcp --permanent firewall-cmd --reload

添加Prometheus配置

安装完成后还需要添加Prometheus配置,为避免大家打错,这里采用追加写入
$ cat >> /d/app/prometheus/prometheus/prometheus.yml <<"EOF" # 在scrape_configs这行下面添加配置 - job_name: "node-exporter" scrape_interval: 15s static_configs: - targets: ["localhost:9100"] labels: instance: prometheus服务器 EOF
重载Prometheus配置
$ curl -X POST http://localhost:9090/-/reload

基于文件的服务发现

mkdir targets cat > targets/targets.yml << "EOF" - targets: - "localhost:9090" labels: job: prometheus env: ops - targets: - "localhost:9100" labels: instance: Prometheus服务器 job: node-exporter env: ops EOF
cat >prometheus.yml<<"EOF" # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - localhost:9093 # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "alert.yml" - "rules/*.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "file-sd-target" file_sd_configs: - refresh_interval: 10s files: - "targets/targets.yml" - job_name: "springboot-demo" metrics_path: "/actuator/prometheus" file_sd_configs: - refresh_interval: 10s files: - "targets/springboot.yml" EOF # 更改prometheus用户的文件夹权限 chown prometheus:prometheus -R /d/app/prometheus # 检测 ./promtool check config prometheus.yml # 重载Prometheus配置 curl -X POST http://localhost:9090/-/reload
 
邮箱:
global: #邮箱服务器 smtp_smarthost: 'smtp.mail.qq.com' #发邮件的邮箱 smtp_from: 'ops@qq.com' #发邮件的邮箱用户名 smtp_auth_username: 'ops@qq.com' #邮箱密码 smtp_auth_password: 'xxxx' #进行tls验证 smtp_require_tls: false route: group_by: ['alertname'] #分组标签 group_wait: 10s # 告警等待时间。告警产生后等待10s,如果有同组告警一起发出 group_interval: 10s # 两组告警的间隔时间 repeat_interval: 5m # 重复告警的间隔时间,减少相同右键的发送频率 此处为测试设置为5分钟 receiver: 'mail' # 默认接收者 routes: # 指定那些组可以接收消息 receivers: - name: 'mail' email_configs: - to: 'ops@qq.com' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
# 检测配置 ./amtool check-config alertmanager.yml # 动态加载 curl -X POST http://localhost:9093/-/reload
cat > /d/app/prometheus/alertmanager/template/email.tmpl <<"EOF" {{ define "email.html" }} {{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }} <h2>@告警通知</h2> 告警程序: prometheus告警 <br> 告警级别: {{ .Labels.severity }} <br> 告警类型: {{ .Labels.alertname }} <br> 故障主机: {{ .Labels.instance }} <br> 告警主题: {{ .Annotations.summary }} <br> 告警详情: {{ .Annotations.description }} <br> 触发时间: {{ .StartsAt.Local.Format "2019-08-04 16:58:15" }} <br> {{ end }}{{ end -}} {{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }} <h2>@告警恢复</h2> 告警程序: prometheus告警 <br> 告警类型: {{ .Labels.alertname }} <br> 故障主机: {{ .Labels.instance }} <br> 告警主题: {{ .Annotations.summary }} <br> 告警详情: {{ .Annotations.description }} <br> 告警时间: {{ .StartsAt.Local.Format "2019-08-04 16:58:15" }} <br> 恢复时间: {{ .EndAt.Local.Format "2019-08-04 16:58:15" }} <br> {{ end }}{{ end -}} {{- end }} EOF
 
DevOps
  • 监控
  • Prometheus
  • Spring Boot Jar包运行加载静态资源文件用Java和Nodejs获取http30X跳转后的url
    目录