Prometheus 专栏 —— Prometheus安装、配置
配置文件基本结构
- global: 全局配置
- scrape_interval: 抓取目标指标的频率,默认为 1min
- evaluation_interval: 评估告警规则的频率,默认为 1min
- scrape_timeout: 抓取目标指标数据拉取超时,默认为 10s,如果出现 context deadline exceeded 错误时需要在特定的 job 下配置该字段
- external_labels: 服务端在与其他系统对接所携带的标签
# 写法示例
global: scrape_interval: 15s # 设置为15秒 evaluation_interval: 15s # 设置为15秒scrape_timeout: 15s # 设置为15秒
- alerting: 非必须,配置与告警相关的设置,可以指定多个 Alertmanager 实例
- alert_relabel_configs: 用于在发送告警之前重新标记告警的标签
- alertmanagers: 定义Prometheus将告警发送给哪些Alertmanager实例
# 写法示例
alerting: alertmanagers: - static_configs: - targets: - localhost:9093 # 假设Alertmanager运行在本地9093端口
- scrape_configs: 定义了Prometheus如何抓取目标的数据,每个 scrape_config 块代表一组抓取目标及其相关的配置
- metrics_path: 目标抓取数据时使用的路径,一般默认路径为
/metrics
- scheme: 目标抓取数据时使用的协议,一般默认协议为
http
- job_name: 用于标识这组抓取目标的名称
- static_configs: 静态配置的目标列表
- targets: 目标地址列表,Prometheus将从这些地址抓取数据
- metrics_path: 目标抓取数据时使用的路径,一般默认路径为
# 写法示例
scrape_configs: - job_name: 'node_exporter'metrics_path: /metricsscheme: httpstatic_configs: - targets: ['localhost:9100'] # 假设node_exporter运行在本地9100端口
- remote_write: 非必须,用于远程存储写配置
- remote_read: 非必须,用于远程读配置
- rule_files: 指定 Prometheus 载入的告警规则文件列表,这些文件中定义了触发告警的具体规则
# 写法示例
rule_files: - "alert_rules.yml" # 假设告警规则定义在alert_rules.yml文件中
Prometheus.service
基础配置
- –config.file=“prometheus.yml” : Prometheus的配置文件路径,这个文件定义了Prometheus如何发现目标、抓取指标等
告警配置
- –alertmanager.notification-queue-capacity=10000 : 待处理的Alertmanager通知队列的容量
查询配置
- –query.lookback-delta=5m : 在表达式评估和联邦过程中检索指标的最大回溯持续时间
- –query.timeout=2m : 查询可能执行的最长时间,之后将被中止
- –query.max-concurrency=20 : 最大并发查询数
- –query.max-samples=50000000 : 单个查询可以加载到内存中的最大样本数。这也限制了查询可以返回的样本数
日志配置
- –log.level=info : 仅记录给定严重程度或以上的日志消息
- –log.format=logfmt : 日志消息的输出格式
存储配置
- –storage.tsdb.path=“data/” : 指标存储的基本路径,仅用于服务器模式
- –storage.tsdb.retention.time : 存储中保留样本的时间长度
- –storage.tsdb.retention.size : 存储中保留样本的最大字节数
- –[no-]storage.tsdb.no-lockfile : 不在数据目录中创建锁文件
- –storage.tsdb.head-chunks-write-queue-size=0 : 用于写入头块到磁盘的队列大小
- –storage.agent.path=“data-agent/” : 仅用于代理模式的指标存储基本路径
- –storage.agent.wal-compression : 压缩代理的 WAL(写前日志)
- –storage.agent.retention.min-time : 定义 WAL 截断时样本的最小年龄
- –storage.agent.retention.max-time : 定义 WAL 截断时样本的最大年龄
Web配置
- –web.listen-address=“0.0.0.0:9090” : Prometheus的UI、API和遥测数据的监听地址和端口。默认情况下,Prometheus在9090端口上监听所有接口
- –web.config.file=“” : 用于指定启用 TLS 或身份验证的配置文件路径
- –web.read-timeout=5m : 请求读取超时前的最大持续时间,以及空闲连接的关闭时间
- –web.max-connections=512 : 最大并发连接数
- –web.external-url= : Prometheus对外可达的URL,通常用于反向代理设置。它用于生成指向Prometheus自身的相对和绝对链接
- –web.route-prefix=
: Web端点的内部路由前缀。默认为 --web.external-url 的路径部分 - –web.user-assets=
: 静态资源目录的路径,通过 /user 访问 - –[no-]web.enable-lifecycle : 通过HTTP请求启用关闭和加载
- –[no-]web.enable-admin-api : 启用用于管理控制操作的API端点
- –[no-]web.enable-remote-write-receiver : 启用接受远程写入请求的API端点
- –web.console.templates=“consoles” : 控制台模板目录的路径,通过/consoles访问
- –web.console.libraries=“console_libraries” : 控制台库目录的路径
- –web.page-title=“Prometheus Time Series Collection and Processing Server” : Prometheus实例的文档标题
- –web.cors.origin=“.*” : CORS源的正则表达式。用于跨域资源共享配置
功能标志
- –enable-feature= … : 启用特定的功能标志。这可以用于启用实验性或高级功能
搭建监控环境(ansible版)
下载地址: https://prometheus.io/download/
目录结构
> hostk
> monitor.yml
> roles
> - monitor
> - vars
> - main.yml
> - tasks
> - main.yml
> - install_prometheus.yml
> - install_grafana.yml
> - templates
> - prometheus.service.j2
> - prometheus.yml.j2
下载、安装与配置
# hostk
[monitorServer]
monitor-server ansible_host=172.16.13.212 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_pass='123456'
[monitorServer:vars]
gra_version=7.5.4-1
pro_version=2.53.3
# monitor.yml
---
- name: monitorhosts: monitorServerbecome: yesvars_files:- roles/monitor/vars/main.ymlroles:- monitor
# vars/main.yml
GRAFANA_VERSION: "{{ hostvars['monitor-server']['gra_version'] }}"
PROMETHEUS_VERSION: "{{ hostvars['monitor-server']['pro_version'] }}"
MONITOR_IP: "{{ ansible_default_ipv4.address }}"
SOURCE_DIR: /data/tools
# tasks/main.yml
---
- import_tasks: install_prometheus.yml
- import_tasks: install_grafana.yml
# tasks/install_prometheus.yml
---
- name: Install multiple packages using yum moduleansible.builtin.yum:name:- fontconfig- urw-fontsstate: present# - name: Copy source prometheus to remote server(拷贝版)
# ansible.builtin.copy:
# src: "{{ item }}"
# dest: "{{ SOURCE_DIR }}"
# with_fileglob:
# - "../files/prometheus-{{ PROMETHEUS_VERSION }}.linux-amd64.tar.gz"- name: 下载并解压文件prometheus-{{prometheus_version}}.linux-amd64.tar.gz(下载版)ansible.builtin.unarchive:src: 'https://github.com/prometheus/node_exporter/releases/download/v{{ PROMETHEUS_VERSION }}/prometheus-{{ PROMETHEUS_VERSION }}.linux-amd64.tar.gz'dest: '{{ SOURCE_DIR }}'remote_src: yes# - name: unarchive source prometheus package
# ansible.builtin.unarchive:
# src: "{{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}.linux-amd64.tar.gz"
# dest: "{{ SOURCE_DIR }}"
# remote_src: yes
# creates: "{{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}"
# register: unarchive_result- name: Rename extracted directory if necessaryansible.builtin.command: mv {{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}.linux-amd64 {{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}when: unarchive_result.changed- name: create prometheus directoryfile:path: "{{ item }}"state: directorymode: '0755'with_items:- "{{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/data"- "{{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/rules"- name: copy template redis.conf to remote serveransible.builtin.template:src: "prometheus.yml.j2"dest: "{{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/prometheus.yml"- name: Copy template prometheus.service to remote serveransible.builtin.template:src: "prometheus.service.j2"dest: "/usr/lib/systemd/system/prometheus.service"owner: rootgroup: root- name: start prometheusansible.builtin.systemd:name: prometheusstate: startedenabled: yes- name: check prometheus port is already runningansible.builtin.wait_for:port: 9090state: starteddelay: 1timeout: 60# tasks/install_grafana.yml
---
# - name: Copy source grafana to remote server
# ansible.builtin.copy:
# src: "{{ item }}"
# dest: "{{ SOURCE_DIR }}"
# with_fileglob:
# - "../files/grafana-{{ GRAFANA_VERSION }}.x86_64.rpm"- name: Download Grafana RPM packageansible.builtin.get_url:url: https://mirrors.aliyun.com/grafana/yum/rpm/Packages/grafana-7.5.4-1.x86_64.rpmdest: "{{ SOURCE_DIR }}"- name: Install grafana into remoter serveransible.builtin.command: "rpm -ivh {{ SOURCE_DIR }}/grafana-{{ GRAFANA_VERSION }}.x86_64.rpm"- name: start grafanaansible.builtin.systemd:name: grafana-serverstate: startedenabled: yes- name: check grafana port is already runningansible.builtin.wait_for:port: 3000state: starteddelay: 1timeout: 604
# prometheus.service.j2
[Unit]
Description = prometheus
After=network.target[Service]
Type=simple
User=root
ExecStart={{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/prometheus --web.enable-lifecycle --config.file={{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/prometheus.yml --storage.tsdb.path={{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/data/ --storage.tsdb.retention.time=45d
Restart=on-failure[Install]
WantedBy=multi-user.target
# prometheus.yml.j2
global:scrape_interval: 20sscrape_timeout: 15sevaluation_interval: 15sexternal_labels:monitor: 'ops-dev-prometheus'alerting:alertmanagers:- static_configs:- targets:- 127.0.0.1:9093rule_files:- '{{ SOURCE_DIR }}/prometheus-{{ PROMETHEUS_VERSION }}/rules/*.yml'scrape_configs:- job_name: 'node_exporter'metrics_path: /metricsstatic_configs:- targets: ['{{ MONITOR_IP }}:9100']