完善中央与边缘部署、远程写入与监控文档
- 增加中央与边缘完整配置和部署脚本 - 引入 VictoriaMetrics 数据源与 remote_write 故障排查说明 - 新增 edge-agent 配置脚本、ONVIF 自建 exporter 与 ping 监控示例 Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
372
central-server/CONFIGURATION.md
Normal file
372
central-server/CONFIGURATION.md
Normal file
@@ -0,0 +1,372 @@
|
||||
# 中央服务器配置说明
|
||||
|
||||
本文档说明如何配置中央服务器的各项参数。
|
||||
|
||||
## 配置文件结构
|
||||
|
||||
中央服务器使用 `.env` 文件管理所有可配置参数,主要配置文件包括:
|
||||
|
||||
- **`.env`** - 环境变量配置文件(从 `env.example` 复制并修改)
|
||||
- **`docker-compose.yml`** - Docker Compose 服务定义(使用环境变量)
|
||||
- **`prometheus.yml.template`** - Prometheus 配置模板(部署时自动生成 `prometheus.yml`)
|
||||
- **`prometheus.yml`** - Prometheus 实际配置文件(由模板自动生成,不要手动编辑)
|
||||
- **`alert_rules.yml`** - Prometheus 告警规则配置
|
||||
- **`alertmanager/alertmanager.yml`** - Alertmanager 告警管理配置
|
||||
|
||||
## 快速开始
|
||||
|
||||
1. **复制环境变量模板**:
|
||||
```bash
|
||||
cd central-server
|
||||
cp env.example .env
|
||||
```
|
||||
|
||||
2. **编辑 `.env` 文件**,根据实际情况修改配置参数
|
||||
|
||||
3. **运行部署脚本**:
|
||||
```bash
|
||||
./deploy.sh
|
||||
```
|
||||
|
||||
部署脚本会自动:
|
||||
- 加载 `.env` 文件中的环境变量
|
||||
- 从 `prometheus.yml.template` 生成 `prometheus.yml`
|
||||
- 创建数据目录并设置权限
|
||||
- 启动所有服务
|
||||
|
||||
## 配置参数说明
|
||||
|
||||
### 端口配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `PROMETHEUS_PORT` | 9091 | Prometheus Web UI 端口(避免与 cockpit 冲突) |
|
||||
| `GRAFANA_PORT` | 3000 | Grafana Web UI 端口 |
|
||||
| `ALERTMANAGER_PORT` | 9093 | Alertmanager Web UI 端口 |
|
||||
| `VICTORIAMETRICS_PORT` | 8428 | VictoriaMetrics 端口(边缘节点推送数据到此端口) |
|
||||
|
||||
**注意**:
|
||||
- 如果端口被占用,修改 `.env` 文件中对应的端口号
|
||||
- 修改端口后需要重新运行 `./deploy.sh`
|
||||
- 确保防火墙已开放相应端口
|
||||
|
||||
### Grafana 配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `GRAFANA_ADMIN_PASSWORD` | admin123 | Grafana 管理员密码(**首次部署后请立即修改**) |
|
||||
| `GRAFANA_DEFAULT_LANGUAGE` | zh-Hans | Grafana 默认语言(zh-Hans=简体中文,en=英文) |
|
||||
| `GRAFANA_DEFAULT_THEME` | light | Grafana 默认主题(light=浅色,dark=深色) |
|
||||
|
||||
**安全建议**:
|
||||
- 生产环境必须修改 `GRAFANA_ADMIN_PASSWORD`
|
||||
- 建议使用强密码(至少12位,包含大小写字母、数字和特殊字符)
|
||||
|
||||
### Prometheus 配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `PROMETHEUS_RETENTION_TIME` | 30d | Prometheus 数据保留时间(30d=30天,7d=7天) |
|
||||
| `PROMETHEUS_SCRAPE_INTERVAL` | 15 | Prometheus 抓取间隔(秒) |
|
||||
| `PROMETHEUS_EVALUATION_INTERVAL` | 15 | Prometheus 告警评估间隔(秒) |
|
||||
| `PROMETHEUS_CLUSTER_NAME` | central-monitoring | Prometheus 集群标识(用于区分不同集群) |
|
||||
|
||||
**性能调优**:
|
||||
- `PROMETHEUS_SCRAPE_INTERVAL` 越小,数据越实时,但会增加系统负载
|
||||
- 建议值:15-30秒(一般监控),60秒(低频监控)
|
||||
- `PROMETHEUS_RETENTION_TIME` 越大,历史数据保留越久,但占用更多存储空间
|
||||
|
||||
### 远程写入配置
|
||||
|
||||
这些参数控制边缘节点向中央服务器推送数据的性能:
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES` | 10000 | 每次发送的最大样本数 |
|
||||
| `PROMETHEUS_REMOTE_WRITE_CAPACITY` | 20000 | 队列容量(样本数) |
|
||||
| `PROMETHEUS_REMOTE_WRITE_MAX_SHARDS` | 10 | 最大分片数(并发写入数) |
|
||||
|
||||
**调优建议**:
|
||||
- 边缘节点较多时,可以增加 `MAX_SHARDS` 提高并发处理能力
|
||||
- 网络不稳定时,可以增加 `CAPACITY` 提高缓冲能力
|
||||
- 一般情况使用默认值即可
|
||||
|
||||
### VictoriaMetrics 配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `VICTORIAMETRICS_RETENTION_PERIOD` | 30d | VictoriaMetrics 数据保留时间 |
|
||||
|
||||
**注意**:
|
||||
- VictoriaMetrics 是边缘节点数据的长期存储
|
||||
- 保留时间越长,占用存储空间越大
|
||||
- 建议根据实际存储容量和需求设置
|
||||
|
||||
### 数据存储路径配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `DATA_STORAGE_ROOT` | ./data | 数据存储根目录(相对于 central-server 目录) |
|
||||
| `PROMETHEUS_DATA_DIR` | ./data/prometheus-data | Prometheus 数据目录(相对路径) |
|
||||
| `GRAFANA_DATA_DIR` | ./data/grafana-data | Grafana 数据目录(相对路径) |
|
||||
| `VICTORIAMETRICS_DATA_DIR` | ./data/victoria-metrics-data | VictoriaMetrics 数据目录(相对路径) |
|
||||
|
||||
**存储说明**:
|
||||
- 默认使用相对路径,数据存储在 `central-server/data/` 目录下
|
||||
- 部署脚本会自动将相对路径转换为绝对路径(Docker 需要绝对路径)
|
||||
- 如果需要使用其他路径,可以修改为绝对路径(如 `/data/prometheus-data`)
|
||||
- 确保目录有足够的写权限
|
||||
|
||||
### Traefik 反向代理配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `TRAEFIK_ENABLED` | true | 是否启用 Traefik 反向代理 |
|
||||
| `TRAEFIK_NETWORK` | traefik | Traefik 网络名称 |
|
||||
| `GRAFANA_DOMAIN` | grafana.example.com | Grafana 域名 |
|
||||
| `PROMETHEUS_DOMAIN` | prometheus.example.com | Prometheus 域名 |
|
||||
| `ALERTMANAGER_DOMAIN` | alertmanager.example.com | Alertmanager 域名 |
|
||||
| `TRAEFIK_ENTRYPOINT` | web | Traefik EntryPoint 名称 |
|
||||
| `TRAEFIK_HTTPS_ENABLED` | false | 是否启用 HTTPS |
|
||||
| `GRAFANA_ROOT_URL` | http://localhost:3000 | Grafana 根 URL(使用 Traefik 时设置为 https://域名) |
|
||||
|
||||
**Traefik 配置说明**:
|
||||
- 启用 Traefik 后,服务将通过域名访问,不再直接暴露端口
|
||||
- 需要确保 Traefik 网络已创建:`docker network create traefik`(如果不存在)
|
||||
- 需要配置 DNS 解析,将域名指向 Traefik 服务器
|
||||
- VictoriaMetrics 通常不通过 Traefik 访问(边缘节点直接连接)
|
||||
- 如果使用 HTTPS,需要配置 Traefik 的 TLS 证书
|
||||
- `NETWORK_NAME` 必须与 Traefik 实际网络一致,否则 Traefik 无法转发
|
||||
|
||||
### Docker 网络配置
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `NETWORK_NAME` | traefik | Compose 网络名称(与 Traefik 实际网络一致) |
|
||||
| `ENABLE_IPV6` | false | 是否启用 IPv6(true/false) |
|
||||
|
||||
**网络配置说明**:
|
||||
- `NETWORK_NAME` 必须与 Traefik 实际网络名称一致,否则 Traefik 无法访问服务
|
||||
- 如需 IPv6,请设为 `true`,并确保 Docker 已启用 IPv6
|
||||
|
||||
## 配置示例
|
||||
|
||||
### 示例 1:修改端口避免冲突
|
||||
|
||||
如果 9091 端口被占用,修改 `.env`:
|
||||
|
||||
```bash
|
||||
# 将 Prometheus 端口改为 9092
|
||||
PROMETHEUS_PORT=9092
|
||||
```
|
||||
|
||||
### 示例 2:延长数据保留时间
|
||||
|
||||
如果需要保留 90 天的历史数据,修改 `.env`:
|
||||
|
||||
```bash
|
||||
PROMETHEUS_RETENTION_TIME=90d
|
||||
VICTORIAMETRICS_RETENTION_PERIOD=90d
|
||||
```
|
||||
|
||||
**注意**:保留时间越长,需要的存储空间越大。90天大约需要:
|
||||
- 每个边缘节点:约 5-10GB(取决于监控目标数量)
|
||||
- 中央服务器:约 2-5GB
|
||||
|
||||
### 示例 3:使用自定义存储路径
|
||||
|
||||
如果需要使用其他分区或绝对路径:
|
||||
|
||||
```bash
|
||||
# 使用绝对路径(例如使用 /data 分区)
|
||||
DATA_STORAGE_ROOT=/data
|
||||
PROMETHEUS_DATA_DIR=/data/prometheus-data
|
||||
GRAFANA_DATA_DIR=/data/grafana-data
|
||||
VICTORIAMETRICS_DATA_DIR=/data/victoria-metrics-data
|
||||
|
||||
# 或者使用相对路径(相对于 central-server 目录)
|
||||
PROMETHEUS_DATA_DIR=./data/prometheus-data
|
||||
GRAFANA_DATA_DIR=./data/grafana-data
|
||||
VICTORIAMETRICS_DATA_DIR=./data/victoria-metrics-data
|
||||
```
|
||||
|
||||
### 示例 4:启用 Traefik 反向代理
|
||||
|
||||
通过 Traefik 使用域名访问服务:
|
||||
|
||||
```bash
|
||||
# 启用 Traefik
|
||||
TRAEFIK_ENABLED=true
|
||||
TRAEFIK_NETWORK=traefik
|
||||
NETWORK_NAME=traefik
|
||||
ENABLE_IPV6=false
|
||||
|
||||
# 配置域名(需要配置 DNS 解析)
|
||||
GRAFANA_DOMAIN=grafana.yourdomain.com
|
||||
PROMETHEUS_DOMAIN=prometheus.yourdomain.com
|
||||
ALERTMANAGER_DOMAIN=alertmanager.yourdomain.com
|
||||
|
||||
# 如果使用 HTTPS
|
||||
TRAEFIK_HTTPS_ENABLED=true
|
||||
TRAEFIK_ENTRYPOINT=websecure
|
||||
GRAFANA_ROOT_URL=https://grafana.yourdomain.com
|
||||
```
|
||||
|
||||
**注意**:
|
||||
- 确保 Traefik 网络已创建:`docker network create traefik`
|
||||
- 配置 DNS 解析,将域名指向 Traefik 服务器
|
||||
- 如果使用 HTTPS,需要在 Traefik 中配置 TLS 证书
|
||||
|
||||
### 示例 5:生产环境安全配置
|
||||
|
||||
生产环境建议配置:
|
||||
|
||||
```bash
|
||||
# 使用强密码
|
||||
GRAFANA_ADMIN_PASSWORD=YourStrongPassword123!
|
||||
|
||||
# 启用 Traefik 反向代理(推荐)
|
||||
TRAEFIK_ENABLED=true
|
||||
TRAEFIK_NETWORK=traefik
|
||||
NETWORK_NAME=traefik
|
||||
GRAFANA_DOMAIN=grafana.yourdomain.com
|
||||
PROMETHEUS_DOMAIN=prometheus.yourdomain.com
|
||||
ALERTMANAGER_DOMAIN=alertmanager.yourdomain.com
|
||||
|
||||
# 或使用非标准端口(如果不用 Traefik)
|
||||
GRAFANA_PORT=3001
|
||||
PROMETHEUS_PORT=9092
|
||||
|
||||
# 延长数据保留时间
|
||||
PROMETHEUS_RETENTION_TIME=90d
|
||||
VICTORIAMETRICS_RETENTION_PERIOD=90d
|
||||
```
|
||||
|
||||
## 配置修改流程
|
||||
|
||||
1. **停止服务**:
|
||||
```bash
|
||||
docker compose down
|
||||
# 或
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
2. **编辑 `.env` 文件**,修改需要更改的参数
|
||||
|
||||
3. **重新运行部署脚本**:
|
||||
```bash
|
||||
./deploy.sh
|
||||
```
|
||||
|
||||
部署脚本会自动:
|
||||
- 重新加载环境变量
|
||||
- 从模板重新生成 `prometheus.yml`
|
||||
- 更新 Docker Compose 配置
|
||||
- 重启所有服务
|
||||
|
||||
## 配置文件说明
|
||||
|
||||
### docker-compose.yml
|
||||
|
||||
Docker Compose 配置文件,定义了所有服务的:
|
||||
- 镜像版本
|
||||
- 端口映射(使用环境变量)
|
||||
- 数据卷挂载(使用环境变量)
|
||||
- 环境变量(使用环境变量)
|
||||
- 启动命令(使用环境变量)
|
||||
|
||||
**注意**:此文件已配置为使用环境变量,一般不需要手动修改。
|
||||
|
||||
### prometheus.yml.template
|
||||
|
||||
Prometheus 配置模板文件,包含:
|
||||
- 全局配置(抓取间隔、评估间隔、集群名称)
|
||||
- 远程写入配置(接收边缘节点数据)
|
||||
- 抓取配置(抓取本地服务指标)
|
||||
- 告警规则配置
|
||||
- Alertmanager 配置
|
||||
|
||||
**注意**:
|
||||
- 此文件是模板,不要直接使用
|
||||
- 部署脚本会自动从此模板生成 `prometheus.yml`
|
||||
- 修改配置应通过 `.env` 文件,而不是直接修改模板
|
||||
|
||||
### prometheus.yml
|
||||
|
||||
Prometheus 实际配置文件,由 `prometheus.yml.template` 自动生成。
|
||||
|
||||
**注意**:
|
||||
- 此文件由部署脚本自动生成,**不要手动编辑**
|
||||
- 如果需要修改,应修改 `prometheus.yml.template` 和 `.env` 文件
|
||||
|
||||
### alert_rules.yml
|
||||
|
||||
Prometheus 告警规则配置文件,定义了:
|
||||
- ONVIF 设备告警规则(设备离线、高温、存储不足)
|
||||
- 网络设备告警规则(设备离线、高延迟)
|
||||
|
||||
**注意**:此文件需要手动编辑,修改后需要重启 Prometheus 服务。
|
||||
|
||||
### alertmanager/alertmanager.yml
|
||||
|
||||
Alertmanager 告警管理配置文件,定义了:
|
||||
- 告警路由规则
|
||||
- 告警分组规则
|
||||
- 告警抑制规则
|
||||
- 告警接收器(通知渠道)
|
||||
|
||||
**注意**:此文件需要手动编辑,修改后需要重启 Alertmanager 服务。
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 修改配置后服务没有生效?
|
||||
|
||||
**A**: 确保:
|
||||
1. 已重新运行 `./deploy.sh` 脚本
|
||||
2. 检查服务是否正常启动:`docker compose ps`
|
||||
3. 查看服务日志:`docker compose logs`
|
||||
|
||||
### Q2: 如何查看当前配置?
|
||||
|
||||
**A**:
|
||||
- 查看环境变量:`cat .env`
|
||||
- 查看生成的 Prometheus 配置:`cat prometheus.yml`
|
||||
- 查看 Docker Compose 配置:`cat docker-compose.yml`
|
||||
|
||||
### Q3: 配置错误导致服务无法启动?
|
||||
|
||||
**A**:
|
||||
1. 检查 `.env` 文件语法(确保没有多余的空格或特殊字符)
|
||||
2. 检查端口是否被占用:`netstat -tuln | grep <端口>`
|
||||
3. 检查数据目录权限:`ls -ld /storage/*-data`
|
||||
4. 查看错误日志:`docker compose logs`
|
||||
|
||||
### Q4: 如何备份配置?
|
||||
|
||||
**A**:
|
||||
```bash
|
||||
# 备份所有配置文件
|
||||
tar -czf central-server-config-backup-$(date +%Y%m%d).tar.gz \
|
||||
.env \
|
||||
prometheus.yml.template \
|
||||
alert_rules.yml \
|
||||
alertmanager/alertmanager.yml \
|
||||
docker-compose.yml
|
||||
```
|
||||
|
||||
### Q5: 如何恢复默认配置?
|
||||
|
||||
**A**:
|
||||
```bash
|
||||
# 从模板重新创建 .env
|
||||
cp env.example .env
|
||||
# 重新部署
|
||||
./deploy.sh
|
||||
```
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [中央服务器架构说明](../doc/ARCHITECTURE.md)
|
||||
- [部署指南](../doc/DEPLOYMENT_GUIDE.md)
|
||||
- [告警规则说明](../doc/ALERT_RULES_EXPLANATION.md)
|
||||
- [Alertmanager 配置说明](../doc/ALERTMANAGER_CONFIG.md)
|
||||
49
central-server/alert_rules.yml
Normal file
49
central-server/alert_rules.yml
Normal file
@@ -0,0 +1,49 @@
|
||||
groups:
|
||||
- name: onvif_alerts
|
||||
rules:
|
||||
- alert: ONVIFDeviceDown
|
||||
expr: up{job="onvif-devices"} == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "ONVIF设备离线"
|
||||
description: "ONVIF设备 {{ $labels.instance }} 已离线超过1分钟"
|
||||
|
||||
- alert: ONVIFDeviceHighTemperature
|
||||
expr: onvif_device_temperature > 70
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "ONVIF设备温度过高"
|
||||
description: "设备 {{ $labels.instance }} 温度达到 {{ $value }}°C"
|
||||
|
||||
- alert: ONVIFDeviceLowStorage
|
||||
expr: onvif_storage_usage_percent > 90
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "ONVIF设备存储空间不足"
|
||||
description: "设备 {{ $labels.instance }} 存储使用率达到 {{ $value }}%"
|
||||
|
||||
- name: network_alerts
|
||||
rules:
|
||||
- alert: NetworkDeviceDown
|
||||
expr: probe_success{job="network-ping"} == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "网络设备离线"
|
||||
description: "网络设备 {{ $labels.instance }} 无法ping通"
|
||||
|
||||
- alert: HighNetworkLatency
|
||||
expr: probe_duration_seconds{job="network-ping"} > 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "网络延迟过高"
|
||||
description: "设备 {{ $labels.instance }} 延迟达到 {{ $value }}秒"
|
||||
22
central-server/alertmanager/alertmanager.yml
Normal file
22
central-server/alertmanager/alertmanager.yml
Normal file
@@ -0,0 +1,22 @@
|
||||
global:
|
||||
smtp_smarthost: 'localhost:587'
|
||||
smtp_from: 'alertmanager@example.com'
|
||||
|
||||
route:
|
||||
group_by: ['alertname']
|
||||
group_wait: 10s
|
||||
group_interval: 10s
|
||||
repeat_interval: 1h
|
||||
receiver: 'web.hook'
|
||||
|
||||
receivers:
|
||||
- name: 'web.hook'
|
||||
webhook_configs:
|
||||
- url: 'http://127.0.0.1:5001/'
|
||||
|
||||
inhibit_rules:
|
||||
- source_match:
|
||||
severity: 'critical'
|
||||
target_match:
|
||||
severity: 'warning'
|
||||
equal: ['alertname', 'dev', 'instance']
|
||||
329
central-server/deploy.sh
Normal file
329
central-server/deploy.sh
Normal file
@@ -0,0 +1,329 @@
|
||||
#!/bin/bash
|
||||
|
||||
# 分布式Prometheus中央服务器部署脚本
|
||||
# 适用于Linux系统
|
||||
|
||||
set -e
|
||||
|
||||
echo "=== 分布式Prometheus中央服务器部署脚本 ==="
|
||||
echo ""
|
||||
|
||||
# 加载环境变量配置
|
||||
if [ -f ".env" ]; then
|
||||
echo "📝 加载 .env 配置文件..."
|
||||
# 导出环境变量(支持注释和空行)
|
||||
set -a
|
||||
source .env
|
||||
set +a
|
||||
echo "✅ 环境变量加载完成"
|
||||
elif [ -f "env.example" ]; then
|
||||
echo "⚠️ 未找到 .env 文件,从 env.example 创建..."
|
||||
cp env.example .env
|
||||
echo "✅ 已创建 .env 文件,请根据需要修改配置"
|
||||
echo " 然后重新运行此脚本"
|
||||
exit 0
|
||||
else
|
||||
echo "⚠️ 未找到 .env 和 env.example 文件,使用默认配置"
|
||||
fi
|
||||
|
||||
# 设置默认值(如果环境变量未设置)
|
||||
PROMETHEUS_PORT=${PROMETHEUS_PORT:-9091}
|
||||
GRAFANA_PORT=${GRAFANA_PORT:-3000}
|
||||
ALERTMANAGER_PORT=${ALERTMANAGER_PORT:-9093}
|
||||
VICTORIAMETRICS_PORT=${VICTORIAMETRICS_PORT:-8428}
|
||||
PROMETHEUS_DATA_DIR=${PROMETHEUS_DATA_DIR:-./data/prometheus-data}
|
||||
GRAFANA_DATA_DIR=${GRAFANA_DATA_DIR:-./data/grafana-data}
|
||||
VICTORIAMETRICS_DATA_DIR=${VICTORIAMETRICS_DATA_DIR:-./data/victoria-metrics-data}
|
||||
GRAFANA_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin123}
|
||||
PROMETHEUS_SCRAPE_INTERVAL=${PROMETHEUS_SCRAPE_INTERVAL:-15}
|
||||
PROMETHEUS_EVALUATION_INTERVAL=${PROMETHEUS_EVALUATION_INTERVAL:-15}
|
||||
PROMETHEUS_CLUSTER_NAME=${PROMETHEUS_CLUSTER_NAME:-central-monitoring}
|
||||
PROMETHEUS_RETENTION_TIME=${PROMETHEUS_RETENTION_TIME:-30d}
|
||||
VICTORIAMETRICS_RETENTION_PERIOD=${VICTORIAMETRICS_RETENTION_PERIOD:-30d}
|
||||
PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES=${PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES:-10000}
|
||||
PROMETHEUS_REMOTE_WRITE_CAPACITY=${PROMETHEUS_REMOTE_WRITE_CAPACITY:-20000}
|
||||
PROMETHEUS_REMOTE_WRITE_MAX_SHARDS=${PROMETHEUS_REMOTE_WRITE_MAX_SHARDS:-10}
|
||||
GRAFANA_DEFAULT_LANGUAGE=${GRAFANA_DEFAULT_LANGUAGE:-zh-Hans}
|
||||
GRAFANA_DEFAULT_THEME=${GRAFANA_DEFAULT_THEME:-light}
|
||||
|
||||
# 将相对路径转换为绝对路径(Docker 需要绝对路径)
|
||||
# 获取脚本所在目录的绝对路径
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR" || exit 1
|
||||
|
||||
# 将相对路径转换为绝对路径
|
||||
if [[ "$PROMETHEUS_DATA_DIR" != /* ]]; then
|
||||
PROMETHEUS_DATA_DIR="$(cd "$(dirname "$PROMETHEUS_DATA_DIR")" && pwd)/$(basename "$PROMETHEUS_DATA_DIR")"
|
||||
fi
|
||||
if [[ "$GRAFANA_DATA_DIR" != /* ]]; then
|
||||
GRAFANA_DATA_DIR="$(cd "$(dirname "$GRAFANA_DATA_DIR")" && pwd)/$(basename "$GRAFANA_DATA_DIR")"
|
||||
fi
|
||||
if [[ "$VICTORIAMETRICS_DATA_DIR" != /* ]]; then
|
||||
VICTORIAMETRICS_DATA_DIR="$(cd "$(dirname "$VICTORIAMETRICS_DATA_DIR")" && pwd)/$(basename "$VICTORIAMETRICS_DATA_DIR")"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# 检查Docker是否安装
|
||||
if ! command -v docker &> /dev/null; then
|
||||
echo "❌ Docker未安装,请先安装Docker"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查Docker Compose (优先检查V2,然后检查V1)
|
||||
DOCKER_COMPOSE_CMD=""
|
||||
if docker compose version &> /dev/null; then
|
||||
DOCKER_COMPOSE_CMD="docker compose"
|
||||
echo "✅ 检测到 Docker Compose V2"
|
||||
elif command -v docker-compose &> /dev/null; then
|
||||
DOCKER_COMPOSE_CMD="docker-compose"
|
||||
echo "✅ 检测到 Docker Compose V1"
|
||||
else
|
||||
echo "❌ Docker Compose未安装,请先安装Docker Compose"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Docker环境检查通过"
|
||||
echo ""
|
||||
|
||||
# 检查磁盘空间(检查当前目录所在分区)
|
||||
echo "💾 检查磁盘空间..."
|
||||
DATA_PARTITION_AVAIL=$(df -BG "$SCRIPT_DIR" 2>/dev/null | awk 'NR==2 {print $4}' | sed 's/G//' || echo "0")
|
||||
ROOT_AVAIL=$(df -BG / | awk 'NR==2 {print $4}' | sed 's/G//' || echo "0")
|
||||
|
||||
if [ -z "$DATA_PARTITION_AVAIL" ]; then
|
||||
DATA_PARTITION_AVAIL=0
|
||||
fi
|
||||
if [ -z "$ROOT_AVAIL" ]; then
|
||||
ROOT_AVAIL=0
|
||||
fi
|
||||
|
||||
echo " 当前目录所在分区可用空间: ${DATA_PARTITION_AVAIL}GB"
|
||||
echo " 根分区可用空间: ${ROOT_AVAIL}GB"
|
||||
|
||||
# 检查当前目录所在分区空间(需要至少2GB用于数据存储)
|
||||
if [ "$DATA_PARTITION_AVAIL" -lt 2 ]; then
|
||||
echo ""
|
||||
echo "⚠️ 警告:当前目录所在分区空间不足!"
|
||||
echo " 分区可用空间: ${DATA_PARTITION_AVAIL}GB"
|
||||
echo " 数据存储路径: ${PROMETHEUS_DATA_DIR}"
|
||||
echo " 建议至少保留 2GB 空间(用于监控数据存储)"
|
||||
echo ""
|
||||
read -p "是否继续部署?(y/N): " -n 1 -r
|
||||
echo
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "部署已取消"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# 检查根分区空间(需要至少500MB用于containerd临时文件)
|
||||
if [ "$ROOT_AVAIL" -lt 1 ]; then
|
||||
echo ""
|
||||
echo "⚠️ 警告:根分区空间不足!"
|
||||
echo " 根分区可用空间: ${ROOT_AVAIL}GB"
|
||||
echo " 建议至少保留 1GB 空间(用于containerd临时文件和系统运行)"
|
||||
echo ""
|
||||
echo "💡 建议清理空间:"
|
||||
echo " 1. 清理Docker资源: docker system prune -a --volumes"
|
||||
echo " 2. 清理系统日志: journalctl --vacuum-time=3d"
|
||||
echo " 3. 清理包缓存: dnf clean all 或 apt-get clean"
|
||||
echo " 4. 检查大文件: du -h --max-depth=1 / | sort -hr | head -10"
|
||||
echo ""
|
||||
read -p "是否继续部署?(y/N): " -n 1 -r
|
||||
echo
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "部署已取消"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 从模板生成 prometheus.yml
|
||||
if [ -f "prometheus.yml.template" ]; then
|
||||
echo "📝 从模板生成 prometheus.yml..."
|
||||
# 检查是否有 envsubst 命令
|
||||
if command -v envsubst &> /dev/null; then
|
||||
envsubst < prometheus.yml.template > prometheus.yml
|
||||
echo "✅ prometheus.yml 已生成"
|
||||
else
|
||||
echo "⚠️ envsubst 命令未找到,尝试使用 sed 替换..."
|
||||
# 使用 sed 进行简单的变量替换
|
||||
sed -e "s/\${PROMETHEUS_SCRAPE_INTERVAL}/${PROMETHEUS_SCRAPE_INTERVAL}/g" \
|
||||
-e "s/\${PROMETHEUS_EVALUATION_INTERVAL}/${PROMETHEUS_EVALUATION_INTERVAL}/g" \
|
||||
-e "s/\${PROMETHEUS_CLUSTER_NAME}/${PROMETHEUS_CLUSTER_NAME}/g" \
|
||||
-e "s/\${VICTORIAMETRICS_PORT}/${VICTORIAMETRICS_PORT}/g" \
|
||||
-e "s/\${PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES}/${PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES}/g" \
|
||||
-e "s/\${PROMETHEUS_REMOTE_WRITE_CAPACITY}/${PROMETHEUS_REMOTE_WRITE_CAPACITY}/g" \
|
||||
-e "s/\${PROMETHEUS_REMOTE_WRITE_MAX_SHARDS}/${PROMETHEUS_REMOTE_WRITE_MAX_SHARDS}/g" \
|
||||
prometheus.yml.template > prometheus.yml
|
||||
echo "✅ prometheus.yml 已生成(使用 sed)"
|
||||
fi
|
||||
elif [ ! -f "prometheus.yml" ]; then
|
||||
echo "❌ 配置文件 prometheus.yml 不存在,且未找到模板文件"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查配置文件
|
||||
if [ ! -f "alert_rules.yml" ]; then
|
||||
echo "❌ 配置文件 alert_rules.yml 不存在"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -f "alertmanager/alertmanager.yml" ]; then
|
||||
echo "❌ 配置文件 alertmanager/alertmanager.yml 不存在"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ 配置文件检查通过"
|
||||
echo ""
|
||||
|
||||
# 导出环境变量供 docker-compose 使用(使用绝对路径)
|
||||
export PROMETHEUS_DATA_DIR
|
||||
export GRAFANA_DATA_DIR
|
||||
export VICTORIAMETRICS_DATA_DIR
|
||||
export PROMETHEUS_PORT
|
||||
export GRAFANA_PORT
|
||||
export ALERTMANAGER_PORT
|
||||
export VICTORIAMETRICS_PORT
|
||||
export GRAFANA_ADMIN_PASSWORD
|
||||
export GRAFANA_DEFAULT_LANGUAGE
|
||||
export GRAFANA_DEFAULT_THEME
|
||||
export GRAFANA_ROOT_URL
|
||||
export PROMETHEUS_RETENTION_TIME
|
||||
export VICTORIAMETRICS_RETENTION_PERIOD
|
||||
export TRAEFIK_ENABLED
|
||||
export TRAEFIK_NETWORK
|
||||
export TRAEFIK_ENTRYPOINT
|
||||
export GRAFANA_DOMAIN
|
||||
export PROMETHEUS_DOMAIN
|
||||
export ALERTMANAGER_DOMAIN
|
||||
|
||||
# 检查 Traefik 网络(docker-compose.yml 中总是会引用此网络,无论是否启用)
|
||||
echo "🔍 检查 Traefik 网络..."
|
||||
TRAEFIK_NET=${TRAEFIK_NETWORK:-traefik}
|
||||
if ! docker network inspect "$TRAEFIK_NET" &> /dev/null; then
|
||||
echo "⚠️ Traefik 网络 '$TRAEFIK_NET' 不存在,正在创建..."
|
||||
if docker network create "$TRAEFIK_NET" 2>/dev/null; then
|
||||
echo "✅ Traefik 网络 '$TRAEFIK_NET' 已创建"
|
||||
else
|
||||
echo "❌ 无法创建 Traefik 网络 '$TRAEFIK_NET'"
|
||||
echo " 请确保:"
|
||||
echo " 1. Traefik 已运行并创建了网络"
|
||||
echo " 2. 或手动创建网络: docker network create $TRAEFIK_NET"
|
||||
echo ""
|
||||
read -p "是否继续部署?(y/N): " -n 1 -r
|
||||
echo
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "部署已取消"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
else
|
||||
echo "✅ Traefik 网络 '$TRAEFIK_NET' 已存在"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 创建数据目录(使用环境变量中配置的路径)
|
||||
echo "📁 创建数据目录..."
|
||||
mkdir -p "${PROMETHEUS_DATA_DIR}"
|
||||
mkdir -p "${GRAFANA_DATA_DIR}"
|
||||
mkdir -p "${VICTORIAMETRICS_DATA_DIR}"
|
||||
mkdir -p grafana/dashboards
|
||||
mkdir -p grafana/provisioning/datasources
|
||||
mkdir -p grafana/provisioning/dashboards
|
||||
|
||||
# 设置目录权限
|
||||
# Prometheus 需要写权限
|
||||
chmod 777 "${PROMETHEUS_DATA_DIR}" 2>/dev/null || true
|
||||
# Grafana 需要 UID 472 的权限(Grafana 容器用户)
|
||||
chown -R 472:472 "${GRAFANA_DATA_DIR}" 2>/dev/null || chmod -R 777 "${GRAFANA_DATA_DIR}"
|
||||
# VictoriaMetrics 需要写权限
|
||||
chmod 777 "${VICTORIAMETRICS_DATA_DIR}" 2>/dev/null || true
|
||||
|
||||
echo "✅ 数据目录创建完成"
|
||||
echo " - Prometheus: ${PROMETHEUS_DATA_DIR}"
|
||||
echo " - Grafana: ${GRAFANA_DATA_DIR}"
|
||||
echo " - VictoriaMetrics: ${VICTORIAMETRICS_DATA_DIR}"
|
||||
echo ""
|
||||
|
||||
# 停止现有服务
|
||||
echo "🛑 停止现有服务..."
|
||||
$DOCKER_COMPOSE_CMD down 2>/dev/null || true
|
||||
|
||||
# 拉取最新镜像
|
||||
echo "📥 拉取Docker镜像..."
|
||||
if ! $DOCKER_COMPOSE_CMD pull; then
|
||||
echo ""
|
||||
echo "⚠️ 镜像拉取失败,可能的原因:"
|
||||
echo " 1. 网络连接问题"
|
||||
echo " 2. Docker Hub 速率限制"
|
||||
echo " 3. 需要配置镜像加速器"
|
||||
echo ""
|
||||
echo "💡 建议:"
|
||||
echo " 1. 检查网络连接"
|
||||
echo " 2. 配置 Docker 镜像加速器(如阿里云、腾讯云等)"
|
||||
echo " 3. 或稍后重试"
|
||||
echo ""
|
||||
echo "🔄 尝试继续启动(如果本地已有镜像)..."
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# 启动服务
|
||||
echo "🚀 启动服务..."
|
||||
$DOCKER_COMPOSE_CMD up -d
|
||||
|
||||
# 等待服务启动
|
||||
echo "⏳ 等待服务启动..."
|
||||
sleep 15
|
||||
|
||||
# 检查服务状态
|
||||
echo ""
|
||||
echo "📊 服务状态检查:"
|
||||
$DOCKER_COMPOSE_CMD ps
|
||||
|
||||
echo ""
|
||||
echo "📋 服务日志:"
|
||||
$DOCKER_COMPOSE_CMD logs --tail=20
|
||||
|
||||
echo ""
|
||||
echo "✅ 部署完成!"
|
||||
echo ""
|
||||
|
||||
# 检查是否启用 Traefik
|
||||
if [ "${TRAEFIK_ENABLED:-false}" = "true" ]; then
|
||||
echo "🔗 访问地址(通过 Traefik 反向代理):"
|
||||
echo " - Grafana仪表板: http://${GRAFANA_DOMAIN:-grafana.example.com} (admin/${GRAFANA_ADMIN_PASSWORD}) [${GRAFANA_DEFAULT_LANGUAGE}界面]"
|
||||
echo " - Prometheus: http://${PROMETHEUS_DOMAIN:-prometheus.example.com} [英文界面]"
|
||||
echo " - Alertmanager: http://${ALERTMANAGER_DOMAIN:-alertmanager.example.com} [英文界面]"
|
||||
echo " - VictoriaMetrics: http://localhost:${VICTORIAMETRICS_PORT} [英文界面,边缘节点直接连接]"
|
||||
echo ""
|
||||
echo "⚠️ 请确保:"
|
||||
echo " 1. DNS 已正确解析域名到 Traefik 服务器"
|
||||
echo " 2. Traefik 网络 (${TRAEFIK_NETWORK:-traefik_default}) 已创建"
|
||||
echo " 3. 边缘节点可以访问此服务器的${VICTORIAMETRICS_PORT}端口(VictoriaMetrics 不通过 Traefik)"
|
||||
else
|
||||
echo "🔗 访问地址:"
|
||||
echo " - Grafana仪表板: http://localhost:${GRAFANA_PORT} (admin/${GRAFANA_ADMIN_PASSWORD}) [${GRAFANA_DEFAULT_LANGUAGE}界面]"
|
||||
echo " - Prometheus: http://localhost:${PROMETHEUS_PORT} [英文界面]"
|
||||
echo " - VictoriaMetrics: http://localhost:${VICTORIAMETRICS_PORT} [英文界面]"
|
||||
echo " - Alertmanager: http://localhost:${ALERTMANAGER_PORT} [英文界面]"
|
||||
echo ""
|
||||
echo "⚠️ 请确保:"
|
||||
echo " 1. 防火墙已开放相应端口 (${GRAFANA_PORT}, ${PROMETHEUS_PORT}, ${VICTORIAMETRICS_PORT}, ${ALERTMANAGER_PORT})"
|
||||
echo " 2. 边缘节点可以访问此服务器的${VICTORIAMETRICS_PORT}端口"
|
||||
fi
|
||||
|
||||
echo " 3. 已配置好告警通知渠道"
|
||||
echo ""
|
||||
echo "📝 管理命令:"
|
||||
echo " - 查看日志: $DOCKER_COMPOSE_CMD logs -f"
|
||||
echo " - 重启服务: $DOCKER_COMPOSE_CMD restart"
|
||||
echo " - 停止服务: $DOCKER_COMPOSE_CMD down"
|
||||
echo ""
|
||||
echo "🔧 下一步:"
|
||||
echo " 1. 登录Grafana配置数据源"
|
||||
echo " 2. 导入ONVIF监控仪表板"
|
||||
echo " 3. 配置Alertmanager告警通知"
|
||||
echo ""
|
||||
echo "💡 提示:修改配置后,编辑 .env 文件并重新运行此脚本"
|
||||
90
central-server/docker-compose.yml
Normal file
90
central-server/docker-compose.yml
Normal file
@@ -0,0 +1,90 @@
|
||||
services:
|
||||
# 中央Prometheus服务器
|
||||
prometheus-central:
|
||||
image: prom/prometheus:latest
|
||||
container_name: prometheus-central
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "${PROMETHEUS_PORT:-9091}:9090"
|
||||
volumes:
|
||||
- ${PROMETHEUS_DATA_DIR:-./data/prometheus-data}:/prometheus
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- ./alert_rules.yml:/etc/prometheus/alert_rules.yml:ro
|
||||
command:
|
||||
- '--config.file=/etc/prometheus/prometheus.yml'
|
||||
- "--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_TIME:-30d}"
|
||||
- '--web.console.libraries=/etc/prometheus/console_libraries'
|
||||
- '--web.console.templates=/etc/prometheus/consoles'
|
||||
- '--web.enable-lifecycle'
|
||||
labels:
|
||||
- "traefik.enable=${TRAEFIK_ENABLED:-true}"
|
||||
- "traefik.http.routers.prometheus.rule=Host(`${PROMETHEUS_DOMAIN:-prometheus.example.com}`)"
|
||||
- "traefik.http.routers.prometheus.entrypoints=${TRAEFIK_ENTRYPOINT:-web}"
|
||||
- "traefik.http.routers.prometheus.service=prometheus"
|
||||
- "traefik.http.services.prometheus.loadbalancer.server.port=9090"
|
||||
- "traefik.docker.network=${TRAEFIK_NETWORK:-traefik}"
|
||||
|
||||
# Grafana仪表板
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
container_name: grafana
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "${GRAFANA_PORT:-3000}:3000"
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin123}
|
||||
- GF_METRICS_ENABLED=true
|
||||
- GF_METRICS_BASIC_AUTH_ENABLED=false
|
||||
- GF_DEFAULT_LANGUAGE=${GRAFANA_DEFAULT_LANGUAGE:-zh-Hans}
|
||||
- GF_USERS_DEFAULT_THEME=${GRAFANA_DEFAULT_THEME:-light}
|
||||
# 配置 Grafana 的根 URL(用于 Traefik 反向代理)
|
||||
- GF_SERVER_ROOT_URL=${GRAFANA_ROOT_URL:-http://localhost:3000}
|
||||
volumes:
|
||||
- ${GRAFANA_DATA_DIR:-./data/grafana-data}:/var/lib/grafana
|
||||
- ./grafana/provisioning:/etc/grafana/provisioning
|
||||
- ./grafana/dashboards:/var/lib/grafana/dashboards
|
||||
labels:
|
||||
- "traefik.enable=${TRAEFIK_ENABLED:-true}"
|
||||
- "traefik.http.routers.grafana.rule=Host(`${GRAFANA_DOMAIN:-grafana.example.com}`)"
|
||||
- "traefik.http.routers.grafana.entrypoints=${TRAEFIK_ENTRYPOINT:-web}"
|
||||
- "traefik.http.routers.grafana.service=grafana"
|
||||
- "traefik.http.services.grafana.loadbalancer.server.port=3000"
|
||||
- "traefik.docker.network=${TRAEFIK_NETWORK:-traefik}"
|
||||
|
||||
# Alertmanager告警管理
|
||||
alertmanager:
|
||||
image: prom/alertmanager:latest
|
||||
container_name: alertmanager
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "${ALERTMANAGER_PORT:-9093}:9093"
|
||||
volumes:
|
||||
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
|
||||
labels:
|
||||
- "traefik.enable=${TRAEFIK_ENABLED:-true}"
|
||||
- "traefik.http.routers.alertmanager.rule=Host(`${ALERTMANAGER_DOMAIN:-alertmanager.example.com}`)"
|
||||
- "traefik.http.routers.alertmanager.entrypoints=${TRAEFIK_ENTRYPOINT:-web}"
|
||||
- "traefik.http.routers.alertmanager.service=alertmanager"
|
||||
- "traefik.http.services.alertmanager.loadbalancer.server.port=9093"
|
||||
- "traefik.docker.network=${TRAEFIK_NETWORK:-traefik}"
|
||||
|
||||
# 远程写入接收器 (VictoriaMetrics)
|
||||
victoria-metrics:
|
||||
image: victoriametrics/victoria-metrics:latest
|
||||
container_name: victoria-metrics
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "${VICTORIAMETRICS_PORT:-8428}:8428"
|
||||
volumes:
|
||||
- ${VICTORIAMETRICS_DATA_DIR:-./data/victoria-metrics-data}:/victoria-metrics-data
|
||||
command:
|
||||
- '--storageDataPath=/victoria-metrics-data'
|
||||
- "--retentionPeriod=${VICTORIAMETRICS_RETENTION_PERIOD:-30d}"
|
||||
- "--httpListenAddr=:${VICTORIAMETRICS_PORT:-8428}"
|
||||
|
||||
|
||||
# 定义网络配置(默认使用已存在的 traefik 网络)
|
||||
networks:
|
||||
default:
|
||||
name: ${NETWORK_NAME:-traefik}
|
||||
external: ${EXTERNAL_NETWORK:-true}
|
||||
128
central-server/env.example
Normal file
128
central-server/env.example
Normal file
@@ -0,0 +1,128 @@
|
||||
# 中央服务器环境变量配置
|
||||
# 复制此文件为 .env 并根据实际情况修改
|
||||
|
||||
# ============================================
|
||||
# 端口配置
|
||||
# ============================================
|
||||
|
||||
# Prometheus 端口(避免与 cockpit 冲突)
|
||||
PROMETHEUS_PORT=9091
|
||||
|
||||
# Grafana 端口
|
||||
GRAFANA_PORT=3000
|
||||
|
||||
# Alertmanager 端口
|
||||
ALERTMANAGER_PORT=9093
|
||||
|
||||
# VictoriaMetrics 端口(边缘节点推送数据到此端口)
|
||||
VICTORIAMETRICS_PORT=8428
|
||||
|
||||
# ============================================
|
||||
# Grafana 配置
|
||||
# ============================================
|
||||
|
||||
# Grafana 管理员密码
|
||||
GRAFANA_ADMIN_PASSWORD=admin123
|
||||
|
||||
# Grafana 默认语言
|
||||
GRAFANA_DEFAULT_LANGUAGE=zh-Hans
|
||||
|
||||
# Grafana 默认主题
|
||||
GRAFANA_DEFAULT_THEME=light
|
||||
|
||||
# Grafana 根 URL(用于 Traefik 反向代理,如果使用 Traefik,设置为 https://grafana.example.com)
|
||||
GRAFANA_ROOT_URL=http://localhost:3000
|
||||
|
||||
# ============================================
|
||||
# Prometheus 配置
|
||||
# ============================================
|
||||
|
||||
# Prometheus 数据保留时间
|
||||
PROMETHEUS_RETENTION_TIME=30d
|
||||
|
||||
# Prometheus 抓取间隔(秒)
|
||||
PROMETHEUS_SCRAPE_INTERVAL=15
|
||||
|
||||
# Prometheus 告警评估间隔(秒)
|
||||
PROMETHEUS_EVALUATION_INTERVAL=15
|
||||
|
||||
# Prometheus 集群标识
|
||||
PROMETHEUS_CLUSTER_NAME=central-monitoring
|
||||
|
||||
# 远程写入队列配置
|
||||
PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES=10000
|
||||
PROMETHEUS_REMOTE_WRITE_CAPACITY=20000
|
||||
PROMETHEUS_REMOTE_WRITE_MAX_SHARDS=10
|
||||
|
||||
# ============================================
|
||||
# VictoriaMetrics 配置
|
||||
# ============================================
|
||||
|
||||
# VictoriaMetrics 数据保留时间
|
||||
VICTORIAMETRICS_RETENTION_PERIOD=30d
|
||||
|
||||
# ============================================
|
||||
# 数据存储路径
|
||||
# ============================================
|
||||
|
||||
# 数据存储根目录(所有数据存储在此目录下,相对于 central-server 目录)
|
||||
# 使用相对路径,数据将存储在 central-server 目录下的 data 子目录中
|
||||
DATA_STORAGE_ROOT=./data
|
||||
|
||||
# Prometheus 数据目录
|
||||
PROMETHEUS_DATA_DIR=./data/prometheus-data
|
||||
|
||||
# Grafana 数据目录
|
||||
GRAFANA_DATA_DIR=./data/grafana-data
|
||||
|
||||
# VictoriaMetrics 数据目录
|
||||
VICTORIAMETRICS_DATA_DIR=./data/victoria-metrics-data
|
||||
|
||||
# ============================================
|
||||
# Traefik 反向代理配置
|
||||
# ============================================
|
||||
|
||||
# 是否启用 Traefik 反向代理(true/false)
|
||||
# 如果启用,服务将通过 Traefik 访问,不再直接暴露端口
|
||||
TRAEFIK_ENABLED=true
|
||||
|
||||
# Traefik 网络名称(通常为 traefik)
|
||||
TRAEFIK_NETWORK=traefik
|
||||
|
||||
# 域名配置(需要配置 DNS 解析到 Traefik 服务器)
|
||||
# Grafana 域名
|
||||
GRAFANA_DOMAIN=grafana.example.com
|
||||
|
||||
# Prometheus 域名
|
||||
PROMETHEUS_DOMAIN=prometheus.example.com
|
||||
|
||||
# Alertmanager 域名
|
||||
ALERTMANAGER_DOMAIN=alertmanager.example.com
|
||||
|
||||
# VictoriaMetrics 域名(通常不需要通过 Traefik 访问,边缘节点直接连接)
|
||||
VICTORIAMETRICS_DOMAIN=vm.example.com
|
||||
|
||||
# Traefik EntryPoint(通常为 web 或 websecure)
|
||||
TRAEFIK_ENTRYPOINT=web
|
||||
|
||||
# 是否启用 HTTPS(需要配置 Traefik 的 TLS)
|
||||
TRAEFIK_HTTPS_ENABLED=false
|
||||
|
||||
# ============================================
|
||||
# Docker 网络配置
|
||||
# ============================================
|
||||
|
||||
# Compose 项目名称(用于区分同一目录下的多套部署,避免与旧状态冲突)
|
||||
COMPOSE_PROJECT_NAME=central
|
||||
|
||||
# 使用已存在的 traefik 网络(由 Traefik 创建,不在此项目中创建)
|
||||
# 保持以下两项即可接入现有 Traefik
|
||||
NETWORK_NAME=traefik
|
||||
EXTERNAL_NETWORK=true
|
||||
|
||||
# 仅在不使用 Traefik、仅本地直连时改为:
|
||||
# NETWORK_NAME=central-server_default
|
||||
# EXTERNAL_NETWORK=false
|
||||
|
||||
# 是否启用 IPv6(true/false)
|
||||
ENABLE_IPV6=false
|
||||
76
central-server/grafana/dashboards/onvif-monitoring.json
Normal file
76
central-server/grafana/dashboards/onvif-monitoring.json
Normal file
@@ -0,0 +1,76 @@
|
||||
{
|
||||
"dashboard": {
|
||||
"id": null,
|
||||
"title": "ONVIF设备监控",
|
||||
"tags": ["onvif", "camera", "monitoring"],
|
||||
"style": "dark",
|
||||
"timezone": "browser",
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"title": "ONVIF设备状态",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "up{job=\"onvif-devices\"}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "red",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"title": "设备在线率",
|
||||
"type": "gauge",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(up{job=\"onvif-devices\"}) / count(up{job=\"onvif-devices\"}) * 100",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"min": 0,
|
||||
"max": 100,
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
}
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"refresh": "30s"
|
||||
}
|
||||
}
|
||||
12
central-server/grafana/provisioning/dashboards/dashboard.yml
Normal file
12
central-server/grafana/provisioning/dashboards/dashboard.yml
Normal file
@@ -0,0 +1,12 @@
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 10
|
||||
allowUiUpdates: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards
|
||||
@@ -0,0 +1,20 @@
|
||||
# 管理员全局数据源配置
|
||||
# 此数据源允许管理员查看所有数据(不受标签过滤限制)
|
||||
# 放置在 provisioning/datasources/ 目录下会自动加载
|
||||
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus-All-Data
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus-central:9090
|
||||
isDefault: false
|
||||
editable: false
|
||||
jsonData:
|
||||
httpMethod: POST
|
||||
queryTimeout: 60s
|
||||
timeInterval: 15s
|
||||
# 此数据源对所有组织可见(通过权限控制)
|
||||
# 管理员可以使用无标签过滤的查询查看所有数据
|
||||
# 例如: up 而不是 up{user_group="xxx"}
|
||||
@@ -0,0 +1,9 @@
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus-central:9090
|
||||
isDefault: true
|
||||
editable: true
|
||||
@@ -0,0 +1,16 @@
|
||||
# VictoriaMetrics 数据源(边缘节点上报的数据存储在此)
|
||||
# 边缘节点通过 remote_write 推送到中央 VictoriaMetrics,本数据源用于在 Grafana 中查询这些数据
|
||||
# 使用前需在边缘节点配置:remote_write 指向中央服务器 VictoriaMetrics 地址(如 http://中央IP:8428/api/v1/write)
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: VictoriaMetrics
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://victoria-metrics:8428
|
||||
isDefault: false
|
||||
editable: true
|
||||
jsonData:
|
||||
httpMethod: POST
|
||||
queryTimeout: 60s
|
||||
timeInterval: 15s
|
||||
164
central-server/grafana/setup-users.sh
Normal file
164
central-server/grafana/setup-users.sh
Normal file
@@ -0,0 +1,164 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Grafana 多用户和组织配置脚本
|
||||
# 使用方法: ./setup-users.sh
|
||||
|
||||
set -e
|
||||
|
||||
GRAFANA_URL="${GRAFANA_URL:-http://localhost:3000}"
|
||||
GRAFANA_ADMIN_USER="${GRAFANA_ADMIN_USER:-admin}"
|
||||
GRAFANA_ADMIN_PASSWORD="${GRAFANA_ADMIN_PASSWORD:-admin123}"
|
||||
|
||||
echo "=== Grafana 多用户配置脚本 ==="
|
||||
echo ""
|
||||
|
||||
# 检查 jq 是否安装
|
||||
if ! command -v jq &> /dev/null; then
|
||||
echo "❌ jq 未安装,请先安装 jq:"
|
||||
echo " Ubuntu/Debian: sudo apt-get install jq"
|
||||
echo " CentOS/RHEL: sudo yum install jq"
|
||||
echo " Fedora: sudo dnf install jq"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 获取认证 Token
|
||||
echo "🔐 获取 Grafana API Token..."
|
||||
AUTH_RESPONSE=$(curl -s -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"user\":\"$GRAFANA_ADMIN_USER\",\"password\":\"$GRAFANA_ADMIN_PASSWORD\"}" \
|
||||
"$GRAFANA_URL/login")
|
||||
|
||||
# 检查 Grafana 是否可访问
|
||||
if ! curl -s "$GRAFANA_URL/api/health" > /dev/null; then
|
||||
echo "❌ 无法连接到 Grafana: $GRAFANA_URL"
|
||||
echo " 请确保 Grafana 服务正在运行"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Grafana 连接成功"
|
||||
echo ""
|
||||
|
||||
# 创建组织的函数
|
||||
create_organization() {
|
||||
local org_name=$1
|
||||
local org_id=$2
|
||||
|
||||
echo "📁 创建组织: $org_name"
|
||||
|
||||
# 检查组织是否已存在
|
||||
ORG_EXISTS=$(curl -s -u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
"$GRAFANA_URL/api/orgs/name/$org_name" | jq -r '.id // empty')
|
||||
|
||||
if [ -n "$ORG_EXISTS" ]; then
|
||||
echo " ⚠️ 组织 $org_name 已存在 (ID: $ORG_EXISTS)"
|
||||
return
|
||||
fi
|
||||
|
||||
# 创建组织
|
||||
ORG_RESPONSE=$(curl -s -X POST \
|
||||
-u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"name\":\"$org_name\"}" \
|
||||
"$GRAFANA_URL/api/orgs")
|
||||
|
||||
NEW_ORG_ID=$(echo "$ORG_RESPONSE" | jq -r '.orgId // empty')
|
||||
|
||||
if [ -n "$NEW_ORG_ID" ]; then
|
||||
echo " ✅ 组织创建成功 (ID: $NEW_ORG_ID)"
|
||||
else
|
||||
echo " ❌ 组织创建失败: $ORG_RESPONSE"
|
||||
fi
|
||||
}
|
||||
|
||||
# 创建用户的函数
|
||||
create_user() {
|
||||
local org_name=$1
|
||||
local username=$2
|
||||
local password=$3
|
||||
local email=$4
|
||||
local role=${5:-Viewer}
|
||||
|
||||
echo "👤 创建用户: $username (组织: $org_name)"
|
||||
|
||||
# 切换到指定组织
|
||||
ORG_ID=$(curl -s -u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
"$GRAFANA_URL/api/orgs/name/$org_name" | jq -r '.id // empty')
|
||||
|
||||
if [ -z "$ORG_ID" ]; then
|
||||
echo " ❌ 组织 $org_name 不存在"
|
||||
return
|
||||
fi
|
||||
|
||||
# 切换到组织
|
||||
curl -s -X POST \
|
||||
-u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
"$GRAFANA_URL/api/user/using/$ORG_ID" > /dev/null
|
||||
|
||||
# 检查用户是否已存在
|
||||
USER_EXISTS=$(curl -s -u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
"$GRAFANA_URL/api/users/lookup?loginOrEmail=$email" | jq -r '.id // empty')
|
||||
|
||||
if [ -n "$USER_EXISTS" ]; then
|
||||
echo " ⚠️ 用户 $username 已存在"
|
||||
# 将用户添加到组织
|
||||
curl -s -X POST \
|
||||
-u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"loginOrEmail\":\"$email\",\"role\":\"$role\"}" \
|
||||
"$GRAFANA_URL/api/orgs/$ORG_ID/users" > /dev/null
|
||||
echo " ✅ 用户已添加到组织"
|
||||
return
|
||||
fi
|
||||
|
||||
# 创建用户
|
||||
USER_RESPONSE=$(curl -s -X POST \
|
||||
-u "$GRAFANA_ADMIN_USER:$GRAFANA_ADMIN_PASSWORD" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"name\":\"$username\",
|
||||
\"email\":\"$email\",
|
||||
\"login\":\"$username\",
|
||||
\"password\":\"$password\",
|
||||
\"OrgId\":$ORG_ID
|
||||
}" \
|
||||
"$GRAFANA_URL/api/admin/users")
|
||||
|
||||
USER_ID=$(echo "$USER_RESPONSE" | jq -r '.id // empty')
|
||||
|
||||
if [ -n "$USER_ID" ]; then
|
||||
echo " ✅ 用户创建成功 (ID: $USER_ID)"
|
||||
else
|
||||
echo " ❌ 用户创建失败: $USER_RESPONSE"
|
||||
fi
|
||||
}
|
||||
|
||||
# 示例:创建组织和用户
|
||||
echo "📝 开始创建组织和用户..."
|
||||
echo ""
|
||||
|
||||
# 创建示例组织
|
||||
create_organization "用户组A" 2
|
||||
create_organization "用户组B" 3
|
||||
|
||||
# 创建示例用户
|
||||
create_user "用户组A" "usera1" "password123" "usera1@example.com" "Viewer"
|
||||
create_user "用户组A" "usera2" "password123" "usera2@example.com" "Editor"
|
||||
create_user "用户组B" "userb1" "password123" "userb1@example.com" "Viewer"
|
||||
create_user "用户组B" "userb2" "password123" "userb2@example.com" "Editor"
|
||||
|
||||
echo ""
|
||||
echo "✅ 用户配置完成!"
|
||||
echo ""
|
||||
echo "📋 创建的用户:"
|
||||
echo " 用户组A:"
|
||||
echo " - usera1 (Viewer) - usera1@example.com / password123"
|
||||
echo " - usera2 (Editor) - usera2@example.com / password123"
|
||||
echo " 用户组B:"
|
||||
echo " - userb1 (Viewer) - userb1@example.com / password123"
|
||||
echo " - userb2 (Editor) - userb2@example.com / password123"
|
||||
echo ""
|
||||
echo "💡 下一步:"
|
||||
echo " 1. 登录 Grafana 为每个组织配置数据源"
|
||||
echo " 2. 创建组织专用的仪表板"
|
||||
echo " 3. 配置数据源标签过滤(通过 Prometheus 标签)"
|
||||
echo ""
|
||||
52
central-server/prometheus.yml
Normal file
52
central-server/prometheus.yml
Normal file
@@ -0,0 +1,52 @@
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
external_labels:
|
||||
cluster: 'central-monitoring'
|
||||
|
||||
# 远程写入配置 - 接收来自边缘节点的数据
|
||||
remote_write:
|
||||
- url: http://victoria-metrics:8428/api/v1/write
|
||||
queue_config:
|
||||
max_samples_per_send: 10000
|
||||
capacity: 20000
|
||||
max_shards: 10
|
||||
|
||||
# 抓取配置 - 主要抓取本地服务
|
||||
scrape_configs:
|
||||
# 抓取中央Prometheus自身
|
||||
- job_name: 'prometheus-central'
|
||||
scrape_interval: 15s
|
||||
static_configs:
|
||||
- targets: ['prometheus-central:9090']
|
||||
|
||||
# 抓取VictoriaMetrics (VictoriaMetrics 提供 /metrics 端点)
|
||||
- job_name: 'victoria-metrics'
|
||||
scrape_interval: 15s
|
||||
metrics_path: '/metrics'
|
||||
static_configs:
|
||||
- targets: ['victoria-metrics:8428']
|
||||
|
||||
# 抓取Alertmanager
|
||||
- job_name: 'alertmanager'
|
||||
scrape_interval: 15s
|
||||
static_configs:
|
||||
- targets: ['alertmanager:9093']
|
||||
|
||||
# 抓取Grafana (需要启用 metrics 功能)
|
||||
- job_name: 'grafana'
|
||||
scrape_interval: 15s
|
||||
metrics_path: '/metrics'
|
||||
static_configs:
|
||||
- targets: ['grafana:3000']
|
||||
|
||||
# 告警规则配置
|
||||
rule_files:
|
||||
- "alert_rules.yml"
|
||||
|
||||
# Alertmanager配置
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
52
central-server/prometheus.yml.template
Normal file
52
central-server/prometheus.yml.template
Normal file
@@ -0,0 +1,52 @@
|
||||
global:
|
||||
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
|
||||
evaluation_interval: ${PROMETHEUS_EVALUATION_INTERVAL}s
|
||||
external_labels:
|
||||
cluster: '${PROMETHEUS_CLUSTER_NAME}'
|
||||
|
||||
# 远程写入配置 - 接收来自边缘节点的数据
|
||||
remote_write:
|
||||
- url: http://victoria-metrics:${VICTORIAMETRICS_PORT}/api/v1/write
|
||||
queue_config:
|
||||
max_samples_per_send: ${PROMETHEUS_REMOTE_WRITE_MAX_SAMPLES}
|
||||
capacity: ${PROMETHEUS_REMOTE_WRITE_CAPACITY}
|
||||
max_shards: ${PROMETHEUS_REMOTE_WRITE_MAX_SHARDS}
|
||||
|
||||
# 抓取配置 - 主要抓取本地服务
|
||||
scrape_configs:
|
||||
# 抓取中央Prometheus自身
|
||||
- job_name: 'prometheus-central'
|
||||
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
|
||||
static_configs:
|
||||
- targets: ['prometheus-central:9090']
|
||||
|
||||
# 抓取VictoriaMetrics (VictoriaMetrics 提供 /metrics 端点)
|
||||
- job_name: 'victoria-metrics'
|
||||
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
|
||||
metrics_path: '/metrics'
|
||||
static_configs:
|
||||
- targets: ['victoria-metrics:${VICTORIAMETRICS_PORT}']
|
||||
|
||||
# 抓取Alertmanager
|
||||
- job_name: 'alertmanager'
|
||||
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
|
||||
static_configs:
|
||||
- targets: ['alertmanager:9093']
|
||||
|
||||
# 抓取Grafana (需要启用 metrics 功能)
|
||||
- job_name: 'grafana'
|
||||
scrape_interval: ${PROMETHEUS_SCRAPE_INTERVAL}s
|
||||
metrics_path: '/metrics'
|
||||
static_configs:
|
||||
- targets: ['grafana:3000']
|
||||
|
||||
# 告警规则配置
|
||||
rule_files:
|
||||
- "alert_rules.yml"
|
||||
|
||||
# Alertmanager配置
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
Reference in New Issue
Block a user