Files
Deploy-Laboratory/docs/05-05-prometheus与grafana.md
2026-03-29 09:08:01 +08:00

57 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 05-05-Prometheus 与 Grafana
> 使用 `kube-prometheus-stack` 建立基础可观测能力。
## 契约与真源
- **Helm values 示例**`ansible/files/05-05/kube-prometheus-stack-values.example.yaml`(见同目录 `README.md`)。
- **手动**:下文 `helm` 命令;可将 `-f ansible/files/05-05/kube-prometheus-stack-values.example.yaml` 传给 `helm upgrade --install`
- **自动**`./ansible/bin/verify.sh run 05-05`(与上述路径同一真源,便于对照;当前为 noop + 集群基线)。
## TL;DR
- **自动化验收**`./ansible/bin/verify.sh run 05-05`
- **关键前置**:按本文「前置条件」准备环境变量/Secret/入口 IP
- **成功判据**:达到本文「预期」且 playbook 断言通过
- **排障**:见本文「排障」
## 前置条件
- 集群已正常运行
- Helm 已安装
## 操作步骤
1. 添加仓库并更新索引
2. 创建 `monitoring` 命名空间
3. 安装 `kube-prometheus-stack`
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm upgrade --install monitoring prometheus-community/kube-prometheus-stack -n monitoring \
-f ansible/files/05-05/kube-prometheus-stack-values.example.yaml
```
## 验证命令
```bash
kubectl -n monitoring get pods
kubectl -n monitoring get svc
```
## 预期
- 核心组件 Pod 处于 Running
## 下一步
- `06-02-运维小结.md`
## 排障
- **先看 playbook 输出**:失败时先定位是 deploy/wait/http_check 哪一步。
- **集群侧总览**`kubectl get nodes -o wide``kubectl -n kube-system get pods -o wide`
- **事件与日志**`kubectl -n <ns> describe ...``kubectl -n <ns> logs ... --tail=200`