51 lines
1.2 KiB
Markdown
51 lines
1.2 KiB
Markdown
# 05-05-Prometheus 与 Grafana
|
||
|
||
> 使用 `kube-prometheus-stack` 建立基础可观测能力。
|
||
|
||
|
||
## TL;DR
|
||
|
||
- **自动化验收**:`./scripts/verify.sh run 05-05`
|
||
- **关键前置**:按本文「前置条件」准备环境变量/Secret/入口 IP
|
||
- **成功判据**:达到本文「预期」且 playbook 断言通过
|
||
- **排障**:见本文「排障」
|
||
|
||
## 前置条件
|
||
|
||
- 集群已正常运行
|
||
- Helm 已安装
|
||
|
||
## 操作步骤
|
||
|
||
1. 添加仓库并更新索引
|
||
2. 创建 `monitoring` 命名空间
|
||
3. 安装 `kube-prometheus-stack`
|
||
|
||
```bash
|
||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||
helm repo update
|
||
kubectl create namespace monitoring
|
||
helm upgrade --install monitoring prometheus-community/kube-prometheus-stack -n monitoring
|
||
```
|
||
|
||
## 验证命令
|
||
|
||
```bash
|
||
kubectl -n monitoring get pods
|
||
kubectl -n monitoring get svc
|
||
```
|
||
|
||
## 预期
|
||
|
||
- 核心组件 Pod 处于 Running
|
||
|
||
## 下一步
|
||
|
||
- `06-02-运维小结.md`
|
||
|
||
## 排障
|
||
|
||
- **先看 playbook 输出**:失败时先定位是 deploy/wait/http_check 哪一步。
|
||
- **集群侧总览**:`kubectl get nodes -o wide`、`kubectl -n kube-system get pods -o wide`。
|
||
- **事件与日志**:`kubectl -n <ns> describe ...`、`kubectl -n <ns> logs ... --tail=200`。
|