基本框架
This commit is contained in:
63
docs/06-02-运维小结.md
Normal file
63
docs/06-02-运维小结.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# 06-02-运维小结
|
||||
|
||||
> 日常运维建议:检查项、变更记录、备份策略。
|
||||
|
||||
## 日常检查
|
||||
|
||||
- `kubectl get nodes` 是否全部 `Ready`
|
||||
- 关键服务是否可访问
|
||||
- 证书有效期与域名解析是否正常
|
||||
|
||||
## 常用排查命令速查
|
||||
|
||||
- **集群与工作负载**
|
||||
- `kubectl get pod,svc,ing -A -o wide`
|
||||
- `kubectl -n kube-system logs deploy/traefik --tail=100`
|
||||
- `kubectl -n kube-system get helmchart,helmchartconfig`
|
||||
- `kubectl -n kube-system describe pod <pod-name>`
|
||||
|
||||
- **节点与网络**
|
||||
- `kubectl get node -o wide`
|
||||
- `watch -n 1 'ip addr; ip route'`
|
||||
- `ss -tulpn | grep ':80\|:443\|:6443'`
|
||||
- `sudo netstat -tulpn | grep ':80\|:443\|:6443'`
|
||||
- `sudo lsof -iTCP -sTCP:LISTEN -P -n | grep -E ':80|:443|:6443'`
|
||||
- `curl -vk https://<域名>/ --resolve "<域名>:443:<入口IP>" -o /dev/null`
|
||||
|
||||
- **Traefik / ACME 相关**
|
||||
- `kubectl -n kube-system logs deploy/traefik --tail=200 | grep -i acme || true`
|
||||
- `kubectl -n kube-system get ingress -A`
|
||||
- `openssl s_client -connect <IP>:443 -servername <域名> </dev/null 2>/dev/null | openssl x509 -noout -text | grep -E "Subject:|DNS:"`
|
||||
|
||||
- **SSH 与 Ansible**
|
||||
- `bash scripts/ssh/test-ssh.sh`
|
||||
- `ssh -i ~/.ssh/id_ed25519_k3s_*.61 root@192.168.2.61`
|
||||
- `ansible-playbook -i ansible/inventory.ini ansible/playbooks/nginx-matrix-tls-deploy.yml`
|
||||
|
||||
## 建议的日常清理
|
||||
|
||||
- **清理一次性 Job/安装 Pod**
|
||||
- `sudo kubectl -n kube-system get pod | grep 'helm-install-traefik'`
|
||||
- `sudo kubectl -n kube-system delete pod helm-install-traefik-* --ignore-not-found=true`
|
||||
- 原则:只删 `Completed`、`ErrImagePull` 等一次性安装 Pod,不删长期组件(如 `traefik`、`coredns` 等)。
|
||||
|
||||
## 变更管理
|
||||
|
||||
- 所有配置变更优先走 Git
|
||||
- 变更后执行最小回归验证(`curl` + `kubectl`)
|
||||
|
||||
## 备份建议
|
||||
|
||||
- K3s 关键配置与 manifests
|
||||
- 数据库(如 GitLab)
|
||||
- NFS 与业务数据
|
||||
|
||||
## 故障入口
|
||||
|
||||
- 网络与策略:`06-01-k3s-networkpolicy-故障排查.md`
|
||||
- 脚本入口:`scripts/README.md`
|
||||
|
||||
## 下一步
|
||||
|
||||
- 返回 00-00-构建总览.md,按导航继续。
|
||||
|
||||
Reference in New Issue
Block a user