Files
Deploy-Laboratory/docs/06-02-运维小结.md
2026-03-21 04:36:06 +08:00

64 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 06-02-运维小结
> 日常运维建议:检查项、变更记录、备份策略。
## 日常检查
- `kubectl get nodes` 是否全部 `Ready`
- 关键服务是否可访问
- 证书有效期与域名解析是否正常
## 常用排查命令速查
- **集群与工作负载**
- `kubectl get pod,svc,ing -A -o wide`
- `kubectl -n kube-system logs deploy/traefik --tail=100`
- `kubectl -n kube-system get helmchart,helmchartconfig`
- `kubectl -n kube-system describe pod <pod-name>`
- **节点与网络**
- `kubectl get node -o wide`
- `watch -n 1 'ip addr; ip route'`
- `ss -tulpn | grep ':80\|:443\|:6443'`
- `sudo netstat -tulpn | grep ':80\|:443\|:6443'`
- `sudo lsof -iTCP -sTCP:LISTEN -P -n | grep -E ':80|:443|:6443'`
- `curl -vk https://<域名>/ --resolve "<域名>:443:<入口IP>" -o /dev/null`
- **Traefik / ACME 相关**
- `kubectl -n kube-system logs deploy/traefik --tail=200 | grep -i acme || true`
- `kubectl -n kube-system get ingress -A`
- `openssl s_client -connect <IP>:443 -servername <域名> </dev/null 2>/dev/null | openssl x509 -noout -text | grep -E "Subject:|DNS:"`
- **SSH 与 Ansible**
- `bash scripts/ssh/test-ssh.sh`
- `ssh -i ~/.ssh/id_ed25519_k3s_*.61 root@192.168.2.61`
- `ansible-playbook -i ansible/inventory.ini ansible/playbooks/nginx-matrix-tls-deploy.yml`
## 建议的日常清理
- **清理一次性 Job/安装 Pod**
- `sudo kubectl -n kube-system get pod | grep 'helm-install-traefik'`
- `sudo kubectl -n kube-system delete pod helm-install-traefik-* --ignore-not-found=true`
- 原则:只删 `Completed``ErrImagePull` 等一次性安装 Pod不删长期组件`traefik``coredns` 等)。
## 变更管理
- 所有配置变更优先走 Git
- 变更后执行最小回归验证(`curl` + `kubectl`
## 备份建议
- K3s 关键配置与 manifests
- 数据库(如 GitLab
- NFS 与业务数据
## 故障入口
- 网络与策略:`06-01-k3s-networkpolicy-故障排查.md`
- 脚本入口:`scripts/README.md`
## 下一步
- 返回 00-00-构建总览.md按导航继续。