Files
Deploy-Laboratory/docs/06-02-运维小结.md
jack 8a54cac61f feat: CoreDNS IPv4 上游、03-03 Tomcat 修复、HAProxy 与验证脚本
- Ansible: 部署时自动配置 CoreDNS forward 为 IPv4,避免 ACME 解析失败
- 01-01/01-07: 文档增加 CoreDNS 设置说明
- 03-03: Tomcat webapps.dist 复制、HTTP/HTTPS 双 Ingress、显式 Dashboard IngressRoute
- traefik-dashboard-acme: tomcat-acme.yaml、404 排查说明
- HAProxy: 健康检查与 PROXY 配置拆分,18080/18443 部署与验证脚本

Made-with: Cursor
2026-03-22 19:02:46 +08:00

61 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 06-02-运维小结
> 日常运维建议:检查项、变更记录、备份策略。
## 日常检查
- `kubectl get nodes` 是否全部 `Ready`
- 关键服务是否可访问
- 证书有效期与域名解析是否正常
## 常用排查命令速查
- **集群与工作负载**
- `kubectl get pod,svc,ing -A -o wide`
- `kubectl -n kube-system logs deploy/traefik --tail=100`
- `kubectl -n kube-system get helmchart,helmchartconfig`
- `kubectl -n kube-system describe pod <pod-name>`
- **节点与网络**
- `kubectl get node -o wide`
- `watch -n 1 'ip addr; ip route'`
- `ss -tulpn | grep ':80\|:443\|:6443'`
- `sudo netstat -tulpn | grep ':80\|:443\|:6443'`
- `sudo lsof -iTCP -sTCP:LISTEN -P -n | grep -E ':80|:443|:6443'`
- `curl -vk https://<域名>/ --resolve "<域名>:443:<入口IP>" -o /dev/null`
- **Traefik / ACME 相关**
- `kubectl -n kube-system logs deploy/traefik --tail=200 | grep -i acme || true`
- `kubectl -n kube-system get ingress -A`
- `openssl s_client -connect <IP>:443 -servername <域名> </dev/null 2>/dev/null | openssl x509 -noout -text | grep -E "Subject:|DNS:"`
- **SSH 与 Ansible**
- `bash scripts/ssh/test-ssh.sh`
- `ssh -i ~/.ssh/id_ed25519_k3s_*.61 root@192.168.2.61`
- `ansible-playbook -i ansible/inventory.ini ansible/playbooks/nginx-matrix-tls-deploy.yml`
## 建议的日常清理
- **清理一次性 Job/安装 Pod**
- `sudo kubectl -n kube-system get pod | grep 'helm-install-traefik'`
- `sudo kubectl -n kube-system delete pod helm-install-traefik-* --ignore-not-found=true`
- 原则:只删 `Completed``ErrImagePull` 等一次性安装 Pod不删长期组件`traefik``coredns` 等)。
## 变更管理
- 所有配置变更优先走 Git
- 变更后执行最小回归验证(`curl` + `kubectl`
## 备份建议
- K3s 关键配置与 manifests
- 数据库(如 GitLab
- NFS 与业务数据
## 故障入口
- 网络与策略:`06-01-k3s-networkpolicy-故障排查.md`
- 脚本入口:`scripts/README.md`
## 下一步
- 返回 00-00-构建总览.md按导航继续。