基本框架

This commit is contained in:
2026-03-21 04:36:06 +08:00
commit de1be1dbe5
125 changed files with 10302 additions and 0 deletions

View File

@@ -0,0 +1,113 @@
# entrypath 诊断脚本说明
`entrypath.sh` 用于排查 `client -> worker:80 -> kube-proxy DNAT -> Traefik Pod` 全链路问题。
## 命令
```bash
./scripts/diag/entrypath/entrypath.sh <command> [options]
```
- `run`:完整检查(默认)
- `preflight`:仅检查依赖与参数环境
- `capture`:强制开启抓包/trace能力后执行 run
- `analyze --log <path>`:离线分析日志
## 关键参数
- `--worker-host` / `--client-host`
- `--worker-ssh-key` / `--client-ssh-key`
- `--client-ip` / `--lb-ip`
- `--remote-check y|n`
- `--capture-mode y|n`
- `--nft-trace-mode y|n`
- `--return-trace-mode y|n`
- `--pod-netns-trace-mode y|n`
- `--non-interactive`
## 日志
- root 运行:`/root/netpol-diag-logs/entrypath-*.log`
- 非 root`~/netpol-diag-logs/entrypath-*.log`
## 典型用法
### 1) 预检查
```bash
./scripts/diag/entrypath/entrypath.sh preflight --non-interactive
```
### 2) 全功能在线诊断(默认值示例)
```bash
./scripts/diag/entrypath/entrypath.sh run \
--worker-host root@192.168.2.62 \
--client-host root@192.168.2.63 \
--worker-ssh-key ~/.ssh/id_ed25519_k3s_diag_worker \
--client-ssh-key ~/.ssh/id_ed25519_k3s_diag_client \
--client-ip 192.168.2.63 \
--lb-ip 192.168.2.62 \
--remote-check y \
--capture-mode y \
--capture-seconds 15 \
--nft-trace-mode y \
--nft-trace-seconds 10 \
--return-trace-mode y \
--return-trace-seconds 12 \
--pod-netns-trace-mode y \
--pod-netns-trace-seconds 12 \
--non-interactive
```
### 3) 离线日志判读
```bash
./scripts/diag/entrypath/entrypath.sh analyze \
--log ~/netpol-diag-logs/entrypath-20260310-195812.log
```
## 常见陷阱与修复
### 1) `62:80` 不通,但 worker 已 DNAT 到 Traefik
若日志同时出现:
- `nft 观测到 KUBE-EXT DNAT: yes`
- `ylc61(any) SYN/SYN-ACK: N/0`
- `filter_FORWARD_POLICIES ... reject with icmpx admin-prohibited`
通常是 `ylc61` 的 firewalld 转发策略阻断 `flannel.1 -> cni0`
修复(推荐):
```bash
sudo firewall-cmd --zone=trusted --add-interface=flannel.1
sudo firewall-cmd --zone=trusted --add-interface=cni0
sudo firewall-cmd --permanent --zone=trusted --add-interface=flannel.1
sudo firewall-cmd --permanent --zone=trusted --add-interface=cni0
sudo firewall-cmd --reload
```
### 2) `Worker CNI hostport DNAT 计数未增长` 是否异常
不一定。若 nft trace 明确显示走的是 `KUBE-EXT -> KUBE-SVC -> KUBE-SEP`,则 CNI hostport 计数不增长属于正常路径差异,不应作为故障根因。
### 3) 成功判据
至少满足以下任一组:
- 客户端对 `http://<lb-ip>:80` 返回 `404/200/...`(非连接失败)
- 自动判读中:
- `ylc62(ens18) SYN/SYN-ACK``N/N`
- `ylc61(any) SYN/SYN-ACK``N/N`
- `ylc61(cni0) SYN/SYN-ACK``N/N`
## 模块划分
- `lib/common.sh`:通用工具、参数默认值
- `lib/k8s_checks.sh`:本地 K8s 基线采样
- `lib/remote_checks.sh`:远端 worker 采样与复测
- `lib/capture.sh`tcpdump / nft / conntrack / pod netns
- `lib/analyze.sh`:实时/离线判读