114 lines
3.0 KiB
Markdown
114 lines
3.0 KiB
Markdown
# entrypath 诊断脚本说明
|
||
|
||
`entrypath.sh` 用于排查 `client -> worker:80 -> kube-proxy DNAT -> Traefik Pod` 全链路问题。
|
||
|
||
## 命令
|
||
|
||
```bash
|
||
./scripts/diag/entrypath/entrypath.sh <command> [options]
|
||
```
|
||
|
||
- `run`:完整检查(默认)
|
||
- `preflight`:仅检查依赖与参数环境
|
||
- `capture`:强制开启抓包/trace能力后执行 run
|
||
- `analyze --log <path>`:离线分析日志
|
||
|
||
## 关键参数
|
||
|
||
- `--worker-host` / `--client-host`
|
||
- `--worker-ssh-key` / `--client-ssh-key`
|
||
- `--client-ip` / `--lb-ip`
|
||
- `--remote-check y|n`
|
||
- `--capture-mode y|n`
|
||
- `--nft-trace-mode y|n`
|
||
- `--return-trace-mode y|n`
|
||
- `--pod-netns-trace-mode y|n`
|
||
- `--non-interactive`
|
||
|
||
## 日志
|
||
|
||
- root 运行:`/root/netpol-diag-logs/entrypath-*.log`
|
||
- 非 root:`~/netpol-diag-logs/entrypath-*.log`
|
||
|
||
## 典型用法
|
||
|
||
### 1) 预检查
|
||
|
||
```bash
|
||
./scripts/diag/entrypath/entrypath.sh preflight --non-interactive
|
||
```
|
||
|
||
### 2) 全功能在线诊断(默认值示例)
|
||
|
||
```bash
|
||
./scripts/diag/entrypath/entrypath.sh run \
|
||
--worker-host root@192.168.2.62 \
|
||
--client-host root@192.168.2.63 \
|
||
--worker-ssh-key ~/.ssh/id_ed25519_k3s_diag_worker \
|
||
--client-ssh-key ~/.ssh/id_ed25519_k3s_diag_client \
|
||
--client-ip 192.168.2.63 \
|
||
--lb-ip 192.168.2.62 \
|
||
--remote-check y \
|
||
--capture-mode y \
|
||
--capture-seconds 15 \
|
||
--nft-trace-mode y \
|
||
--nft-trace-seconds 10 \
|
||
--return-trace-mode y \
|
||
--return-trace-seconds 12 \
|
||
--pod-netns-trace-mode y \
|
||
--pod-netns-trace-seconds 12 \
|
||
--non-interactive
|
||
```
|
||
|
||
### 3) 离线日志判读
|
||
|
||
```bash
|
||
./scripts/diag/entrypath/entrypath.sh analyze \
|
||
--log ~/netpol-diag-logs/entrypath-20260310-195812.log
|
||
```
|
||
|
||
## 常见陷阱与修复
|
||
|
||
### 1) `62:80` 不通,但 worker 已 DNAT 到 Traefik
|
||
|
||
若日志同时出现:
|
||
|
||
- `nft 观测到 KUBE-EXT DNAT: yes`
|
||
- `ylc61(any) SYN/SYN-ACK: N/0`
|
||
- `filter_FORWARD_POLICIES ... reject with icmpx admin-prohibited`
|
||
|
||
通常是 `ylc61` 的 firewalld 转发策略阻断 `flannel.1 -> cni0`。
|
||
|
||
修复(推荐):
|
||
|
||
```bash
|
||
sudo firewall-cmd --zone=trusted --add-interface=flannel.1
|
||
sudo firewall-cmd --zone=trusted --add-interface=cni0
|
||
|
||
sudo firewall-cmd --permanent --zone=trusted --add-interface=flannel.1
|
||
sudo firewall-cmd --permanent --zone=trusted --add-interface=cni0
|
||
sudo firewall-cmd --reload
|
||
```
|
||
|
||
### 2) `Worker CNI hostport DNAT 计数未增长` 是否异常
|
||
|
||
不一定。若 nft trace 明确显示走的是 `KUBE-EXT -> KUBE-SVC -> KUBE-SEP`,则 CNI hostport 计数不增长属于正常路径差异,不应作为故障根因。
|
||
|
||
### 3) 成功判据
|
||
|
||
至少满足以下任一组:
|
||
|
||
- 客户端对 `http://<lb-ip>:80` 返回 `404/200/...`(非连接失败)
|
||
- 自动判读中:
|
||
- `ylc62(ens18) SYN/SYN-ACK` 为 `N/N`
|
||
- `ylc61(any) SYN/SYN-ACK` 为 `N/N`
|
||
- `ylc61(cni0) SYN/SYN-ACK` 为 `N/N`
|
||
|
||
## 模块划分
|
||
|
||
- `lib/common.sh`:通用工具、参数默认值
|
||
- `lib/k8s_checks.sh`:本地 K8s 基线采样
|
||
- `lib/remote_checks.sh`:远端 worker 采样与复测
|
||
- `lib/capture.sh`:tcpdump / nft / conntrack / pod netns
|
||
- `lib/analyze.sh`:实时/离线判读
|