A tiny AI + Infra CLI: inspects your Kubernetes pods and asks an LLM to produce an SRE-style, actionable health summary.
- Showcases platform engineering + AI in a compact demo.
- Demo-friendly: works with a live cluster or a provided
sample_pods.json. - LinkedIn-ready: screenshot the deterministic + AI summaries side-by-side.
# 1) Create a venv and install deps
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2) (Option A) Run against your current K8s context
python ai_pod_explainer.py -A --provider openrouter --model "anthropic/claude-3.5-haiku"
# (Option B) No cluster? Use the sample data
python ai_pod_explainer.py --from-file sample_pods.json --no-ai
# 3) Configure AI (optional, for the AI summary)
# OpenRouter:
export OPENROUTER_API_KEY=YOUR_KEY
# OpenAI:
# export OPENAI_API_KEY=YOUR_KEY
# Re-run with AI enabled:
python ai_pod_explainer.py --from-file sample_pods.json --provider openrouter --model "anthropic/claude-3.5-haiku"Tip: If you don't want AI at all, just use
--no-aiand screenshot the deterministic summary.
- Executes
kubectl get pods -A -o json(or loadssample_pods.json). - Aggregates: phase counts, NotReady pods, CrashLoopBackOff, ImagePullBackOff, OOMKilled, high restarts.
- Highlights "hot" namespaces and example pods.
- (Optional) Sends a compact snapshot to your LLM provider for an SRE-style summary.
=== Deterministic Summary (no AI) ===
Quick health summary:
- Total pods: 42 (by phase: Running:39, Pending:2, Failed:1)
- Not Ready pods: 4
- CrashLoopBackOff: 2 (check config/env/image)
- Image pull issues: 1 (ImagePullBackOff/ErrImagePull)
- High restarts (>=3): 3
- Hot namespaces:
- payments: total=10, not_ready=3, crashloop=1, pending=0
- auth: total=6, not_ready=1, crashloop=1, pending=1
- Sample pods:
- payments/api-7d9f... phase=Running ready=False restarts=5 age=2h issues=CrashLoopBackOff
- auth/worker-9c8... phase=Pending ready=False restarts=0 age=12m
...
=== AI Summary ===
• Config regression in `payments` likely causing CrashLoopBackOff on `api`.
• One image pull failure suggests tag mismatch or registry access issue.
• Multiple pods restarting ≥3 times hint at unstable startup (env/secrets).
Next:
1) `kubectl -n payments describe pod <pod>` and check env/secrets diff.
2) Verify image tags & credentials; `kubectl -n <ns> get events | grep -i image`.
3) For OOMKilled, raise memory requests/limits and inspect heap.
Built on 2025-10-11.