Debugging Alpha Deployments at Pod Level: A Complete Guide
Introduction
Debugging issues in Kubernetes-deployed Alpha environments can be challenging. When something goes wrong in production or UAT, you need to quickly collect logs, identify errors, and understand what happened across multiple pods and services.
This guide introduces a script that helps you systematically collect, organize, and analyze logs from your entire Alpha deployment.
Quick Access: Get the script and documentation
What Problem Does This Solve?
When debugging Alpha deployments, you typically face these challenges:
- Too many pods to check manually - Alpha deployments have 10+ services running
- Scattered logs - Logs are spread across backend services, frontends, and infrastructure
- Lost context - Raw logs donβt show what happened before/after an error
- Time-consuming - Manually collecting logs from each pod takes forever
- No overview - Hard to identify which service is actually failing
- Previous pod logs - When a pod restarts, you lose access to pre-restart logs
The Old Way
# Manually checking each pod
kubectl logs alpha-auth-service-xyz -n alpha
kubectl logs alpha-case-service-abc -n alpha
kubectl logs alpha-config-service-def -n alpha
# ... repeat 15 more times ...
# Did any pod restart? Check previous logs
kubectl logs alpha-auth-service-xyz -n alpha --previous
# ... check each pod again ...
# Find errors manually
kubectl logs alpha-auth-service-xyz -n alpha | grep -i error
# ... no context about what caused it ...
The New Way
# One command to collect everything
./collect-pod-logs.sh alpha --since 2025-12-01
# You get:
# - All logs organized by service type
# - Contextual error extraction with 100 lines before/after
# - Diagnostic summary with severity classification
# - Anomaly detection (OOM, connection issues, etc.)
# - Previous pod logs automatically collected
# - Everything in a zip file ready to share
The Solution: Automated Pod Log Collection
A bash script that automates the entire log collection and analysis process for Alpha deployments.
What You Get
The script provides:
- Organized Structure - Logs grouped by service type (backends/frontends/infra)
- Smart Error Extraction - Automatically finds errors and captures context
- Diagnostic Summary - Overview of all issues with severity levels
- Anomaly Detection - Identifies OOM, connection failures, auth issues
- File Index - Quick reference to find any log file
- Zip Archive - Everything packaged for easy sharing with team
- Date Filtering - Only collect logs from specific dates
- Previous Logs - Automatically collects pre-restart logs
Getting Started
Prerequisites
You need:
kubectlinstalled and configured with cluster access- Bash shell (works on Linux, macOS, Windows WSL/Git Bash)
- Access to the Kubernetes namespace you want to debug
Installation
Whatβs in the repository:
collect-pod-logs.sh- The main scriptREADME.md- Technical documentationexample-alpha-logs.zip- Sample output for reference
- Download the script:
# Option 1: Clone the repository
git clone https://bitbucket.org/bhivedevs/alpha-debugging-things.git
cd alpha-debugging-things/alpha-deployment-debugging/collect-pod-logs/
- Verify kubectl access:
kubectl get namespaces
kubectl cluster-info
- Youβre ready to go!
Usage
Basic Usage
./collect-pod-logs.sh <namespace>
Example:
./collect-pod-logs.sh alpha
With Date Filter (Recommended for Production)
./collect-pod-logs.sh <namespace> --since YYYY-MM-DD
Example:
# Only collect logs from December 1st onwards
./collect-pod-logs.sh alpha-prod --since 2025-12-01
With Custom Output Directory
./collect-pod-logs.sh <namespace> [--since YYYY-MM-DD] <output-directory>
Example:
./collect-pod-logs.sh alpha --since 2025-12-01 ./incident-2025-12-02
Understanding the Output
Folder Structure
The script creates a well-organized directory structure:
pod-logs-alpha-20251202_143052/
βββ backends/
β βββ logs/
β βββ alpha-auth-service/
β β βββ current/ β Current pod logs
β β β βββ alpha-auth-service_2025-12-02.log
β β βββ previous/ β Pre-restart logs
β β β βββ alpha-auth-service_2025-12-02_previous.log
β β βββ errors/ β Extracted errors with context
β β βββ alpha-auth-service_2025-12-02_errors.log
β β βββ alpha-auth-service_2025-12-02_previous_errors.log
β β
β βββ alpha-case-service/
β βββ alpha-config-service/
β βββ alpha-module-service/
β βββ gts/
β
βββ frontends/
β βββ logs/
β βββ alpha-admin-ui/
β βββ alpha-case-manager-ui/
β βββ alpha-workflow-studio/
β
βββ infra/
β βββ logs/
β βββ rabbitmq/
β
βββ DIAGNOSTIC_SUMMARY.txt β Start here!
βββ FILE_INDEX.txt β Find any log quickly
βββ pod-logs-alpha-20251202_143052.zip
Service Classification
The script automatically classifies pods:
| Category | Services |
|---|---|
| Backends | auth-service, case-service, config-service, module-service, gts, etc. |
| Frontends | admin-ui, case-manager-ui, workflow-studio, delta-ui |
| Infrastructure | rabbitmq, redis, postgres, nginx, elasticsearch |
How to Debug with This Tool
Step 1: Collect Logs
./collect-pod-logs.sh alpha --since 2025-12-01
Step 2: Check the Diagnostic Summary
cd pod-logs-alpha-*/
cat DIAGNOSTIC_SUMMARY.txt
What to look for:
- CRITICAL pods (>100 errors)
- Anomalies (OOM_DETECTED, CONN_ISSUES, AUTH_FAILURES)
- Warning spikes (unusual number of warnings)
Example output:
Pod: alpha-auth-service-7d8f9c-xyz
Severity: π΄ CRITICAL
Current Logs: 245 errors, 89 warnings
Previous Logs: 156 errors, 45 warnings
Anomalies: ERROR_BURST, AUTH_FAILURES
Files:
Current: ./backends/logs/alpha-auth-service/current/alpha-auth-service_2025-12-02.log
Previous: ./backends/logs/alpha-auth-service/previous/alpha-auth-service_2025-12-02_previous.log
Errors (current): ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_errors.log
Errors (previous): ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_previous_errors.log
Step 3: Review Contextual Errors
Open the error files from the diagnostic summary:
# View errors with context
cat ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_errors.log
What you see:
===============================================================================
CONTEXTUAL ERROR EXTRACTION
===============================================================================
Source File: ./backends/logs/alpha-auth-service/current/alpha-auth-service_2025-12-02.log
Context: 100 lines before and after each error
===============================================================================
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ERROR #1 - Line 5462
β Context: Lines 5362 to 5562
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5362: 2025-12-02 14:23:10 INFO [auth-service] User login attempt: user@example.com
5363: 2025-12-02 14:23:10 DEBUG [auth-service] Validating credentials...
5364: 2025-12-02 14:23:10 DEBUG [auth-service] Connecting to database...
...
>>> 5462: 2025-12-02 14:23:15 ERROR [auth-service] Connection timeout: postgres-primary:5432
...
5562: 2025-12-02 14:23:20 INFO [auth-service] Retry attempt 1/3
This shows you:
- What the user was doing (login attempt)
- What the service tried (database connection)
- What went wrong (connection timeout)
- What happened after (retry attempt)
Step 4: Investigate Specific Patterns
# Search for specific errors across all services
grep -r "NullPointerException" ./pod-logs-*/
# Check for database connection issues
grep -r "connection refused" ./backends/logs/
# Find authentication failures
grep -r "401\|403\|Unauthorized" ./backends/logs/
# Check RabbitMQ issues
cat ./infra/logs/rabbitmq/errors/*.log
Step 5: Share with Team
# The zip file is ready to share
ls -lh pod-logs-alpha-*.zip
# Upload to your team's incident tracking system
# Or attach to support tickets
Real-World Examples
Example 1: Service Down After Deployment
Scenario: After deploying to UAT, users report the case manager is not loading.
Debug process:
# Collect logs from the last hour
./collect-pod-logs.sh alpha --since 2025-12-02
# Check diagnostic summary
cat pod-logs-*/DIAGNOSTIC_SUMMARY.txt
Example 2: Intermittent Auth Failures
Scenario: Users occasionally canβt log in, but it works after refreshing.
Debug process:
./collect-pod-logs.sh alpha-prod --since 2025-11-30
# Check auth service errors
cat pod-logs-*/backends/logs/alpha-auth-service/errors/*_errors.log
Example 3: Post-Restart Investigation
Scenario: A pod restarted overnight and you need to know why.
Debug process:
./collect-pod-logs.sh alpha
# Check PREVIOUS logs (pre-restart)
cat pod-logs-*/DIAGNOSTIC_SUMMARY.txt
# Look for the pod's previous logs section
cat pod-logs-*/backends/logs/alpha-module-service/previous/*_previous.log
Anomaly Detection Explained
The script automatically detects these issues:
| Anomaly | What It Means | Typical Cause |
|---|---|---|
| ERROR_BURST | >100 errors in logs | Service crash, cascading failure |
| OOM_DETECTED | Out of memory errors | Memory leak, insufficient resources |
| CONN_ISSUES | >10 connection failures | Network issues, service unavailability |
| AUTH_FAILURES | >5 auth failures | Identity server issues, token expiry |
| TIMEOUTS | >10 timeout errors | Slow queries, unresponsive services |
| DISK_ISSUE | Disk full errors | Log rotation issue, data volume growth |
Severity Levels
| Severity | Criteria | Action Required |
|---|---|---|
| >100 errors | Immediate investigation needed | |
| >10 errors | Investigate soon | |
| 1-10 errors | Monitor and review | |
| >50 warnings | Check during maintenance | |
| No issues | All good! |
Troubleshooting
βkubectl is not installed or not in PATHβ
Fix:
# Check if kubectl is installed
which kubectl
kubectl version --client
# If not installed, install it:
# macOS
brew install kubectl
# Linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/
βCannot connect to Kubernetes clusterβ
Fix:
# Check cluster connection
kubectl cluster-info
kubectl config current-context
# Switch to correct context if needed
kubectl config use-context your-cluster-context
βNamespace does not existβ
Fix:
# List available namespaces
kubectl get namespaces
# Use the correct namespace name
./collect-pod-logs.sh <correct-namespace-name>
βNo previous container logs availableβ
This is normal - it means the pod hasnβt restarted. Previous logs only exist after a pod restart.
Script is slow
For large deployments with many pods:
- Use
--sinceto limit the date range - Run during off-peak hours
- Ensure good network connection to cluster
Best Practices
1. Always Use Date Filters in Production
# Don't collect ALL logs from production
./collect-pod-logs.sh alpha-prod --since 2025-12-01
2. Create Incident Folders
# Organize by incident
./collect-pod-logs.sh alpha --since 2025-12-02 ./incident-auth-failures-2025-12-02
3. Check Diagnostic Summary First
Donβt dive into raw logs immediately. The diagnostic summary tells you where to focus.
### 5. Share Context with Team
When reporting issues, share:
- The diagnostic summary
- Relevant error files
- Your analysis
---
## Platform Support
| Platform | Support | Notes |
|----------|---------|-------|
| **Linux** | β
Full support | Native bash |
| **macOS** | β
Full support | Native bash |
| **Windows (WSL)** | β
Full support | Use WSL1 or WSL2 |
| **Windows (Git Bash)** | β
Full support | Install Git for Windows |
| **Windows (Native CMD)** | β Not supported | Use WSL or Git Bash |
---
## Pro Tips
### 1. Create Aliases
Add to your `~/.bashrc` or `~/.zshrc`:
```bash
alias collect-logs='~/scripts/collect-pod-logs.sh'
alias logs-uat='~/scripts/collect-pod-logs.sh alpha'
alias logs-prod='~/scripts/collect-pod-logs.sh alpha-prod'
2. Schedule Regular Health Checks
# Daily health check script
#!/bin/bash
./collect-pod-logs.sh alpha --since $(date -u '+%Y-%m-%d') ./daily-checks/$(date +%Y%m%d)
Contributing
Found a bug? Want to improve the script?
Ways to contribute:
- Report issues with specific examples
- Suggest new anomaly detection patterns
- Share your debugging workflows
- Improve documentation
- Submit pull requests with enhancements
Additional Resources
- Script Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/
- kubectl Cheat Sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
Next steps:
- Visit the
repository and download the script - Try it on your environment
- Review the diagnostic summary
- Explore the organized log structure
Questions?
Have questions or need help?
- Post in the community forum
- Report any issues
Happy debugging!
Last Updated: December 2025
Version: 1.0
Platform: Alpha
Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/