Debugging Alpha Deployments at Pod Level: A Complete Guide

gaurav.pandey · December 2, 2025, 12:03pm

Debugging Alpha Deployments at Pod Level: A Complete Guide

Introduction

Debugging issues in Kubernetes-deployed Alpha environments can be challenging. When something goes wrong in production or UAT, you need to quickly collect logs, identify errors, and understand what happened across multiple pods and services.

This guide introduces a script that helps you systematically collect, organize, and analyze logs from your entire Alpha deployment.

Quick Access: Get the script and documentation

What Problem Does This Solve?

When debugging Alpha deployments, you typically face these challenges:

Too many pods to check manually - Alpha deployments have 10+ services running
Scattered logs - Logs are spread across backend services, frontends, and infrastructure
Lost context - Raw logs don’t show what happened before/after an error
Time-consuming - Manually collecting logs from each pod takes forever
No overview - Hard to identify which service is actually failing
Previous pod logs - When a pod restarts, you lose access to pre-restart logs

The Old Way

# Manually checking each pod
kubectl logs alpha-auth-service-xyz -n alpha
kubectl logs alpha-case-service-abc -n alpha
kubectl logs alpha-config-service-def -n alpha
# ... repeat 15 more times ...

# Did any pod restart? Check previous logs
kubectl logs alpha-auth-service-xyz -n alpha --previous
# ... check each pod again ...

# Find errors manually
kubectl logs alpha-auth-service-xyz -n alpha | grep -i error
# ... no context about what caused it ...

The New Way

# One command to collect everything
./collect-pod-logs.sh alpha --since 2025-12-01

# You get:
# - All logs organized by service type
# - Contextual error extraction with 100 lines before/after
# - Diagnostic summary with severity classification
# - Anomaly detection (OOM, connection issues, etc.)
# - Previous pod logs automatically collected
# - Everything in a zip file ready to share

The Solution: Automated Pod Log Collection

A bash script that automates the entire log collection and analysis process for Alpha deployments.

What You Get

The script provides:

Organized Structure - Logs grouped by service type (backends/frontends/infra)
Smart Error Extraction - Automatically finds errors and captures context
Diagnostic Summary - Overview of all issues with severity levels
Anomaly Detection - Identifies OOM, connection failures, auth issues
File Index - Quick reference to find any log file
Zip Archive - Everything packaged for easy sharing with team
Date Filtering - Only collect logs from specific dates
Previous Logs - Automatically collects pre-restart logs

Getting Started

Prerequisites

You need:

kubectl installed and configured with cluster access
Bash shell (works on Linux, macOS, Windows WSL/Git Bash)
Access to the Kubernetes namespace you want to debug

Installation

Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/

What’s in the repository:

collect-pod-logs.sh - The main script
README.md - Technical documentation
example-alpha-logs.zip - Sample output for reference

Download the script:

# Option 1: Clone the repository
git clone https://bitbucket.org/bhivedevs/alpha-debugging-things.git
cd alpha-debugging-things/alpha-deployment-debugging/collect-pod-logs/

Verify kubectl access:

kubectl get namespaces
kubectl cluster-info

You’re ready to go!

Usage

Basic Usage

./collect-pod-logs.sh <namespace>

Example:

./collect-pod-logs.sh alpha

With Date Filter (Recommended for Production)

./collect-pod-logs.sh <namespace> --since YYYY-MM-DD

Example:

# Only collect logs from December 1st onwards
./collect-pod-logs.sh alpha-prod --since 2025-12-01

With Custom Output Directory

./collect-pod-logs.sh <namespace> [--since YYYY-MM-DD] <output-directory>

Example:

./collect-pod-logs.sh alpha --since 2025-12-01 ./incident-2025-12-02

Understanding the Output

Folder Structure

The script creates a well-organized directory structure:

pod-logs-alpha-20251202_143052/
├── backends/
│   └── logs/
│       ├── alpha-auth-service/
│       │   ├── current/               ← Current pod logs
│       │   │   └── alpha-auth-service_2025-12-02.log
│       │   ├── previous/              ← Pre-restart logs
│       │   │   └── alpha-auth-service_2025-12-02_previous.log
│       │   └── errors/                ← Extracted errors with context
│       │       ├── alpha-auth-service_2025-12-02_errors.log
│       │       └── alpha-auth-service_2025-12-02_previous_errors.log
│       │
│       ├── alpha-case-service/
│       ├── alpha-config-service/
│       ├── alpha-module-service/
│       └── gts/
│
├── frontends/
│   └── logs/
│       ├── alpha-admin-ui/
│       ├── alpha-case-manager-ui/
│       └── alpha-workflow-studio/
│
├── infra/
│   └── logs/
│       └── rabbitmq/
│
├── DIAGNOSTIC_SUMMARY.txt             ← Start here!
├── FILE_INDEX.txt                     ← Find any log quickly
└── pod-logs-alpha-20251202_143052.zip

Service Classification

The script automatically classifies pods:

Category	Services
Backends	auth-service, case-service, config-service, module-service, gts, etc.
Frontends	admin-ui, case-manager-ui, workflow-studio, delta-ui
Infrastructure	rabbitmq, redis, postgres, nginx, elasticsearch

How to Debug with This Tool

Step 1: Collect Logs

./collect-pod-logs.sh alpha --since 2025-12-01

Step 2: Check the Diagnostic Summary

cd pod-logs-alpha-*/
cat DIAGNOSTIC_SUMMARY.txt

What to look for:

CRITICAL pods (>100 errors)
Anomalies (OOM_DETECTED, CONN_ISSUES, AUTH_FAILURES)
Warning spikes (unusual number of warnings)

Example output:

Pod: alpha-auth-service-7d8f9c-xyz
  Severity: 🔴 CRITICAL
  Current Logs: 245 errors, 89 warnings
  Previous Logs: 156 errors, 45 warnings
  Anomalies: ERROR_BURST, AUTH_FAILURES
  Files:
    Current: ./backends/logs/alpha-auth-service/current/alpha-auth-service_2025-12-02.log
    Previous: ./backends/logs/alpha-auth-service/previous/alpha-auth-service_2025-12-02_previous.log
    Errors (current): ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_errors.log
    Errors (previous): ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_previous_errors.log

Step 3: Review Contextual Errors

Open the error files from the diagnostic summary:

# View errors with context
cat ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_errors.log

What you see:

===============================================================================
CONTEXTUAL ERROR EXTRACTION
===============================================================================
Source File: ./backends/logs/alpha-auth-service/current/alpha-auth-service_2025-12-02.log
Context: 100 lines before and after each error
===============================================================================

┌─────────────────────────────────────────────────────────────────────────────
│ ERROR #1 - Line 5462
│ Context: Lines 5362 to 5562
└─────────────────────────────────────────────────────────────────────────────
    5362: 2025-12-02 14:23:10 INFO  [auth-service] User login attempt: user@example.com
    5363: 2025-12-02 14:23:10 DEBUG [auth-service] Validating credentials...
    5364: 2025-12-02 14:23:10 DEBUG [auth-service] Connecting to database...
    ...
>>> 5462: 2025-12-02 14:23:15 ERROR [auth-service] Connection timeout: postgres-primary:5432
    ...
    5562: 2025-12-02 14:23:20 INFO  [auth-service] Retry attempt 1/3

This shows you:

What the user was doing (login attempt)
What the service tried (database connection)
What went wrong (connection timeout)
What happened after (retry attempt)

Step 4: Investigate Specific Patterns

# Search for specific errors across all services
grep -r "NullPointerException" ./pod-logs-*/

# Check for database connection issues
grep -r "connection refused" ./backends/logs/

# Find authentication failures
grep -r "401\|403\|Unauthorized" ./backends/logs/

# Check RabbitMQ issues
cat ./infra/logs/rabbitmq/errors/*.log

Step 5: Share with Team

# The zip file is ready to share
ls -lh pod-logs-alpha-*.zip

# Upload to your team's incident tracking system
# Or attach to support tickets

Real-World Examples

Example 1: Service Down After Deployment

Scenario: After deploying to UAT, users report the case manager is not loading.

Debug process:

# Collect logs from the last hour
./collect-pod-logs.sh alpha --since 2025-12-02

# Check diagnostic summary
cat pod-logs-*/DIAGNOSTIC_SUMMARY.txt

Example 2: Intermittent Auth Failures

Scenario: Users occasionally can’t log in, but it works after refreshing.

Debug process:

./collect-pod-logs.sh alpha-prod --since 2025-11-30

# Check auth service errors
cat pod-logs-*/backends/logs/alpha-auth-service/errors/*_errors.log

Example 3: Post-Restart Investigation

Scenario: A pod restarted overnight and you need to know why.

Debug process:

./collect-pod-logs.sh alpha

# Check PREVIOUS logs (pre-restart)
cat pod-logs-*/DIAGNOSTIC_SUMMARY.txt
# Look for the pod's previous logs section

cat pod-logs-*/backends/logs/alpha-module-service/previous/*_previous.log

Anomaly Detection Explained

The script automatically detects these issues:

Anomaly	What It Means	Typical Cause
ERROR_BURST	>100 errors in logs	Service crash, cascading failure
OOM_DETECTED	Out of memory errors	Memory leak, insufficient resources
CONN_ISSUES	>10 connection failures	Network issues, service unavailability
AUTH_FAILURES	>5 auth failures	Identity server issues, token expiry
TIMEOUTS	>10 timeout errors	Slow queries, unresponsive services
DISK_ISSUE	Disk full errors	Log rotation issue, data volume growth

Severity Levels

Severity	Criteria	Action Required
CRITICAL	>100 errors	Immediate investigation needed
HIGH	>10 errors	Investigate soon
MEDIUM	1-10 errors	Monitor and review
LOW	>50 warnings	Check during maintenance
OK	No issues	All good!

Troubleshooting

“kubectl is not installed or not in PATH”

Fix:

# Check if kubectl is installed
which kubectl
kubectl version --client

# If not installed, install it:
# macOS
brew install kubectl

# Linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/

“Cannot connect to Kubernetes cluster”

Fix:

# Check cluster connection
kubectl cluster-info
kubectl config current-context

# Switch to correct context if needed
kubectl config use-context your-cluster-context

“Namespace does not exist”

Fix:

# List available namespaces
kubectl get namespaces

# Use the correct namespace name
./collect-pod-logs.sh <correct-namespace-name>

“No previous container logs available”

This is normal - it means the pod hasn’t restarted. Previous logs only exist after a pod restart.

Script is slow

For large deployments with many pods:

Use --since to limit the date range
Run during off-peak hours
Ensure good network connection to cluster

Best Practices

1. Always Use Date Filters in Production

# Don't collect ALL logs from production
./collect-pod-logs.sh alpha-prod --since 2025-12-01

2. Create Incident Folders

# Organize by incident
./collect-pod-logs.sh alpha --since 2025-12-02 ./incident-auth-failures-2025-12-02

3. Check Diagnostic Summary First

Don’t dive into raw logs immediately. The diagnostic summary tells you where to focus.


### 5. Share Context with Team

When reporting issues, share:
- The diagnostic summary
- Relevant error files
- Your analysis

---

## Platform Support

| Platform | Support | Notes |
|----------|---------|-------|
| **Linux** | ✅ Full support | Native bash |
| **macOS** | ✅ Full support | Native bash |
| **Windows (WSL)** | ✅ Full support | Use WSL1 or WSL2 |
| **Windows (Git Bash)** | ✅ Full support | Install Git for Windows |
| **Windows (Native CMD)** | ❌ Not supported | Use WSL or Git Bash |

---

## Pro Tips

### 1. Create Aliases

Add to your `~/.bashrc` or `~/.zshrc`:

```bash
alias collect-logs='~/scripts/collect-pod-logs.sh'
alias logs-uat='~/scripts/collect-pod-logs.sh alpha'
alias logs-prod='~/scripts/collect-pod-logs.sh alpha-prod'

2. Schedule Regular Health Checks

# Daily health check script
#!/bin/bash
./collect-pod-logs.sh alpha --since $(date -u '+%Y-%m-%d') ./daily-checks/$(date +%Y%m%d)

Contributing

Found a bug? Want to improve the script?

Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/

Ways to contribute:

Report issues with specific examples
Suggest new anomaly detection patterns
Share your debugging workflows
Improve documentation
Submit pull requests with enhancements

Additional Resources

Script Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/
kubectl Cheat Sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/

Next steps:

Visit the repository and download the script
Try it on your environment
Review the diagnostic summary
Explore the organized log structure

Questions?

Have questions or need help?

Post in the community forum
Report any issues

Happy debugging!

Last Updated: December 2025
Version: 1.0
Platform: Alpha
Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/