Debugging Alpha Deployments at Pod Level: A Complete Guide

Debugging Alpha Deployments at Pod Level: A Complete Guide

Introduction

Debugging issues in Kubernetes-deployed Alpha environments can be challenging. When something goes wrong in production or UAT, you need to quickly collect logs, identify errors, and understand what happened across multiple pods and services.

This guide introduces a script that helps you systematically collect, organize, and analyze logs from your entire Alpha deployment.

Quick Access: Get the script and documentation


What Problem Does This Solve?

When debugging Alpha deployments, you typically face these challenges:

  • Too many pods to check manually - Alpha deployments have 10+ services running
  • Scattered logs - Logs are spread across backend services, frontends, and infrastructure
  • Lost context - Raw logs don’t show what happened before/after an error
  • Time-consuming - Manually collecting logs from each pod takes forever
  • No overview - Hard to identify which service is actually failing
  • Previous pod logs - When a pod restarts, you lose access to pre-restart logs

:cross_mark: The Old Way

# Manually checking each pod
kubectl logs alpha-auth-service-xyz -n alpha
kubectl logs alpha-case-service-abc -n alpha
kubectl logs alpha-config-service-def -n alpha
# ... repeat 15 more times ...

# Did any pod restart? Check previous logs
kubectl logs alpha-auth-service-xyz -n alpha --previous
# ... check each pod again ...

# Find errors manually
kubectl logs alpha-auth-service-xyz -n alpha | grep -i error
# ... no context about what caused it ...

:white_check_mark: The New Way

# One command to collect everything
./collect-pod-logs.sh alpha --since 2025-12-01

# You get:
# - All logs organized by service type
# - Contextual error extraction with 100 lines before/after
# - Diagnostic summary with severity classification
# - Anomaly detection (OOM, connection issues, etc.)
# - Previous pod logs automatically collected
# - Everything in a zip file ready to share

The Solution: Automated Pod Log Collection

A bash script that automates the entire log collection and analysis process for Alpha deployments.

What You Get

The script provides:

  1. Organized Structure - Logs grouped by service type (backends/frontends/infra)
  2. Smart Error Extraction - Automatically finds errors and captures context
  3. Diagnostic Summary - Overview of all issues with severity levels
  4. Anomaly Detection - Identifies OOM, connection failures, auth issues
  5. File Index - Quick reference to find any log file
  6. Zip Archive - Everything packaged for easy sharing with team
  7. Date Filtering - Only collect logs from specific dates
  8. Previous Logs - Automatically collects pre-restart logs

Getting Started

Prerequisites

You need:

  • kubectl installed and configured with cluster access
  • Bash shell (works on Linux, macOS, Windows WSL/Git Bash)
  • Access to the Kubernetes namespace you want to debug

Installation

Repository: :link:https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/

What’s in the repository:

  • collect-pod-logs.sh - The main script
  • README.md - Technical documentation
  • example-alpha-logs.zip - Sample output for reference
  1. Download the script:
# Option 1: Clone the repository
git clone https://bitbucket.org/bhivedevs/alpha-debugging-things.git
cd alpha-debugging-things/alpha-deployment-debugging/collect-pod-logs/

  1. Verify kubectl access:
kubectl get namespaces
kubectl cluster-info
  1. You’re ready to go!

Usage

Basic Usage

./collect-pod-logs.sh <namespace>

Example:

./collect-pod-logs.sh alpha

With Date Filter (Recommended for Production)

./collect-pod-logs.sh <namespace> --since YYYY-MM-DD

Example:

# Only collect logs from December 1st onwards
./collect-pod-logs.sh alpha-prod --since 2025-12-01

With Custom Output Directory

./collect-pod-logs.sh <namespace> [--since YYYY-MM-DD] <output-directory>

Example:

./collect-pod-logs.sh alpha --since 2025-12-01 ./incident-2025-12-02

Understanding the Output

Folder Structure

The script creates a well-organized directory structure:

pod-logs-alpha-20251202_143052/
β”œβ”€β”€ backends/
β”‚   └── logs/
β”‚       β”œβ”€β”€ alpha-auth-service/
β”‚       β”‚   β”œβ”€β”€ current/               ← Current pod logs
β”‚       β”‚   β”‚   └── alpha-auth-service_2025-12-02.log
β”‚       β”‚   β”œβ”€β”€ previous/              ← Pre-restart logs
β”‚       β”‚   β”‚   └── alpha-auth-service_2025-12-02_previous.log
β”‚       β”‚   └── errors/                ← Extracted errors with context
β”‚       β”‚       β”œβ”€β”€ alpha-auth-service_2025-12-02_errors.log
β”‚       β”‚       └── alpha-auth-service_2025-12-02_previous_errors.log
β”‚       β”‚
β”‚       β”œβ”€β”€ alpha-case-service/
β”‚       β”œβ”€β”€ alpha-config-service/
β”‚       β”œβ”€β”€ alpha-module-service/
β”‚       └── gts/
β”‚
β”œβ”€β”€ frontends/
β”‚   └── logs/
β”‚       β”œβ”€β”€ alpha-admin-ui/
β”‚       β”œβ”€β”€ alpha-case-manager-ui/
β”‚       └── alpha-workflow-studio/
β”‚
β”œβ”€β”€ infra/
β”‚   └── logs/
β”‚       └── rabbitmq/
β”‚
β”œβ”€β”€ DIAGNOSTIC_SUMMARY.txt             ← Start here!
β”œβ”€β”€ FILE_INDEX.txt                     ← Find any log quickly
└── pod-logs-alpha-20251202_143052.zip

Service Classification

The script automatically classifies pods:

Category Services
Backends auth-service, case-service, config-service, module-service, gts, etc.
Frontends admin-ui, case-manager-ui, workflow-studio, delta-ui
Infrastructure rabbitmq, redis, postgres, nginx, elasticsearch

How to Debug with This Tool

Step 1: Collect Logs

./collect-pod-logs.sh alpha --since 2025-12-01

Step 2: Check the Diagnostic Summary

cd pod-logs-alpha-*/
cat DIAGNOSTIC_SUMMARY.txt

What to look for:

  • CRITICAL pods (>100 errors)
  • Anomalies (OOM_DETECTED, CONN_ISSUES, AUTH_FAILURES)
  • Warning spikes (unusual number of warnings)

Example output:

Pod: alpha-auth-service-7d8f9c-xyz
  Severity: πŸ”΄ CRITICAL
  Current Logs: 245 errors, 89 warnings
  Previous Logs: 156 errors, 45 warnings
  Anomalies: ERROR_BURST, AUTH_FAILURES
  Files:
    Current: ./backends/logs/alpha-auth-service/current/alpha-auth-service_2025-12-02.log
    Previous: ./backends/logs/alpha-auth-service/previous/alpha-auth-service_2025-12-02_previous.log
    Errors (current): ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_errors.log
    Errors (previous): ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_previous_errors.log

Step 3: Review Contextual Errors

Open the error files from the diagnostic summary:

# View errors with context
cat ./backends/logs/alpha-auth-service/errors/alpha-auth-service_2025-12-02_errors.log

What you see:

===============================================================================
CONTEXTUAL ERROR EXTRACTION
===============================================================================
Source File: ./backends/logs/alpha-auth-service/current/alpha-auth-service_2025-12-02.log
Context: 100 lines before and after each error
===============================================================================

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ ERROR #1 - Line 5462
β”‚ Context: Lines 5362 to 5562
└─────────────────────────────────────────────────────────────────────────────
    5362: 2025-12-02 14:23:10 INFO  [auth-service] User login attempt: user@example.com
    5363: 2025-12-02 14:23:10 DEBUG [auth-service] Validating credentials...
    5364: 2025-12-02 14:23:10 DEBUG [auth-service] Connecting to database...
    ...
>>> 5462: 2025-12-02 14:23:15 ERROR [auth-service] Connection timeout: postgres-primary:5432
    ...
    5562: 2025-12-02 14:23:20 INFO  [auth-service] Retry attempt 1/3

This shows you:

  • What the user was doing (login attempt)
  • What the service tried (database connection)
  • What went wrong (connection timeout)
  • What happened after (retry attempt)

Step 4: Investigate Specific Patterns

# Search for specific errors across all services
grep -r "NullPointerException" ./pod-logs-*/

# Check for database connection issues
grep -r "connection refused" ./backends/logs/

# Find authentication failures
grep -r "401\|403\|Unauthorized" ./backends/logs/

# Check RabbitMQ issues
cat ./infra/logs/rabbitmq/errors/*.log

Step 5: Share with Team

# The zip file is ready to share
ls -lh pod-logs-alpha-*.zip

# Upload to your team's incident tracking system
# Or attach to support tickets

Real-World Examples

Example 1: Service Down After Deployment

Scenario: After deploying to UAT, users report the case manager is not loading.

Debug process:

# Collect logs from the last hour
./collect-pod-logs.sh alpha --since 2025-12-02

# Check diagnostic summary
cat pod-logs-*/DIAGNOSTIC_SUMMARY.txt

Example 2: Intermittent Auth Failures

Scenario: Users occasionally can’t log in, but it works after refreshing.

Debug process:

./collect-pod-logs.sh alpha-prod --since 2025-11-30

# Check auth service errors
cat pod-logs-*/backends/logs/alpha-auth-service/errors/*_errors.log

Example 3: Post-Restart Investigation

Scenario: A pod restarted overnight and you need to know why.

Debug process:

./collect-pod-logs.sh alpha

# Check PREVIOUS logs (pre-restart)
cat pod-logs-*/DIAGNOSTIC_SUMMARY.txt
# Look for the pod's previous logs section

cat pod-logs-*/backends/logs/alpha-module-service/previous/*_previous.log

Anomaly Detection Explained

The script automatically detects these issues:

Anomaly What It Means Typical Cause
ERROR_BURST >100 errors in logs Service crash, cascading failure
OOM_DETECTED Out of memory errors Memory leak, insufficient resources
CONN_ISSUES >10 connection failures Network issues, service unavailability
AUTH_FAILURES >5 auth failures Identity server issues, token expiry
TIMEOUTS >10 timeout errors Slow queries, unresponsive services
DISK_ISSUE Disk full errors Log rotation issue, data volume growth

Severity Levels

Severity Criteria Action Required
:red_circle: CRITICAL >100 errors Immediate investigation needed
:orange_circle: HIGH >10 errors Investigate soon
:yellow_circle: MEDIUM 1-10 errors Monitor and review
:blue_circle: LOW >50 warnings Check during maintenance
:green_circle: OK No issues All good!

Troubleshooting

β€œkubectl is not installed or not in PATH”

Fix:

# Check if kubectl is installed
which kubectl
kubectl version --client

# If not installed, install it:
# macOS
brew install kubectl

# Linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/

β€œCannot connect to Kubernetes cluster”

Fix:

# Check cluster connection
kubectl cluster-info
kubectl config current-context

# Switch to correct context if needed
kubectl config use-context your-cluster-context

β€œNamespace does not exist”

Fix:

# List available namespaces
kubectl get namespaces

# Use the correct namespace name
./collect-pod-logs.sh <correct-namespace-name>

β€œNo previous container logs available”

This is normal - it means the pod hasn’t restarted. Previous logs only exist after a pod restart.

Script is slow

For large deployments with many pods:

  • Use --since to limit the date range
  • Run during off-peak hours
  • Ensure good network connection to cluster

Best Practices

1. Always Use Date Filters in Production

# Don't collect ALL logs from production
./collect-pod-logs.sh alpha-prod --since 2025-12-01

2. Create Incident Folders

# Organize by incident
./collect-pod-logs.sh alpha --since 2025-12-02 ./incident-auth-failures-2025-12-02

3. Check Diagnostic Summary First

Don’t dive into raw logs immediately. The diagnostic summary tells you where to focus.


### 5. Share Context with Team

When reporting issues, share:
- The diagnostic summary
- Relevant error files
- Your analysis

---

## Platform Support

| Platform | Support | Notes |
|----------|---------|-------|
| **Linux** | βœ… Full support | Native bash |
| **macOS** | βœ… Full support | Native bash |
| **Windows (WSL)** | βœ… Full support | Use WSL1 or WSL2 |
| **Windows (Git Bash)** | βœ… Full support | Install Git for Windows |
| **Windows (Native CMD)** | ❌ Not supported | Use WSL or Git Bash |

---

## Pro Tips

### 1. Create Aliases

Add to your `~/.bashrc` or `~/.zshrc`:

```bash
alias collect-logs='~/scripts/collect-pod-logs.sh'
alias logs-uat='~/scripts/collect-pod-logs.sh alpha'
alias logs-prod='~/scripts/collect-pod-logs.sh alpha-prod'

2. Schedule Regular Health Checks

# Daily health check script
#!/bin/bash
./collect-pod-logs.sh alpha --since $(date -u '+%Y-%m-%d') ./daily-checks/$(date +%Y%m%d)

Contributing

Found a bug? Want to improve the script?

Repository: :link:https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/

Ways to contribute:

  • Report issues with specific examples
  • Suggest new anomaly detection patterns
  • Share your debugging workflows
  • Improve documentation
  • Submit pull requests with enhancements

Additional Resources


Next steps:

  1. Visit the :link:repository and download the script
  2. Try it on your environment
  3. Review the diagnostic summary
  4. Explore the organized log structure

Questions?

Have questions or need help?

  • Post in the community forum
  • Report any issues

Happy debugging!


Last Updated: December 2025
Version: 1.0
Platform: Alpha
Repository: https://bitbucket.org/bhivedevs/alpha-debugging-things/src/master/alpha-deployment-debugging/collect-pod-logs/

4 Likes