Archival & Restore Use Cases

doomedCode · December 2, 2025, 8:28am

This document explains how to plan and execute archival and restore operations using the Alpha Archival service, and how to work on the service as a developer.

The service manages archival of jBPM-backed process data between a primary BPM database (source) and a backup database (target), with detailed audit trails and safety controls.

Overview

Key capabilities:

Count and find process instances by filters (process name, status, dates, task filters, variable filters, explicit piids)
Discover recursive child process instance IDs
Generate transaction-wrapped SQL backup scripts for a set of process IDs
Run full backup (source → backup DB) and restore (backup DB → source) in controlled batches with audit logs
Delete process instances in bulk (recursive children included)

Constraints and safety:

At most one backup/restore operation runs at a time; a concurrent request returns a message and an auditId you can poll
Backup deletes matching data from the source after copying to the target; restore deletes matching data from the backup after copying to the source
Every step writes detailed rows into the process_audit_logs table in the backup database for traceability

Base URL & Docs

All endpoint paths in this document are relative to /archivalservice/bpm unless noted otherwise.
The base path is configurable via BASE_PATH (default: archivalservice).
Swagger (local default): http://localhost:3000/archivalservice/api-docs
Health check (no auth): GET /archivalservice/ping → { status: 'ok' }

Prerequisites & Environment

Databases and schema:

Primary BPM DB (source): contains jBPM tables (e.g., processinstancelog, tasks, variables)
Backup BPM DB (target): receives backups and stores audit tables
Startup migrations create or update stored procedures/functions in both DBs and create backup tables in the backup DB

Configuration (service .env):

Primary DB: DB_HOST, DB_PORT, DB_USERNAME, DB_PASSWORD, BPM_DB_NAME, BPM_DB_SCHEMA_NAME, DB_SSL, optional DB_CA_CERT_PATH/DB_CA_CERT
Backup DB: BPM_BACKUP_DB_HOST, BPM_BACKUP_DB_PORT, BPM_BACKUP_DB_USERNAME, BPM_BACKUP_DB_PASSWORD, BPM_BACKUP_DB_NAME, BPM_BACKUP_DB_SCHEMA_NAME, BPM_BACKUP_DB_SSL, optional BPM_BACKUP_DB_CA_CERT_PATH/BPM_BACKUP_DB_CA_CERT
Service: BASE_PATH, PORT
Security: IDS_ENABLE, IDS_URL, IDS_CLIENT_ID, IDS_CLIENT_SECRET, ALPHA_AUTH_SERVICE_URL

Health check:

GET /archivalservice/ping returns { status: 'ok' } and is excluded from auth

Config validation:

On startup, the service validates that primary (bpm) and backup (bpm_backup) do not point to the same physical database (same host, port, and database name). If they do, the service remains unhealthy:
- GET /archivalservice/ping returns HTTP 503 with { status: 'error', errorCode: 'INVALID_DB_CONFIG', errorMessage: '...' }.
- Backup/restore endpoints respond with HTTP 503 until configuration is fixed.

Quick Start

Plan, back up, and track a run end-to-end.

Estimate volume

POST /processes/count
Body:

{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/count" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }'

Kick off backup (async)

POST /processes/backup
Body mirrors your filters (e.g., the one above). Best practice: target terminal states only (e.g., processStatus: "COMPLETED") and/or use beforeNoOfDays to avoid in‑flight processes. Response:

{ "message": "Backup initiated successfully", "auditId": "<uuid>" }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/backup" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "processName": "customerOnboarding",
    "processStatus": "COMPLETED",
    "beforeNoOfDays": 90
  }'

Track progress

Latest: GET /processes/audits/<auditId>/latest

Full timeline: GET /processes/audits/<auditId>
Curl (latest):

curl -X GET \
  "http://localhost:3000/archivalservice/bpm/processes/audits/<auditId>/latest" \
  -H "Authorization: Bearer <token>"

Curl (timeline):

curl -X GET \
  "http://localhost:3000/archivalservice/bpm/processes/audits/<auditId>?pageNumber=1&pageSize=50" \
  -H "Authorization: Bearer <token>"

(Optional) Restore similar scope later

POST /processes/restore with the same filters (applied against the backup DB).
Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/restore" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "processName": "customerOnboarding",
    "beforeNoOfDays": 90
  }'

Planning an Archival

Objective: select the right scope, estimate volume, understand relationships, and choose the proper execution strategy.

Define selection filters

Use any combination of:
- processName (exact match), processStatus (ACTIVE|COMPLETED|ABORTED), deploymentId
- beforeNoOfDays (date cutoff), task filters (taskName, taskStatus, taskAssignee)
- processVariables: array of { name, value } filters (AND semantics)
- piids: explicit list of process instance IDs
- Status guidance: Prefer terminal states (COMPLETED, ABORTED) for archival/deletion. Avoid ACTIVE to prevent removing in‑flight processes. When using explicit piids, combine with a status filter to guard against accidental inclusion.

Estimate volume

POST /processes/count with filters → returns totalCount
Use totalCount to decide batch windows and maintenance windows

Sample the IDs

POST /processes (paginated) with the same filters
Page through to manually spot-check a subset

Understand child processes (scope preview)

POST /processes/children/find with parentProcessInstanceIds to preview scope.
Note: Backup/restore automatically fetch and include recursive child processes. Delete-bulk also deletes recursive children automatically. Use children/find mainly for planning and verification, not as a required step before running.

Dry-run SQL generation (optional for safety)

POST /processes/generate-sql with processInstanceIds
Review the transaction-wrapped SQL before running full backup in production

Schedule the operation

Backups run in batches (up to 100 per iteration; hard cap). Total batches = ceil(totalCount / 100)
Expect the following high-level steps per batch (audit logs will reflect them): count → fetch → children → generate SQL → execute → delete → complete

Communicate recovery plan

Capture restore parameters mirroring the same filters; ensure you know how to trace and reverse if needed

Development Methodology (End Users)

Purpose: a practical, safe way to plan and execute archival and restore jobs so you sleep well after pressing “Run”.

Principles

Safety first: target terminal states (COMPLETED, ABORTED) and/or use beforeNoOfDays to avoid in‑flight data.
Predictability: estimate volume with /processes/count before starting; run in bounded batches (service caps at 100).
Observability: track each run by auditId and decide based on facts, not guesses.
Rehearsal: dry‑run with /processes/generate-sql and perform a lower‑env rehearsal before production.
Reversibility: capture filters you used so you can mirror them for restore if needed.
Concurrency: only one backup/restore runs at a time; avoid parallelism through external automation.

Pre‑flight checklist

Connectivity and readiness
- GET /archivalservice/ping returns ok.
- Swagger reachable locally: http://localhost:3000/archivalservice/api-docs.
- Auth token available; environments configured via .env.
Scope definition
- Choose filters with terminal processStatus and beforeNoOfDays that reflect your retention policy.
- Run POST /processes/count and record totalCount for the window.
- Optionally POST /processes to sample IDs and spot‑check.
- Optionally POST /processes/children/find to preview recursive scope.
Schedule
- Plan a window sized as ceil(totalCount / 100) batches plus margin.
- Communicate a recovery plan (the exact filters you would use to restore).

Execution ladder (backup)

Kick off POST /processes/backup with the selected filters (terminal states recommended). Capture auditId.
Monitor GET /processes/audits/{auditId}/latest every few seconds/minutes.
If failure occurs, inspect errorCode, errorSummary, errorResolve, and errorMessage in the latest audit row.
- For unique violations (23505) on the target DB, delete conflicts via POST /processes/delete-bulk on the target side and retry.
Wait until the final audit row shows auditStatus=completed.

Post‑run verification

Compare POST /processes/count on source vs backup (adjusting DB context) for the same filters.
Sample via POST /processes and verify expected records moved.
Confirm latest audit shows completed with no remaining count.

Execution ladder (restore)

Mirror the backup filters (applied against the backup DB) and POST /processes/restore.
Monitor audits as above until completed.
Verify counts and sample data back in the primary DB for the same scope.

Roll‑back and record‑keeping

Keep a record of the exact JSON filters used (and the returned auditId).
If you need to reverse a backup wave, reuse the same filters for restore or derive piids from the audit logs and restore those explicitly.

Do’s and Don’ts

Do use processStatus: COMPLETED and beforeNoOfDays for production backups.
Do validate “ACTIVE count == 0” when deleting specific piids.
Don’t hand‑pick IDs without a status guard.
Don’t attempt to run multiple archival jobs at once.

Backup Runbook (Async)

Kick off a backup:

POST /processes/backup with the selection filters
Typical response (201 Created) includes auditId to track progress

Track progress and outcomes:

GET /processes/audits/:auditId/latest → last status row
GET /processes/audits/:auditId → full timeline (paginated)
GET /processes/audits/:auditId/summary → total number of audit rows

Notes:

If another backup/restore is already in progress, the service returns a message indicating this and still provides an auditId
Data is deleted from the source DB after being applied to the backup DB for each batch
For a non-destructive rehearsal, use generate-sql on a sample set and run in a lower environment
Best practice: Scope backups to terminal states (e.g., processStatus: COMPLETED) and/or apply beforeNoOfDays to avoid capturing in‑flight data.

Restore Runbook (Async)

Kick off a restore:

POST /processes/restore with filters (applied against the backup DB)
Typical response (201 Created) includes auditId

Tracking mirrors backup:

Use the same audit endpoints to follow status and errors

Notes:

Restore applies data to the primary DB and deletes it from the backup DB
Make sure constraints and schemas on the primary DB match expectations (migrations run at startup and functions/procedures are prepared)
Validate with count and processes before and after
Best practice: Restore only processes that were previously archived in terminal states (e.g., COMPLETED), unless there is an explicit business need to reintroduce in‑flight data.

Delete Bulk Use Case

Endpoint:

POST /processes/delete-bulk with { processInstanceIds: number[] }

Options:

Pass IDs of parent processes; recursive child processes are automatically included in the deletion.
children/find is optional for planning/preview. Response returns operation status.

Safety:

Prefer to delete after you have validated that the matching data is safely backed up
Consider performing a smaller test deletion window before a large wave

Status strategy (preventing accidental deletion):

Delete-bulk operates on the exact IDs you pass and does NOT filter by status. Always ensure the parent process IDs you supply are in terminal states (COMPLETED or ABORTED).
Recommended guardrails:
- Pre-check for active processes among your candidate IDs:
  - POST /processes/count with { piids: [...], processStatus: "ACTIVE" } should return totalCount = 0.
- Prefer to derive the candidate list via filters that enforce terminal states (e.g., processStatus: "COMPLETED", and beforeNoOfDays), rather than hand-picking IDs.
- Children are deleted recursively; verify scope with /processes/children/find when needed.

Example safe delete flow:

Build candidate set (terminal only)

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "processName": "customerOnboarding",
    "processStatus": "COMPLETED",
    "beforeNoOfDays": 30,
    "pageSize": 100,
    "pageNumber": 1
  }'

Guard: ensure none of these are ACTIVE

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/count" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "piids": [/* candidate IDs here */],
    "processStatus": "ACTIVE"
  }'

Delete (includes children automatically)

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/delete-bulk" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processInstanceIds": [/* candidate IDs here */] }'

Auditing, Monitoring & Troubleshooting

Audit discovery:

GET /processes/audits?auditType=&auditStatus=&processName=&deploymentId=&from=&to=&pageNumber=&pageSize=
- Returns the latest row for each auditId matching discovery filters
- X-Total-Count header includes total matches
  Curl (includes headers):
```
curl -i -X GET \
  "http://localhost:3000/archivalservice/bpm/processes/audits?auditType=backup&processName=customerOnboarding&pageNumber=1&pageSize=20" \
  -H "Authorization: Bearer <token>"
```

Summaries:

GET /processes/audits/summary?... → total number of audits matching discovery filters
Curl:

curl -X GET \
  "http://localhost:3000/archivalservice/bpm/processes/audits/summary?auditType=backup&from=2025-01-01T00:00:00Z&to=2025-02-01T00:00:00Z" \
  -H "Authorization: Bearer <token>"

Log details:

Server logs include compiled SQL with schema injections (e.g., "public".process_count_fn_v1(...))
Query failures surface Postgres codes and details in the log (e.g., function missing, permission issues)

Common issues:

Function missing: confirm startup migrations run and the configured schema names are correct
Permission errors: validate DB users have CREATE FUNCTION/PROCEDURE, SELECT/INSERT/DELETE where needed
Concurrency: only one backup/restore at a time; retry later if another is in progress
Unique violation (23505) during backup/restore apply: the target DB already contains conflicting data. Use POST /processes/delete-bulk to remove the conflicting data from the target DB as indicated by audit/log details, then retry the operation.

Minimal examples (selected):

Count

Request: POST /processes/count

{ "processName": "order-fulfillment" }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/count" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processName": "order-fulfillment" }'

Response: { "totalCount": 123 }

Find (paginated)

Request: POST /processes

{ "processName": "customerOnboarding", "pageSize": 50, "pageNumber": 1 }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "processName": "customerOnboarding",
    "pageSize": 50,
    "pageNumber": 1
  }'

Response: { "processInstanceIds": [2001, 2002] }

Children preview

Request: POST /processes/children/find

{ "parentProcessInstanceIds": [201, 205, 210] }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/children/find" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "parentProcessInstanceIds": [201, 205, 210] }'

Response: { "childProcessInstanceIds": [301, 302, 303] }

Generate SQL (dry-run)

Request: POST /processes/generate-sql

{ "processInstanceIds": [101, 102, 103] }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/generate-sql" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processInstanceIds": [101, 102, 103] }'

Response: { "backupSqlScript": ["BEGIN;", "INSERT ...", "COMMIT;"] }

Backup (async)

Request: POST /processes/backup

{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/backup" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }'

Response (201): { "message": "Backup initiated successfully", "auditId": "<uuid>" }
If busy: { "message": "Previous operation is still in progress, please wait for it to complete before retrying", "auditId": "<uuid>" }

Restore (async)

Request: POST /processes/restore

{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/restore" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }'

Response (201): { "message": "Restore initiated successfully", "auditId": "<uuid>" }

Delete-bulk (includes children)

Request: POST /processes/delete-bulk

{ "processInstanceIds": [12345, 67890] }

Curl:

curl -X POST \
  "http://localhost:3000/archivalservice/bpm/processes/delete-bulk" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{ "processInstanceIds": [12345, 67890] }'

Response: { "status": "SUCCESS" } or { "status": "FAILURE: <message>" }

Security & Authorization

Inbound:

Bearer tokens via IDS: Authorization: Bearer <token>
Configure via IDS_ENABLE, IDS_URL, IDS_CLIENT_ID, IDS_CLIENT_SECRET, ALPHA_AUTH_SERVICE_URL
/archivalservice/ping is excluded from auth for health checks

Outbound (service-to-service):

Not used by this service directly for archival operations, but the broader platform employs a centralized token service for outbound calls

Development Methodology

Security & docs:

Annotate new secured endpoints with @ApiBearerAuth() and group them via @ApiTags
Ensure RequestUtils is enforcing auth on all non-excluded routes

Operational considerations:

Keep batch sizes manageable (currently 100) and respect the one-operation-at-a-time rule
Ensure new SQL is idempotent and safe to re-run in migrations
Log failures with enough context to diagnose in production without leaking sensitive data

FAQ & Tips

Q: Can I run a dry-run of backup?
- A: Use generate-sql on a sample set in a non-production environment and review the generated statements
Q: How do I predict downtime?
- A: Use processes/count to estimate total volume and multiply by average per-batch duration to derive a window
Q: What if a batch fails mid-run?
- A: Inspect audits/:auditId for the last successful step, fix the underlying issue (e.g., a constraint), and re-run; operations are batched and audit trails are preserved
Q: How do I ensure complete deletion?
- A: For backup/restore, child processes are handled automatically. For deletions, /processes/delete-bulk also deletes recursive children; use children/find mainly to preview scope when planning.
Q: How do I verify after restore?
- A: Compare processes/count between environments before/after; sample processes endpoints to spot check

Archival & Restore Use Cases

Contents

Overview

Base URL & Docs

Prerequisites & Environment

Quick Start

Planning an Archival

Development Methodology (End Users)

Backup Runbook (Async)

Restore Runbook (Async)

Delete Bulk Use Case

Auditing, Monitoring & Troubleshooting

Security & Authorization

Development Methodology

FAQ & Tips