This document explains how to plan and execute archival and restore operations using the Alpha Archival service, and how to work on the service as a developer.
The service manages archival of jBPM-backed process data between a primary BPM database (source) and a backup database (target), with detailed audit trails and safety controls.
Contents
-
Overview
-
Base URL & Docs
-
Prerequisites & Environment
-
Quick Start
-
Planning an Archival
-
Development Methodology (End Users)
-
Backup Runbook (Async)
-
Restore Runbook (Async)
-
Delete Bulk Use Case
-
Auditing, Monitoring & Troubleshooting
-
Security & Authorization
-
Development Methodology
-
FAQ & Tips
Overview
Key capabilities:
-
Count and find process instances by filters (process name, status, dates, task filters, variable filters, explicit
piids) -
Discover recursive child process instance IDs
-
Generate transaction-wrapped SQL backup scripts for a set of process IDs
-
Run full backup (source → backup DB) and restore (backup DB → source) in controlled batches with audit logs
-
Delete process instances in bulk (recursive children included)
Constraints and safety:
-
At most one backup/restore operation runs at a time; a concurrent request returns a message and an
auditIdyou can poll -
Backup deletes matching data from the source after copying to the target; restore deletes matching data from the backup after copying to the source
-
Every step writes detailed rows into the
process_audit_logstable in the backup database for traceability
Base URL & Docs
-
All endpoint paths in this document are relative to
/archivalservice/bpmunless noted otherwise. -
The base path is configurable via
BASE_PATH(default:archivalservice). -
Swagger (local default):
http://localhost:3000/archivalservice/api-docs -
Health check (no auth):
GET /archivalservice/ping→{ status: 'ok' }
Prerequisites & Environment
Databases and schema:
-
Primary BPM DB (source): contains jBPM tables (e.g.,
processinstancelog, tasks, variables) -
Backup BPM DB (target): receives backups and stores audit tables
-
Startup migrations create or update stored procedures/functions in both DBs and create backup tables in the backup DB
Configuration (service .env):
-
Primary DB:
DB_HOST,DB_PORT,DB_USERNAME,DB_PASSWORD,BPM_DB_NAME,BPM_DB_SCHEMA_NAME,DB_SSL, optionalDB_CA_CERT_PATH/DB_CA_CERT -
Backup DB:
BPM_BACKUP_DB_HOST,BPM_BACKUP_DB_PORT,BPM_BACKUP_DB_USERNAME,BPM_BACKUP_DB_PASSWORD,BPM_BACKUP_DB_NAME,BPM_BACKUP_DB_SCHEMA_NAME,BPM_BACKUP_DB_SSL, optionalBPM_BACKUP_DB_CA_CERT_PATH/BPM_BACKUP_DB_CA_CERT -
Service:
BASE_PATH,PORT -
Security:
IDS_ENABLE,IDS_URL,IDS_CLIENT_ID,IDS_CLIENT_SECRET,ALPHA_AUTH_SERVICE_URL
Health check:
GET /archivalservice/pingreturns{ status: 'ok' }and is excluded from auth
Config validation:
- On startup, the service validates that primary (
bpm) and backup (bpm_backup) do not point to the same physical database (same host, port, and database name). If they do, the service remains unhealthy:-
GET /archivalservice/pingreturns HTTP 503 with{ status: 'error', errorCode: 'INVALID_DB_CONFIG', errorMessage: '...' }. -
Backup/restore endpoints respond with HTTP 503 until configuration is fixed.
-
Quick Start
Plan, back up, and track a run end-to-end.
- Estimate volume
POST /processes/count
Body:
Curl:{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/count" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }'
- Kick off backup (async)
POST /processes/backup
Body mirrors your filters (e.g., the one above). Best practice: target terminal states only (e.g.,processStatus: "COMPLETED") and/or usebeforeNoOfDaysto avoid in‑flight processes. Response:
Curl:{ "message": "Backup initiated successfully", "auditId": "<uuid>" }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/backup" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "customerOnboarding", "processStatus": "COMPLETED", "beforeNoOfDays": 90 }'
- Track progress
- Latest:
GET /processes/audits/<auditId>/latest - Full timeline:
GET /processes/audits/<auditId>
Curl (latest):
Curl (timeline):curl -X GET \ "http://localhost:3000/archivalservice/bpm/processes/audits/<auditId>/latest" \ -H "Authorization: Bearer <token>"curl -X GET \ "http://localhost:3000/archivalservice/bpm/processes/audits/<auditId>?pageNumber=1&pageSize=50" \ -H "Authorization: Bearer <token>"
- (Optional) Restore similar scope later
POST /processes/restorewith the same filters (applied against the backup DB).
Curl:curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/restore" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }'
Planning an Archival
Objective: select the right scope, estimate volume, understand relationships, and choose the proper execution strategy.
- Define selection filters
- Use any combination of:
processName(exact match),processStatus(ACTIVE|COMPLETED|ABORTED),deploymentIdbeforeNoOfDays(date cutoff), task filters (taskName,taskStatus,taskAssignee)processVariables: array of{ name, value }filters (AND semantics)piids: explicit list of process instance IDs- Status guidance: Prefer terminal states (
COMPLETED,ABORTED) for archival/deletion. AvoidACTIVEto prevent removing in‑flight processes. When using explicitpiids, combine with a status filter to guard against accidental inclusion.
- Estimate volume
POST /processes/countwith filters → returnstotalCount- Use
totalCountto decide batch windows and maintenance windows
- Sample the IDs
POST /processes(paginated) with the same filters- Page through to manually spot-check a subset
- Understand child processes (scope preview)
POST /processes/children/findwithparentProcessInstanceIdsto preview scope.- Note: Backup/restore automatically fetch and include recursive child processes. Delete-bulk also deletes recursive children automatically. Use
children/findmainly for planning and verification, not as a required step before running.
- Dry-run SQL generation (optional for safety)
POST /processes/generate-sqlwithprocessInstanceIds- Review the transaction-wrapped SQL before running full backup in production
- Schedule the operation
- Backups run in batches (up to 100 per iteration; hard cap). Total batches =
ceil(totalCount / 100) - Expect the following high-level steps per batch (audit logs will reflect them): count → fetch → children → generate SQL → execute → delete → complete
- Communicate recovery plan
- Capture
restoreparameters mirroring the same filters; ensure you know how to trace and reverse if needed
Development Methodology (End Users)
Purpose: a practical, safe way to plan and execute archival and restore jobs so you sleep well after pressing “Run”.
Principles
-
Safety first: target terminal states (
COMPLETED,ABORTED) and/or usebeforeNoOfDaysto avoid in‑flight data. -
Predictability: estimate volume with
/processes/countbefore starting; run in bounded batches (service caps at 100). -
Observability: track each run by
auditIdand decide based on facts, not guesses. -
Rehearsal: dry‑run with
/processes/generate-sqland perform a lower‑env rehearsal before production. -
Reversibility: capture filters you used so you can mirror them for restore if needed.
-
Concurrency: only one backup/restore runs at a time; avoid parallelism through external automation.
Pre‑flight checklist
-
Connectivity and readiness
-
GET /archivalservice/pingreturns ok. -
Swagger reachable locally:
http://localhost:3000/archivalservice/api-docs. -
Auth token available; environments configured via
.env.
-
-
Scope definition
-
Choose filters with terminal
processStatusandbeforeNoOfDaysthat reflect your retention policy. -
Run
POST /processes/countand recordtotalCountfor the window. -
Optionally
POST /processesto sample IDs and spot‑check. -
Optionally
POST /processes/children/findto preview recursive scope.
-
-
Schedule
-
Plan a window sized as
ceil(totalCount / 100)batches plus margin. -
Communicate a recovery plan (the exact filters you would use to restore).
-
Execution ladder (backup)
-
Kick off
POST /processes/backupwith the selected filters (terminal states recommended). CaptureauditId. -
Monitor
GET /processes/audits/{auditId}/latestevery few seconds/minutes. -
If failure occurs, inspect
errorCode,errorSummary,errorResolve, anderrorMessagein the latest audit row.- For unique violations (23505) on the target DB, delete conflicts via
POST /processes/delete-bulkon the target side and retry.
- For unique violations (23505) on the target DB, delete conflicts via
-
Wait until the final audit row shows
auditStatus=completed.
Post‑run verification
-
Compare
POST /processes/counton source vs backup (adjusting DB context) for the same filters. -
Sample via
POST /processesand verify expected records moved. -
Confirm latest audit shows
completedwith no remaining count.
Execution ladder (restore)
-
Mirror the backup filters (applied against the backup DB) and
POST /processes/restore. -
Monitor audits as above until
completed. -
Verify counts and sample data back in the primary DB for the same scope.
Roll‑back and record‑keeping
-
Keep a record of the exact JSON filters used (and the returned
auditId). -
If you need to reverse a backup wave, reuse the same filters for restore or derive
piidsfrom the audit logs and restore those explicitly.
Do’s and Don’ts
-
Do use
processStatus: COMPLETEDandbeforeNoOfDaysfor production backups. -
Do validate “ACTIVE count == 0” when deleting specific
piids. -
Don’t hand‑pick IDs without a status guard.
-
Don’t attempt to run multiple archival jobs at once.
Backup Runbook (Async)
Kick off a backup:
-
POST /processes/backupwith the selection filters -
Typical response (201 Created) includes
auditIdto track progress
Track progress and outcomes:
-
GET /processes/audits/:auditId/latest→ last status row -
GET /processes/audits/:auditId→ full timeline (paginated) -
GET /processes/audits/:auditId/summary→ total number of audit rows
Notes:
-
If another backup/restore is already in progress, the service returns a message indicating this and still provides an
auditId -
Data is deleted from the source DB after being applied to the backup DB for each batch
-
For a non-destructive rehearsal, use
generate-sqlon a sample set and run in a lower environment -
Best practice: Scope backups to terminal states (e.g.,
processStatus: COMPLETED) and/or applybeforeNoOfDaysto avoid capturing in‑flight data.
Restore Runbook (Async)
Kick off a restore:
-
POST /processes/restorewith filters (applied against the backup DB) -
Typical response (201 Created) includes
auditId
Tracking mirrors backup:
- Use the same audit endpoints to follow status and errors
Notes:
-
Restore applies data to the primary DB and deletes it from the backup DB
-
Make sure constraints and schemas on the primary DB match expectations (migrations run at startup and functions/procedures are prepared)
-
Validate with
countandprocessesbefore and after -
Best practice: Restore only processes that were previously archived in terminal states (e.g.,
COMPLETED), unless there is an explicit business need to reintroduce in‑flight data.
Delete Bulk Use Case
Endpoint:
POST /processes/delete-bulkwith{ processInstanceIds: number[] }
Options:
- Pass IDs of parent processes; recursive child processes are automatically included in the deletion.
children/findis optional for planning/preview. Response returns operation status.
Safety:
- Prefer to delete after you have validated that the matching data is safely backed up
- Consider performing a smaller test deletion window before a large wave
Status strategy (preventing accidental deletion):
- Delete-bulk operates on the exact IDs you pass and does NOT filter by status. Always ensure the parent process IDs you supply are in terminal states (
COMPLETEDorABORTED). - Recommended guardrails:
- Pre-check for active processes among your candidate IDs:
POST /processes/countwith{ piids: [...], processStatus: "ACTIVE" }should returntotalCount = 0.
- Prefer to derive the candidate list via filters that enforce terminal states (e.g.,
processStatus: "COMPLETED", andbeforeNoOfDays), rather than hand-picking IDs. - Children are deleted recursively; verify scope with
/processes/children/findwhen needed.
- Pre-check for active processes among your candidate IDs:
Example safe delete flow:
- Build candidate set (terminal only)
curl -X POST \
"http://localhost:3000/archivalservice/bpm/processes" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"processName": "customerOnboarding",
"processStatus": "COMPLETED",
"beforeNoOfDays": 30,
"pageSize": 100,
"pageNumber": 1
}'
- Guard: ensure none of these are ACTIVE
curl -X POST \
"http://localhost:3000/archivalservice/bpm/processes/count" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"piids": [/* candidate IDs here */],
"processStatus": "ACTIVE"
}'
- Delete (includes children automatically)
curl -X POST \
"http://localhost:3000/archivalservice/bpm/processes/delete-bulk" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{ "processInstanceIds": [/* candidate IDs here */] }'
Auditing, Monitoring & Troubleshooting
Audit discovery:
-
GET /processes/audits?auditType=&auditStatus=&processName=&deploymentId=&from=&to=&pageNumber=&pageSize=-
Returns the latest row for each
auditIdmatching discovery filters -
X-Total-Countheader includes total matches
Curl (includes headers):
curl -i -X GET \ "http://localhost:3000/archivalservice/bpm/processes/audits?auditType=backup&processName=customerOnboarding&pageNumber=1&pageSize=20" \ -H "Authorization: Bearer <token>" -
Summaries:
GET /processes/audits/summary?...→ total number of audits matching discovery filters
Curl:curl -X GET \ "http://localhost:3000/archivalservice/bpm/processes/audits/summary?auditType=backup&from=2025-01-01T00:00:00Z&to=2025-02-01T00:00:00Z" \ -H "Authorization: Bearer <token>"
Log details:
- Server logs include compiled SQL with schema injections (e.g.,
"public".process_count_fn_v1(...)) - Query failures surface Postgres codes and details in the log (e.g., function missing, permission issues)
Common issues:
- Function missing: confirm startup migrations run and the configured schema names are correct
- Permission errors: validate DB users have
CREATE FUNCTION/PROCEDURE,SELECT/INSERT/DELETEwhere needed - Concurrency: only one backup/restore at a time; retry later if another is in progress
- Unique violation (23505) during backup/restore apply: the target DB already contains conflicting data. Use
POST /processes/delete-bulkto remove the conflicting data from the target DB as indicated by audit/log details, then retry the operation.
Minimal examples (selected):
- Count
- Request:
POST /processes/count
Curl:{ "processName": "order-fulfillment" }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/count" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "order-fulfillment" }' - Response:
{ "totalCount": 123 }
- Request:
- Find (paginated)
- Request:
POST /processes
Curl:{ "processName": "customerOnboarding", "pageSize": 50, "pageNumber": 1 }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "customerOnboarding", "pageSize": 50, "pageNumber": 1 }' - Response:
{ "processInstanceIds": [2001, 2002] }
- Request:
- Children preview
- Request:
POST /processes/children/find
Curl:{ "parentProcessInstanceIds": [201, 205, 210] }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/children/find" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "parentProcessInstanceIds": [201, 205, 210] }' - Response:
{ "childProcessInstanceIds": [301, 302, 303] }
- Request:
- Generate SQL (dry-run)
- Request:
POST /processes/generate-sql
Curl:{ "processInstanceIds": [101, 102, 103] }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/generate-sql" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processInstanceIds": [101, 102, 103] }' - Response:
{ "backupSqlScript": ["BEGIN;", "INSERT ...", "COMMIT;"] }
- Request:
- Backup (async)
- Request:
POST /processes/backup
Curl:{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/backup" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }' - Response (201):
{ "message": "Backup initiated successfully", "auditId": "<uuid>" } - If busy:
{ "message": "Previous operation is still in progress, please wait for it to complete before retrying", "auditId": "<uuid>" }
- Request:
- Restore (async)
- Request:
POST /processes/restore
Curl:{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/restore" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processName": "customerOnboarding", "beforeNoOfDays": 90 }' - Response (201):
{ "message": "Restore initiated successfully", "auditId": "<uuid>" }
- Request:
- Delete-bulk (includes children)
- Request:
POST /processes/delete-bulk
Curl:{ "processInstanceIds": [12345, 67890] }curl -X POST \ "http://localhost:3000/archivalservice/bpm/processes/delete-bulk" \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{ "processInstanceIds": [12345, 67890] }' - Response:
{ "status": "SUCCESS" }or{ "status": "FAILURE: <message>" }
- Request:
Security & Authorization
Inbound:
- Bearer tokens via IDS:
Authorization: Bearer <token> - Configure via
IDS_ENABLE,IDS_URL,IDS_CLIENT_ID,IDS_CLIENT_SECRET,ALPHA_AUTH_SERVICE_URL /archivalservice/pingis excluded from auth for health checks
Outbound (service-to-service):
- Not used by this service directly for archival operations, but the broader platform employs a centralized token service for outbound calls
Development Methodology
Security & docs:
- Annotate new secured endpoints with
@ApiBearerAuth()and group them via@ApiTags - Ensure RequestUtils is enforcing auth on all non-excluded routes
Operational considerations:
- Keep batch sizes manageable (currently 100) and respect the one-operation-at-a-time rule
- Ensure new SQL is idempotent and safe to re-run in migrations
- Log failures with enough context to diagnose in production without leaking sensitive data
FAQ & Tips
-
Q: Can I run a dry-run of backup?
- A: Use
generate-sqlon a sample set in a non-production environment and review the generated statements
- A: Use
-
Q: How do I predict downtime?
- A: Use
processes/countto estimate total volume and multiply by average per-batch duration to derive a window
- A: Use
-
Q: What if a batch fails mid-run?
- A: Inspect
audits/:auditIdfor the last successful step, fix the underlying issue (e.g., a constraint), and re-run; operations are batched and audit trails are preserved
- A: Inspect
-
Q: How do I ensure complete deletion?
- A: For backup/restore, child processes are handled automatically. For deletions,
/processes/delete-bulkalso deletes recursive children; usechildren/findmainly to preview scope when planning.
- A: For backup/restore, child processes are handled automatically. For deletions,
-
Q: How do I verify after restore?
- A: Compare
processes/countbetween environments before/after; sampleprocessesendpoints to spot check
- A: Compare