Sharing a POC I developed for intelligent large-scale medical document extraction using AIHub and Alpha UI

The POC demonstrates how we can process large unstructured medical documents (30+, 60+, even 238+ pages) using AIHub, orchestrate extraction workflows efficiently, and provide a structured review experience inside Alpha UI.


The Challenge

Medical documents often arrive as massive unstructured PDFs containing:

  • Clinical notes

  • Reports

  • Summaries

  • Tables

  • Mixed document layouts

  • Semi-structured medical information

Traditional extraction approaches face several challenges:

  • Large files increase processing complexity

  • Extraction failures require manual intervention

  • API calls can become uncontrolled at scale

  • Users lack visibility into extraction progress

  • Reviewing extracted data alongside original documents is difficult


The Solution

Built an intelligent extraction orchestration flow using AIHub and Alpha UI that enables users to:

  • Upload large medical documents

  • Split and process documents efficiently

  • Handle retries and failed extractions gracefully

  • Generate summaries using AIHub

  • Create cases automatically

  • Review extracted content alongside original pages


Core Components Developed

:one: Document Extraction Manager (Lit Component)

Created a reusable Lit component responsible for orchestrating the complete extraction lifecycle.

Features

  • Drag-and-drop or file selection upload

  • Frontend-based PDF page splitting

  • Configurable split count for page batching

  • AIHub assistant integration using configured access token

  • Automatic conversation creation

  • Extraction execution and result polling

  • Async generator–based orchestration for controlled API concurrency

  • Real-time progress tracking

  • Automatic retry handling for failed extractions

  • Configurable retry count

  • Failed extraction queue management

  • Manual retry/resume support

  • Emits final extracted payload through component events for downstream workflows

Technical Highlights

The component uses async generators to control and limit concurrent API calls, improving stability and user experience while processing very large documents.

The architecture was designed to be reusable and configurable so consumers can plug in custom workflows on extraction completion.


:two: Custom AIHub Summary & Case Creation Logic

For the POC workflow, I implemented custom orchestration logic on top of the extraction outputs.

Flow

  • Each extracted page result was summarized individually using AIHub

  • Page summaries were consolidated into a final medical summary

  • Final summary + extraction outputs were used to automatically create a case

  • User is navigated directly into the generated case in Alpha UI

This demonstrated how extracted structured data can flow into downstream business processes with minimal manual effort.


:three: Medical Review Viewer (Lit Component)

Built another reusable Lit component for reviewing extracted content alongside the original document.

Features

  • Displays original document pages on the left side

  • Displays extracted structured content on the right side

  • Retrieves files using AIHub file_id

  • Configurable path mappings for extraction payloads

  • Dynamic rendering based on extraction structure:

    • Tables

    • Key-value layouts

    • Structured summaries

    • Custom extraction formats

  • Pagination-based review experience

  • Page-level summary viewing

This allows users to validate extracted information directly against source medical documents.


POC Workflow

  1. User uploads large medical document

  2. Document is split into configurable page batches on FE

  3. AIHub processes extraction asynchronously

  4. Failed extractions retry automatically

  5. Remaining failures can be resumed manually

  6. Page-level summaries are generated

  7. Final consolidated medical summary is created

  8. Case is automatically created in Alpha UI

  9. User reviews extracted data alongside original pages


Platforms Orchestrated

Platform Role
AIHub Document extraction, summarization
Alpha UI Orchestration, case creation and review experience

Key Outcomes Demonstrated

:check_box_with_check: Large document support — Successfully processed medical files with 30+, 60+, and 238+ pages

:check_box_with_check: Controlled AI orchestration — Async generator pattern ensured optimized and stable API execution

:check_box_with_check: Resilient extraction workflow — Automatic retries with resumable failed processing

:check_box_with_check: Reusable architecture — Built configurable Lit components for future integrations

:check_box_with_check: Structured review experience — Users can validate extracted data against original document pages

:check_box_with_check: AI-assisted summarization — Generated consolidated medical summaries from extracted page-level outputs

:check_box_with_check: End-to-end workflow automation — From raw document upload to case creation and review

:check_box_with_check: Flexible rendering support — Extraction output dynamically adapts based on AIHub model structure


This POC demonstrates how AIHub and Alpha UI can be composed into a scalable intelligent medical document processing workflow capable of handling large unstructured files while maintaining resiliency, visibility, and usability.

Happy to walk through the architecture, component design, or discuss possible enhancements and optimizations!

1 Like