How to Train & Consume Document Prediction or Classification on Neutrinos AI Hub

Gowtham_Balram · November 27, 2024, 7:20am

This guide provides detailed steps to create, train, and deploy a prediction model for classifying various document types on the Neutrinos AI Hub platform. With its intuitive interface and advanced capabilities, Neutrinos AI Hub simplifies document prediction while offering robust configurations.

Step 1: Logging into the Platform

Log in to Neutrinos AI Hub using the username and password provided by your organization.
Once logged in, navigate to the Prediction Tab on the dashboard.

Step 2: Choosing the Right Prediction Type

The platform provides flexibility for different prediction needs:

Document Tab: Select this tab for training models that classify document types.
Text Prediction Tab: Use this for predictive models focusing on textual data, such as fraud detection or churn analysis.

Step 3: Adding a New Prediction Model

Click the Add option to create a new model.
The platform displays an overview of the training workflow, explaining how the process works.

Preparing Your Training Data

If documents are already categorized offline, upload them in bulk to the platform.
Alternatively, upload uncategorized documents and use the platform to tag and annotate them.

Why Categorize Documents?

Categorizing documents ensures the model understands the distinctions between document types. Proper tagging and categorization improve the accuracy and reliability of predictions.

Step 4: Uploading Files for Training

Minimum Requirement: Upload at least 25 files per document category for medium accuracy.
Best Practice: Upload more than 100 files per category to achieve higher prediction accuracy.

Example:

If you are training the model to classify documents into categories like “Invoices,” “Purchase Orders,” and “Receipts,” upload 25–100+ files for each category.

Step 5: Configuring Page Splitting

Some documents may consist of multiple pages that require individual classification. The platform provides the following options:

Yes: The system splits and classifies pages individually.
No: Use this if your documents are single-page or do not require splitting.

Example Use Case:

For multi-page documents like ID cards with separate front and back pages, enable splitting to classify them separately. Otherwise, you can group pages under a single document type.

Step 6: Setting Up the Model Details

Provide the necessary details for the model, including:

Model Name: A descriptive name for the model.
Description: A short summary of the model’s purpose.

Training Rules:

Feedback Loop Configuration:

Why It Matters:

Feedback loops help improve model accuracy by identifying areas for re-training. This ensures the model evolves with user feedback and achieves higher reliability over time.
- Always: Tag all inference requests for review, approval, or re-training.
- Never: Skip tagging (ideal for high-performing models in production).
- Confident: Tag requests only when confidence is below a specific threshold.

Retention Settings:
- Define the retention period for data stored on Neutrinos Cloud.
- Note: Tagged data will remain until manually processed, ensuring compliance with review workflows.

Document Merging:
- Combine multi-page classifications into a single file when required. For example, group “EID Front” and “EID Back” into one document.

Advanced Configurations:
- Improve document quality using features like:
  - Enhancing image contrast.
  - Rotating or resizing.
  - Removing watermarks.
  - Mirroring or flipping images.

Step 7: Starting the Training Process

Click Start Training to initiate the process.
The platform evaluates multiple models using an 80:20 data split for training and testing.
Platform determines the top 7 models suited for the incoming data and batch size and trains, tests based on weighted average score.
It automatically selects the best-performing model based on metrics like precision and F1 score and more.

Training Duration:

Training time depends on the number of document categories, the volume of data, and the selected quality level. It typically takes 15 minutes to 2 hours.

Step 8: Viewing and Testing the Trained Model

Once training is complete, navigate to the Prediction List and select the trained model.
Review model performance metrics, including:
- Confidence: Indicates how certain the model is about its predictions.
- Precision and F1 Score: Key indicators of model accuracy.

Testing Options:

Single Test: Test individual documents for predictions.
Batch Test: Test multiple documents simultaneously to validate bulk predictions.

Step 9: Reviewing Tagged Inference Requests

The Review Hub allows users to:

Review inference requests tagged based on configured feedback rules.
Accept predictions or request re-training to improve accuracy further.

Step 10: Integrating the Model

Go to the Integrations section to find APIs for:
- Single inference requests.
- Batch inference requests.

Schedule jobs for bulk document predictions. The platform supports integrations with:
- CMIS-compliant DMS.
- Amazon S3.
- SFTP.
- Network Folders.
- and more

Example:

Set up a scheduled job to classify documents stored in an S3 bucket and automatically organize results into folders.

Key Features of Neutrinos AI Hub

Multi-Model Training: The platform trains multiple models and selects the best performer for deployment.
Customizable Retention Policies: Ensures compliance with data privacy and retention regulations.
Advanced Document Enrichment: Offers features like watermark removal, contrast enhancement and more to enhance document quality.
Seamless Integrations: Supports real-time and batch predictions via APIs and popular data sources.

Conclusion

Neutrinos AI Hub simplifies the process of training, testing, and deploying document prediction models. By leveraging its feedback loops, advanced configurations, and robust integration capabilities, you can create powerful, scalable solutions tailored to your document classification needs.

Tags: Document Prediction, AI Hub, Machine Learning, Neutrinos, Document Classification

How to Train & Consume Document Prediction or Classification on Neutrinos AI Hub

Step 1: Logging into the Platform

Step 2: Choosing the Right Prediction Type

Step 3: Adding a New Prediction Model

Preparing Your Training Data

Why Categorize Documents?

Step 4: Uploading Files for Training

Example:

Step 5: Configuring Page Splitting

Example Use Case:

Step 6: Setting Up the Model Details

Training Rules:

Why It Matters:

Step 7: Starting the Training Process

Training Duration:

Step 8: Viewing and Testing the Trained Model

Testing Options:

Step 9: Reviewing Tagged Inference Requests

Step 10: Integrating the Model

Example:

Key Features of Neutrinos AI Hub

Conclusion