Getting Started

Installation

Minimum System Requirements

Linux (Ubuntu 20.04+), Windows 10+, or macOS 11+
at least 2 CPU cores
at least 8 GB RAM
16 GB RAM or more is recommended for model training
NVIDIA GPU with CUDA support is recommended for acceleration
at least 4 GB VRAM is recommended for GPU execution

CPU-only execution is supported, but performance may be significantly lower than on GPU. Model training currently targets CUDA-capable NVIDIA GPUs and falls back to CPU when CUDA is unavailable.

Basic installation (inference only):

pip install manuscript-ocr

Installation with training support (includes PyTorch):

pip install manuscript-ocr[dev]

This installs additional dependencies for model training:

PyTorch and TorchVision
ONNX export tools
Training utilities (albumentations, tensorboard, etc.)
Development tools (pytest, black, flake8, etc.)

GPU acceleration (NVIDIA CUDA):

If you are switching an existing installation from CPU to GPU:

Remove the CPU version of ONNX Runtime and install the GPU version:

pip uninstall onnxruntime
pip install onnxruntime-gpu

If you are working in Jupyter Notebook, JupyterLab, VS Code notebooks, or Google Colab, restart the kernel or runtime after installation.

Reinstalling manuscript-ocr is not required.

You can switch models and pipeline components explicitly with the device parameter, for example device="cuda" for NVIDIA GPU or device="cpu" for CPU:

from manuscript.detectors import EAST
from manuscript.recognizers import TRBA
from manuscript.correctors import CharLM

detector = EAST(device="cuda")
recognizer = TRBA(device="cuda")
corrector = CharLM(device="cuda")

Diagnostics

If the pipeline still does not switch to GPU, first run:

import onnxruntime as ort

print(ort.get_available_providers())

Case 1. "CUDAExecutionProvider" is missing

Install additional CUDA/cuDNN runtime packages:

pip install nvidia-cudnn-cu12 nvidia-cublas-cu12 nvidia-cuda-runtime-cu12 nvidia-cufft-cu12

Then restart the kernel or runtime and create the Pipeline again.

If ONNX Runtime appears to be installed but still behaves incorrectly in a notebook environment, perform a clean GPU reinstall:

pip uninstall -y onnxruntime
pip install --no-cache-dir --force-reinstall onnxruntime-gpu==1.24.4
pip install --no-cache-dir nvidia-cudnn-cu12 nvidia-cublas-cu12 nvidia-cuda-runtime-cu12 nvidia-cufft-cu12

After that, restart the kernel or runtime again and re-import manuscript.

Case 2. "CUDAExecutionProvider" is present, but the models still fall back to CPU

In some notebook environments, ONNX Runtime may require an explicit preload step before importing manuscript:

import onnxruntime as ort
ort.preload_dlls(directory="")

After that, import manuscript and create the Pipeline again.

Apple Silicon acceleration (CoreML):

pip install manuscript-ocr
pip install onnxruntime-silicon

Then use device="coreml" for the relevant models or pipeline components.

Quick Start

Basic usage example:

from manuscript import Pipeline

# Create pipeline
pipeline = Pipeline()

# Process image
result = pipeline.predict("document.jpg")

# Get recognized text
text = pipeline.get_text(result["page"])
print(text)

Example Notebooks

Current example notebooks are available in the repository notebooks folder:

Main Components

Pipeline - High-level OCR pipeline
YOLO - ONNX text detector for YOLO-family models
EAST - Text detector
SimpleSorting - Layout ordering model
TRBA - Text recognizer
CharLM - Character-level text corrector
Page - Page data structure
Block - Block data structure
Line - Line data structure
TextSpan - Smallest OCR text region

Model Zoo

For the list of built-in presets and release artifacts documented for this documentation version, see Model Zoo.