Getting Started =============== Installation ------------ Minimum System Requirements ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Linux (Ubuntu 20.04+), Windows 10+, or macOS 11+ - at least 2 CPU cores - at least 8 GB RAM - 16 GB RAM or more is recommended for model training - NVIDIA GPU with CUDA support is recommended for acceleration - at least 4 GB VRAM is recommended for GPU execution CPU-only execution is supported, but performance may be significantly lower than on GPU. Model training currently targets CUDA-capable NVIDIA GPUs and falls back to CPU when CUDA is unavailable. **Basic installation** (inference only): .. code-block:: bash pip install manuscript-ocr **Installation with training support** (includes PyTorch): .. code-block:: bash pip install manuscript-ocr[dev] This installs additional dependencies for model training: - PyTorch and TorchVision - ONNX export tools - Training utilities (albumentations, tensorboard, etc.) - Development tools (pytest, black, flake8, etc.) **GPU acceleration** (NVIDIA CUDA): If you are switching an existing installation from CPU to GPU: 1. Remove the CPU version of ONNX Runtime and install the GPU version: .. code-block:: bash pip uninstall onnxruntime pip install onnxruntime-gpu 2. If you are working in Jupyter Notebook, JupyterLab, VS Code notebooks, or Google Colab, restart the kernel or runtime after installation. Reinstalling ``manuscript-ocr`` is not required. You can switch models and pipeline components explicitly with the ``device`` parameter, for example ``device="cuda"`` for NVIDIA GPU or ``device="cpu"`` for CPU: .. code-block:: python from manuscript.detectors import EAST from manuscript.recognizers import TRBA from manuscript.correctors import CharLM detector = EAST(device="cuda") recognizer = TRBA(device="cuda") corrector = CharLM(device="cuda") Diagnostics ^^^^^^^^^^^ If the pipeline still does not switch to GPU, first run: .. code-block:: python import onnxruntime as ort print(ort.get_available_providers()) Case 1. ``"CUDAExecutionProvider"`` is missing Install additional CUDA/cuDNN runtime packages: .. code-block:: bash pip install nvidia-cudnn-cu12 nvidia-cublas-cu12 nvidia-cuda-runtime-cu12 nvidia-cufft-cu12 Then restart the kernel or runtime and create the ``Pipeline`` again. If ONNX Runtime appears to be installed but still behaves incorrectly in a notebook environment, perform a clean GPU reinstall: .. code-block:: bash pip uninstall -y onnxruntime pip install --no-cache-dir --force-reinstall onnxruntime-gpu==1.24.4 pip install --no-cache-dir nvidia-cudnn-cu12 nvidia-cublas-cu12 nvidia-cuda-runtime-cu12 nvidia-cufft-cu12 After that, restart the kernel or runtime again and re-import ``manuscript``. Case 2. ``"CUDAExecutionProvider"`` is present, but the models still fall back to CPU In some notebook environments, ONNX Runtime may require an explicit preload step before importing ``manuscript``: .. code-block:: python import onnxruntime as ort ort.preload_dlls(directory="") After that, import ``manuscript`` and create the ``Pipeline`` again. **Apple Silicon acceleration** (CoreML): .. code-block:: bash pip install manuscript-ocr pip install onnxruntime-silicon Then use ``device="coreml"`` for the relevant models or pipeline components. Quick Start ----------- Basic usage example: .. code-block:: python from manuscript import Pipeline # Create pipeline pipeline = Pipeline() # Process image result = pipeline.predict("document.jpg") # Get recognized text text = pipeline.get_text(result["page"]) print(text) Example Notebooks ----------------- Current example notebooks are available in the repository ``notebooks`` folder: - `End-to-end inference `_ - `Pipeline with YOLO detector `_ - `Pipeline with TrOCR recognizer `_ - `Pipeline with Yandex Speller `_ - `Gradio demo launch `_ - `Detector training launch `_ - `Recognition training launch `_ - `Corrector training launch `_ Main Components --------------- - :class:`~manuscript.Pipeline` - High-level OCR pipeline - :class:`~manuscript.detectors.YOLO` - ONNX text detector for YOLO-family models - :class:`~manuscript.detectors.EAST` - Text detector - :class:`~manuscript.layouts.SimpleSorting` - Layout ordering model - :class:`~manuscript.recognizers.TRBA` - Text recognizer - :class:`~manuscript.correctors.CharLM` - Character-level text corrector - :class:`~manuscript.data.Page` - Page data structure - :class:`~manuscript.data.Block` - Block data structure - :class:`~manuscript.data.Line` - Line data structure - :class:`~manuscript.data.TextSpan` - Smallest OCR text region Model Zoo --------- For the list of built-in presets and release artifacts documented for this documentation version, see :doc:`model_zoo`. Related Work ------------ For publications related to the project and its manuscript OCR experiments, see :doc:`related_work`.