Pipeline Usage Guide

The pipeline is stage-based and passes one Page object through the stages:

detector -> layout -> recognizer -> corrector

By default, Pipeline() creates:

detector: YOLO(weights="yolo26x_obb_text_g1")
layout: SimpleSorting()
recognizer: TRBA(weights="trba_lite_g2")
corrector: None

Stage Contracts

Detector

Detector must implement:

def predict(self, image) -> Page:
    ...

Recognizer

Recognizer must implement:

def predict(self, page: Page, image: Optional[np.ndarray] = None) -> Page:
    ...

Layout

Layout model must implement:

def predict(self, page: Page, image: Optional[np.ndarray] = None) -> Page:
    ...

Corrector

Corrector must implement:

def predict(self, page: Page, image: Optional[np.ndarray] = None) -> Page:
    ...

Basic Usage

from manuscript import Pipeline

pipeline = Pipeline()
result = pipeline.predict("document.jpg")
text = pipeline.get_text(result["page"])
print(text)

Disable Stages

You can disable optional stages via None:

from manuscript import Pipeline

# Detection + layout only
pipeline = Pipeline(recognizer=None, corrector=None)

# Detection + recognition only
pipeline = Pipeline(layout=None, corrector=None)

# Detection + layout + recognition (no correction)
pipeline = Pipeline(corrector=None)

Layout Placement

Use layout_after to choose where layout runs:

"detector" (default)
"recognizer"
"corrector"

from manuscript import Pipeline
from manuscript.layouts import SimpleSorting

pipeline = Pipeline(
    layout=SimpleSorting(),
    layout_after="recognizer",
)

If the anchor stage is disabled (for example, recognizer=None with layout_after="recognizer"), layout still executes in that slot.

Built-in Components

from manuscript.detectors import EAST
from manuscript.layouts import SimpleSorting
from manuscript.recognizers import TRBA
from manuscript.correctors import CharLM
from manuscript import Pipeline

detector = EAST(weights="east_50_g1", score_thresh=0.8, iou_threshold=0.2)
layout = SimpleSorting(max_splits=10, use_columns=True)
recognizer = TRBA(weights="trba_lite_g1", device="cuda", min_text_size=5)
corrector = CharLM()

pipeline = Pipeline(
    detector=detector,
    layout=layout,
    recognizer=recognizer,
    corrector=corrector,
    layout_after="detector",
)

TRBA Region Preparation

TRBA supports configurable crop preparation before recognition.

Default settings:

region_preparer="bbox" extracts axis-aligned bounding boxes
rotate_threshold=1.5 auto-rotates tall crops before recognition
min_text_size=5 skips tiny detections

Built-in preparer presets:

"bbox": axis-aligned crop
"polygon_mask": tight crop with pixels outside the polygon masked to white
"quad_warp": perspective rectification for 4-point polygons, with bbox fallback

from manuscript.recognizers import TRBA

recognizer = TRBA(region_preparer="bbox")
recognizer = TRBA(region_preparer="polygon_mask")
recognizer = TRBA(region_preparer="quad_warp")
recognizer = TRBA(
    region_preparer="bbox",
    region_preparer_options={"pad": 2},
)

region_preparer_options is reserved for built-in preset configuration:

"bbox" / "polygon_mask": pad
"polygon_mask": background
"quad_warp": output_size=(width, height), fallback_to_bbox

For advanced cases, you can inject hooks into TRBA instead of writing a full custom recognizer:

import numpy as np

def my_preparer(page, image, recognizer=None, options=None):
    regions = []
    for block in page.blocks:
        for line in block.lines:
            for text_span in line.text_spans:
                poly = np.asarray(text_span.polygon, dtype=np.float32)
                crop = image[10:40, 10:80]
                regions.append(
                    {"text_span": text_span, "image": crop, "polygon": poly}
                )
    return regions

recognizer = TRBA(region_preparer=my_preparer)

If you need complete control over recognition logic, the simplest route is still to provide your own recognizer class with predict(page, image) -> Page.

Collapsing Text Spans

Some recognizers work on whole lines or blocks rather than on individual text-span crops. Use collapse_page_text_spans to convert a narrow page structure into a wider one before recognition.

from manuscript.utils import collapse_page_text_spans

line_level_page = collapse_page_text_spans(
    page,
    level="line",
    method="bbox",
)

block_level_page = collapse_page_text_spans(
    page,
    level="block",
    method="convex_hull",
)

"line" keeps the same blocks and lines, but replaces each line with one merged TextSpan. "block" replaces each block with one line containing one merged TextSpan.

Lower-level helpers are also available:

merge_text_spans(text_spans, method="bbox")
collapse_line_text_spans(line, method="bbox")
collapse_block_text_spans(block, method="bbox")

Visualization and Profiling

result, vis_img = pipeline.predict("document.jpg", vis=True)
vis_img.save("output_visualization.jpg")

result = pipeline.predict("document.jpg", profile=True)

Intermediate Results

After each run, the pipeline keeps snapshots:

pipeline.last_detection_page
pipeline.last_layout_page
pipeline.last_recognition_page
pipeline.last_correction_page

Skipped stages keep corresponding last_* value as None.