Pipeline Usage Guide
The pipeline is stage-based and passes one Page object through the stages:
detector -> layout -> recognizer -> corrector
By default, Pipeline() creates:
detector:
YOLO(weights="yolo26x_obb_text_g1")layout:
SimpleSorting()recognizer:
TRBA(weights="trba_lite_g2")corrector:
None
Stage Contracts
Detector
Detector must implement:
def predict(self, image) -> Page:
...
Recognizer
Recognizer must implement:
def predict(self, page: Page, image: Optional[np.ndarray] = None) -> Page:
...
Layout
Layout model must implement:
def predict(self, page: Page, image: Optional[np.ndarray] = None) -> Page:
...
Corrector
Corrector must implement:
def predict(self, page: Page, image: Optional[np.ndarray] = None) -> Page:
...
Basic Usage
from manuscript import Pipeline
pipeline = Pipeline()
result = pipeline.predict("document.jpg")
text = pipeline.get_text(result["page"])
print(text)
Disable Stages
You can disable optional stages via None:
from manuscript import Pipeline
# Detection + layout only
pipeline = Pipeline(recognizer=None, corrector=None)
# Detection + recognition only
pipeline = Pipeline(layout=None, corrector=None)
# Detection + layout + recognition (no correction)
pipeline = Pipeline(corrector=None)
Layout Placement
Use layout_after to choose where layout runs:
"detector"(default)"recognizer""corrector"
from manuscript import Pipeline
from manuscript.layouts import SimpleSorting
pipeline = Pipeline(
layout=SimpleSorting(),
layout_after="recognizer",
)
If the anchor stage is disabled (for example, recognizer=None with
layout_after="recognizer"), layout still executes in that slot.
Built-in Components
from manuscript.detectors import EAST
from manuscript.layouts import SimpleSorting
from manuscript.recognizers import TRBA
from manuscript.correctors import CharLM
from manuscript import Pipeline
detector = EAST(weights="east_50_g1", score_thresh=0.8, iou_threshold=0.2)
layout = SimpleSorting(max_splits=10, use_columns=True)
recognizer = TRBA(weights="trba_lite_g1", device="cuda", min_text_size=5)
corrector = CharLM()
pipeline = Pipeline(
detector=detector,
layout=layout,
recognizer=recognizer,
corrector=corrector,
layout_after="detector",
)
TRBA Region Preparation
TRBA supports configurable crop preparation before recognition.
Default settings:
region_preparer="bbox"extracts axis-aligned bounding boxesrotate_threshold=1.5auto-rotates tall crops before recognitionmin_text_size=5skips tiny detections
Built-in preparer presets:
"bbox": axis-aligned crop"polygon_mask": tight crop with pixels outside the polygon masked to white"quad_warp": perspective rectification for 4-point polygons, with bbox fallback
from manuscript.recognizers import TRBA
recognizer = TRBA(region_preparer="bbox")
recognizer = TRBA(region_preparer="polygon_mask")
recognizer = TRBA(region_preparer="quad_warp")
recognizer = TRBA(
region_preparer="bbox",
region_preparer_options={"pad": 2},
)
region_preparer_options is reserved for built-in preset configuration:
"bbox"/"polygon_mask":pad"polygon_mask":background"quad_warp":output_size=(width, height),fallback_to_bbox
For advanced cases, you can inject hooks into TRBA instead of writing a
full custom recognizer:
import numpy as np
def my_preparer(page, image, recognizer=None, options=None):
regions = []
for block in page.blocks:
for line in block.lines:
for text_span in line.text_spans:
poly = np.asarray(text_span.polygon, dtype=np.float32)
crop = image[10:40, 10:80]
regions.append(
{"text_span": text_span, "image": crop, "polygon": poly}
)
return regions
recognizer = TRBA(region_preparer=my_preparer)
If you need complete control over recognition logic, the simplest route is
still to provide your own recognizer class with predict(page, image) -> Page.
Collapsing Text Spans
Some recognizers work on whole lines or blocks rather than on individual
text-span crops. Use collapse_page_text_spans to convert a narrow page
structure into a wider one before recognition.
from manuscript.utils import collapse_page_text_spans
line_level_page = collapse_page_text_spans(
page,
level="line",
method="bbox",
)
block_level_page = collapse_page_text_spans(
page,
level="block",
method="convex_hull",
)
"line" keeps the same blocks and lines, but replaces each line with one
merged TextSpan. "block" replaces each block with one line
containing one merged TextSpan.
Lower-level helpers are also available:
merge_text_spans(text_spans, method="bbox")collapse_line_text_spans(line, method="bbox")collapse_block_text_spans(block, method="bbox")
Visualization and Profiling
result, vis_img = pipeline.predict("document.jpg", vis=True)
vis_img.save("output_visualization.jpg")
result = pipeline.predict("document.jpg", profile=True)
Intermediate Results
After each run, the pipeline keeps snapshots:
pipeline.last_detection_pagepipeline.last_layout_pagepipeline.last_recognition_pagepipeline.last_correction_page
Skipped stages keep corresponding last_* value as None.