Pipeline

The main high-level interface for OCR processing.

class manuscript._pipeline.Pipeline(detector=None, recognizer=None, corrector=None, min_text_size=5, rotate_threshold=1.5)[source]

Bases: object

High-level OCR pipeline combining text detection, recognition, and correction.

The Pipeline class orchestrates EAST detector, TRBA recognizer, and optional text corrector to perform complete OCR workflow: detection → crop extraction → recognition → correction → result merging.

detector

Text detector instance

Type:: EAST

recognizer

Text recognizer instance

Type:: TRBA

corrector

Text corrector instance (None to skip correction)

Type:: BaseCorrector, optional

min_text_size

Minimum text box size in pixels (width and height)

Type:: int

rotate_threshold

Aspect ratio threshold for automatic rotation of vertical text crops. If height > width * rotate_threshold, crop is rotated 90° clockwise.

Type:: float

Examples

Create pipeline with default models:

>>> from manuscript import Pipeline
>>> pipeline = Pipeline()
>>> result = pipeline.predict("document.jpg")
>>> text = pipeline.get_text(result["page"])
>>> print(text)

Create pipeline with custom models:

>>> from manuscript import Pipeline
>>> from manuscript.detectors import EAST
>>> from manuscript.recognizers import TRBA
>>> detector = EAST(weights="east_50_g1", score_thresh=0.8)
>>> recognizer = TRBA(weights="trba_lite_g1", device="cuda")
>>> pipeline = Pipeline(detector=detector, recognizer=recognizer)

Create pipeline with text correction:

>>> from manuscript import Pipeline
>>> from manuscript.correctors import CharLM
>>> corrector = CharLM()
>>> pipeline = Pipeline(corrector=corrector)

Disable automatic rotation of vertical text:

>>> pipeline = Pipeline(rotate_threshold=0)

Attributes:

last_correction_page
last_detection_page
last_recognition_page

Parameters:

detector (EAST | None)
recognizer (TRBA | None)
corrector (BaseCorrector | None)
min_text_size (int)
rotate_threshold (float)

Methods

`get_text`(page)	Extract plain text from Page object.
`predict`(image[, recognize_text, vis, profile])	Run OCR pipeline on a single image.

__init__(detector=None, recognizer=None, corrector=None, min_text_size=5, rotate_threshold=1.5)[source]

Initialize OCR pipeline.

Parameters:

detector (EAST, optional) – Text detector instance. If None, creates default EAST detector.
recognizer (TRBA, optional) – Text recognizer instance. If None, creates default TRBA recognizer.
corrector (BaseCorrector, optional) – Text corrector instance. If None, no text correction is applied. The corrector receives a Page object after recognition and returns a corrected Page object.
min_text_size (int, optional) – Minimum text size in pixels. Boxes smaller than this will be filtered out before recognition. Default is 5.
rotate_threshold (float, optional) – Aspect ratio threshold for automatic rotation of vertical text. If height > width * rotate_threshold, the crop is rotated 90 degrees clockwise to convert vertical text to horizontal. Set to None or 0 to disable automatic rotation. Default is 1.5.

predict(image, recognize_text=True, vis=False, profile=False)[source]

Run OCR pipeline on a single image.

Parameters:

image (str, Path, numpy.ndarray, or PIL.Image) – Input image. Can be: - Path to image file (str or Path) - RGB numpy array with shape (H, W, 3) in uint8 - PIL Image object
recognize_text (bool, optional) – If True, performs both detection and recognition. If False, performs only detection. Default is True.
vis (bool, optional) – If True, returns visualization image along with results. Default is False.
profile (bool, optional) – If True, prints timing information for each pipeline stage. Default is False.

Returns:

If vis=False:: dict with keys: - “page” : Page object with detection/recognition results
If vis=True:: tuple of (result_dict, vis_image)

Return type:

dict or tuple

Examples

Basic usage:

>>> pipeline = Pipeline()
>>> result = pipeline.predict("document.jpg")
>>> page = result["page"]
>>> print(page.blocks[0].lines[0].words[0].text)

Detection only:

>>> result = pipeline.predict("document.jpg", recognize_text=False)
>>> # Words will have polygon and detection_confidence but no text

With visualization:

>>> result, vis_img = pipeline.predict("document.jpg", vis=True)
>>> vis_img.show()

With profiling:

>>> result = pipeline.predict("document.jpg", profile=True)
# Prints timing for each stage

get_text(page)[source]

Extract plain text from Page object.

Parameters:: page (Page) – Page object with recognition results.
Returns:: Extracted text with lines separated by newlines.
Return type:: str

Examples

>>> pipeline = Pipeline()
>>> result = pipeline.predict("document.jpg")
>>> text = pipeline.get_text(result["page"])
>>> print(text)

property last_detection_page: Page | None

property last_recognition_page: Page | None

property last_correction_page: Page | None