Pipeline

The main high-level interface for OCR processing.

class manuscript._pipeline.Pipeline(detector=None, recognizer=None, corrector=None, min_text_size=5, rotate_threshold=1.5)[source]

Bases: object

High-level OCR pipeline combining text detection, recognition, and correction.

The Pipeline class orchestrates EAST detector, TRBA recognizer, and optional text corrector to perform complete OCR workflow: detection → crop extraction → recognition → correction → result merging.

detector

Text detector instance

Type:

EAST

recognizer

Text recognizer instance

Type:

TRBA

corrector

Text corrector instance (None to skip correction)

Type:

BaseCorrector, optional

min_text_size

Minimum text box size in pixels (width and height)

Type:

int

rotate_threshold

Aspect ratio threshold for automatic rotation of vertical text crops. If height > width * rotate_threshold, crop is rotated 90° clockwise.

Type:

float

Examples

Create pipeline with default models:

>>> from manuscript import Pipeline
>>> pipeline = Pipeline()
>>> result = pipeline.predict("document.jpg")
>>> text = pipeline.get_text(result["page"])
>>> print(text)

Create pipeline with custom models:

>>> from manuscript import Pipeline
>>> from manuscript.detectors import EAST
>>> from manuscript.recognizers import TRBA
>>> detector = EAST(weights="east_50_g1", score_thresh=0.8)
>>> recognizer = TRBA(weights="trba_lite_g1", device="cuda")
>>> pipeline = Pipeline(detector=detector, recognizer=recognizer)

Create pipeline with text correction:

>>> from manuscript import Pipeline
>>> from manuscript.correctors import CharLM
>>> corrector = CharLM()
>>> pipeline = Pipeline(corrector=corrector)

Disable automatic rotation of vertical text:

>>> pipeline = Pipeline(rotate_threshold=0)
Attributes:
last_correction_page
last_detection_page
last_recognition_page
Parameters:
  • detector (EAST | None)

  • recognizer (TRBA | None)

  • corrector (BaseCorrector | None)

  • min_text_size (int)

  • rotate_threshold (float)

Methods

get_text(page)

Extract plain text from Page object.

predict(image[, recognize_text, vis, profile])

Run OCR pipeline on a single image.

__init__(detector=None, recognizer=None, corrector=None, min_text_size=5, rotate_threshold=1.5)[source]

Initialize OCR pipeline.

Parameters:
  • detector (EAST, optional) – Text detector instance. If None, creates default EAST detector.

  • recognizer (TRBA, optional) – Text recognizer instance. If None, creates default TRBA recognizer.

  • corrector (BaseCorrector, optional) – Text corrector instance. If None, no text correction is applied. The corrector receives a Page object after recognition and returns a corrected Page object.

  • min_text_size (int, optional) – Minimum text size in pixels. Boxes smaller than this will be filtered out before recognition. Default is 5.

  • rotate_threshold (float, optional) – Aspect ratio threshold for automatic rotation of vertical text. If height > width * rotate_threshold, the crop is rotated 90 degrees clockwise to convert vertical text to horizontal. Set to None or 0 to disable automatic rotation. Default is 1.5.

predict(image, recognize_text=True, vis=False, profile=False)[source]

Run OCR pipeline on a single image.

Parameters:
  • image (str, Path, numpy.ndarray, or PIL.Image) – Input image. Can be: - Path to image file (str or Path) - RGB numpy array with shape (H, W, 3) in uint8 - PIL Image object

  • recognize_text (bool, optional) – If True, performs both detection and recognition. If False, performs only detection. Default is True.

  • vis (bool, optional) – If True, returns visualization image along with results. Default is False.

  • profile (bool, optional) – If True, prints timing information for each pipeline stage. Default is False.

Returns:

If vis=False:

dict with keys: - “page” : Page object with detection/recognition results

If vis=True:

tuple of (result_dict, vis_image)

Return type:

dict or tuple

Examples

Basic usage:

>>> pipeline = Pipeline()
>>> result = pipeline.predict("document.jpg")
>>> page = result["page"]
>>> print(page.blocks[0].lines[0].words[0].text)

Detection only:

>>> result = pipeline.predict("document.jpg", recognize_text=False)
>>> # Words will have polygon and detection_confidence but no text

With visualization:

>>> result, vis_img = pipeline.predict("document.jpg", vis=True)
>>> vis_img.show()

With profiling:

>>> result = pipeline.predict("document.jpg", profile=True)
# Prints timing for each stage
get_text(page)[source]

Extract plain text from Page object.

Parameters:

page (Page) – Page object with recognition results.

Returns:

Extracted text with lines separated by newlines.

Return type:

str

Examples

>>> pipeline = Pipeline()
>>> result = pipeline.predict("document.jpg")
>>> text = pipeline.get_text(result["page"])
>>> print(text)
property last_detection_page: Page | None
property last_recognition_page: Page | None
property last_correction_page: Page | None