Pipeline
The main high-level interface for OCR processing.
- class manuscript._pipeline.Pipeline(detector=None, recognizer=None, corrector=None, min_text_size=5, rotate_threshold=1.5)[source]
Bases:
objectHigh-level OCR pipeline combining text detection, recognition, and correction.
The Pipeline class orchestrates EAST detector, TRBA recognizer, and optional text corrector to perform complete OCR workflow: detection → crop extraction → recognition → correction → result merging.
- corrector
Text corrector instance (None to skip correction)
- Type:
BaseCorrector, optional
- rotate_threshold
Aspect ratio threshold for automatic rotation of vertical text crops. If
height > width * rotate_threshold, crop is rotated 90° clockwise.- Type:
Examples
Create pipeline with default models:
>>> from manuscript import Pipeline >>> pipeline = Pipeline() >>> result = pipeline.predict("document.jpg") >>> text = pipeline.get_text(result["page"]) >>> print(text)
Create pipeline with custom models:
>>> from manuscript import Pipeline >>> from manuscript.detectors import EAST >>> from manuscript.recognizers import TRBA >>> detector = EAST(weights="east_50_g1", score_thresh=0.8) >>> recognizer = TRBA(weights="trba_lite_g1", device="cuda") >>> pipeline = Pipeline(detector=detector, recognizer=recognizer)
Create pipeline with text correction:
>>> from manuscript import Pipeline >>> from manuscript.correctors import CharLM >>> corrector = CharLM() >>> pipeline = Pipeline(corrector=corrector)
Disable automatic rotation of vertical text:
>>> pipeline = Pipeline(rotate_threshold=0)
- Attributes:
- last_correction_page
- last_detection_page
- last_recognition_page
- Parameters:
Methods
get_text(page)Extract plain text from Page object.
predict(image[, recognize_text, vis, profile])Run OCR pipeline on a single image.
- __init__(detector=None, recognizer=None, corrector=None, min_text_size=5, rotate_threshold=1.5)[source]
Initialize OCR pipeline.
- Parameters:
detector (EAST, optional) – Text detector instance. If None, creates default EAST detector.
recognizer (TRBA, optional) – Text recognizer instance. If None, creates default TRBA recognizer.
corrector (BaseCorrector, optional) – Text corrector instance. If None, no text correction is applied. The corrector receives a Page object after recognition and returns a corrected Page object.
min_text_size (int, optional) – Minimum text size in pixels. Boxes smaller than this will be filtered out before recognition. Default is 5.
rotate_threshold (float, optional) – Aspect ratio threshold for automatic rotation of vertical text. If
height > width * rotate_threshold, the crop is rotated 90 degrees clockwise to convert vertical text to horizontal. Set toNoneor0to disable automatic rotation. Default is 1.5.
- predict(image, recognize_text=True, vis=False, profile=False)[source]
Run OCR pipeline on a single image.
- Parameters:
image (str, Path, numpy.ndarray, or PIL.Image) – Input image. Can be: - Path to image file (str or Path) - RGB numpy array with shape (H, W, 3) in uint8 - PIL Image object
recognize_text (bool, optional) – If True, performs both detection and recognition. If False, performs only detection. Default is True.
vis (bool, optional) – If True, returns visualization image along with results. Default is False.
profile (bool, optional) – If True, prints timing information for each pipeline stage. Default is False.
- Returns:
- If vis=False:
dict with keys: - “page” : Page object with detection/recognition results
- If vis=True:
tuple of (result_dict, vis_image)
- Return type:
Examples
Basic usage:
>>> pipeline = Pipeline() >>> result = pipeline.predict("document.jpg") >>> page = result["page"] >>> print(page.blocks[0].lines[0].words[0].text)
Detection only:
>>> result = pipeline.predict("document.jpg", recognize_text=False) >>> # Words will have polygon and detection_confidence but no text
With visualization:
>>> result, vis_img = pipeline.predict("document.jpg", vis=True) >>> vis_img.show()
With profiling:
>>> result = pipeline.predict("document.jpg", profile=True) # Prints timing for each stage
- get_text(page)[source]
Extract plain text from Page object.
- Parameters:
page (Page) – Page object with recognition results.
- Returns:
Extracted text with lines separated by newlines.
- Return type:
Examples
>>> pipeline = Pipeline() >>> result = pipeline.predict("document.jpg") >>> text = pipeline.get_text(result["page"]) >>> print(text)