Pipeline
The main high-level interface for OCR processing.
- class manuscript._pipeline.Pipeline(detector=<object object>, layout=<object object>, recognizer=<object object>, corrector=None, layout_after='detector')[source]
Bases:
objectHigh-level OCR pipeline with configurable stage ordering.
Default pipeline:
detector -> layout -> recognizer.correctoris optional and disabled by default.- Attributes:
- last_correction_page
- last_detection_page
- last_layout_page
- last_recognition_page
- Parameters:
detector (DetectorProtocol)
layout (LayoutProtocol | None)
recognizer (RecognizerProtocol | None)
corrector (CorrectorProtocol | None)
layout_after (str)
Methods
get_text(page)Extract plain text from
Pageobject.predict(image[, vis, profile])Run pipeline on a single image.
- __init__(detector=<object object>, layout=<object object>, recognizer=<object object>, corrector=None, layout_after='detector')[source]
Initialize OCR pipeline.
- Parameters:
detector (object, optional) – Detector instance with
predict(image) -> Page. If omitted, defaultYOLO(weights="yolo26x_obb_text_g1")is used. Detector cannot be disabled.layout (object or None, optional) – Layout model instance with
predict(page, image=None) -> Page. If omitted, defaultSimpleSorting()is used. PassNoneto disable layout stage.recognizer (object or None, optional) – Recognizer instance with
predict(page, image=None, ...) -> Page. If omitted, defaultTRBA(weights="trba_lite_g2")is used. PassNoneto disable recognition stage.corrector (object or None, optional) – Corrector instance with
predict(page, image=None) -> Page. Default isNone(disabled).layout_after ({"detector", "recognizer", "corrector"}, optional) – Slot where layout stage is executed. Default is
"detector".