Pipeline Usage Guide ===================== The Pipeline class in ``manuscript-ocr`` is designed to work with **any** detectors, recognizers, and correctors that implement a simple interface. Detector Requirements --------------------- A detector class must implement a ``predict`` method that takes an image and returns a dictionary with a ``"page"`` key: .. code-block:: python def predict(self, image) -> Dict[str, Any]: """ Parameters: - image: file path (str) or numpy array (H, W, 3) in uint8 Returns dictionary: { "page": Page # Page object with detection results } """ pass Result Structure ~~~~~~~~~~~~~~~~ The result must contain a ``Page`` object with hierarchy: **Page** → **Block** → **Line** → **Word** See ``src/manuscript/data/structures.py`` for detailed structure documentation. **Minimal example of creating a Page:** .. code-block:: python from manuscript.data import Word, Line, Block, Page # Create a word with coordinates and detection confidence word = Word( polygon=[(10, 20), (100, 20), (100, 40), (10, 40)], detection_confidence=0.95 ) # Group words into a line line = Line(words=[word]) # Group lines into a block block = Block(lines=[line]) # Create a page page = Page(blocks=[block]) Recognizer Requirements ----------------------- A recognizer class must implement a ``predict`` method that takes a list of images and returns a list of results: .. code-block:: python def predict(self, images: List[np.ndarray]) -> List[Dict[str, Any]]: """ Parameters: - images: list of numpy arrays (RGB word images) Returns list of dictionaries: [ {"text": "word1", "confidence": 0.95}, {"text": "word2", "confidence": 0.92}, ... ] """ pass **Example:** .. code-block:: python class MyRecognizer: def predict(self, images): results = [] for img in images: # Your recognition logic text = "recognized_text" confidence = 0.92 results.append({"text": text, "confidence": confidence}) return results Corrector Requirements ---------------------- A corrector class must implement a ``predict`` method that takes a Page object and returns a corrected Page: .. code-block:: python def predict(self, page: Page) -> Page: """ Parameters: - page: Page object with recognized text Returns: - Page: Page object with corrected text """ pass **Example:** .. code-block:: python from manuscript.data import Page class MyCorrector: def predict(self, page: Page) -> Page: result = page.model_copy(deep=True) for block in result.blocks: for line in block.lines: for word in line.words: if word.text: # Your correction logic word.text = self._correct(word.text) return result def _correct(self, text: str) -> str: # Text correction logic return text Built-in CharLM Corrector ~~~~~~~~~~~~~~~~~~~~~~~~~~ CharLM is a Transformer-based character-level language model for correcting OCR errors: .. code-block:: python from manuscript.correctors import CharLM # With default settings corrector = CharLM() # With custom parameters corrector = CharLM( weights="prereform_charlm_g1", # or "modern_charlm_g1" mask_threshold=0.05, # confidence threshold for correction apply_threshold=0.95, # minimum model confidence max_edits=2, # max edits per word min_word_len=4, # min word length for correction lexicon="prereform_words" # lexicon of known words ) Compatible Implementation Examples ---------------------------------- Complete Detector Example ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from manuscript.data import Word, Line, Block, Page class MyDetector: def predict(self, image): # Your image detection logic # ... # Create result words = [ Word( polygon=[(10, 20), (100, 20), (100, 40), (10, 40)], detection_confidence=0.95 ), Word( polygon=[(110, 20), (200, 20), (200, 40), (110, 40)], detection_confidence=0.92 ), ] line = Line(words=words) block = Block(lines=[line]) page = Page(blocks=[block]) return {"page": page} Using Custom Components ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from manuscript import Pipeline from my_package import MyDetector, MyRecognizer, MyCorrector # Use custom detector and recognizer detector = MyDetector() recognizer = MyRecognizer() corrector = MyCorrector() pipeline = Pipeline( detector=detector, recognizer=recognizer, corrector=corrector ) result = pipeline.predict("document.jpg") Pipeline Usage Examples ----------------------- Basic Usage ~~~~~~~~~~~ .. code-block:: python from manuscript import Pipeline # Initialize with default models pipeline = Pipeline() # Process image result = pipeline.predict("document.jpg") page = result["page"] # Extract text text = pipeline.get_text(page) print(text) Detection Only (Without Recognition) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python result = pipeline.predict("document.jpg", recognize_text=False) page = result["page"] # Words have polygon and detection_confidence, but no text for block in page.blocks: for line in block.lines: for word in line.words: print(f"Polygon: {word.polygon}, Confidence: {word.detection_confidence}") With Visualization ~~~~~~~~~~~~~~~~~~ .. code-block:: python result, vis_img = pipeline.predict("document.jpg", vis=True) vis_img.save("output_visualization.jpg") Intermediate Results ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from manuscript.correctors import CharLM pipeline = Pipeline(corrector=CharLM()) result = pipeline.predict("document.jpg") # Result after detection (before recognition) detection_page = pipeline.last_detection_page # Result after recognition (before correction) recognition_page = pipeline.last_recognition_page # Result after correction (None if corrector not used) correction_page = pipeline.last_correction_page Export/Import Page to JSON ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python page = result["page"] # Save to file page.to_json("result.json") # Get as string json_str = page.to_json() # Load from file from manuscript.data import Page page = Page.from_json("result.json") # Load from string page = Page.from_json('{"blocks": [...]}') With Profiling ~~~~~~~~~~~~~~ .. code-block:: python # Prints execution time for each stage result = pipeline.predict("document.jpg", profile=True) # Output: # Detection: 0.123s # Load image for crops: 0.005s # Extract 45 crops: 0.012s # Recognition: 0.234s # Pipeline total: 0.374s Batch Processing ~~~~~~~~~~~~~~~~ .. code-block:: python images = ["page1.jpg", "page2.jpg", "page3.jpg"] results = pipeline.process_batch(images) for result in results: text = pipeline.get_text(result["page"]) print(text) Component Configuration ----------------------- Replacing Detector or Recognizer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from manuscript import Pipeline # Only custom detector, default recognizer from my_package import MyCustomDetector pipeline = Pipeline(detector=MyCustomDetector()) # Only custom recognizer, default detector from my_package import MyCustomRecognizer pipeline = Pipeline(recognizer=MyCustomRecognizer()) # Both components custom pipeline = Pipeline(detector=MyCustomDetector(), recognizer=MyCustomRecognizer()) Built-in Model Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from manuscript import Pipeline from manuscript.detectors import EAST from manuscript.recognizers import TRBA # EAST with settings detector = EAST( weights="east_50_g1", # weight selection score_thresh=0.8, # confidence threshold nms_thresh=0.2, # NMS threshold device="cpu" # device (cpu/cuda) ) # TRBA with settings recognizer = TRBA( weights="trba_lite_g1", # weight selection device="cuda" # GPU for acceleration ) pipeline = Pipeline(detector, recognizer) Size Filtering ~~~~~~~~~~~~~~ .. code-block:: python # Ignore text blocks smaller than 10 pixels pipeline = Pipeline(min_text_size=10) Automatic Rotation Control ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Enable automatic rotation of vertical text (default) pipeline = Pipeline(rotate_threshold=1.5) # Disable automatic rotation pipeline = Pipeline(rotate_threshold=0)