Utilities

Utility functions for image processing, visualization, and more.

Common utilities for manuscript-ocr.

manuscript.utils.read_image(img_or_path)[source]

Universal image reading with support for multiple input types.

Parameters:

img_or_path (str, Path, bytes, np.ndarray, or PIL.Image) – Image source in one of the following formats: - File path (str or Path) - supports Unicode paths (e.g., Cyrillic) - Bytes buffer (e.g., from HTTP response) - NumPy array (already loaded image) - PIL Image object

Returns:

RGB image as numpy array with shape (H, W, 3) and dtype uint8.

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If the image file cannot be read with either OpenCV or PIL.

  • TypeError – If the input type is not supported.

  • ValueError – If bytes cannot be decoded into an image.

Examples

>>> # Read from file path (with Unicode support)
>>> img = read_image("путь/к/изображению.jpg")
>>> img.shape
(480, 640, 3)
>>> # Read from bytes
>>> with open("image.jpg", "rb") as f:
...     img = read_image(f.read())
>>> # Read from PIL Image
>>> pil_img = Image.open("image.jpg")
>>> img = read_image(pil_img)
>>> # Pass through numpy array
>>> img = read_image(existing_array)
manuscript.utils.create_page_from_text(lines, confidence=1.0)[source]

Create a Page object from a list of text lines.

This utility function creates a simple Page structure from raw text, useful for testing correctors or other text processing components without requiring actual OCR detection/recognition.

Each line becomes a Line object with words split by whitespace. Words are assigned dummy polygon coordinates for compatibility with the data structures.

Parameters:
  • lines (List[str]) – List of text lines. Each line will be split into words.

  • confidence (float, optional) – Confidence score to assign to all words (default 1.0).

Returns:

Page object with one Block containing the provided lines.

Return type:

Page

Examples

>>> from manuscript.utils import create_page_from_text
>>> page = create_page_from_text(["Hello world", "This is a test"])
>>> page.blocks[0].lines[0].words[0].text
'Hello'
>>> len(page.blocks[0].lines)
2

Use with corrector:

>>> from manuscript.correctors import CharLM
>>> from manuscript.utils import create_page_from_text
>>>
>>> # Create page from text with potential OCR errors
>>> page = create_page_from_text(["Привѣтъ міръ"])
>>>
>>> # Apply correction
>>> corrector = CharLM()
>>> corrected = corrector.predict(page)
>>>
>>> # Get corrected text
>>> for line in corrected.blocks[0].lines:
...     print(" ".join(w.text for w in line.words))
manuscript.utils.visualize_page(image, page, color=(0, 255, 0), thickness=2, show_order=True, show_lines=False, show_numbers=False, line_color=(255, 165, 0), number_bg=(255, 255, 255), number_color=(0, 0, 0), max_size=4096)[source]

Visualize a Page object with detected words/blocks.

This function draws all words from the Page structure on the image, optionally showing reading order with numbered markers and connecting lines. When show_order=True, it also visualizes blocks with semi-transparent bounding boxes, each block having a distinct color.

Parameters:
  • image (str, Path, np.ndarray, or PIL.Image) – Input image. Can be: - Path to image file (str or Path) - supports Unicode paths - RGB numpy array with shape (H, W, 3) - PIL Image object

  • page (Page) – Page object from manuscript.data containing detected blocks/words.

  • color (tuple of int, default=(0, 255, 0)) – RGB color for word boundaries.

  • thickness (int, default=2) – Line thickness for word boundaries.

  • show_order (bool, default=True) – If True, colors different text lines with different colors and shows semi-transparent block boundaries with different colors per block.

  • show_lines (bool, default=False) – If True and show_order=True, draw connecting lines between consecutive words showing the reading sequence.

  • show_numbers (bool, default=False) – If True and show_order=True, display numbered markers on each word showing the reading order.

  • line_color (tuple of int, default=(255, 165, 0)) – RGB color for connecting lines between words.

  • number_bg (tuple of int, default=(255, 255, 255)) – Background color for order number boxes.

  • number_color (tuple of int, default=(0, 0, 0)) – Text color for order numbers.

  • max_size (int or None, default=4096) – Maximum size for the longer dimension of the output image. Image will be resized proportionally if larger. Set to None to keep original size.

Returns:

Visualized image with detection boxes and optional reading order annotations. When show_order=True, also includes semi-transparent block boundaries.

Return type:

PIL.Image.Image

Examples

Basic visualization without reading order:

>>> from manuscript import EAST
>>> from manuscript.utils import visualize_page
>>> detector = EAST()
>>> result = detector.predict("document.jpg")
>>> # Can pass path directly
>>> vis = visualize_page("document.jpg", result["page"])
>>> vis.save("output.jpg")

Visualization with reading order and block boundaries:

>>> # Can also use numpy array or PIL Image
>>> from manuscript.utils import read_image
>>> img = read_image("document.jpg")
>>> vis = visualize_page(
...     img,
...     result["page"],
...     show_order=True,
...     color=(255, 0, 0),
...     thickness=3
... )

Show connecting lines and numbers between words:

>>> vis = visualize_page(
...     "document.jpg",
...     result["page"],
...     show_order=True,
...     show_lines=True,
...     show_numbers=True
... )
manuscript.utils.organize_page(page, max_splits=10, use_columns=True)[source]

Organize words in a Page into structured Blocks, Lines, and reading order.

Takes a Page with unorganized words and returns a new Page where: - Words are grouped into columns (Blocks) - Each Block contains Lines of Words - Words within Lines are ordered left-to-right - Lines within Blocks are ordered top-to-bottom - Blocks are ordered left-to-right (for columns)

Parameters:
  • page (Page) – Input Page object. Can contain either: - Words in unstructured blocks/lines - Direct list of words without proper organization

  • max_splits (int, optional) – Maximum number of column splits to attempt when segmenting. Higher values allow more columns to be detected. Default is 10.

  • use_columns (bool, optional) – If True, segments the page into columns (separate Blocks). If False, treats entire page as single column. Default is True.

Returns:

New Page object with organized Blocks, Lines, and reading order set.

Return type:

Page

Examples

>>> from manuscript.detectors import EAST
>>> from manuscript.utils import organize_page
>>>
>>> detector = EAST()
>>> result = detector.predict("image.jpg", sort_reading_order=False)
>>> page = result["page"]
>>>
>>> # Organize into structured reading order
>>> organized_page = organize_page(page, max_splits=5)
>>>
>>> # Access first word in first line of first block
>>> first_word = organized_page.blocks[0].lines[0].words[0]
>>> print(f"Word order: {first_word.order}")

Notes

This function extracts all words from the input Page (regardless of their current organization), converts them to bounding boxes, performs column segmentation and line sorting, then rebuilds a clean Page structure.

The function preserves all Word attributes (polygon, confidence, text, etc.) while updating the order field for reading sequence.

manuscript.utils.set_seed(seed=42)[source]

Set random seed for reproducibility across random, numpy, and PyTorch.

Return type:

None

Parameters:

seed (int)