Utilities

Utility functions for image processing, visualization, page organization, text-span collapsing, and more.

The utilities module also includes helpers for creating lightweight Page objects without running the full pipeline:

create_page_from_text for testing correctors and other text-processing components from plain text.
create_page_from_image for wrapping a single image or crop into a Page with one TextSpan covering the full image. It can also accept a sequence of crops and build a synthetic page for direct recognizer inference with the 0.1.11+ stage API.

Common utilities for manuscript-ocr.

manuscript.utils.read_image(img_or_path)[source]

Universal image reading with support for multiple input types.

Parameters:

img_or_path (str, Path, bytes, np.ndarray, or PIL.Image) – Image source in one of the following formats: - File path (str or Path) - supports Unicode paths (e.g., Cyrillic) - Bytes buffer (e.g., from HTTP response) - NumPy array (already loaded image) - PIL Image object

Returns:

RGB image as numpy array with shape (H, W, 3) and dtype uint8.

Return type:

np.ndarray

Raises:

FileNotFoundError – If the image file cannot be read with either OpenCV or PIL.
TypeError – If the input type is not supported.
ValueError – If bytes cannot be decoded into an image.

Examples

>>> img = read_image("path/to/image.jpg")
>>> img.shape
(480, 640, 3)

>>> with open("image.jpg", "rb") as f:
...     img = read_image(f.read())

>>> pil_img = Image.open("image.jpg")
>>> img = read_image(pil_img)

>>> img = read_image(existing_array)

manuscript.utils.create_page_from_text(lines, confidence=1.0)[source]

Create a Page object from a list of text lines.

This utility function creates a simple Page structure from raw text, useful for testing correctors or other text processing components without requiring actual OCR detection/recognition.

Each line becomes a Line object with text spans split by whitespace. Text spans are assigned dummy polygon coordinates for compatibility with the data structures.

Parameters:

lines (List[str]) – List of text lines. Each line will be split into text spans.
confidence (float, optional) – Confidence score to assign to all text spans (default 1.0).

Returns:

Page object with one Block containing the provided lines.

Return type:

Page

Examples

>>> from manuscript.utils import create_page_from_text
>>> page = create_page_from_text(["Hello world", "This is a test"])
>>> page.blocks[0].lines[0].text_spans[0].text
'Hello'
>>> len(page.blocks[0].lines)
2

Use with corrector:

>>> from manuscript.correctors import CharLM
>>> from manuscript.utils import create_page_from_text
>>>
>>> page = create_page_from_text(["Привѣтъ міръ"])
>>> corrector = CharLM()
>>> corrected = corrector.predict(page)
>>> for line in corrected.blocks[0].lines:
...     print(" ".join(span.text for span in line.text_spans))

manuscript.utils.create_page_from_image(image, confidence=1.0, gap=8, return_image=False)[source]

Create a Page object that wraps one or more images or text crops.

This utility is useful when a recognizer expects the 0.1.11+ stage API (predict(page, image=...) -> Page), but you want to run inference on one or more pre-cropped images without a detector. For a single image, the function creates one block, one line, and one TextSpan covering the full image extent. For multiple images, the crops are stacked vertically into a synthetic page, and each crop becomes a separate line with one TextSpan.

Parameters:

image (str, Path, bytes, numpy.ndarray, PIL.Image, or sequence thereof) – Image source accepted by read_image().
confidence (float, optional) – Detection confidence assigned to the created text span. Default is 1.0.
gap (int, optional) – Vertical gap in pixels between crops when a sequence of images is passed. Default is 8.
return_image (bool, optional) – If True, also return the normalised RGB image that corresponds to the created Page. Especially useful when image is a sequence and a synthetic page image is built. Default is False.

Returns:

Page object with one block, one line, and one text span covering the whole image. If return_image=True, also returns the RGB image used to build the page.

Return type:

Page or tuple of (Page, numpy.ndarray)

Examples

>>> from manuscript.utils import create_page_from_image
>>> page = create_page_from_image("crop1.png")
>>> span = page.blocks[0].lines[0].text_spans[0]
>>> span.polygon
[(0.0, 0.0), (120.0, 0.0), (120.0, 32.0), (0.0, 32.0)]

Use with a recognizer:

>>> from manuscript.recognizers import TRBA
>>> page = create_page_from_image("crop1.png")
>>> recognizer = TRBA()
>>> result_page = recognizer.predict(page, image="crop1.png")
>>> result_page.blocks[0].lines[0].text_spans[0].text
'example'

Use with multiple crops:

>>> page, composed_image = create_page_from_image(
...     ["crop1.png", "crop2.png"],
...     return_image=True,
... )
>>> recognizer = TRBA()
>>> result_page = recognizer.predict(page, image=composed_image)

manuscript.utils.visualize_page(image, page, color=(0, 255, 0), thickness=4, show_order=True, show_lines=False, show_numbers=False, line_color=(255, 165, 0), number_bg=(255, 255, 255), number_color=(0, 0, 0), max_size=4096)[source]

Visualize a Page object with detected text spans/blocks.

This function draws all text spans from the Page structure on the image, optionally showing reading order with numbered markers and connecting lines. When show_order=True, it also visualizes blocks with semi-transparent bounding boxes, each block having a distinct color.

Parameters:

image (str, Path, np.ndarray, or PIL.Image) – Input image. Can be: - Path to image file (str or Path) - supports Unicode paths - RGB numpy array with shape (H, W, 3) - PIL Image object
page (Page) – Page object from manuscript.data containing detected blocks/text spans.
color (tuple of int, default=(0, 255, 0)) – RGB color for text span boundaries.
thickness (int, default=4) – Line thickness for text span boundaries.
show_order (bool, default=True) – If True, colors different text lines with different colors and shows semi-transparent block boundaries with different colors per block.
show_lines (bool, default=False) – If True and show_order=True, draw connecting lines between consecutive text spans showing the reading sequence.
show_numbers (bool, default=False) – If True and show_order=True, display numbered markers on each text span showing the reading order.
line_color (tuple of int, default=(255, 165, 0)) – RGB color for connecting lines between text spans.
number_bg (tuple of int, default=(255, 255, 255)) – Background color for order number boxes.
number_color (tuple of int, default=(0, 0, 0)) – Text color for order numbers.
max_size (int or None, default=4096) – Maximum size for the longer dimension of the output image. Image will be resized proportionally if larger. Set to None to keep original size.

Returns:

Visualized image with detection boxes and optional reading order annotations. When show_order=True, also includes semi-transparent block boundaries.

Return type:

PIL.Image.Image

Examples

Basic visualization without reading order:

>>> from manuscript import EAST
>>> from manuscript.utils import visualize_page
>>> detector = EAST()
>>> page = detector.predict("document.jpg")
>>> # Can pass path directly
>>> vis = visualize_page("document.jpg", page)
>>> vis.save("output.jpg")

Visualization with reading order and block boundaries:

>>> # Can also use numpy array or PIL Image
>>> from manuscript.utils import read_image
>>> img = read_image("document.jpg")
>>> vis = visualize_page(
...     img,
...     page,
...     show_order=True,
...     color=(255, 0, 0),
...     thickness=3
... )

Show connecting lines and numbers between text spans:

>>> vis = visualize_page(
...     "document.jpg",
...     page,
...     show_order=True,
...     show_lines=True,
...     show_numbers=True
... )

manuscript.utils.organize_page(page, max_splits=10, use_columns=True)[source]

Compatibility wrapper around SimpleSorting layout model.

Parameters:

page (Page) – Input page with detected text spans.
max_splits (int, optional) – Maximum number of column split attempts. Default is 10.
use_columns (bool, optional) – If True, segment into columns before line grouping. Default is True.

Returns:

Organized page.

Return type:

Page

manuscript.utils.crop_axis_aligned(image, polygon, pad=0.0)[source]

Вырезает выровненный по осям прямоугольник, охватывающий полигон.

Return type:

Optional[ndarray]

Parameters:

image (numpy.ndarray)
polygon (numpy.ndarray | Tuple[Tuple[float, float], ...])
pad (float)

manuscript.utils.crop_polygon_mask(image, polygon, pad=0.0, background=255)[source]

Вырезает ограничивающий прямоугольник полигона и маскирует пиксели за пределами полигона.

Работает с произвольными полигонами формы (N, 2).

Return type:

Optional[ndarray]

Parameters:

image (numpy.ndarray)
polygon (numpy.ndarray | Tuple[Tuple[float, float], ...])
pad (float)
background (int)

manuscript.utils.merge_polygons(polygons, method='bbox')[source]

Объединяет несколько полигонов в один.

Return type:

Optional[List[Tuple[float, float]]]

Parameters:

polygons (Sequence[numpy.ndarray | Tuple[Tuple[float, float], ...]])
method (str)

manuscript.utils.order_quad_points(points)[source]

Упорядочивает ровно 4 точки полигона в порядке: верхний левый, верхний правый, нижний правый, нижний левый.

Return type:: ndarray
Parameters:: points (numpy.ndarray | Tuple[Tuple[float, float], ...])

manuscript.utils.polygon_to_bbox(polygon, image_shape=None, pad=0.0)[source]

Преобразует полигон с произвольным числом вершин в обрезанный ограничивающий прямоугольник, выровненный по осям.

Return type:

Optional[Tuple[int, int, int, int]]

Parameters:

polygon (numpy.ndarray | Tuple[Tuple[float, float], ...])
image_shape (Tuple[int, ...] | None)
pad (float)

manuscript.utils.warp_quad(image, polygon, output_size=None, background=255)[source]

Применяет перспективное преобразование к четырёхугольному полигону и возвращает выпрямленный кроп.

Функция намеренно предназначена только для четырёхугольников. Для полигонов с другим числом вершин возвращает None, чтобы вызывающий код мог выбрать запасную стратегию.

Return type:

Optional[ndarray]

Parameters:

image (numpy.ndarray)
polygon (numpy.ndarray | Tuple[Tuple[float, float], ...])
output_size (Tuple[int, int] | None)
background (int)

manuscript.utils.merge_text_spans(text_spans, method='bbox')[source]

Merge multiple TextSpan objects into a single wider TextSpan.

Parameters:

text_spans (sequence of TextSpan) – Input text spans to merge.
method ({"bbox", "convex_hull"}, optional) – Polygon merge strategy. "bbox" creates an axis-aligned rectangle covering all span polygons. "convex_hull" creates a convex hull around all polygon vertices. Default is "bbox".

Returns:

Merged text span, or None when text_spans is empty.

Return type:

TextSpan or None

manuscript.utils.collapse_line_text_spans(line, method='bbox')[source]

Collapse all text spans inside a line into a single text span.

Parameters:

line (Line) – Input line.
method ({"bbox", "convex_hull"}, optional) – Polygon merge strategy. Default is "bbox".

Returns:

New line containing one merged text span or an empty span list.

Return type:

Line

manuscript.utils.collapse_block_text_spans(block, method='bbox')[source]

Collapse all text spans inside a block into a single line with one text span.

Parameters:

block (Block) – Input block.
method ({"bbox", "convex_hull"}, optional) – Polygon merge strategy. Default is "bbox".

Returns:

New block containing a single collapsed line.

Return type:

Block

manuscript.utils.collapse_page_text_spans(page, level='line', method='bbox')[source]

Collapse narrow OCR structure into wider line-level or block-level spans.

Parameters:

page (Page) – Input page.
level ({"line", "block"}, optional) – Collapse target. "line" keeps the same block/line structure and replaces each line with one merged text span. "block" replaces each block with one line containing one merged text span. Default is "line".
method ({"bbox", "convex_hull"}, optional) – Polygon merge strategy. Default is "bbox".

Returns:

Collapsed page.

Return type:

Page

manuscript.utils.set_seed(seed=42)[source]

Set random seed for reproducibility across random, numpy, and PyTorch.

Return type:: None
Parameters:: seed (int)