Utilities
Utility functions for image processing, visualization, page organization, text-span collapsing, and more.
Common utilities for manuscript-ocr.
- manuscript.utils.read_image(img_or_path)[source]
Universal image reading with support for multiple input types.
- Parameters:
img_or_path (str, Path, bytes, np.ndarray, or PIL.Image) – Image source in one of the following formats: - File path (str or Path) - supports Unicode paths (e.g., Cyrillic) - Bytes buffer (e.g., from HTTP response) - NumPy array (already loaded image) - PIL Image object
- Returns:
RGB image as numpy array with shape (H, W, 3) and dtype uint8.
- Return type:
np.ndarray
- Raises:
FileNotFoundError – If the image file cannot be read with either OpenCV or PIL.
TypeError – If the input type is not supported.
ValueError – If bytes cannot be decoded into an image.
Examples
>>> # Read from file path (with Unicode support) >>> img = read_image("путь/к/изображению.jpg") >>> img.shape (480, 640, 3)
>>> # Read from bytes >>> with open("image.jpg", "rb") as f: ... img = read_image(f.read())
>>> # Read from PIL Image >>> pil_img = Image.open("image.jpg") >>> img = read_image(pil_img)
>>> # Pass through numpy array >>> img = read_image(existing_array)
- manuscript.utils.create_page_from_text(lines, confidence=1.0)[source]
Create a Page object from a list of text lines.
This utility function creates a simple Page structure from raw text, useful for testing correctors or other text processing components without requiring actual OCR detection/recognition.
Each line becomes a Line object with text spans split by whitespace. Text spans are assigned dummy polygon coordinates for compatibility with the data structures.
- Parameters:
- Returns:
Page object with one Block containing the provided lines.
- Return type:
Examples
>>> from manuscript.utils import create_page_from_text >>> page = create_page_from_text(["Hello world", "This is a test"]) >>> page.blocks[0].lines[0].text_spans[0].text 'Hello' >>> len(page.blocks[0].lines) 2
Use with corrector:
>>> from manuscript.correctors import CharLM >>> from manuscript.utils import create_page_from_text >>> >>> # Create page from text with potential OCR errors >>> page = create_page_from_text(["Привѣтъ міръ"]) >>> >>> # Apply correction >>> corrector = CharLM() >>> corrected = corrector.predict(page) >>> >>> # Get corrected text >>> for line in corrected.blocks[0].lines: ... print(" ".join(span.text for span in line.text_spans))
- manuscript.utils.visualize_page(image, page, color=(0, 255, 0), thickness=2, show_order=True, show_lines=False, show_numbers=False, line_color=(255, 165, 0), number_bg=(255, 255, 255), number_color=(0, 0, 0), max_size=4096)[source]
Visualize a Page object with detected text spans/blocks.
This function draws all text spans from the Page structure on the image, optionally showing reading order with numbered markers and connecting lines. When show_order=True, it also visualizes blocks with semi-transparent bounding boxes, each block having a distinct color.
- Parameters:
image (str, Path, np.ndarray, or PIL.Image) – Input image. Can be: - Path to image file (str or Path) - supports Unicode paths - RGB numpy array with shape (H, W, 3) - PIL Image object
page (Page) – Page object from manuscript.data containing detected blocks/text spans.
color (tuple of int, default=(0, 255, 0)) – RGB color for text span boundaries.
thickness (int, default=2) – Line thickness for text span boundaries.
show_order (bool, default=True) – If True, colors different text lines with different colors and shows semi-transparent block boundaries with different colors per block.
show_lines (bool, default=False) – If True and show_order=True, draw connecting lines between consecutive text spans showing the reading sequence.
show_numbers (bool, default=False) – If True and show_order=True, display numbered markers on each text span showing the reading order.
line_color (tuple of int, default=(255, 165, 0)) – RGB color for connecting lines between text spans.
number_bg (tuple of int, default=(255, 255, 255)) – Background color for order number boxes.
number_color (tuple of int, default=(0, 0, 0)) – Text color for order numbers.
max_size (int or None, default=4096) – Maximum size for the longer dimension of the output image. Image will be resized proportionally if larger. Set to None to keep original size.
- Returns:
Visualized image with detection boxes and optional reading order annotations. When show_order=True, also includes semi-transparent block boundaries.
- Return type:
PIL.Image.Image
Examples
Basic visualization without reading order:
>>> from manuscript import EAST >>> from manuscript.utils import visualize_page >>> detector = EAST() >>> page = detector.predict("document.jpg") >>> # Can pass path directly >>> vis = visualize_page("document.jpg", page) >>> vis.save("output.jpg")
Visualization with reading order and block boundaries:
>>> # Can also use numpy array or PIL Image >>> from manuscript.utils import read_image >>> img = read_image("document.jpg") >>> vis = visualize_page( ... img, ... page, ... show_order=True, ... color=(255, 0, 0), ... thickness=3 ... )
Show connecting lines and numbers between text spans:
>>> vis = visualize_page( ... "document.jpg", ... page, ... show_order=True, ... show_lines=True, ... show_numbers=True ... )
- manuscript.utils.organize_page(page, max_splits=10, use_columns=True)[source]
Compatibility wrapper around
SimpleSortinglayout model.
- manuscript.utils.crop_axis_aligned(image, polygon, pad=0.0)[source]
Crop an axis-aligned rectangle covering the polygon.
- Return type:
- Parameters:
image (numpy.ndarray)
polygon (numpy.ndarray | Tuple[Tuple[float, float], ...])
pad (float)
- manuscript.utils.crop_polygon_mask(image, polygon, pad=0.0, background=255)[source]
Crop the polygon bounding box and mask pixels outside the polygon.
Works with arbitrary polygons of shape
(N, 2).- Return type:
- Parameters:
image (numpy.ndarray)
polygon (numpy.ndarray | Tuple[Tuple[float, float], ...])
pad (float)
background (int)
- manuscript.utils.merge_polygons(polygons, method='bbox')[source]
Merge multiple polygons into a single polygon.
- Parameters:
polygons (sequence of array-like polygons) – Input polygons with shape
(N, 2).method ({"bbox", "convex_hull"}, optional) – Merge strategy.
"bbox"returns an axis-aligned rectangle covering all points."convex_hull"returns a convex hull over all points.
- Returns:
Merged polygon, or
Nonewhenpolygonsis empty.- Return type:
- manuscript.utils.order_quad_points(points)[source]
Order exactly 4 polygon points as top-left, top-right, bottom-right, bottom-left.
- manuscript.utils.polygon_to_bbox(polygon, image_shape=None, pad=0.0)[source]
Convert a polygon with any number of vertices to a clipped axis-aligned bounding box.
- Parameters:
- Returns:
Bounding box as
(x1, y1, x2, y2)orNoneif invalid.- Return type:
tuple or None
- manuscript.utils.warp_quad(image, polygon, output_size=None, background=255)[source]
Perspective-warp a quadrilateral polygon into a rectified crop.
This helper is intentionally quad-specific. For non-quad polygons it returns
Noneso callers may choose a fallback strategy.
- manuscript.utils.merge_text_spans(text_spans, method='bbox')[source]
Merge multiple
TextSpanobjects into a single widerTextSpan.- Parameters:
text_spans (sequence of TextSpan) – Input text spans to merge.
method ({"bbox", "convex_hull"}, optional) – Polygon merge strategy.
"bbox"creates an axis-aligned rectangle covering all span polygons."convex_hull"creates a convex hull around all polygon vertices. Default is"bbox".
- Returns:
Merged text span, or
Nonewhentext_spansis empty.- Return type:
TextSpan or None
- manuscript.utils.collapse_line_text_spans(line, method='bbox')[source]
Collapse all text spans inside a line into a single text span.
- manuscript.utils.collapse_block_text_spans(block, method='bbox')[source]
Collapse all text spans inside a block into a single line with one text span.
- manuscript.utils.collapse_page_text_spans(page, level='line', method='bbox')[source]
Collapse narrow OCR structure into wider line-level or block-level spans.
- Parameters:
page (Page) – Input page.
level ({"line", "block"}, optional) – Collapse target.
"line"keeps the same block/line structure and replaces each line with one merged text span."block"replaces each block with one line containing one merged text span. Default is"line".method ({"bbox", "convex_hull"}, optional) – Polygon merge strategy. Default is
"bbox".
- Returns:
Collapsed page.
- Return type: