Data Structures

Core data structures for representing OCR results.

Data Model

The following diagram shows the relationships between data structures:

        graph LR

    %% Entities
    Page[Page]
    Block[Block]
    Line[Line]
    TextSpan[TextSpan]

    %% Relations
    Page -->|"blocks: List[Block]"| Block
    Block -->|"lines: List[Line]"| Line
    Line -->|"text_spans: List[TextSpan]"| TextSpan

    %% TextSpan fields
    TextSpan --> Tpoly["polygon: List[(x, y)]<br>≥ 4 points, clockwise"]
    TextSpan --> Tdet["detection_confidence: float (0–1)"]
    TextSpan --> Ttext["text: Optional[str]"]
    TextSpan --> Trec["recognition_confidence: Optional[float] (0–1)"]
    TextSpan --> Torder["order: Optional[int]<br>assigned after sorting"]

    %% Line fields
    Line --> LineOrder["order: Optional[int]<br>assigned after sorting"]

    %% Block fields
    Block --> BlockOrder["order: Optional[int]<br>assigned after sorting"]
    Block --> FlatInput["text_spans: List[TextSpan]<br>optional flat input"]
    

Compatibility

The canonical names in v0_1_11 are TextSpan and text_spans. For code and services that still target v0_1_10, Word and words remain available as compatibility aliases on import, validation, and Python attribute access.

When exporting OCR results, choose the schema explicitly:

page.to_dict(schema="v0_1_11")
page.to_json("result.json", schema="v0_1_10")

Use "v0_1_10" only for legacy JSON consumers. New integrations should prefer "v0_1_11".

Module Reference

Data structures for manuscript OCR.

This package contains the core data structures used to represent OCR results throughout the manuscript-ocr library.

class manuscript.data.TextSpan(*args, **kwargs)[source]

Bases: BaseModel

A single detected or recognized text span.

A text span is the smallest OCR region in the pipeline. It may correspond to a word, a whole text line, or any other contiguous text segment returned by a detector.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

polygon

Polygon vertices (x, y), ordered clockwise. The public data model supports arbitrary polygons with 4 or more points. For quadrilateral text regions, the canonical order is TL -> TR -> BR -> BL (Top-Left, Top-Right, Bottom-Right, Bottom-Left).

Type:

List[Tuple[float, float]]

detection_confidence

Text detection confidence score from detector (0.0 to 1.0).

Type:

float

text

Recognized text content (populated by OCR pipeline). None if only detection was performed.

Type:

str, optional

recognition_confidence

Text recognition confidence score from recognizer (0.0 to 1.0). None if only detection was performed.

Type:

float, optional

order

Text span position inside the line after sorting. None before sorting.

Type:

int, optional

Examples

>>> text_span = TextSpan(
...     polygon=[(10, 20), (100, 20), (100, 40), (10, 40)],
...     detection_confidence=0.95,
...     text="Hello",
...     recognition_confidence=0.98
... )
>>> print(text_span.text)
Hello

Methods

__call__(*args, **kwargs)

Call self as a function.

detection_confidence

model_config

order

polygon

recognition_confidence

text

polygon: List[Tuple[float, float]] = Ellipsis
detection_confidence: float = Ellipsis
text: str | None = None
recognition_confidence: float | None = None
order: int | None = None
class manuscript.data.Line(*args, **kwargs)[source]

Bases: BaseModel

A single text line containing one or more text spans.

text_spans

List of text spans in the line.

Type:

List[TextSpan]

order

Line position inside a block or page after sorting. None before sorting.

Type:

int, optional

Examples

>>> line = Line(text_spans=[
...     TextSpan(
...         polygon=[(10, 20), (50, 20), (50, 40), (10, 40)],
...         detection_confidence=0.95,
...         text="Hello",
...     ),
...     TextSpan(
...         polygon=[(60, 20), (110, 20), (110, 40), (60, 40)],
...         detection_confidence=0.97,
...         text="World",
...     ),
... ])
>>> print(len(line.text_spans))
2
Attributes:
words

Backward-compatible alias for text_spans.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

Methods

__call__(*args, **kwargs)

Call self as a function.

model_config

order

text_spans

order: int | None = None
property words: List[TextSpan]

Backward-compatible alias for text_spans.

class manuscript.data.Block(*args, **kwargs)[source]

Bases: BaseModel

A logical text block (e.g., paragraph, column).

lines

List of text lines in the block.

Type:

List[Line]

text_spans

Optional flat list of text spans used as a shorthand input. If lines is empty and text_spans are provided, they are wrapped into a single line.

Type:

List[TextSpan], optional

order

Block reading-order position after sorting. None before sorting.

Type:

int, optional

Examples

>>> block = Block(lines=[
...     Line(text_spans=[
...         TextSpan(
...             polygon=[(10, 20), (50, 20), (50, 40), (10, 40)],
...             detection_confidence=0.95,
...             text="Line 1",
...         )
...     ]),
...     Line(text_spans=[
...         TextSpan(
...             polygon=[(10, 50), (50, 50), (50, 70), (10, 70)],
...             detection_confidence=0.97,
...             text="Line 2",
...         )
...     ]),
... ])
>>> print(len(block.lines))
2
Attributes:
words

Backward-compatible alias for flat text_spans input.

Methods

__call__(*args, **kwargs)

Call self as a function.

lines

model_config

order

text_spans

order: int | None = None
__init__(**data)[source]

Initialize Block, normalizing flat text_spans into one line.

property words: List[TextSpan]

Backward-compatible alias for flat text_spans input.

class manuscript.data.Page(*args, **kwargs)[source]

Bases: BaseModel

A document page containing blocks of text.

For a full visual diagram of the data model, see: DATA_MODEL.md located in the same module directory.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

blocks

List of text blocks on the page.

Type:

List[Block]

Examples

>>> page = Page(blocks=[
...     Block(lines=[
...         Line(text_spans=[
...             TextSpan(
...                 polygon=[(10, 20), (50, 20), (50, 40), (10, 40)],
...                 detection_confidence=0.95,
...                 text="Hello",
...             )
...         ])
...     ])
... ])
>>> print(len(page.blocks))
1

Methods

__call__(*args, **kwargs)

Call self as a function.

from_json(source)

Load Page from JSON file or string.

to_dict([schema])

Export Page to a plain Python dictionary.

to_json([path, indent, schema])

Export Page to JSON.

model_config

blocks: List[Block]
to_dict(schema='v0_1_11')[source]

Export Page to a plain Python dictionary.

Parameters:

schema ({"v0_1_11", "v0_1_10"}, optional) – Output schema version. Default is "v0_1_11".

Return type:

Dict[str, Any]

to_json(path=None, indent=2, schema='v0_1_11')[source]

Export Page to JSON.

Parameters:
  • path (str or Path, optional) – If provided, saves JSON to file.

  • indent (int, optional) – JSON indentation. Default is 2.

  • schema ({"v0_1_11", "v0_1_10"}, optional) – Output schema version. Default is "v0_1_11".

Returns:

JSON string representation.

Return type:

str

Examples

>>> page.to_json("result.json")  # save to file
>>> json_str = page.to_json()    # get as string
>>> legacy_json = page.to_json(schema="v0_1_10")
classmethod from_json(source)[source]

Load Page from JSON file or string.

Parameters:

source (str or Path) – Path to JSON file or JSON string.

Returns:

Loaded Page object.

Return type:

Page

Examples

>>> page = Page.from_json("result.json")
>>> page = Page.from_json('{"blocks": [...]}')