Data Structures

Core data structures for representing OCR results.

Data Model

The following diagram shows the relationships between data structures:

        graph LR

    %% Entities
    Page[Page]
    Block[Block]
    Line[Line]
    Word[Word]

    %% Relations
    Page -->|"blocks: List[Block]"| Block
    Block -->|"lines: List[Line]"| Line
    Line -->|"words: List[Word]"| Word

    %% Word fields
    Word --> Wpoly["polygon: List[(x, y)]<br>≥ 4 points, clockwise"]
    Word --> Wdet["detection_confidence: float (0–1)"]
    Word --> Wtext["text: Optional[str]"]
    Word --> Wrec["recognition_confidence: Optional[float] (0–1)"]
    Word --> WordOrder["order: Optional[int]<br>assigned after sorting"]

    %% Line fields
    Line --> LineOrder["order: Optional[int]<br>assigned after sorting"]

    %% Block fields
    Block --> BlockOrder["order: Optional[int]<br>assigned after sorting"]
    

API Reference

Data structures for manuscript OCR.

This package contains the core data structures used to represent OCR results throughout the manuscript-ocr library.

class manuscript.data.Word(*args, **kwargs)[source]

Bases: BaseModel

A single detected or recognized word.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

polygon

Polygon vertices (x, y), ordered clockwise. For quadrilateral text regions: TL → TR → BR → BL (Top-Left, Top-Right, Bottom-Right, Bottom-Left).

Type:

List[Tuple[float, float]]

detection_confidence

Text detection confidence score from detector (0.0 to 1.0).

Type:

float

text

Recognized text content (populated by OCR pipeline). None if only detection was performed.

Type:

str, optional

recognition_confidence

Text recognition confidence score from recognizer (0.0 to 1.0). None if only detection was performed.

Type:

float, optional

order

Word position inside the line after sorting. None before sorting.

Type:

int, optional

Examples

>>> word = Word(
...     polygon=[(10, 20), (100, 20), (100, 40), (10, 40)],
...     detection_confidence=0.95,
...     text="Hello",
...     recognition_confidence=0.98
... )
>>> print(word.text)
Hello

Methods

__call__(*args, **kwargs)

Call self as a function.

detection_confidence

order

polygon

recognition_confidence

text

polygon: List[Tuple[float, float]] = Ellipsis
detection_confidence: float = Ellipsis
text: str | None = None
recognition_confidence: float | None = None
order: int | None = None
class manuscript.data.Line(*args, **kwargs)[source]

Bases: BaseModel

A single text line containing one or more words.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

words

List of words in the line.

Type:

List[Word]

order

Line position inside a block or page after sorting. None before sorting.

Type:

int, optional

Examples

>>> line = Line(words=[
...     Word(polygon=[(10, 20), (50, 20), (50, 40), (10, 40)],
...          detection_confidence=0.95, text="Hello"),
...     Word(polygon=[(60, 20), (110, 20), (110, 40), (60, 40)],
...          detection_confidence=0.97, text="World"),
... ])
>>> print(len(line.words))
2

Methods

__call__(*args, **kwargs)

Call self as a function.

order

words: List[Word]
order: int | None = None
class manuscript.data.Block(*args, **kwargs)[source]

Bases: BaseModel

A logical text block (e.g., paragraph, column).

lines

List of text lines in the block.

Type:

List[Line]

words

Legacy: Direct list of words without line structure. Used for backward compatibility. If both lines and words are empty, creates a single line from words.

Type:

List[Word], optional

order

Block reading-order position after sorting. None before sorting.

Type:

int, optional

Examples

>>> block = Block(lines=[
...     Line(words=[Word(polygon=[(10, 20), (50, 20), (50, 40), (10, 40)],
...                      detection_confidence=0.95, text="Line 1")]),
...     Line(words=[Word(polygon=[(10, 50), (50, 50), (50, 70), (10, 70)],
...                      detection_confidence=0.97, text="Line 2")]),
... ])
>>> print(len(block.lines))
2

Methods

__call__(*args, **kwargs)

Call self as a function.

lines

order

words

order: int | None = None
__init__(**data)[source]

Initialize Block with backward compatibility for words-only input.

class manuscript.data.Page(*args, **kwargs)[source]

Bases: BaseModel

A document page containing blocks of text.

For a full visual diagram of the data model, see: DATA_MODEL.md located in the same module directory.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

blocks

List of text blocks on the page.

Type:

List[Block]

Examples

>>> page = Page(blocks=[
...     Block(lines=[
...         Line(words=[Word(polygon=[(10, 20), (50, 20), (50, 40), (10, 40)],
...                          detection_confidence=0.95, text="Hello")])
...     ])
... ])
>>> print(len(page.blocks))
1

Methods

__call__(*args, **kwargs)

Call self as a function.

from_json(source)

Load Page from JSON file or string.

to_json([path, indent])

Export Page to JSON.

blocks: List[Block]
to_json(path=None, indent=2)[source]

Export Page to JSON.

Parameters:
  • path (str or Path, optional) – If provided, saves JSON to file.

  • indent (int, optional) – JSON indentation. Default is 2.

Returns:

JSON string representation.

Return type:

str

Examples

>>> page.to_json("result.json")  # save to file
>>> json_str = page.to_json()    # get as string
classmethod from_json(source)[source]

Load Page from JSON file or string.

Parameters:

source (str or Path) – Path to JSON file or JSON string.

Returns:

Loaded Page object.

Return type:

Page

Examples

>>> page = Page.from_json("result.json")
>>> page = Page.from_json('{"blocks": [...]}')