Recognizers

Text recognition models.

class manuscript.recognizers.TRBA(weights=None, config=None, charset=None, device=None, rotate_threshold=1.5, region_preparer='bbox', region_preparer_options=None, min_text_size=5, **kwargs)[source]

Bases: BaseRecognizer

Initialize TRBA text recognition model with ONNX Runtime.

Parameters:

weights (str or Path, optional) –
Path or identifier for ONNX model weights. Supports:
- Local file path: "path/to/model.onnx"
- HTTP/HTTPS URL: "https://example.com/model.onnx"
- GitHub release: "github://owner/repo/tag/file.onnx"
- Google Drive: "gdrive:FILE_ID"
- Preset name: "trba_lite_g1" or "trba_base_g1" (from pretrained_registry)
- None: auto-downloads default preset (trba_lite_g1)
config (str or Path, optional) – Path or identifier for model configuration JSON. Same URL schemes as weights. If None, attempts to infer from weights location or uses default config for preset models.
charset (str or Path, optional) – Path or identifier for character set file. If None, attempts to find charset near weights or falls back to package default.
device ({"cuda", "coreml", "cpu"}, optional) –
Compute device. If None, automatically selects CPU. For GPU/CoreML acceleration:
- CUDA (NVIDIA): pip install onnxruntime-gpu
- CoreML (Apple Silicon M1/M2/M3): pip install onnxruntime-silicon
Default is None (CPU).
rotate_threshold (float or None, optional) – Aspect-ratio threshold for rotating vertical text-span crops before recognition. If height > width * rotate_threshold, crop is rotated 90 degrees clockwise. Set to 0 or None to disable. Default is 1.5.
region_preparer ({"bbox", "polygon_mask", "quad_warp"} or callable, optional) – Strategy used to convert Page polygons into recognition crops. "bbox" extracts axis-aligned bounding boxes for arbitrary polygons. "polygon_mask" masks pixels outside the polygon inside a tight crop and also supports arbitrary polygons. "quad_warp" rectifies only 4-point polygons with a perspective transform before recognition. A custom callable may also be provided and should return a list of prepared text regions. Default is "bbox".
region_preparer_options (dict or None, optional) – Optional configuration for built-in region preparers. Defaults to None. Typical options are pad for "bbox" and "polygon_mask", or output_size=(width, height) for "quad_warp". Non-quad polygons passed to "quad_warp" fall back to bbox crops by default.
min_text_size (int, optional) – Minimum crop width/height in pixels to run recognition for a text span. Text spans below this threshold are skipped. Default is 5.
**kwargs – Additional configuration options (reserved for future use).

Raises:

FileNotFoundError – If specified files do not exist.
ValueError – If weights format is invalid.

Notes

The class provides three main public methods:

predict - run recognition over text spans in a Page object.
train - high-level training entrypoint to train a TRBA model on custom datasets.
export - static method to export PyTorch model to ONNX format.

Model uses ONNX Runtime for fast inference on CPU and GPU. For GPU acceleration, install: pip install onnxruntime-gpu

Examples

Create recognizer with default preset (auto-downloads):

>>> from manuscript.recognizers import TRBA
>>> recognizer = TRBA()

Load from local ONNX file:

>>> recognizer = TRBA(weights="path/to/model.onnx")

Load from GitHub release:

>>> recognizer = TRBA(
...     weights="github://owner/repo/v1.0/model.onnx",
...     config="github://owner/repo/v1.0/config.json"
... )

Force CPU execution:

>>> recognizer = TRBA(weights="model.onnx", device="cpu")

Methods

`__call__`(args, *kwargs)	Call self as a function.
`export`(weights_path, config_path, ...[, ...])	Export TRBA PyTorch model to ONNX format.
`predict`(page[, image, batch_size, ...])	Recognize text for text spans in a `Page` and return updated `Page`.
`runtime_providers`()	Get ONNX Runtime execution providers based on device.
`train`(train_csvs, train_roots[, val_csvs, ...])	Train TRBA text recognition model on custom datasets.

__init__(weights=None, config=None, charset=None, device=None, rotate_threshold=1.5, region_preparer='bbox', region_preparer_options=None, min_text_size=5, **kwargs)[source]

Parameters:

weights (str | None)
config (str | None)
charset (str | None)
device (str | None)
rotate_threshold (float | None)
region_preparer (str | Callable[[...], Sequence[Any]])
region_preparer_options (Dict[str, Any] | None)
min_text_size (int)

charset_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.txt', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.txt', 'trba_lite_g2': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g2.txt'}

config_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.json', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.json', 'trba_lite_g2': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g2.json'}

default_weights_name: str | None = 'trba_lite_g1'

static export(weights_path, config_path, charset_path, output_path, opset_version=14, simplify=True)[source]

Export TRBA PyTorch model to ONNX format.

This method converts a trained TRBA model from PyTorch to ONNX format, which can be used for faster inference with ONNX Runtime. The exported model can be loaded using TRBA(weights="model.onnx").

Parameters:

weights_path (str or Path) – Path to the PyTorch model weights file (.pth).
config_path (str or Path) – Path to the model configuration JSON file. Used to determine model architecture (img_h, img_w, max_len, hidden_size, etc.).
charset_path (str or Path) – Path to the charset file (charset.txt). Used to determine num_classes for the model.
output_path (str or Path) – Path where the ONNX model will be saved (.onnx).
opset_version (int, optional) – ONNX opset version to use for export. Default is 14.
simplify (bool, optional) – If True, applies ONNX graph simplification using onnx-simplifier to optimize the model. Requires onnx-simplifier package. Default is True.

Returns:

The ONNX model is saved to output_path.

Return type:

None

Raises:

ImportError – If required packages (torch, onnx) are not installed.
FileNotFoundError – If weights_path or config_path do not exist.

Notes

The exported ONNX model has one output:

logits: Character predictions with shape (batch, max_length+1, num_classes)

The model uses greedy decoding (argmax) and supports dynamic batch size. The sequence length is fixed to max_length + 1 from the config (same as PyTorch inference mode for compatibility).

Architecture exported: - CNN backbone (SE-ResNet-31 or SE-ResNet-31-Lite) - Bidirectional LSTM encoder - Attention decoder (greedy decoding)

Note: Only the attention decoder is exported. CTC head is used only during training and is not included in the ONNX model.

Examples

Export TRBA model to ONNX:

>>> from manuscript.recognizers import TRBA
>>> TRBA.export(
...     weights_path="experiments/best_model/best_acc_weights.pth",
...     config_path="experiments/best_model/config.json",
...     charset_path="configs/charset.txt",
...     output_path="trba_model.onnx"
... )
Loading TRBA model...
=== TRBA ONNX Export ===
Max decoding length: 40
Input size: 64x256
[OK] ONNX model saved to: trba_model.onnx

Export with custom opset:

>>> TRBA.export(
...     weights_path="model.pth",
...     config_path="config.json",
...     charset_path="charset.txt",
...     output_path="model.onnx",
...     opset_version=16,
...     simplify=False
... )

Use the exported model for inference:

>>> from manuscript.detectors import EAST
>>> recognizer = TRBA(weights="trba_model.onnx")
>>> detector = EAST()
>>> det = detector.predict("page.jpg")
>>> result = recognizer.predict(det["page"], image="page.jpg")

See also

TRBA.__init__: Initialize TRBA recognizer with ONNX model.

predict(page, image=None, batch_size=32, debug_save_dir=None)[source]

Recognize text for text spans in a Page and return updated Page.

Parameters:

page (Page) – Page object with detected text-span polygons.
image (str, Path, numpy.ndarray, or PIL.Image, optional) – Source page image used to extract text regions. If None, recognition is skipped and a deep copy of page is returned.
batch_size (int, optional) – Number of prepared text regions to process simultaneously.
debug_save_dir (str or Path, optional) – If provided, saves the prepared recognition crops to this directory as *.png files together with index.json. Crops are saved after region_preparer and auto-rotation, i.e. in the same orientation that goes into recognizer inference.

Returns:

New Page object with recognized text and recognition_confidence filled for processed text spans.

Return type:

Page

pretrained_registry: Dict[str, str] = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.onnx', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.onnx', 'trba_lite_g2': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g2.onnx'}

static train(train_csvs, train_roots, val_csvs=None, val_roots=None, *, exp_dir=None, charset_path=None, encoding='utf-8', img_h=64, img_w=256, max_len=25, hidden_size=256, num_encoder_layers=3, cnn_in_channels=3, cnn_out_channels=512, cnn_backbone='seresnet31', ctc_weight=0.3, ctc_weight_decay_epochs=50, ctc_weight_min=0.0, max_grad_norm=5.0, batch_size=32, epochs=20, lr=0.001, optimizer='AdamW', scheduler='OneCycleLR', weight_decay=0.0, momentum=0.9, val_interval=1, val_size=3000, train_proportions=None, num_workers=0, seed=42, resume_from=None, save_interval=None, device='cuda', freeze_cnn='none', freeze_enc_rnn='none', freeze_attention='none', pretrain_weights='default', **extra_config)[source]

Train TRBA text recognition model on custom datasets.

Parameters:

train_csvs (str, Path or sequence of paths) – Path(s) to training CSV files. Each CSV should have columns: image_path (relative to train_roots) and text (ground truth transcription).
train_roots (str, Path or sequence of paths) – Root directory/directories containing training images. Must have same length as train_csvs.
val_csvs (str, Path, sequence of paths, or None, optional) – Path(s) to validation CSV files with same format as train_csvs. If None, no validation is performed. Default is None.
val_roots (str, Path, sequence of paths, or None, optional) – Root directory/directories for validation images. Must match length of val_csvs if provided. Default is None.
exp_dir (str or Path, optional) – Experiment directory where checkpoints and logs will be saved. If None, auto-generated based on timestamp. Default is None.
charset_path (str or Path, optional) – Path to character set file. If None, uses default charset from package. Default is None.
encoding (str, optional) – Text encoding for reading CSV files. Default is "utf-8".
img_h (int, optional) – Target height for input images (pixels). Default is 64.
img_w (int, optional) – Target width for input images (pixels). Default is 256.
max_len (int, optional) – Maximum sequence length for text recognition. Default is 25.
hidden_size (int, optional) – Hidden dimension size for RNN encoder/decoder. Default is 256.
num_encoder_layers (int, optional) – Number of Bidirectional LSTM layers in the encoder. Default is 2.
cnn_in_channels (int, optional) – Number of input channels for CNN backbone (3 for RGB, 1 for grayscale). Default is 3.
cnn_out_channels (int, optional) – Number of output channels from CNN backbone. Default is 512.
cnn_backbone ({"seresnet31", "seresnet31-lite"}, optional) – CNN backbone variant. "seresnet31" keeps the standard SE-ResNet-31, while "seresnet31-lite" enables a depthwise-lite version. Default is "seresnet31".
ctc_weight (float, optional) – Initial weight for CTC loss during training (CTC always used for stability): loss = attn_loss * (1 - ctc_weight) + ctc_loss * ctc_weight. CTC weight decays over epochs. Default is 0.3.
ctc_weight_decay_epochs (int, optional) – Number of epochs for CTC weight to decay to minimum. Default is 50.
ctc_weight_min (float, optional) – Minimum value for CTC weight after decay. Default is 0.0.
max_grad_norm (float, optional) – Maximum gradient norm for clipping (prevents gradient explosion/NaN). Default is 5.0.
batch_size (int, optional) – Training batch size. Default is 32.
epochs (int, optional) – Number of training epochs. Default is 20.
lr (float, optional) – Learning rate. Default is 1e-3.
optimizer ({"Adam", "SGD", "AdamW"}, optional) – Optimizer type. Default is "AdamW".
scheduler ({"ReduceLROnPlateau", "CosineAnnealingLR", "OneCycleLR", "None"}, optional) –
Learning rate scheduler type:
- "OneCycleLR" - one-cycle policy with cosine annealing (default, recommended)
- "ReduceLROnPlateau" - reduce LR on validation loss plateau
- "CosineAnnealingLR" - cosine annealing over epochs
- "None" or None - constant learning rate
Default is "OneCycleLR".
weight_decay (float, optional) – L2 weight decay coefficient. Default is 0.0.
momentum (float, optional) – Momentum for SGD optimizer. Default is 0.9.
val_interval (int, optional) – Perform validation every N epochs. Default is 1.
val_size (int, optional) – Maximum number of validation samples to use. Default is 3000.
train_proportions (sequence of float, optional) – Sampling proportions for multiple training datasets. Must sum to 1.0 and match length of train_csvs. If None, datasets are concatenated equally. Default is None.
num_workers (int, optional) – Number of data loading workers. Default is 0.
seed (int, optional) – Random seed for reproducibility. Default is 42.
resume_from (str or Path, optional) – Path to checkpoint file to resume training from. Default is None.
save_interval (int, optional) – Save checkpoint every N epochs. If None, only saves best model. Default is None.
device ({"cuda", "cpu"}, optional) – Training device. Default is "cuda".
freeze_cnn ({"none", "all", "first", "last"}, optional) – CNN freezing policy. Default is "none".
freeze_enc_rnn ({"none", "all", "first", "last"}, optional) – Encoder RNN freezing policy. Default is "none".
freeze_attention ({"none", "all"}, optional) – Attention module freezing policy. Default is "none".
pretrain_weights (str, Path, bool, or None, optional) –
Pretrained weights to initialize from:
- "default" or True - use release weights
- None or False - train from scratch
- str/Path - path or URL to custom weights file
Default is "default".
**extra_config (dict, optional) – Additional configuration parameters passed to training config.

Returns:

Path to the best model checkpoint saved during training.

Return type:

str

Examples

Train on single dataset with validation:

>>> from manuscript.recognizers import TRBA
>>>
>>> best_model = TRBA.train(
...     train_csvs="data/train.csv",
...     train_roots="data/train_images",
...     val_csvs="data/val.csv",
...     val_roots="data/val_images",
...     exp_dir="./experiments/trba_exp1",
...     epochs=50,
...     batch_size=64,
...     img_h=64,
...     img_w=256,
... )
>>> print(f"Best model saved at: {best_model}")

Train on multiple datasets with custom proportions:

>>> train_csvs = ["data/dataset1/train.csv", "data/dataset2/train.csv"]
>>> train_roots = ["data/dataset1/images", "data/dataset2/images"]
>>> train_proportions = [0.7, 0.3]  # 70% from dataset1, 30% from dataset2
>>>
>>> best_model = TRBA.train(
...     train_csvs=train_csvs,
...     train_roots=train_roots,
...     train_proportions=train_proportions,
...     val_csvs="data/val.csv",
...     val_roots="data/val_images",
...     epochs=100,
...     lr=5e-4,
...     optimizer="AdamW",
...     weight_decay=1e-4,
... )

Resume training from checkpoint:

>>> best_model = TRBA.train(
...     train_csvs="data/train.csv",
...     train_roots="data/train_images",
...     resume_from="experiments/trba_exp1/checkpoints/last.pth",
...     epochs=100,
... )

Fine-tune from pretrained weights with frozen CNN:

>>> best_model = TRBA.train(
...     train_csvs="data/finetune.csv",
...     train_roots="data/finetune_images",
...     pretrain_weights="default",
...     freeze_cnn="all",
...     epochs=20,
...     lr=1e-4,
... )

Train with CTC for stability (always enabled):

>>> best_model = TRBA.train(
...     train_csvs="data/train.csv",
...     train_roots="data/train_images",
...     optimizer="AdamW",
...     scheduler="OneCycleLR",
...     lr=1e-3,
...     ctc_weight=0.3,
...     ctc_weight_decay_epochs=50,
...     max_grad_norm=5.0,
...     epochs=100,
... )