Recognizers

Text recognition models.

class manuscript.recognizers.TRBA(weights=None, config=None, charset=None, device=None, **kwargs)[source]

Bases: BaseModel

Initialize TRBA text recognition model with ONNX Runtime.

Parameters:

weights (str or Path, optional) –
Path or identifier for ONNX model weights. Supports:
- Local file path: "path/to/model.onnx"
- HTTP/HTTPS URL: "https://example.com/model.onnx"
- GitHub release: "github://owner/repo/tag/file.onnx"
- Google Drive: "gdrive:FILE_ID"
- Preset name: "trba_lite_g1" or "trba_base_g1" (from pretrained_registry)
- None: auto-downloads default preset (trba_lite_g1)
config (str or Path, optional) – Path or identifier for model configuration JSON. Same URL schemes as weights. If None, attempts to infer from weights location or uses default config for preset models.
charset (str or Path, optional) – Path or identifier for character set file. If None, attempts to find charset near weights or falls back to package default.
device ({"cuda", "coreml", "cpu"}, optional) –
Compute device. If None, automatically selects CPU. For GPU/CoreML acceleration:
- CUDA (NVIDIA): pip install onnxruntime-gpu
- CoreML (Apple Silicon M1/M2/M3): pip install onnxruntime-silicon
Default is None (CPU).
**kwargs – Additional configuration options (reserved for future use).

Raises:

FileNotFoundError – If specified files do not exist.
ValueError – If weights format is invalid.

Notes

The class provides three main public methods:

predict — run text recognition inference on cropped word images.
train — high-level training entrypoint to train a TRBA model on custom datasets.
export — static method to export PyTorch model to ONNX format.

Model uses ONNX Runtime for fast inference on CPU and GPU. For GPU acceleration, install: pip install onnxruntime-gpu

Examples

Create recognizer with default preset (auto-downloads):

>>> from manuscript.recognizers import TRBA
>>> recognizer = TRBA()

Load from local ONNX file:

>>> recognizer = TRBA(weights="path/to/model.onnx")

Load from GitHub release:

>>> recognizer = TRBA(
...     weights="github://owner/repo/v1.0/model.onnx",
...     config="github://owner/repo/v1.0/config.json"
... )

Force CPU execution:

>>> recognizer = TRBA(weights="model.onnx", device="cpu")

Methods

`__call__`(args, *kwargs)	Call self as a function.
`export`(weights_path, config_path, ...[, ...])	Export TRBA PyTorch model to ONNX format.
`predict`(images[, batch_size])	Run text recognition on one or more word images.
`runtime_providers`()	Get ONNX Runtime execution providers based on device.
`train`(train_csvs, train_roots[, val_csvs, ...])	Train TRBA text recognition model on custom datasets.

default_weights_name: str | None = 'trba_lite_g1'

pretrained_registry: Dict[str, str] = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.onnx', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.onnx'}

config_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.json', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.json'}

charset_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.txt', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.txt'}

__init__(weights=None, config=None, charset=None, device=None, **kwargs)[source]

Parameters:

weights (str | None)
config (str | None)
charset (str | None)
device (str | None)

predict(images, batch_size=32)[source]

Run text recognition on one or more word images.

Parameters:

images (str, Path, numpy.ndarray, PIL.Image, or list thereof) –
Single image or list of images to recognize. Each image can be:
- Path to image file (str or Path)
- RGB numpy array with shape (H, W, 3) in uint8
- PIL Image object
batch_size (int, optional) – Number of images to process simultaneously. Larger batches are faster but require more memory. Default is 32.

Returns:

Recognition results as list of dictionaries, each containing:

"text" : str — recognized text
"confidence" : float — recognition confidence in [0, 1]

If input is a single image, returns a list with one element.

Return type:

list of dict

Examples

Recognize single image:

>>> from manuscript.recognizers import TRBA
>>> recognizer = TRBA()
>>> results = recognizer.predict("word_image.jpg")
>>> print(f"Text: '{results[0]['text']}' (confidence: {results[0]['confidence']:.3f})")

Process numpy arrays:

>>> import cv2
>>> img = cv2.imread("word.jpg")
>>> img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
>>> results = recognizer.predict(img_rgb)
>>> print(results[0]["text"])

static train(train_csvs, train_roots, val_csvs=None, val_roots=None, *, exp_dir=None, charset_path=None, encoding='utf-8', img_h=64, img_w=256, max_len=25, hidden_size=256, num_encoder_layers=3, cnn_in_channels=3, cnn_out_channels=512, cnn_backbone='seresnet31', ctc_weight=0.3, ctc_weight_decay_epochs=50, ctc_weight_min=0.0, max_grad_norm=5.0, batch_size=32, epochs=20, lr=0.001, optimizer='AdamW', scheduler='OneCycleLR', weight_decay=0.0, momentum=0.9, val_interval=1, val_size=3000, train_proportions=None, num_workers=0, seed=42, resume_from=None, save_interval=None, device='cuda', freeze_cnn='none', freeze_enc_rnn='none', freeze_attention='none', pretrain_weights='default', **extra_config)[source]

Train TRBA text recognition model on custom datasets.

Parameters:

train_csvs (str, Path or sequence of paths) – Path(s) to training CSV files. Each CSV should have columns: image_path (relative to train_roots) and text (ground truth transcription).
train_roots (str, Path or sequence of paths) – Root directory/directories containing training images. Must have same length as train_csvs.
val_csvs (str, Path, sequence of paths, or None, optional) – Path(s) to validation CSV files with same format as train_csvs. If None, no validation is performed. Default is None.
val_roots (str, Path, sequence of paths, or None, optional) – Root directory/directories for validation images. Must match length of val_csvs if provided. Default is None.
exp_dir (str or Path, optional) – Experiment directory where checkpoints and logs will be saved. If None, auto-generated based on timestamp. Default is None.
charset_path (str or Path, optional) – Path to character set file. If None, uses default charset from package. Default is None.
encoding (str, optional) – Text encoding for reading CSV files. Default is "utf-8".
img_h (int, optional) – Target height for input images (pixels). Default is 64.
img_w (int, optional) – Target width for input images (pixels). Default is 256.
max_len (int, optional) – Maximum sequence length for text recognition. Default is 25.
hidden_size (int, optional) – Hidden dimension size for RNN encoder/decoder. Default is 256.
num_encoder_layers (int, optional) – Number of Bidirectional LSTM layers in the encoder. Default is 2.
cnn_in_channels (int, optional) – Number of input channels for CNN backbone (3 for RGB, 1 for grayscale). Default is 3.
cnn_out_channels (int, optional) – Number of output channels from CNN backbone. Default is 512.
cnn_backbone ({"seresnet31", "seresnet31-lite"}, optional) – CNN backbone variant. "seresnet31" keeps the standard SE-ResNet-31, while "seresnet31-lite" enables a depthwise-lite version. Default is "seresnet31".
ctc_weight (float, optional) – Initial weight for CTC loss during training (CTC always used for stability): loss = attn_loss * (1 - ctc_weight) + ctc_loss * ctc_weight. CTC weight decays over epochs. Default is 0.3.
ctc_weight_decay_epochs (int, optional) – Number of epochs for CTC weight to decay to minimum. Default is 50.
ctc_weight_min (float, optional) – Minimum value for CTC weight after decay. Default is 0.0.
max_grad_norm (float, optional) – Maximum gradient norm for clipping (prevents gradient explosion/NaN). Default is 5.0.
batch_size (int, optional) – Training batch size. Default is 32.
epochs (int, optional) – Number of training epochs. Default is 20.
lr (float, optional) – Learning rate. Default is 1e-3.
optimizer ({"Adam", "SGD", "AdamW"}, optional) – Optimizer type. Default is "AdamW".
scheduler ({"ReduceLROnPlateau", "CosineAnnealingLR", "OneCycleLR", "None"}, optional) –
Learning rate scheduler type:
- "OneCycleLR" — one-cycle policy with cosine annealing (default, recommended)
- "ReduceLROnPlateau" — reduce LR on validation loss plateau
- "CosineAnnealingLR" — cosine annealing over epochs
- "None" or None — constant learning rate
Default is "OneCycleLR".
weight_decay (float, optional) – L2 weight decay coefficient. Default is 0.0.
momentum (float, optional) – Momentum for SGD optimizer. Default is 0.9.
val_interval (int, optional) – Perform validation every N epochs. Default is 1.
val_size (int, optional) – Maximum number of validation samples to use. Default is 3000.
train_proportions (sequence of float, optional) – Sampling proportions for multiple training datasets. Must sum to 1.0 and match length of train_csvs. If None, datasets are concatenated equally. Default is None.
num_workers (int, optional) – Number of data loading workers. Default is 0.
seed (int, optional) – Random seed for reproducibility. Default is 42.
resume_from (str or Path, optional) – Path to checkpoint file to resume training from. Default is None.
save_interval (int, optional) – Save checkpoint every N epochs. If None, only saves best model. Default is None.
device ({"cuda", "cpu"}, optional) – Training device. Default is "cuda".
freeze_cnn ({"none", "all", "first", "last"}, optional) – CNN freezing policy. Default is "none".
freeze_enc_rnn ({"none", "all", "first", "last"}, optional) – Encoder RNN freezing policy. Default is "none".
freeze_attention ({"none", "all"}, optional) – Attention module freezing policy. Default is "none".
pretrain_weights (str, Path, bool, or None, optional) –
Pretrained weights to initialize from:
- "default" or True — use release weights
- None or False — train from scratch
- str/Path — path or URL to custom weights file
Default is "default".
**extra_config (dict, optional) – Additional configuration parameters passed to training config.

Returns:

Path to the best model checkpoint saved during training.

Return type:

str

Examples

Train on single dataset with validation:

>>> from manuscript.recognizers import TRBA
>>>
>>> best_model = TRBA.train(
...     train_csvs="data/train.csv",
...     train_roots="data/train_images",
...     val_csvs="data/val.csv",
...     val_roots="data/val_images",
...     exp_dir="./experiments/trba_exp1",
...     epochs=50,
...     batch_size=64,
...     img_h=64,
...     img_w=256,
... )
>>> print(f"Best model saved at: {best_model}")

Train on multiple datasets with custom proportions:

>>> train_csvs = ["data/dataset1/train.csv", "data/dataset2/train.csv"]
>>> train_roots = ["data/dataset1/images", "data/dataset2/images"]
>>> train_proportions = [0.7, 0.3]  # 70% from dataset1, 30% from dataset2
>>>
>>> best_model = TRBA.train(
...     train_csvs=train_csvs,
...     train_roots=train_roots,
...     train_proportions=train_proportions,
...     val_csvs="data/val.csv",
...     val_roots="data/val_images",
...     epochs=100,
...     lr=5e-4,
...     optimizer="AdamW",
...     weight_decay=1e-4,
... )

Resume training from checkpoint:

>>> best_model = TRBA.train(
...     train_csvs="data/train.csv",
...     train_roots="data/train_images",
...     resume_from="experiments/trba_exp1/checkpoints/last.pth",
...     epochs=100,
... )

Fine-tune from pretrained weights with frozen CNN:

>>> best_model = TRBA.train(
...     train_csvs="data/finetune.csv",
...     train_roots="data/finetune_images",
...     pretrain_weights="default",
...     freeze_cnn="all",
...     epochs=20,
...     lr=1e-4,
... )

Train with CTC for stability (always enabled):

>>> best_model = TRBA.train(
...     train_csvs="data/train.csv",
...     train_roots="data/train_images",
...     optimizer="AdamW",
...     scheduler="OneCycleLR",
...     lr=1e-3,
...     ctc_weight=0.3,
...     ctc_weight_decay_epochs=50,
...     max_grad_norm=5.0,
...     epochs=100,
... )

static export(weights_path, config_path, charset_path, output_path, opset_version=14, simplify=True)[source]

Export TRBA PyTorch model to ONNX format.

This method converts a trained TRBA model from PyTorch to ONNX format, which can be used for faster inference with ONNX Runtime. The exported model can be loaded using TRBA(weights="model.onnx").

Parameters:

weights_path (str or Path) – Path to the PyTorch model weights file (.pth).
config_path (str or Path) – Path to the model configuration JSON file. Used to determine model architecture (img_h, img_w, max_len, hidden_size, etc.).
charset_path (str or Path) – Path to the charset file (charset.txt). Used to determine num_classes for the model.
output_path (str or Path) – Path where the ONNX model will be saved (.onnx).
opset_version (int, optional) – ONNX opset version to use for export. Default is 14.
simplify (bool, optional) – If True, applies ONNX graph simplification using onnx-simplifier to optimize the model. Requires onnx-simplifier package. Default is True.

Returns:

The ONNX model is saved to output_path.

Return type:

None

Raises:

ImportError – If required packages (torch, onnx) are not installed.
FileNotFoundError – If weights_path or config_path do not exist.

Notes

The exported ONNX model has one output:

logits: Character predictions with shape (batch, max_length+1, num_classes)

The model uses greedy decoding (argmax) and supports dynamic batch size. The sequence length is fixed to max_length + 1 from the config (same as PyTorch inference mode for compatibility).

Architecture exported: - CNN backbone (SE-ResNet-31 or SE-ResNet-31-Lite) - Bidirectional LSTM encoder - Attention decoder (greedy decoding)

Note: Only the attention decoder is exported. CTC head is used only during training and is not included in the ONNX model.

Examples

Export TRBA model to ONNX:

>>> from manuscript.recognizers import TRBA
>>> TRBA.export(
...     weights_path="experiments/best_model/best_acc_weights.pth",
...     config_path="experiments/best_model/config.json",
...     charset_path="configs/charset.txt",
...     output_path="trba_model.onnx"
... )
Loading TRBA model...
=== TRBA ONNX Export ===
Max decoding length: 40
Input size: 64x256
[OK] ONNX model saved to: trba_model.onnx

Export with custom opset:

>>> TRBA.export(
...     weights_path="model.pth",
...     config_path="config.json",
...     charset_path="charset.txt",
...     output_path="model.onnx",
...     opset_version=16,
...     simplify=False
... )

Use the exported model for inference:

>>> recognizer = TRBA(weights="trba_model.onnx")
>>> result = recognizer.predict("word_image.jpg")