Recognizers
Text recognition models.
- class manuscript.recognizers.TRBA(weights=None, config=None, charset=None, device=None, **kwargs)[source]
Bases:
BaseModelInitialize TRBA text recognition model with ONNX Runtime.
- Parameters:
weights (str or Path, optional) –
Path or identifier for ONNX model weights. Supports:
Local file path:
"path/to/model.onnx"HTTP/HTTPS URL:
"https://example.com/model.onnx"GitHub release:
"github://owner/repo/tag/file.onnx"Google Drive:
"gdrive:FILE_ID"Preset name:
"trba_lite_g1"or"trba_base_g1"(from pretrained_registry)None: auto-downloads default preset (trba_lite_g1)
config (str or Path, optional) – Path or identifier for model configuration JSON. Same URL schemes as
weights. IfNone, attempts to infer from weights location or uses default config for preset models.charset (str or Path, optional) – Path or identifier for character set file. If
None, attempts to find charset near weights or falls back to package default.device ({"cuda", "coreml", "cpu"}, optional) –
Compute device. If
None, automatically selects CPU. For GPU/CoreML acceleration:CUDA (NVIDIA):
pip install onnxruntime-gpuCoreML (Apple Silicon M1/M2/M3):
pip install onnxruntime-silicon
Default is
None(CPU).**kwargs – Additional configuration options (reserved for future use).
- Raises:
FileNotFoundError – If specified files do not exist.
ValueError – If weights format is invalid.
Notes
The class provides three main public methods:
predict— run text recognition inference on cropped word images.train— high-level training entrypoint to train a TRBA model on custom datasets.export— static method to export PyTorch model to ONNX format.
Model uses ONNX Runtime for fast inference on CPU and GPU. For GPU acceleration, install:
pip install onnxruntime-gpuExamples
Create recognizer with default preset (auto-downloads):
>>> from manuscript.recognizers import TRBA >>> recognizer = TRBA()
Load from local ONNX file:
>>> recognizer = TRBA(weights="path/to/model.onnx")
Load from GitHub release:
>>> recognizer = TRBA( ... weights="github://owner/repo/v1.0/model.onnx", ... config="github://owner/repo/v1.0/config.json" ... )
Force CPU execution:
>>> recognizer = TRBA(weights="model.onnx", device="cpu")
Methods
__call__(*args, **kwargs)Call self as a function.
export(weights_path, config_path, ...[, ...])Export TRBA PyTorch model to ONNX format.
predict(images[, batch_size])Run text recognition on one or more word images.
runtime_providers()Get ONNX Runtime execution providers based on device.
train(train_csvs, train_roots[, val_csvs, ...])Train TRBA text recognition model on custom datasets.
- pretrained_registry: Dict[str, str] = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.onnx', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.onnx'}
- config_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.json', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.json'}
- charset_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.txt', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.txt'}
- predict(images, batch_size=32)[source]
Run text recognition on one or more word images.
- Parameters:
images (str, Path, numpy.ndarray, PIL.Image, or list thereof) –
Single image or list of images to recognize. Each image can be:
Path to image file (str or Path)
RGB numpy array with shape
(H, W, 3)inuint8PIL Image object
batch_size (int, optional) – Number of images to process simultaneously. Larger batches are faster but require more memory. Default is 32.
- Returns:
Recognition results as list of dictionaries, each containing:
"text": str — recognized text"confidence": float — recognition confidence in [0, 1]
If input is a single image, returns a list with one element.
- Return type:
Examples
Recognize single image:
>>> from manuscript.recognizers import TRBA >>> recognizer = TRBA() >>> results = recognizer.predict("word_image.jpg") >>> print(f"Text: '{results[0]['text']}' (confidence: {results[0]['confidence']:.3f})")
Process numpy arrays:
>>> import cv2 >>> img = cv2.imread("word.jpg") >>> img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) >>> results = recognizer.predict(img_rgb) >>> print(results[0]["text"])
- static train(train_csvs, train_roots, val_csvs=None, val_roots=None, *, exp_dir=None, charset_path=None, encoding='utf-8', img_h=64, img_w=256, max_len=25, hidden_size=256, num_encoder_layers=3, cnn_in_channels=3, cnn_out_channels=512, cnn_backbone='seresnet31', ctc_weight=0.3, ctc_weight_decay_epochs=50, ctc_weight_min=0.0, max_grad_norm=5.0, batch_size=32, epochs=20, lr=0.001, optimizer='AdamW', scheduler='OneCycleLR', weight_decay=0.0, momentum=0.9, val_interval=1, val_size=3000, train_proportions=None, num_workers=0, seed=42, resume_from=None, save_interval=None, device='cuda', freeze_cnn='none', freeze_enc_rnn='none', freeze_attention='none', pretrain_weights='default', **extra_config)[source]
Train TRBA text recognition model on custom datasets.
- Parameters:
train_csvs (str, Path or sequence of paths) – Path(s) to training CSV files. Each CSV should have columns:
image_path(relative totrain_roots) andtext(ground truth transcription).train_roots (str, Path or sequence of paths) – Root directory/directories containing training images. Must have same length as
train_csvs.val_csvs (str, Path, sequence of paths, or None, optional) – Path(s) to validation CSV files with same format as
train_csvs. IfNone, no validation is performed. Default isNone.val_roots (str, Path, sequence of paths, or None, optional) – Root directory/directories for validation images. Must match length of
val_csvsif provided. Default isNone.exp_dir (str or Path, optional) – Experiment directory where checkpoints and logs will be saved. If
None, auto-generated based on timestamp. Default isNone.charset_path (str or Path, optional) – Path to character set file. If
None, uses default charset from package. Default isNone.encoding (str, optional) – Text encoding for reading CSV files. Default is
"utf-8".img_h (int, optional) – Target height for input images (pixels). Default is 64.
img_w (int, optional) – Target width for input images (pixels). Default is 256.
max_len (int, optional) – Maximum sequence length for text recognition. Default is 25.
hidden_size (int, optional) – Hidden dimension size for RNN encoder/decoder. Default is 256.
num_encoder_layers (int, optional) – Number of Bidirectional LSTM layers in the encoder. Default is 2.
cnn_in_channels (int, optional) – Number of input channels for CNN backbone (3 for RGB, 1 for grayscale). Default is 3.
cnn_out_channels (int, optional) – Number of output channels from CNN backbone. Default is 512.
cnn_backbone ({"seresnet31", "seresnet31-lite"}, optional) – CNN backbone variant.
"seresnet31"keeps the standard SE-ResNet-31, while"seresnet31-lite"enables a depthwise-lite version. Default is"seresnet31".ctc_weight (float, optional) – Initial weight for CTC loss during training (CTC always used for stability):
loss = attn_loss * (1 - ctc_weight) + ctc_loss * ctc_weight. CTC weight decays over epochs. Default is 0.3.ctc_weight_decay_epochs (int, optional) – Number of epochs for CTC weight to decay to minimum. Default is 50.
ctc_weight_min (float, optional) – Minimum value for CTC weight after decay. Default is 0.0.
max_grad_norm (float, optional) – Maximum gradient norm for clipping (prevents gradient explosion/NaN). Default is 5.0.
batch_size (int, optional) – Training batch size. Default is 32.
epochs (int, optional) – Number of training epochs. Default is 20.
lr (float, optional) – Learning rate. Default is 1e-3.
optimizer ({"Adam", "SGD", "AdamW"}, optional) – Optimizer type. Default is
"AdamW".scheduler ({"ReduceLROnPlateau", "CosineAnnealingLR", "OneCycleLR", "None"}, optional) –
Learning rate scheduler type:
"OneCycleLR"— one-cycle policy with cosine annealing (default, recommended)"ReduceLROnPlateau"— reduce LR on validation loss plateau"CosineAnnealingLR"— cosine annealing over epochs"None"orNone— constant learning rate
Default is
"OneCycleLR".weight_decay (float, optional) – L2 weight decay coefficient. Default is 0.0.
momentum (float, optional) – Momentum for SGD optimizer. Default is 0.9.
val_interval (int, optional) – Perform validation every N epochs. Default is 1.
val_size (int, optional) – Maximum number of validation samples to use. Default is 3000.
train_proportions (sequence of float, optional) – Sampling proportions for multiple training datasets. Must sum to 1.0 and match length of
train_csvs. IfNone, datasets are concatenated equally. Default isNone.num_workers (int, optional) – Number of data loading workers. Default is 0.
seed (int, optional) – Random seed for reproducibility. Default is 42.
resume_from (str or Path, optional) – Path to checkpoint file to resume training from. Default is
None.save_interval (int, optional) – Save checkpoint every N epochs. If
None, only saves best model. Default isNone.device ({"cuda", "cpu"}, optional) – Training device. Default is
"cuda".freeze_cnn ({"none", "all", "first", "last"}, optional) – CNN freezing policy. Default is
"none".freeze_enc_rnn ({"none", "all", "first", "last"}, optional) – Encoder RNN freezing policy. Default is
"none".freeze_attention ({"none", "all"}, optional) – Attention module freezing policy. Default is
"none".pretrain_weights (str, Path, bool, or None, optional) –
Pretrained weights to initialize from:
"default"orTrue— use release weightsNoneorFalse— train from scratchstr/Path — path or URL to custom weights file
Default is
"default".**extra_config (dict, optional) – Additional configuration parameters passed to training config.
- Returns:
Path to the best model checkpoint saved during training.
- Return type:
Examples
Train on single dataset with validation:
>>> from manuscript.recognizers import TRBA >>> >>> best_model = TRBA.train( ... train_csvs="data/train.csv", ... train_roots="data/train_images", ... val_csvs="data/val.csv", ... val_roots="data/val_images", ... exp_dir="./experiments/trba_exp1", ... epochs=50, ... batch_size=64, ... img_h=64, ... img_w=256, ... ) >>> print(f"Best model saved at: {best_model}")
Train on multiple datasets with custom proportions:
>>> train_csvs = ["data/dataset1/train.csv", "data/dataset2/train.csv"] >>> train_roots = ["data/dataset1/images", "data/dataset2/images"] >>> train_proportions = [0.7, 0.3] # 70% from dataset1, 30% from dataset2 >>> >>> best_model = TRBA.train( ... train_csvs=train_csvs, ... train_roots=train_roots, ... train_proportions=train_proportions, ... val_csvs="data/val.csv", ... val_roots="data/val_images", ... epochs=100, ... lr=5e-4, ... optimizer="AdamW", ... weight_decay=1e-4, ... )
Resume training from checkpoint:
>>> best_model = TRBA.train( ... train_csvs="data/train.csv", ... train_roots="data/train_images", ... resume_from="experiments/trba_exp1/checkpoints/last.pth", ... epochs=100, ... )
Fine-tune from pretrained weights with frozen CNN:
>>> best_model = TRBA.train( ... train_csvs="data/finetune.csv", ... train_roots="data/finetune_images", ... pretrain_weights="default", ... freeze_cnn="all", ... epochs=20, ... lr=1e-4, ... )
Train with CTC for stability (always enabled):
>>> best_model = TRBA.train( ... train_csvs="data/train.csv", ... train_roots="data/train_images", ... optimizer="AdamW", ... scheduler="OneCycleLR", ... lr=1e-3, ... ctc_weight=0.3, ... ctc_weight_decay_epochs=50, ... max_grad_norm=5.0, ... epochs=100, ... )
- static export(weights_path, config_path, charset_path, output_path, opset_version=14, simplify=True)[source]
Export TRBA PyTorch model to ONNX format.
This method converts a trained TRBA model from PyTorch to ONNX format, which can be used for faster inference with ONNX Runtime. The exported model can be loaded using
TRBA(weights="model.onnx").- Parameters:
weights_path (str or Path) – Path to the PyTorch model weights file (.pth).
config_path (str or Path) – Path to the model configuration JSON file. Used to determine model architecture (img_h, img_w, max_len, hidden_size, etc.).
charset_path (str or Path) – Path to the charset file (charset.txt). Used to determine num_classes for the model.
output_path (str or Path) – Path where the ONNX model will be saved (.onnx).
opset_version (int, optional) – ONNX opset version to use for export. Default is 14.
simplify (bool, optional) – If True, applies ONNX graph simplification using onnx-simplifier to optimize the model. Requires
onnx-simplifierpackage. Default is True.
- Returns:
The ONNX model is saved to
output_path.- Return type:
None
- Raises:
ImportError – If required packages (torch, onnx) are not installed.
FileNotFoundError – If
weights_pathorconfig_pathdo not exist.
Notes
The exported ONNX model has one output:
logits: Character predictions with shape(batch, max_length+1, num_classes)
The model uses greedy decoding (argmax) and supports dynamic batch size. The sequence length is fixed to
max_length + 1from the config (same as PyTorch inference mode for compatibility).Architecture exported: - CNN backbone (SE-ResNet-31 or SE-ResNet-31-Lite) - Bidirectional LSTM encoder - Attention decoder (greedy decoding)
Note: Only the attention decoder is exported. CTC head is used only during training and is not included in the ONNX model.
Examples
Export TRBA model to ONNX:
>>> from manuscript.recognizers import TRBA >>> TRBA.export( ... weights_path="experiments/best_model/best_acc_weights.pth", ... config_path="experiments/best_model/config.json", ... charset_path="configs/charset.txt", ... output_path="trba_model.onnx" ... ) Loading TRBA model... === TRBA ONNX Export === Max decoding length: 40 Input size: 64x256 [OK] ONNX model saved to: trba_model.onnx
Export with custom opset:
>>> TRBA.export( ... weights_path="model.pth", ... config_path="config.json", ... charset_path="charset.txt", ... output_path="model.onnx", ... opset_version=16, ... simplify=False ... )
Use the exported model for inference:
>>> recognizer = TRBA(weights="trba_model.onnx") >>> result = recognizer.predict("word_image.jpg")
See also
TRBA.__init__Initialize TRBA recognizer with ONNX model.