Recognizers
Text recognition models.
- class manuscript.recognizers.TRBA(weights=None, config=None, charset=None, device=None, rotate_threshold=1.5, region_preparer='bbox', region_preparer_options=None, min_text_size=5, **kwargs)[source]
Bases:
BaseRecognizerInitialize TRBA text recognition model with ONNX Runtime.
- Parameters:
weights (str or Path, optional) –
Path or identifier for ONNX model weights. Supports:
Local file path:
"path/to/model.onnx"HTTP/HTTPS URL:
"https://example.com/model.onnx"GitHub release:
"github://owner/repo/tag/file.onnx"Google Drive:
"gdrive:FILE_ID"Preset name:
"trba_lite_g1"or"trba_base_g1"(from pretrained_registry)None: auto-downloads default preset (trba_lite_g1)
config (str or Path, optional) – Path or identifier for model configuration JSON. Same URL schemes as
weights. IfNone, attempts to infer from weights location or uses default config for preset models.charset (str or Path, optional) – Path or identifier for character set file. If
None, attempts to find charset near weights or falls back to package default.device ({"cuda", "coreml", "cpu"}, optional) –
Compute device. If
None, automatically selects CPU. For GPU/CoreML acceleration:CUDA (NVIDIA):
pip install onnxruntime-gpuCoreML (Apple Silicon M1/M2/M3):
pip install onnxruntime-silicon
Default is
None(CPU).rotate_threshold (float or None, optional) – Aspect-ratio threshold for rotating vertical text-span crops before recognition. If
height > width * rotate_threshold, crop is rotated 90 degrees clockwise. Set to0orNoneto disable. Default is1.5.region_preparer ({"bbox", "polygon_mask", "quad_warp"} or callable, optional) – Strategy used to convert
Pagepolygons into recognition crops."bbox"extracts axis-aligned bounding boxes for arbitrary polygons."polygon_mask"masks pixels outside the polygon inside a tight crop and also supports arbitrary polygons."quad_warp"rectifies only 4-point polygons with a perspective transform before recognition. A custom callable may also be provided and should return a list of prepared text regions. Default is"bbox".region_preparer_options (dict or None, optional) – Optional configuration for built-in region preparers. Defaults to
None. Typical options arepadfor"bbox"and"polygon_mask", oroutput_size=(width, height)for"quad_warp". Non-quad polygons passed to"quad_warp"fall back to bbox crops by default.min_text_size (int, optional) – Minimum crop width/height in pixels to run recognition for a text span. Text spans below this threshold are skipped. Default is
5.**kwargs – Additional configuration options (reserved for future use).
- Raises:
FileNotFoundError – If specified files do not exist.
ValueError – If weights format is invalid.
Notes
The class provides three main public methods:
predict- run recognition over text spans in aPageobject.train- high-level training entrypoint to train a TRBA model on custom datasets.export- static method to export PyTorch model to ONNX format.
Model uses ONNX Runtime for fast inference on CPU and GPU. For GPU acceleration, install:
pip install onnxruntime-gpuExamples
Create recognizer with default preset (auto-downloads):
>>> from manuscript.recognizers import TRBA >>> recognizer = TRBA()
Load from local ONNX file:
>>> recognizer = TRBA(weights="path/to/model.onnx")
Load from GitHub release:
>>> recognizer = TRBA( ... weights="github://owner/repo/v1.0/model.onnx", ... config="github://owner/repo/v1.0/config.json" ... )
Force CPU execution:
>>> recognizer = TRBA(weights="model.onnx", device="cpu")
Methods
__call__(*args, **kwargs)Call self as a function.
export(weights_path, config_path, ...[, ...])Export TRBA PyTorch model to ONNX format.
predict(page[, image, batch_size, ...])Recognize text for text spans in a
Pageand return updatedPage.runtime_providers()Get ONNX Runtime execution providers based on device.
train(train_csvs, train_roots[, val_csvs, ...])Train TRBA text recognition model on custom datasets.
- __init__(weights=None, config=None, charset=None, device=None, rotate_threshold=1.5, region_preparer='bbox', region_preparer_options=None, min_text_size=5, **kwargs)[source]
- charset_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.txt', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.txt', 'trba_lite_g2': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g2.txt'}
- config_registry = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.json', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.json', 'trba_lite_g2': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g2.json'}
- static export(weights_path, config_path, charset_path, output_path, opset_version=14, simplify=True)[source]
Export TRBA PyTorch model to ONNX format.
This method converts a trained TRBA model from PyTorch to ONNX format, which can be used for faster inference with ONNX Runtime. The exported model can be loaded using
TRBA(weights="model.onnx").- Parameters:
weights_path (str or Path) – Path to the PyTorch model weights file (.pth).
config_path (str or Path) – Path to the model configuration JSON file. Used to determine model architecture (img_h, img_w, max_len, hidden_size, etc.).
charset_path (str or Path) – Path to the charset file (charset.txt). Used to determine num_classes for the model.
output_path (str or Path) – Path where the ONNX model will be saved (.onnx).
opset_version (int, optional) – ONNX opset version to use for export. Default is 14.
simplify (bool, optional) – If True, applies ONNX graph simplification using onnx-simplifier to optimize the model. Requires
onnx-simplifierpackage. Default is True.
- Returns:
The ONNX model is saved to
output_path.- Return type:
None
- Raises:
ImportError – If required packages (torch, onnx) are not installed.
FileNotFoundError – If
weights_pathorconfig_pathdo not exist.
Notes
The exported ONNX model has one output:
logits: Character predictions with shape(batch, max_length+1, num_classes)
The model uses greedy decoding (argmax) and supports dynamic batch size. The sequence length is fixed to
max_length + 1from the config (same as PyTorch inference mode for compatibility).Architecture exported: - CNN backbone (SE-ResNet-31 or SE-ResNet-31-Lite) - Bidirectional LSTM encoder - Attention decoder (greedy decoding)
Note: Only the attention decoder is exported. CTC head is used only during training and is not included in the ONNX model.
Examples
Export TRBA model to ONNX:
>>> from manuscript.recognizers import TRBA >>> TRBA.export( ... weights_path="experiments/best_model/best_acc_weights.pth", ... config_path="experiments/best_model/config.json", ... charset_path="configs/charset.txt", ... output_path="trba_model.onnx" ... ) Loading TRBA model... === TRBA ONNX Export === Max decoding length: 40 Input size: 64x256 [OK] ONNX model saved to: trba_model.onnx
Export with custom opset:
>>> TRBA.export( ... weights_path="model.pth", ... config_path="config.json", ... charset_path="charset.txt", ... output_path="model.onnx", ... opset_version=16, ... simplify=False ... )
Use the exported model for inference:
>>> from manuscript.detectors import EAST >>> recognizer = TRBA(weights="trba_model.onnx") >>> detector = EAST() >>> det = detector.predict("page.jpg") >>> result = recognizer.predict(det["page"], image="page.jpg")
See also
TRBA.__init__Initialize TRBA recognizer with ONNX model.
- predict(page, image=None, batch_size=32, debug_save_dir=None)[source]
Recognize text for text spans in a
Pageand return updatedPage.- Parameters:
page (Page) – Page object with detected text-span polygons.
image (str, Path, numpy.ndarray, or PIL.Image, optional) – Source page image used to extract text regions. If
None, recognition is skipped and a deep copy ofpageis returned.batch_size (int, optional) – Number of prepared text regions to process simultaneously.
debug_save_dir (str or Path, optional) – If provided, saves the prepared recognition crops to this directory as
*.pngfiles together withindex.json. Crops are saved afterregion_preparerand auto-rotation, i.e. in the same orientation that goes into recognizer inference.
- Returns:
New Page object with recognized
textandrecognition_confidencefilled for processed text spans.- Return type:
- pretrained_registry: Dict[str, str] = {'trba_base_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_base_g1.onnx', 'trba_lite_g1': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g1.onnx', 'trba_lite_g2': 'github://konstantinkozhin/manuscript-ocr/v0.1.0/trba_lite_g2.onnx'}
- static train(train_csvs, train_roots, val_csvs=None, val_roots=None, *, exp_dir=None, charset_path=None, encoding='utf-8', img_h=64, img_w=256, max_len=25, hidden_size=256, num_encoder_layers=3, cnn_in_channels=3, cnn_out_channels=512, cnn_backbone='seresnet31', ctc_weight=0.3, ctc_weight_decay_epochs=50, ctc_weight_min=0.0, max_grad_norm=5.0, batch_size=32, epochs=20, lr=0.001, optimizer='AdamW', scheduler='OneCycleLR', weight_decay=0.0, momentum=0.9, val_interval=1, val_size=3000, train_proportions=None, num_workers=0, seed=42, resume_from=None, save_interval=None, device='cuda', freeze_cnn='none', freeze_enc_rnn='none', freeze_attention='none', pretrain_weights='default', **extra_config)[source]
Train TRBA text recognition model on custom datasets.
- Parameters:
train_csvs (str, Path or sequence of paths) – Path(s) to training CSV files. Each CSV should have columns:
image_path(relative totrain_roots) andtext(ground truth transcription).train_roots (str, Path or sequence of paths) – Root directory/directories containing training images. Must have same length as
train_csvs.val_csvs (str, Path, sequence of paths, or None, optional) – Path(s) to validation CSV files with same format as
train_csvs. IfNone, no validation is performed. Default isNone.val_roots (str, Path, sequence of paths, or None, optional) – Root directory/directories for validation images. Must match length of
val_csvsif provided. Default isNone.exp_dir (str or Path, optional) – Experiment directory where checkpoints and logs will be saved. If
None, auto-generated based on timestamp. Default isNone.charset_path (str or Path, optional) – Path to character set file. If
None, uses default charset from package. Default isNone.encoding (str, optional) – Text encoding for reading CSV files. Default is
"utf-8".img_h (int, optional) – Target height for input images (pixels). Default is 64.
img_w (int, optional) – Target width for input images (pixels). Default is 256.
max_len (int, optional) – Maximum sequence length for text recognition. Default is 25.
hidden_size (int, optional) – Hidden dimension size for RNN encoder/decoder. Default is 256.
num_encoder_layers (int, optional) – Number of Bidirectional LSTM layers in the encoder. Default is 2.
cnn_in_channels (int, optional) – Number of input channels for CNN backbone (3 for RGB, 1 for grayscale). Default is 3.
cnn_out_channels (int, optional) – Number of output channels from CNN backbone. Default is 512.
cnn_backbone ({"seresnet31", "seresnet31-lite"}, optional) – CNN backbone variant.
"seresnet31"keeps the standard SE-ResNet-31, while"seresnet31-lite"enables a depthwise-lite version. Default is"seresnet31".ctc_weight (float, optional) – Initial weight for CTC loss during training (CTC always used for stability):
loss = attn_loss * (1 - ctc_weight) + ctc_loss * ctc_weight. CTC weight decays over epochs. Default is 0.3.ctc_weight_decay_epochs (int, optional) – Number of epochs for CTC weight to decay to minimum. Default is 50.
ctc_weight_min (float, optional) – Minimum value for CTC weight after decay. Default is 0.0.
max_grad_norm (float, optional) – Maximum gradient norm for clipping (prevents gradient explosion/NaN). Default is 5.0.
batch_size (int, optional) – Training batch size. Default is 32.
epochs (int, optional) – Number of training epochs. Default is 20.
lr (float, optional) – Learning rate. Default is 1e-3.
optimizer ({"Adam", "SGD", "AdamW"}, optional) – Optimizer type. Default is
"AdamW".scheduler ({"ReduceLROnPlateau", "CosineAnnealingLR", "OneCycleLR", "None"}, optional) –
Learning rate scheduler type:
"OneCycleLR"- one-cycle policy with cosine annealing (default, recommended)"ReduceLROnPlateau"- reduce LR on validation loss plateau"CosineAnnealingLR"- cosine annealing over epochs"None"orNone- constant learning rate
Default is
"OneCycleLR".weight_decay (float, optional) – L2 weight decay coefficient. Default is 0.0.
momentum (float, optional) – Momentum for SGD optimizer. Default is 0.9.
val_interval (int, optional) – Perform validation every N epochs. Default is 1.
val_size (int, optional) – Maximum number of validation samples to use. Default is 3000.
train_proportions (sequence of float, optional) – Sampling proportions for multiple training datasets. Must sum to 1.0 and match length of
train_csvs. IfNone, datasets are concatenated equally. Default isNone.num_workers (int, optional) – Number of data loading workers. Default is 0.
seed (int, optional) – Random seed for reproducibility. Default is 42.
resume_from (str or Path, optional) – Path to checkpoint file to resume training from. Default is
None.save_interval (int, optional) – Save checkpoint every N epochs. If
None, only saves best model. Default isNone.device ({"cuda", "cpu"}, optional) – Training device. Default is
"cuda".freeze_cnn ({"none", "all", "first", "last"}, optional) – CNN freezing policy. Default is
"none".freeze_enc_rnn ({"none", "all", "first", "last"}, optional) – Encoder RNN freezing policy. Default is
"none".freeze_attention ({"none", "all"}, optional) – Attention module freezing policy. Default is
"none".pretrain_weights (str, Path, bool, or None, optional) –
Pretrained weights to initialize from:
"default"orTrue- use release weightsNoneorFalse- train from scratchstr/Path - path or URL to custom weights file
Default is
"default".**extra_config (dict, optional) – Additional configuration parameters passed to training config.
- Returns:
Path to the best model checkpoint saved during training.
- Return type:
Examples
Train on single dataset with validation:
>>> from manuscript.recognizers import TRBA >>> >>> best_model = TRBA.train( ... train_csvs="data/train.csv", ... train_roots="data/train_images", ... val_csvs="data/val.csv", ... val_roots="data/val_images", ... exp_dir="./experiments/trba_exp1", ... epochs=50, ... batch_size=64, ... img_h=64, ... img_w=256, ... ) >>> print(f"Best model saved at: {best_model}")
Train on multiple datasets with custom proportions:
>>> train_csvs = ["data/dataset1/train.csv", "data/dataset2/train.csv"] >>> train_roots = ["data/dataset1/images", "data/dataset2/images"] >>> train_proportions = [0.7, 0.3] # 70% from dataset1, 30% from dataset2 >>> >>> best_model = TRBA.train( ... train_csvs=train_csvs, ... train_roots=train_roots, ... train_proportions=train_proportions, ... val_csvs="data/val.csv", ... val_roots="data/val_images", ... epochs=100, ... lr=5e-4, ... optimizer="AdamW", ... weight_decay=1e-4, ... )
Resume training from checkpoint:
>>> best_model = TRBA.train( ... train_csvs="data/train.csv", ... train_roots="data/train_images", ... resume_from="experiments/trba_exp1/checkpoints/last.pth", ... epochs=100, ... )
Fine-tune from pretrained weights with frozen CNN:
>>> best_model = TRBA.train( ... train_csvs="data/finetune.csv", ... train_roots="data/finetune_images", ... pretrain_weights="default", ... freeze_cnn="all", ... epochs=20, ... lr=1e-4, ... )
Train with CTC for stability (always enabled):
>>> best_model = TRBA.train( ... train_csvs="data/train.csv", ... train_roots="data/train_images", ... optimizer="AdamW", ... scheduler="OneCycleLR", ... lr=1e-3, ... ctc_weight=0.3, ... ctc_weight_decay_epochs=50, ... max_grad_norm=5.0, ... epochs=100, ... )