Pipeline Usage Guide
The Pipeline class in manuscript-ocr is designed to work with any detectors, recognizers, and correctors that implement a simple interface.
Detector Requirements
A detector class must implement a predict method that takes an image and returns a dictionary with a "page" key:
def predict(self, image) -> Dict[str, Any]:
"""
Parameters:
- image: file path (str) or numpy array (H, W, 3) in uint8
Returns dictionary:
{
"page": Page # Page object with detection results
}
"""
pass
Result Structure
The result must contain a Page object with hierarchy:
Page → Block → Line → Word
See src/manuscript/data/structures.py for detailed structure documentation.
Minimal example of creating a Page:
from manuscript.data import Word, Line, Block, Page
# Create a word with coordinates and detection confidence
word = Word(
polygon=[(10, 20), (100, 20), (100, 40), (10, 40)],
detection_confidence=0.95
)
# Group words into a line
line = Line(words=[word])
# Group lines into a block
block = Block(lines=[line])
# Create a page
page = Page(blocks=[block])
Recognizer Requirements
A recognizer class must implement a predict method that takes a list of images and returns a list of results:
def predict(self, images: List[np.ndarray]) -> List[Dict[str, Any]]:
"""
Parameters:
- images: list of numpy arrays (RGB word images)
Returns list of dictionaries:
[
{"text": "word1", "confidence": 0.95},
{"text": "word2", "confidence": 0.92},
...
]
"""
pass
Example:
class MyRecognizer:
def predict(self, images):
results = []
for img in images:
# Your recognition logic
text = "recognized_text"
confidence = 0.92
results.append({"text": text, "confidence": confidence})
return results
Corrector Requirements
A corrector class must implement a predict method that takes a Page object and returns a corrected Page:
def predict(self, page: Page) -> Page:
"""
Parameters:
- page: Page object with recognized text
Returns:
- Page: Page object with corrected text
"""
pass
Example:
from manuscript.data import Page
class MyCorrector:
def predict(self, page: Page) -> Page:
result = page.model_copy(deep=True)
for block in result.blocks:
for line in block.lines:
for word in line.words:
if word.text:
# Your correction logic
word.text = self._correct(word.text)
return result
def _correct(self, text: str) -> str:
# Text correction logic
return text
Built-in CharLM Corrector
CharLM is a Transformer-based character-level language model for correcting OCR errors:
from manuscript.correctors import CharLM
# With default settings
corrector = CharLM()
# With custom parameters
corrector = CharLM(
weights="prereform_charlm_g1", # or "modern_charlm_g1"
mask_threshold=0.05, # confidence threshold for correction
apply_threshold=0.95, # minimum model confidence
max_edits=2, # max edits per word
min_word_len=4, # min word length for correction
lexicon="prereform_words" # lexicon of known words
)
Compatible Implementation Examples
Complete Detector Example
from manuscript.data import Word, Line, Block, Page
class MyDetector:
def predict(self, image):
# Your image detection logic
# ...
# Create result
words = [
Word(
polygon=[(10, 20), (100, 20), (100, 40), (10, 40)],
detection_confidence=0.95
),
Word(
polygon=[(110, 20), (200, 20), (200, 40), (110, 40)],
detection_confidence=0.92
),
]
line = Line(words=words)
block = Block(lines=[line])
page = Page(blocks=[block])
return {"page": page}
Using Custom Components
from manuscript import Pipeline
from my_package import MyDetector, MyRecognizer, MyCorrector
# Use custom detector and recognizer
detector = MyDetector()
recognizer = MyRecognizer()
corrector = MyCorrector()
pipeline = Pipeline(
detector=detector,
recognizer=recognizer,
corrector=corrector
)
result = pipeline.predict("document.jpg")
Pipeline Usage Examples
Basic Usage
from manuscript import Pipeline
# Initialize with default models
pipeline = Pipeline()
# Process image
result = pipeline.predict("document.jpg")
page = result["page"]
# Extract text
text = pipeline.get_text(page)
print(text)
Detection Only (Without Recognition)
result = pipeline.predict("document.jpg", recognize_text=False)
page = result["page"]
# Words have polygon and detection_confidence, but no text
for block in page.blocks:
for line in block.lines:
for word in line.words:
print(f"Polygon: {word.polygon}, Confidence: {word.detection_confidence}")
With Visualization
result, vis_img = pipeline.predict("document.jpg", vis=True)
vis_img.save("output_visualization.jpg")
Intermediate Results
from manuscript.correctors import CharLM
pipeline = Pipeline(corrector=CharLM())
result = pipeline.predict("document.jpg")
# Result after detection (before recognition)
detection_page = pipeline.last_detection_page
# Result after recognition (before correction)
recognition_page = pipeline.last_recognition_page
# Result after correction (None if corrector not used)
correction_page = pipeline.last_correction_page
Export/Import Page to JSON
page = result["page"]
# Save to file
page.to_json("result.json")
# Get as string
json_str = page.to_json()
# Load from file
from manuscript.data import Page
page = Page.from_json("result.json")
# Load from string
page = Page.from_json('{"blocks": [...]}')
With Profiling
# Prints execution time for each stage
result = pipeline.predict("document.jpg", profile=True)
# Output:
# Detection: 0.123s
# Load image for crops: 0.005s
# Extract 45 crops: 0.012s
# Recognition: 0.234s
# Pipeline total: 0.374s
Batch Processing
images = ["page1.jpg", "page2.jpg", "page3.jpg"]
results = pipeline.process_batch(images)
for result in results:
text = pipeline.get_text(result["page"])
print(text)
Component Configuration
Replacing Detector or Recognizer
from manuscript import Pipeline
# Only custom detector, default recognizer
from my_package import MyCustomDetector
pipeline = Pipeline(detector=MyCustomDetector())
# Only custom recognizer, default detector
from my_package import MyCustomRecognizer
pipeline = Pipeline(recognizer=MyCustomRecognizer())
# Both components custom
pipeline = Pipeline(detector=MyCustomDetector(), recognizer=MyCustomRecognizer())
Built-in Model Configuration
from manuscript import Pipeline
from manuscript.detectors import EAST
from manuscript.recognizers import TRBA
# EAST with settings
detector = EAST(
weights="east_50_g1", # weight selection
score_thresh=0.8, # confidence threshold
nms_thresh=0.2, # NMS threshold
device="cpu" # device (cpu/cuda)
)
# TRBA with settings
recognizer = TRBA(
weights="trba_lite_g1", # weight selection
device="cuda" # GPU for acceleration
)
pipeline = Pipeline(detector, recognizer)
Size Filtering
# Ignore text blocks smaller than 10 pixels
pipeline = Pipeline(min_text_size=10)
Automatic Rotation Control
# Enable automatic rotation of vertical text (default)
pipeline = Pipeline(rotate_threshold=1.5)
# Disable automatic rotation
pipeline = Pipeline(rotate_threshold=0)