Agent Skills: OCR Super Surya

GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract.

UncategorizedID: aktsmm/agent-skills/ocr-super-surya

Install this agent skill to your local

pnpm dlx add-skill https://github.com/aktsmm/Agent-Skills/tree/HEAD/ocr-super-surya

Skill Files

Browse the full folder contents for ocr-super-surya.

Download Skill

Loading file tree…

ocr-super-surya/SKILL.md

Skill Metadata

Name
ocr-super-surya
Description
"GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract."

OCR Super Surya

GPU-optimized OCR using Surya.

When to Use

  • OCR, extract text from image, text recognition, 画像から文字
  • Extracting text from screenshots, photos, or scanned images
  • Processing PDFs with embedded images
  • Multi-language document OCR (90+ languages including Japanese)

Features

| Feature | Description | | ------------- | --------------------------------------- | | Accuracy | 2x better than Tesseract (0.97 vs 0.88) | | GPU | PyTorch-based, CUDA optimized | | Languages | 90+ including CJK | | Layout | Document layout, table recognition |

Quick Start

Installation

# 1. Check GPU
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# 2. Install (with CUDA if GPU available)
pip install surya-ocr

# If CUDA=False but you have GPU, reinstall PyTorch:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Usage

# CLI
python scripts/ocr_helper.py image.png
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

# Or use surya directly
surya_ocr image.png --output_dir ./results

Python API

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

image = Image.open("document.png")
found_pred = FoundationPredictor()
rec_pred = RecognitionPredictor(found_pred)
det_pred = DetectionPredictor()

for page in rec_pred([image], det_predictor=det_pred):
    for line in page.text_lines:
        print(line.text)

GPU Configuration

| Variable | Default | Description | | ------------------------ | ------- | --------------------- | | RECOGNITION_BATCH_SIZE | 512 | Reduce for lower VRAM | | DETECTOR_BATCH_SIZE | 36 | Reduce if OOM |

export RECOGNITION_BATCH_SIZE=256
surya_ocr image.png

Scripts

| Script | Description | | ----------------------- | ----------------------------------------- | | scripts/ocr_helper.py | Helper with OOM auto-retry, batch support |

License Note

  • Surya: GPL-3.0 (code), commercial license required for >$2M revenue