A comprehensive iOS XCFramework for Tesseract 5.5.1 with JPEG-12bit integration and complete imaging pipeline for exceptional OCR quality in iOS applications.
- Advanced Tesseract 5.5.1: Latest OCR engine with 52,977 optimized functions and 100+ language support
- JPEG-12bit: 88 specialized j12 functions* with libjpeg-turbo 3.0.4 runtime precision selection
- Complete Imaging Pipeline: 402 Leptonica functions, 936 PNG, 1,196 TIFF symbols for comprehensive image processing
- Enhanced OCR Quality: Handles complex layouts, compressed images, and challenging document types
- Massive Static Integration: 425MB framework with all dependencies statically linked (no external requirements)
- Universal iOS Binary: Optimized arm64 device (142MB) + arm64/x86_64 simulator (282MB) support
- Broad Compatibility: iOS 13.0+ with optimized cross-compilation
- Automated Build: CI/CD pipeline with automated testing and artifact generation
This framework provides a complete imaging pipeline with integrated libraries:
JPEG-12bit Excellence (libjpeg-turbo 3.0.4):
- 88 specialized j12 functions* for 12-bit precision processing
- Runtime precision selection for optimal quality vs. performance
- Advanced SIMD acceleration and arithmetic coding support
Complete Image Format Support:
- Leptonica (402 functions): Image processing and analysis
- PNG (936 symbols): Full PNG specification with transparency and metadata
- TIFF (1,196 symbols): Complete TIFF support including compression variants
- 57 JPEG API functions: Comprehensive JPEG handling and optimization
Enhanced OCR Quality for:
- High-resolution scanned documents and technical drawings
- Compressed JPEG images with artifacts
- Multi-format document processing (PDF, TIFF, PNG)
- Complex layouts with mixed text and graphics
- Financial documents requiring precise character recognition
- Low-contrast or challenging image conditions
Perfect for a wide range of iOS applications:
- Document Processing: Invoices, contracts, receipts, PDFs
- Business Cards: Automatic contact information extraction
- Translation Apps: OCR text from images for real-time translation
- Accessibility: Convert images to text for screen readers
- Receipt Scanning: Automated expense tracking
- Digital Libraries: Convert books and notes to digital text
- Real-time OCR: Live camera text recognition
- Form Processing: Automated data extraction
- Financial Documents: Bank statements, insurance papers
- Note Taking: Convert printed text and typed documents to text
For production apps, use the complete pre-built framework:
- Download: Get the latest framework from GitHub Releases
- Add to Project: Drag
TesseractOCR.xcframework
into your Xcode project - Embed Framework: Add to target's "Frameworks, Libraries, and Embedded Content"
- Import and Use:
import TesseractOCR
let tesseract = G8Tesseract(language: "eng")
tesseract?.image = yourUIImage
tesseract?.recognize()
let recognizedText = tesseract?.recognizedText
Pre-built Framework Benefits:
- Complete integration: 52,977 optimized functions ready to use
- Complete imaging pipeline: All image formats and processing capabilities included
- Massive static libraries (~425MB) with zero external dependencies
- Production-tested: Thoroughly validated for iOS deployment
- No build complexity: Download and integrate immediately
For development, testing, and local production builds:
# Clone the repository
git clone https://github.com/thebenfarmer/TesseractOCR-iOS-Framework.git
cd TesseractOCR-iOS-Framework
# Build production framework (local optimized build)
./build_script_local.sh
# OR use universal build script (same as CI/CD)
./build_script.sh
Build Scripts:
build_script_local.sh
: Optimized local production build with 88 j12* symbolsbuild_script.sh
: Universal script (CI/local) - attempts production, fallback to demo- Both create: Complete XCFramework (~425MB) when successful
- CI/CD: Uses
build_script.sh
for realistic testing
- Device Framework: arm64 iOS (optimized for iPhones/iPads)
- Simulator Framework: Universal Binary (arm64 + x86_64)
- Framework Size: ~425MB (complete integration with all libraries)
- JPEG Integration: 88 j12* + 34 JPEG API symbols
- Dependencies: All included (Leptonica, libjpeg-turbo, libpng, libtiff, zlib)
dependencies: [
.package(url: "https://github.com/thebenfarmer/TesseractOCR-iOS-Framework.git", from: "1.0.0")
]
pod 'TesseractOCR-iOS', :git => 'https://github.com/thebenfarmer/TesseractOCR-iOS-Framework.git', :tag => '1.0.0'
- Download the framework from Releases
- Drag
TesseractOCR.xcframework
into your project - Embed the framework in your target
- iOS: 13.0+ (Broad compatibility - iPhone 6s and newer)
- Xcode: 12.0+ (for building from source)
- macOS: 10.15+ (for building from source)
- Disk Space: 2GB free space (for building)
Download language packs from Tesseract tessdata_fast and add to your app bundle:
// Single language
let tesseract = G8Tesseract(language: "deu")
// Multiple languages
let tesseract = G8Tesseract(language: "eng+deu+fra")
Integration Benefits:
- 52,977 optimized functions: Complete OCR and imaging pipeline
- 12-bit JPEG precision: 88 specialized functions with runtime selection
- Comprehensive image support: 402 Leptonica + 936 PNG + 1,196 TIFF functions
- Zero external dependencies: All libraries statically integrated
- Memory-optimized: Optimized cross-compilation with iOS-specific optimizations
- Multi-architecture native: Separate optimized binaries for device and simulator
Pre-built frameworks are available in GitHub Releases:
- Complete Integration: 52,977 functions with complete imaging pipeline
- Advanced JPEG-12bit: 88 specialized j12* functions with libjpeg-turbo 3.0.4
- Comprehensive Libraries: Tesseract 5.5.1, Leptonica 402 functions, PNG/TIFF complete support
- Massive Static Integration: 425MB (142MB device + 282MB simulator) with zero dependencies
- Production-Ready: Optimized cross-compilation for iOS deployment
Single build script for all environments:
- Local Development: Same script attempts production build locally
- CI/CD Testing: Identical build process validates real cross-compilation
- Automatic Fallback: Demo framework if production build fails
- Environment Detection: Adapts behavior for GitHub Actions vs local
- Realistic Testing: CI tests actual build pipeline, not just structure
Every commit is automatically tested with:
- macOS Latest with Xcode Latest
- Complete Framework Build from scratch with all 52,977 functions
- Integration Verification (88 j12* JPEG-12bit symbols)
- Multi-Architecture Testing (arm64 device + arm64/x86_64 simulator)
- Comprehensive Artifact Upload for immediate download and integration
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Test with
./build_script.sh
- Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project includes components with different licenses:
- Tesseract OCR: Apache 2.0
- Leptonica: BSD-style
- libjpeg-turbo: BSD-style (with enhanced 12-bit support)
- libpng: PNG License
- libtiff: MIT-style
- zlib: zlib License
- Install Xcode Command Line Tools:
xcode-select --install
- Verify internet connection for dependency downloads
- Ensure sufficient disk space (2GB+)
- Check iOS deployment target (13.0+ required)
- Verify framework is properly embedded
- Include language data files in app bundle
- Framework includes 52,977 optimized functions for comprehensive image processing
- Leverage 88 j12* JPEG-12bit functions for high-quality image processing
- Consider image preprocessing with integrated Leptonica functions for optimal results
- Monitor memory usage with large images (425MB static framework)
Built with Tesseract 5.5.1, JPEG-12bit integration, and complete imaging pipeline (52,977 functions) for exceptional iOS OCR quality.