Skip to content

2.0.0

Compare
Choose a tag to compare
@TimKoornstra TimKoornstra released this 04 Apr 12:06
· 47 commits to master since this release

Release Notes for Loghi-HTR Version 2.0.0

Date: 2024-04-04

Overview

Version 2.0.0 of Loghi-HTR marks a significant milestone in the evolution of our handwriting text recognition software. This release introduces comprehensive enhancements across the board, from data processing and model architecture to user interaction and system efficiency. Key updates include advanced visualization tools for in-depth analysis, a modular and easily navigable code structure, and a second version of our API designed for higher performance and better resource management. We've also focused on refining our GPU handling, data loading, and augmentation processes for optimized performance. Additionally, this version sees a revamp in configuration handling and logging for a more user-friendly experience, alongside the introduction of custom learning rate schedules and significant code quality improvements. Deprecated features and arguments have been carefully evaluated and updated to streamline operations and pave the way for future advancements. With version 2.0.0, users can expect a more powerful, efficient, and intuitive LoghiHTR, ready to meet the challenges of modern handwriting text recognition tasks.

Major Updates

  • Modular Code Structure: Significantly improved organization with functions grouped into subfolders within the src directory, aiding in maintainability.
  • API v2:
    • Improved support of gunicorn. This changes how the API should be started. For reference, check the example scripts in the src/api directory.
    • API refactored for efficiency. Key enhancements include:
      • Simplified queue system for faster processing.
      • New /health and /ready endpoints to monitor overall API and process status.
      • Optional user login through SimpleSecurity integration.
      • Separate decoding process for better GPU utilization.
    • See the updated README for detailed API changes and instructions.
  • Robust Logging: Streamlined logging with a more structured system, comprehensive validation logs including metric tables, and execution timers.
  • Improved Configuration Handling:
    • Run Loghi using a configuration file (--config_file) for greater flexibility.
    • Command-line arguments override config file settings for easy adjustments.
    • Revamped config.json structure for improved readability.
  • Enhanced Visualizations:
    • Time-step prediction visualizer: Highlights the top-3 most probable characters considered by the model at each time-step.
    • Filter activations visualizer: Shows how convolutional layers respond to input images and random noise, enabling analysis of different model architectures.
    • PDF combiner: Creates a single-sheet export of all generated visualizations.

Additional Improvements

  • Custom Learning Rate Schedule: Supports warmup, exponential decay, and linear decay.
  • GPU Handling Refinements
  • Revamped Data Loaders and Augmentations:
    • Data management classes refactored (DataLoader is now DataManager).
    • Data augmentations performed on the GPU for significant performance boost.
  • Code Quality Enhancements: Code simplifications, bug fixes, and improvements.
  • User Experience Improvements: The vis_arg_parser aligns with loghi-htr for a familiar command-line experience.

Deprecations (Effective May 2024)

Several arguments in LoghiHTR are being deprecated to streamline functionality and improve user experience. Here is a summary of the changes and the reasoning behind them:

  • --do_train: Future training processes will be initiated through a more flexible method by providing a train_list. This change allows for a more intuitive setup for training sessions.

  • --do_inference: Inference will be activated by supplying an inference_list, simplifying the command line interface and making it more intuitive to perform inferences.

  • --use_mask: Masking will be enabled by default, removing the need for explicit command-line toggling and reflecting the common use case directly in the application's behavior.

  • --no_auto: This argument will be removed to streamline the command line options, as auto-correction or similar functionalities will be incorporated more seamlessly into the application's logic.

  • --height: The height parameter will be inferred automatically from the VGSL specification, simplifying model configuration and ensuring consistency across model inputs.

  • --channels: Like height, the number of channels will be automatically inferred from the VGSL specification, reducing the need for manual specification and potential errors.

  • --output_charlist: The character list will be saved to output/charlist.txt by default, standardizing output file locations and reducing command line clutter.

  • --config_file_output: Configuration details will be saved to output/config.json by default, aligning with the standardized approach for output management.

  • --thaw: With models being saved with all layers thawed by default, this argument becomes unnecessary, simplifying model saving and loading processes.

  • --existing_model: The use of --existing_model will be replaced by the --model argument, streamlining the process of loading or creating models.

Additionally, we are phasing out support for the classic .pb-style TensorFlow SavedModel format. Starting May 2024, LoghiHTR will automatically convert any old models loaded in the .pb format to the new .keras format. This conversion process is designed to be seamless and will save the converted model to the specified output/model-name directory. This change aligns with our commitment to using the latest and most efficient formats, ensuring better performance and ease of use.

Docker Image

The Docker image for version 2.0.0 can be obtained using the following command:

docker pull loghi/docker.htr:2.0.0

Important Notes

  • Due to the significant changes, please test your workflows thoroughly and report issues.
  • We have strived for a smooth update, but some disruptions may occur. If you encounter problems, please open an issue on the project's GitHub repository.

Contributors

  • @TimKoornstra: A major force behind this release, Tim contributed to several key areas including the main refactor & organization of files, the introduction of an improved learning rate schedule, enhancements in argument handling and configuration, the development of API v2, and numerous quality of life and code quality improvements. His contributions have been instrumental in shaping the direction and capabilities of LoghiHTR 2.0.0.

  • @Thelukepet: Contributed to revamping visualization files and played a pivotal role in the V1 DataGenerator and Data Augmentation Revamp on GPU. These contributions have significantly improved data handling and model visualization capabilities.

  • @MMaas3: Made a notable first contribution by enhancing security features. This addition is crucial for the secure and reliable operation of Loghi-HTR.


Full Changelog: 1.3.12...2.0.0