Skip to content

Releases: Blaizzy/mlx-audio

v0.0.2

07 Mar 22:45
f24355d
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.1...v0.0.2

v0.0.1

28 Feb 16:41
Compare
Choose a tag to compare

Release Notes - February 28, 2025

Overview

This release introduces a fully functional MLX-Audio package with text-to-speech capabilities, complete with testing infrastructure and CI/CD integration via GitHub Actions.

New Features

  • Text-to-Speech Generation: Added complete generation pipeline with audio output functionality
  • Audio Joining: New functionality to join multiple audio segments
  • Model Quantization: Added support for model quantization to improve performance
  • GitHub Actions: Implemented CI/CD workflows for automated testing and deployment

Improvements

  • Kokoro MLX porting: Completed refactoring of the entire model to MLX framework:
    • Text encoder with BERT implementation
    • Decoder with improved audio quality
    • Duration, indices, and alignment target prediction
    • Custom Bidirectional LSTM, Weight norm for CNNs, AdaLayerNorm and Generator layers
  • SafeTensors Support: Added working implementation for SafeTensors format
  • Pipeline Structure: Restructured the generation pipeline for better maintainability

Bug Fixes

  • Fixed model loading mechanism
  • Resolved issues with text encoder LayerNorm operation
  • Fixed generator functionality
  • Addressed issues in LSTM and AdaLayerNorm implementations
  • Refactored and fixed ConvWeight component

Full Changelog: https://github.com/Blaizzy/mlx-audio/commits/v0.0.1