A complete pipeline that converts spoken mathematical expressions into LaTeX format using OpenAI Whisper for speech recognition and a trained NLP model for text-to-math conversion.
Audio Input โ Whisper (Speech-to-Text) โ NLP Model (Text-to-Math) โ LaTeX Output
# Clone the repository
git clone <repository-url>
cd textalk
# Install dependencies
pip install -r requirements.txt# Run complete training pipeline
python train_and_evaluate.pyThis will:
- Generate a training dataset (5000 examples)
- Train a T5-based NLP model
- Evaluate the model performance
- Test the complete pipeline
# Test with text input
python complete_pipeline.py
# Test with audio file
python -c "
from complete_pipeline import CompleteSpeechToMathPipeline
pipeline = CompleteSpeechToMathPipeline()
text, latex = pipeline.process_audio('your_audio.wav')
print(f'Text: {text}')
print(f'LaTeX: {latex}')
"textalk/
โโโ speech_to_math_pipeline.py # Basic pipeline with rule-based conversion
โโโ complete_pipeline.py # Complete pipeline with trained NLP model
โโโ dataset_generator.py # Generate training dataset
โโโ nlp_model_trainer.py # Train the NLP model
โโโ train_and_evaluate.py # Complete training and evaluation script
โโโ requirements.txt # Dependencies
โโโ README.md # This file
- Model: OpenAI Whisper (base model)
- Input: Audio files (WAV, MP3, etc.)
- Output: Transcribed text
- Base Model: T5-large (Text-to-Text Transfer Transformer)
- Training Data: 5000+ generated mathematical expressions
- Input: Natural language mathematical expressions
- Output: LaTeX mathematical expressions
The dataset includes:
- "integral from 0 to infinity of e to the negative x dx" โ
\int_{0}^{\infty} e^{-x} \, dx - "integrate x squared from 0 to 1" โ
\int_{0}^{1} x^2 \, dx
- "derivative of x squared with respect to x" โ
\frac{d}{dx}(x^2) - "d sine x over d x" โ
\frac{d}{dx}(\sin(x))
- "limit of sine x over x as x approaches zero" โ
\lim_{x \to 0} \frac{\sin(x)}{x} - "lim 1 over x as x goes to infinity" โ
\lim_{x \to \infty} \frac{1}{x}
- "x squared plus y squared" โ
x^2 + y^2 - "square root of x plus y" โ
\sqrt{x + y}
from complete_pipeline import CompleteSpeechToMathPipeline
# Initialize pipeline
pipeline = CompleteSpeechToMathPipeline()
# Convert text to math
text = "integral from 0 to infinity of e to the negative x dx"
latex = pipeline.process_text(text)
print(latex) # \int_{0}^{\infty} e^{-x} \, dx# Process audio file
text, latex = pipeline.process_audio("math_expression.wav")
print(f"Transcribed: {text}")
print(f"LaTeX: {latex}")texts = [
"derivative of x squared with respect to x",
"limit of sine x over x as x approaches zero",
"x squared plus y squared"
]
for text in texts:
latex = pipeline.process_text(text)
print(f"{text} โ {latex}")from dataset_generator import MathDatasetGenerator
generator = MathDatasetGenerator()
dataset = generator.generate_dataset(num_samples=10000)
generator.save_dataset(dataset, "custom_dataset.json")from nlp_model_trainer import MathNLPTrainer
trainer = MathNLPTrainer(model_name="t5-base") # Use larger model
trainer.prepare_data("custom_dataset.json")
trainer.train(
output_dir="./custom_model",
num_epochs=10,
batch_size=16
)pipeline = CompleteSpeechToMathPipeline(
nlp_model_path="./custom_model"
)The model is evaluated on:
- Accuracy: Exact match with expected LaTeX
- BLEU Score: Semantic similarity
- Perplexity: Model confidence
- Inference Time: Processing speed
After training, check:
evaluation_report.json- Detailed metricstraining_progress.png- Loss curves- Test examples in the report
- Extend Dataset Generator:
# Add to math_patterns in dataset_generator.py
"new_pattern": [
("your text template", "\\latex_template"),
]- Retrain Model:
python train_and_evaluate.py- T5-small: Fast, good for prototyping
- T5-base: Better accuracy, slower
- T5-large: Best accuracy, requires more resources (currently configured)
-
CUDA Out of Memory:
- Reduce batch size (already optimized for T5-large)
- Use smaller model (t5-small or t5-base)
- Use CPU:
device = "cpu" - Enable gradient checkpointing
-
Poor Transcription:
- Use higher quality audio
- Try larger Whisper model
- Preprocess audio (noise reduction)
-
Incorrect Math Conversion:
- Add more training examples
- Increase training epochs
- Use larger NLP model
- GPU: Use CUDA for faster training
- Batch Size: Increase for better GPU utilization
- Model Size: Balance between accuracy and speed
- PyTorch: Deep learning framework
- Transformers: Hugging Face model library
- Whisper: OpenAI speech recognition
- Datasets: Data processing
- Scikit-learn: Evaluation metrics
- Support for more mathematical domains
- Real-time audio processing
- Web interface
- Mobile app
- Integration with note-taking apps
- Multi-language support
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
For questions or issues:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the evaluation report
Built with โค๏ธ for the mathematical community