Al is an advanced conversational AI assistant that uses speech recognition, natural language processing, and text-to-speech capabilities to engage in dynamic, context-aware conversations. This project is based on the mini-omni project and extends its functionality with improved context management and integration with the Ollama API.
- Speech recognition for user input
- Natural language processing using Ollama API
- Text-to-speech output using espeak
- Context-aware conversations
- Voice Activity Detection (VAD) for improved speech recognition
- Ability to change AI models
- Pause and resume functionality
- Immediate interruption of ongoing responses
- Clipboard content reading
- Screen text extraction
- Python 3.8 or higher
- espeak (for text-to-speech)
- Ollama (for language model inference)
- Microphone and speakers/headphones (to prevent feedback loops)
-
Clone the repository:
git clone https://github.com/ruapotato/Al.git cd Al
-
Create a virtual environment:
python -m venv pyenv
-
Activate the virtual environment:
- Linux:
source pyenv/bin/activate
- Linux:
-
Install the required packages:
pip install torch numpy pyaudio SpeechRecognition requests onnxruntime ollama pyperclip pytesseract pillow
-
Install espeak and other system dependencies:
- For Ubuntu/Debian:
sudo apt-get install espeak tesseract-ocr xdotool
- For Ubuntu/Debian:
-
Install Ollama following the instructions on their official website.
-
Ensure Ollama is running and the desired model is available. (Check ollama_integration.py for the default model being used)
-
Run the main script:
python main.py
-
Speak to Al when prompted. You can:
- Ask questions or give commands
- Say "pause" to pause Al's listening and response generation
- Say "Al" or "resume" to resume from a paused state
- Use "Al [command]" to resume and immediately process a command
- Say "read my clipboard" to have Al read the contents of your clipboard
- Say "read my screen" to have Al extract and read text from your active window
-
Press Ctrl+C to stop the program.
main.py
: The main script that initializes and runs the Al assistantAl_brain.py
: Handles the conversation logic and integration with OllamaAl_ears.py
: Manages speech recognition and transcriptionAl_voice.py
: Handles text-to-speech outputAl_eyes.py
: Manages clipboard reading and screen text extractionollama_integration.py
: Integrates with the Ollama API for language model inference
- To change the default AI model, modify the
model_name
parameter in theOllamaIntegration
class initialization inollama_integration.py
. - Adjust the
max_history
parameter inAl_brain.py
to control the conversation context length.
- If you encounter audio-related errors, ensure your microphone and speakers are properly configured and recognized by your system.
- For Ollama-related issues, check that the Ollama service is running and the desired model is available.
- If clipboard or screen reading fails, ensure you have the necessary permissions and that xclip and xdotool are properly installed.
Contributions are welcome! Please feel free to submit a Pull Request.
- This project is based on the mini-omni project.
- Special thanks to the Ollama team for their API.
This project is licensed under the MIT License - see the LICENSE file for details.
This AI assistant is a simulation and does not have real-world knowledge beyond its training data. It should not be used for critical decision-making or as a substitute for professional advice.