A Node.js Express server that provides a REST API for chatting with open-source AI models running locally via Ollama. This project allows you to interact with AI models like Llama 3.2 without relying on external cloud services.
- Local AI Models: Run AI models locally using Ollama
- REST API: Simple HTTP endpoints for chat interactions
- Memory Support: Chat with conversation history
- Multiple Endpoints: Different chat patterns for various use cases
- CORS Enabled: Ready for frontend integration
- Environment Configuration: Flexible configuration via environment variables
Before running this project, you need to have the following installed:
- Node.js (v16 or higher)
- Ollama - Download and install from ollama.ai
- Git (for cloning the repository)
-
Clone the repository
git clone https://github.com/anutechofficial/ai-models-at-localhost.git cd ai-models-at-localhost
-
Install dependencies
npm install
-
Set up environment variables Create a
.env
file in the root directory:PORT=5000 OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.2:1b
-
Install and run Ollama models
# Install the Llama 3.2 model (or any other model you prefer) ollama pull llama3.2:1b # Start Ollama service ollama serve
npm run dev
npm start
The server will start on http://localhost:5000
(or the port specified in your .env
file).
http://localhost:5000
- GET
/
- Description: Simple health check endpoint
- Response: Welcome message
Example:
curl http://localhost:5000/
Response:
"Welcome to Chat API"
- POST
/chat
- Description: Chat endpoint with conversation memory
- Content-Type:
application/json
Request Body:
{
"message": "Your message here"
}
Example:
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}'
Response:
{
"reply": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}
- POST
/chat-completions
- Description: Simple question-answering without memory
- Content-Type:
application/json
Request Body:
{
"message": "Your question here"
}
Example:
curl -X POST http://localhost:5000/chat-completions \
-H "Content-Type: application/json" \
-d '{"message": "What is the capital of France?"}'
Response:
{
"reply": "The capital of France is Paris."
}
Variable | Default | Description |
---|---|---|
PORT |
5000 |
Server port number |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama3.2:1b |
Default AI model to use |
You can use any model available in Ollama. Some popular options:
llama3.2:1b
- Lightweight Llama modelllama3.2:3b
- Medium-sized Llama modelllama3.2:8b
- Larger Llama modelmistral:7b
- Mistral 7B modelcodellama:7b
- Code-focused model
To use a different model, either:
- Change the
OLLAMA_MODEL
in your.env
file - Modify the model name in
server.js
ai-models-at-localhost/
βββ server.js # Main server file
βββ package.json # Dependencies and scripts
βββ package-lock.json # Locked dependency versions
βββ .env # Environment variables (create this)
βββ README.md # This documentation
- Express Server: HTTP server with CORS and JSON middleware
- LangChain Integration: Uses LangChain for AI model interactions
- ChatOllama: Connects to local Ollama instance
- Prompt Templates: Structured prompts for consistent responses
- Runnable Sequences: Chain processing for complex workflows
- Memory Management: The
/chat
endpoint maintains conversation history - Template-based Prompts: Uses LangChain prompt templates for consistent formatting
- Error Handling: Comprehensive error handling with appropriate HTTP status codes
- Flexible Model Support: Easy to switch between different AI models
Test the health endpoint:
curl http://localhost:5000/
Test chat with memory:
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, what is your name?"}'
Test simple question answering:
curl -X POST http://localhost:5000/chat-completions \
-H "Content-Type: application/json" \
-d '{"message": "Explain quantum computing in simple terms"}'
// Chat with memory
const response = await fetch('http://localhost:5000/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: 'Hello, how are you?'
})
});
const data = await response.json();
console.log(data.reply);
-
Ollama not running
- Error: Connection refused to localhost:11434
- Solution: Start Ollama with
ollama serve
-
Model not found
- Error: Model not found
- Solution: Pull the model with
ollama pull <model-name>
-
Port already in use
- Error: EADDRINUSE
- Solution: Change the PORT in your
.env
file
-
CORS issues
- Error: CORS policy blocked
- Solution: The server already has CORS enabled, check your frontend configuration
To enable debug logging, add this to your .env
file:
DEBUG=*
- This server is designed for local development
- No authentication is implemented
- CORS is enabled for all origins
- Consider adding rate limiting for production use
- Validate and sanitize user inputs
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama for providing the local AI model infrastructure
- LangChain for the AI framework
- Express.js for the web framework
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Ensure Ollama is running and the model is installed
- Verify your environment variables are set correctly
- Check the server logs for detailed error messages
Happy coding with Anurag Yadav! π€