@@ -49,7 +49,7 @@ With the OTW-Viewer, all generated documents (converted Markdown files, lexicon
4949- 📱 ** No-Code Gradio Interface** : Drag-&-drop upload with live terminal and complete pipeline control
5050- 🌐 ** Multi-Format Export** : LoRA, Merged (both for transformers, vLLM, etc.), GGUF in Q_8 with quantizations for local deployment (OpenWebUI/LM-Studio)
5151- 🔍 ** VLM Integration** : Vision-Language-Models for automatic image descriptions in documents
52- - ⚡ ** Universal API Support ** : Works with OpenAI, OpenRouter, Ollama, LM Studio, and any OpenAI-compatible API
52+ - ⚡ ** Runpod Integration ** : Scalable cloud GPU support for cost-effective training
5353
5454***
5555
@@ -60,20 +60,13 @@ With the OTW-Viewer, all generated documents (converted Markdown files, lexicon
6060** Hardware:**
6161- ** Linux system recommended** (Ubuntu 22.04 LTS or similar)
6262- ** At least 100 GB free storage space**
63- - ** For Training: NVIDIA GPU with at least 20 GB VRAM** (depending on the model being trained)
63+ - ** NVIDIA GPU with at least 20 GB VRAM** (depending on the model being trained)
6464 - RTX 4090/A6000/A100 recommended
6565 - For smaller models: RTX 3090/4080 (16GB) possible
66- - ** For Dataset Generation Only: No GPU required** (can use cloud APIs)
67- - ** CUDA 12.8+ and cuDNN** (only if using local GPU)
66+ - ** CUDA 12.8+ and cuDNN installed**
6867
6968** Accounts:**
7069- ** HuggingFace Account** with Access Token (Read + optional Write)
71- - ** API Access** (choose one):
72- - OpenAI API Key
73- - OpenRouter API Key
74- - Ollama (local installation)
75- - LM Studio (local installation)
76- - Any OpenAI-compatible API endpoint
7770
7871### HuggingFace Token Setup
7972
@@ -82,222 +75,78 @@ With the OTW-Viewer, all generated documents (converted Markdown files, lexicon
82753 . Create a new token with ** Read** permission (and ** Write** for model upload)
83764 . Note down the token for installation
8477
85- ### Universal Installation (NEW - Works with any API)
86-
87- OpenTuneWeaver now supports ** any OpenAI-compatible API** for dataset generation. Choose your preferred installation method:
88-
89- #### Quick Installation with Direct Script
90-
91- ``` bash
92- # Download and run the universal setup script
93- wget https://raw.githubusercontent.com/ProfEngel/OpenTuneWeaver/main/setup_universal.sh
94- chmod +x setup_universal.sh
95-
96- # Configure your API (choose one):
97-
98- # Option 1: For OpenAI
99- export OPENAI_API_TYPE=openai
100- export OPENAI_API_BASE=https://api.openai.com/v1
101- export OPENAI_API_KEY=sk-your-key-here
102- export OPENAI_MODEL_NAME=gpt-4
103-
104- # Option 2: For OpenRouter
105- export OPENAI_API_TYPE=openrouter
106- export OPENAI_API_BASE=https://openrouter.ai/api/v1
107- export OPENAI_API_KEY=your-openrouter-key
108- export OPENAI_MODEL_NAME=meta-llama/llama-3.2-3b-instruct
109-
110- # Option 3: For local Ollama (default)
111- export OPENAI_API_TYPE=ollama
112- export OPENAI_API_BASE=http://localhost:11434/v1
113- export OPENAI_MODEL_NAME=gemma3:12b-it-qat # VLM-Model for Image description
114-
115- # Option 4: For LM Studio
116- export OPENAI_API_TYPE=lmstudio
117- export OPENAI_API_BASE=http://localhost:1234/v1
118- export OPENAI_MODEL_NAME=your-loaded-model
119-
120- # Run the installation
121- ./setup_universal.sh
122- ```
123-
124- #### Installation with Virtual Environment (Recommended)
125-
126- ``` bash
127- # Create and activate virtual environment
128- python3 -m venv opentuneweaver-env
129- source opentuneweaver-env/bin/activate
130-
131- # Clone repository
132- git clone https://github.com/ProfEngel/OpenTuneWeaver.git
133- cd OpenTuneWeaver
134-
135- # Install dependencies
136- pip install --upgrade pip
137- pip install -r requirements.txt
138-
139- # Configure API (see options above)
140- export OPENAI_API_TYPE=openai # or your preferred API
141- export OPENAI_API_BASE=https://api.openai.com/v1
142- export OPENAI_API_KEY=your-api-key
143- export OPENAI_MODEL_NAME=gpt-4
144-
145- # Run setup
146- ./setup_universal.sh
147- ```
148-
149- #### Installation with Conda
150-
151- ``` bash
152- # Create conda environment
153- conda create -n opentuneweaver python=3.11
154- conda activate opentuneweaver
155-
156- # Clone repository
157- git clone https://github.com/ProfEngel/OpenTuneWeaver.git
158- cd OpenTuneWeaver
159-
160- # Install dependencies
161- pip install -r requirements.txt
162-
163- # Install unsloth (for training)
164- pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth-zoo.git
165-
166- # Configure API (see options above)
167- export OPENAI_API_TYPE=your-api-type
168- export OPENAI_API_BASE=your-api-base-url
169- export OPENAI_API_KEY=your-api-key
170- export OPENAI_MODEL_NAME=your-model
171-
172- # Run setup
173- ./setup_universal.sh
174- ```
175-
176- #### Docker Installation (Recommended for Production) (not tested yet)
177-
178- ``` bash
179- # Clone repository
180- git clone https://github.com/ProfEngel/OpenTuneWeaver.git
181- cd OpenTuneWeaver
182-
183- # Copy and configure environment
184- cp .env.example .env
185- # Edit .env with your API settings
186-
187- # Build and run with Docker Compose
188- docker-compose up -d
189-
190- # Access at http://localhost:8080
191- ```
192-
193- ### Runpod Installation (For Simple Online-GPU Training)
78+ ### Quick Start with Runpod (Recommended)
19479
19580** Runpod Template:**
19681```
82+
19783runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04
19884Disk Volume: 100 GB
19985Pod Volume: 100 GB
20086Open Ports: 8080,11434
87+
20188```
20289
20390** Installation:**
204- ``` bash
91+ ```
92+
20593cd /workspace
20694git clone https://github.com/ProfEngel/OpenTuneWeaver.git
207- cd OpenTuneWeaver
208-
209- # For Runpod with Ollama (local inference)
95+ cp OpenTuneWeaver/setup_runpod_direct.sh .
96+ chmod +x setup_runpod_direct.sh
21097./setup_runpod_direct.sh
21198
212- # OR for Runpod with external API
213- export OPENAI_API_TYPE=openai
214- export OPENAI_API_BASE=https://api.openai.com/v1
215- export OPENAI_API_KEY=your-key
216- export OPENAI_MODEL_NAME=gpt-4
217- ./setup_universal.sh
21899```
219100
220- ### API Configuration Examples
101+ ** After installation: **
221102
222- #### Using OpenAI GPT-4
223- ``` bash
224- export OPENAI_API_TYPE=openai
225- export OPENAI_API_BASE=https://api.openai.com/v1
226- export OPENAI_API_KEY=sk-...your-key...
227- export OPENAI_MODEL_NAME=gpt-5-mini # or gpt-4
228- ```
103+ wait until the installation is done, then press y for starting the ui. The ui starts on port http://yourIP:8080
229104
230- #### Using OpenRouter
231- ``` bash
232- export OPENAI_API_TYPE=openrouter
233- export OPENAI_API_BASE=https://openrouter.ai/api/v1
234- export OPENAI_API_KEY=your-openrouter-key
235- export OPENAI_MODEL_NAME=meta-llama/llama-3.2-3b-instruct
236- # Other models: claude-3-opus, mistral-large, etc.
237- ```
105+ In Runpod access via Runpod web interface on port 8080.
238106
239- #### Using Local Ollama
240- ``` bash
241- # First install Ollama
242- curl -fsSL https://ollama.com/install.sh | sh
243- ollama pull gemma3:12b-it-qat
107+ ### Alternative Installation Methods
244108
245- # Configure OpenTuneWeaver
246- export OPENAI_API_TYPE=ollama
247- export OPENAI_API_BASE=http://localhost:11434/v1
248- export OPENAI_MODEL_NAME=gemma3:12b-it-qat
109+ ** Docker Installation:** * (Coming Soon)*
249110```
250111
251- #### Using LM Studio
252- ``` bash
253- # Start LM Studio and load a model
254- # Then configure:
255- export OPENAI_API_TYPE=lmstudio
256- export OPENAI_API_BASE=http://localhost:1234/v1
257- export OPENAI_MODEL_NAME=your-loaded-model
258- ```
112+ docker run -d -p 7860:7860 --gpus all -v opentuneweaver:/app/data --name opentuneweaver opentuneweaver/opentuneweaver:latest
259113
260- #### Using Custom API Endpoint
261- ``` bash
262- export OPENAI_API_TYPE=custom
263- export OPENAI_API_BASE=https://your-api-endpoint.com/v1
264- export OPENAI_API_KEY=your-api-key
265- export OPENAI_MODEL_NAME=your-model-name
266114```
267115
268- ### Starting OpenTuneWeaver
116+ ** Conda Installation:**
117+ ```
269118
270- After installation, start the application:
119+ conda create -n opentuneweaver python=3.11
120+ conda activate opentuneweaver
121+ apt-get update && apt-get upgrade -y
122+ git clone https://github.com/ProfEngel/OpenTuneWeaver.git
123+ cp OpenTuneWeaver/setup_runpod_direct.sh .
124+ chmod +x setup_runpod_direct.sh
271125
272- ``` bash
273- # Direct start
274- ./start_otw.sh
126+ # Installation von unsloth_zoo direkt von GitHub
127+ pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth-zoo.git
275128
276- # Or with custom port
277- export SERVER_PORT=7860
278- ./start_otw.sh
129+ # Dann das Setup-Skript ausführen
130+ ./setup_runpod_direct.sh
279131
280- # Access the UI
281- # Local: http://localhost:8080
282- # Remote: http://your-server-ip:8080
283132```
284133
285- ### Troubleshooting
134+ ** Virtual Environment:**
135+ ```
286136
287- If you encounter issues:
137+ python3.11 -m venv opentuneweaver-env
138+ source opentuneweaver-env/bin/activate
139+ apt-get update && apt-get upgrade -y
140+ git clone https://github.com/ProfEngel/OpenTuneWeaver.git
141+ cp OpenTuneWeaver/setup_runpod_direct.sh .
142+ chmod +x setup_runpod_direct.sh
288143
289- ``` bash
290- # Check installation
291- ./debug_otw.sh
144+ # Installation von unsloth_zoo direkt von GitHub
145+ pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth-zoo.git
292146
293- # View logs
294- tail -f logs/pipeline.log
147+ # Dann das Setup-Skript ausführen
148+ ./setup_runpod_direct.sh
295149
296- # Test API connection
297- curl -X POST $OPENAI_API_BASE /chat/completions \
298- -H " Authorization: Bearer $OPENAI_API_KEY " \
299- -H " Content-Type: application/json" \
300- -d ' {"model": "' $OPENAI_MODEL_NAME ' ", "messages": [{"role": "user", "content": "Test"}]}'
301150```
302151
303152***
@@ -367,15 +216,17 @@ OpenTuneWeaver would not be possible without these excellent open-source framewo
367216
368217If you use OpenTuneWeaver in your research, please cite our paper:
369218
370- ``` bibtex
219+ ```
220+
371221@article{opentuneweaver2024,
372- title={OpenTuneWeaver: Semantically-structured, Curatable LLM Fine-tuning Pipeline for Research and Education},
373- author={Engel, Prof. Dr. Mathias},
374- journal={arXiv preprint},
375- year={2024},
376- institution={Hochschule für Wirtschaft und Umwelt Nürtingen-Geislingen},
377- note={Funded by MWK Baden-Württemberg and Stifterverband Deutschland}
222+ title={OpenTuneWeaver: Semantically-structured, Curatable LLM Fine-tuning Pipeline for Research and Education},
223+ author={Engel, Prof. Dr. Mathias},
224+ journal={arXiv preprint},
225+ year={2024},
226+ institution={Hochschule für Wirtschaft und Umwelt Nürtingen-Geislingen},
227+ note={Funded by MWK Baden-Württemberg and Stifterverband Deutschland}
378228}
229+
379230```
380231
381232** Paper available:**
@@ -427,4 +278,4 @@ Semantically-structured, curatable all-in-one LLM fine-tuning pipeline
427278
428279### Topics
429280
430- ` llm ` ` finetuning ` ` ai ` ` machine-learning ` ` nlp ` ` semantic-chunking ` ` lora ` ` qlora ` ` pdf-processing ` ` qa-generation ` ` benchmarking ` ` gradio ` ` huggingface ` ` educational-ai ` ` research-tools `
281+ ` llm ` ` finetuning ` ` ai ` ` machine-learning ` ` nlp ` ` semantic-chunking ` ` lora ` ` qlora ` ` pdf-processing ` ` qa-generation ` ` benchmarking ` ` gradio ` ` huggingface ` ` educational-ai ` ` research-tools `
0 commit comments