DGT (pronounced "digit") is a framework that enables different algorithms and models to be used to generate synthetic data.
This is the main repository for DiGiT, our Data Generation and Transformation framework.
First clone the repository
git clone [email protected]:IBM/fms_dgt.git
cd fms_dgtNow set up your virtual environment. We recommend using a Python virtual environment with Python >=3.10.15 and <3.13.x. Here is how to setup a virtual environment using Python venv
python3 -m venv .venv
source .venv/bin/activateTo install packages, we recommend the following
pip install -e ".[all]"Important
Please install the pre-commit hooks to adhere with code hygiene standards
pip install pre-commit
pre-commit installFor whichever of various API services you plan on using, you need to add configurations to .env file. Copy the .env.example as .env and add your KEYS as follows:
# watsonx [Optional]
WATSONX_API_KEY=<WatsonX key goes here>
WATSONX_PROJECT_ID=<Project env variable>
# OpenAI [Optional]
OPENAI_API_KEY=<OPENAI key goes here>
# Azure OpenAI [Optional]
AZURE_OPENAI_API_KEY=<AZURE OPENAI key goes here>
# Antropic [Optional]
ANTHROPIC_API_KEY=<ANTHROPIC key goes here>To test whether you have been successful, run the following operation that references a databuilder.
- Using ollama
Tip
Default settings assumes you have mistral-small3.2 running. Please use following command to run it for an hour
ollama run mistral-small3.2 --keepalive "1h" &python -m fms_dgt.core --task-paths ./tasks/core/simple/logical_reasoning/causal --restart-generation- Using IBM watsonx
Caution
you must set up a WATSONX_API_KEY and WATSONX_PROJECT_ID before using watsonx API service
python -m fms_dgt.core --task-paths ./tasks/core/simple/logical_reasoning/causal --restart-generation --config-path configs/core/watsonx_simple.yamlIf successful, you should see the outputs of the command in the ./output directory
FMS-DGT is currently maintained by Max Crouse, Kshitij Fadnis, Siva Sankalp Patel, and Pavan Kapanipathi.
FMS-DGT has an Apache 2.0 license, as found in the LICENSE file.