BYU-DML machine learning algorithms or primitives created for DARPA's D3M project. These primitives are wrapped to fit within the D3M ecosystem.
Extracts metafeatures from tabular data.
Imputes missing values in tabular data by randomly sampling other known values from the same column.
- Clone this repository.
cdinto it. Clone the primitives git submodulegit submodule update --init --recursive. cdinto thesubmission/primitivesdirectory, then verify it is synced to the current state of the D3M primitives repo source, and not just to the BYU's fork. See configuring a remote fork and syncing a fork.- Create a
.envfile in the repo root. Populate theDATASETSandWORKER_IDenvironment variables.DATASETSshould be a path to the root folder of the D3M datasets repo.WORKER_IDshould be the ID uniquely identifying the machine any pipelines are run on.
- Update the primitives submodule (skip if the submodule was just cloned)
git submodule update --recursive. - Update the primitives submodule
cd submission/primitives- Pull the master branch of the parent repository into the byu-dml branch
git pull https://gitlab.com/datadrivendiscovery/primitives
- Update
Dockerfilewith the latest tag from D3M. Pull the image and start the container:docker-compose up -d --build. Note that the image will change, but the tag will not, as primitive authors submit their primitives. When this happens, one solution is to delete the image withdocker rmi <image id>and pull it again. - Update the primitives, if necessary. At the least you'll likely need to update the dependencies in this repo to honor the dependencies and their version ranges found in the D3M core package. Be sure to update the version numbers in
byudml/__init__.py. - Next, to run the tests, generate the primitive json files, generate the pipelines, and run the pipelines:
- Execute
docker exec -it test-d3m-primitives bashto enter the docker container. ./run_tests.shThis command will verify that nothing is broken, generate new pipeline and primitive jsons with updated digests and versions, run the pipelines, and place them in the correct folder in the submodule of theprimitivesrepo. NOTE: Verify that the glob pattern insubmission.utils.get_new_d3m_pathwill correctly capture the D3M version in theprimitivessubmodule.exit
- Execute
- Commit the updated primitive jsons and pipelines in the submodule i.e. our fork of the D3M primitives repo. Note: Do not commit straight to the master branch, but to a branch that semantically represents the new D3M package version and our organization.
- Update this repo by committing the changes to the submodule
git add submission/primitives/andgit add,git commit, andgit push. - Release this package.
- Push the primitives submodule
git push origin byu-dml(push to https://gitlab.com/byu-dml/primitives) and verify that the CI passes. If this fails, start over at step 4. NOTE: this package must be released before it can be tested with CI. - Create a merge request from the byu-dml branch of https://gitlab.com/byu-dml/primitives to the master branch of https://gitlab.com/datadrivendiscovery/primitives.