/plots/runtime_plots.ipynb
generates all experimental plots for RumbleML runtime plots
/plots/ablation_plots.ipynb
generates all experimental plots for RumbleML ablation study plots
/preprocessing_pipeline
includes the end-to-end scripts for our pipelines in RumbleML and spark.ml to compare. fix_yfcc.rumble
and fix_yfcc_spark.py
are both preprocessing the raw YFCC data and training an ML model afterwards, while fix_yfcc_store_libsvm_spark.py
additionally also stores the data as libsvm file.
rumbleML_scripts_generator
generates shell scripts and rumble scripts for experiments
run_all_experiments.sh
is the shell script for all runtime experiments.
run_all_experiments_ablation.sh
is the shell script for all ablation experiments.
In order to run the experiments within EMR, it might be required to move run_spark.py to the root.
We log experiments through 2>
and 1>
with
bash run_all_experiments 2> time_logs.txt 1> accuracy.txt