Skip to content

AutoPruner: Transformer-based Call Graph Pruning (ESEC/FSE 2022, Research Track)

Notifications You must be signed in to change notification settings

soarsmu/AutoPruner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚙️AutoPruner✂️

by Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, Quyet Thang Huynh


Welcome to the source code repo of AutoPruner, a LLM-based call graph pruning tool introduced in our paper "AutoPruner: Transformer-based Call Graph Pruning"!

📃 Overview

If you are interested in our work, please refer to our overview for more details.

🏁 Repository Organization

The structure of our source code's repository is as follows:

  • config: contains our experimental configurations;
  • script: contains script for running our experiments;
  • src: contains our source code.
    • finetune: contains source code for fine-tuning phase
    • training: contains source code for training phase
    • utils: contains source code for utility functions, e.g., logger, visualization, ...
    • gnn: contains source code for gnn benchmark
    • Note that, for each sub-folder in this folder, main.py, dataset.py, model.py contains the source code of training/testing, dataset processing and deep learning models, respectively;
  • environment.yml: contains the configuration for AutoPruner's enviroment.

The structure of our data's repository is as follows:

  • dl_dataset: contains our processed dataset for AutoPruner;
  • gnn_dataset: contains our processed dataset for GNN benchmark;
  • gnn_model: contains our trained models for GNN benchmarks;
  • info_data: contains the lists of training and testing programs;
  • model: contains our trained models for AutoPruner;
  • npe_result: contains the results of manual evaluation for Null-pointer analysis;
  • processed_data: contains extracted source code for methods in programs in cgPruner's dataset
  • raw_data: contains the static call graphs generated by static analysis tools from cgPruner

🔧 Installations

Requirements

Hardware

  • More than 200GB disk space
  • 2 NVIDIA GPU that CUDA 11.3; supports and have at least 8GB memory.

Software

  • Ubuntu 18.04 or newer
  • Docker/Conda

Environment Configuration

Conda

conda env create -n autopruner --file environment.yml

Docker

For ease of use, we also provide a installation package via a docker image. User can setup AutoPruner's docker step-by-step as follows:

  • Pull AutoPruner's docker image:
docker pull thanhlecong/autopruner:v2
  • Run a docker container:
docker run --name autopruner -it --shm-size 16G --gpus all thanhlecong/autopruner:v2
  • Activate conda:
source /opt/conda/bin/activate
  • Activate AutoPruner's conda enviroment:
conda activate autopruner

Note that, the source code of AutoPruner are stored at /workspace/ in Docker. So, please move to this folder before running experiments.

🚀 Usage

To use our tool, please use the following command

python3 -m src.training.main --config_path [config path]
                             --mode [mode: test or train] 
                             --feature [type of features: 0: structure, 1: semantic, 2:combine] 
                             --model_path [path to saved model (for saving in train mode and loading in test mode)]

📁 Artifact

To replicate the result of AutoPruner, please down the data from our replication package and put in the same folder with this repository, then run following below instructions. Note that, our results may be slightly different when running on different devices. However, this diffences does not affect our findings in the paper.

RQ1

To replicate the result of AutoPruner in call graph pruning on Wala (RQ1), please use

bash script/rq1_wala.sh

To replicate the result of AutoPruner in call graph pruning on Doop (RQ1), please use

bash script/rq1_doop.sh

To replicate the result of AutoPruner in call graph pruning on Petablox (RQ1), please use

bash script/rq1_peta.sh

RQ2

Null-pointer Analysis

In this analysis, we follow the experimental settings of cgPruner including their code of Null-pointer Analysis (NPA). Please refer to cgPruner's replication package for further instructions. You also can find our manual evaluation in npe_result folder in this link

Monomorphic Call-site Detection

To replicate the result of AutoPruner in monomorphic call-site detection on Wala's call graph (RQ1), please use

bash script/rq2_wala.sh

To replicate the result of AutoPruner in monomorphic call-site detection on Doop's call graph (RQ1), please use

bash script/rq2_doop.sh

To replicate the result of AutoPruner in monomorphic call-site detection on Petablox's call graph (RQ1), please use

bash script/rq2_peta.sh

RQ3

To replicate the ablation study of AutoPruner with strutural features, please use

bash script/rq3_structure.sh

To replicate the ablation study of AutoPruner with semantic features, please use

bash script/rq3_semantic.sh

To replicate the ablation study of AutoPruner with caller function, please use

bash script/rq3_caller.sh

To replicate the ablation study of AutoPruner with callee function, please use

bash script/rq3_callee.sh

📜 Citation

If you use our tool, please cite our paper as follows:

@inproceedings{le2022autopruner,
  title={AutoPruner: transformer-based call graph pruning},
  author={Le-Cong, Thanh and Kang, Hong Jin and Nguyen, Truong Giang and Haryono, Stefanus Agus and Lo, David and Le, Xuan-Bach D and Huynh, Quyet Thang},
  booktitle={Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  pages={520--532},
  year={2022}
}

About

AutoPruner: Transformer-based Call Graph Pruning (ESEC/FSE 2022, Research Track)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published