Duplicate Bug Report Detection System

Background

As software programs become increasingly large and complex, it is important to improve the quality of software maintenance. Bug report recommendations can significantly improve the triaging of bug reports. It is difficult to inspect the new incoming reports manually to route to the developers who have fixed the duplicate bugs. Automatic identification of Duplicate bug reports is a critical research problem in the software repositories’ mining area.

Aim

The project aims to propose an effective unsupervised and supervised models to detect duplicate bug report in the Bugzilla repository. The search engine finds the top-N most similar reports to a given report, and deduplicate issues faster. Moreover, it presents an analytical dashboard to developers to understand the different aspects of the bug reports’ statistics and major sources of bug generation.

ETL Process

The search engine extracts data using the Bugzilla REST API https://wiki.mozilla.org/Bugzilla:REST_API and creates a data-lake using MongoDB. Then it verifies the data quality and conducts data wrangling and cleaning.

Data Preparation

Since the data is considered as big data the engine loads the data to Hadoop HDFS and performs text preprocessing using PySpark which includes:

Converting text to lowercase
Splitting the words into 3 steps using
1. ASCII character identification for English
2. split by space
3. Wordninja
Applying normalizes
Applying contractions or expansions
Removing punctuations, tags, special characters, digits
Stemming
Lemmatization.

Then it stores the processed data in PostgreSQL.

Evaluation

The empirical evaluation is performed on the open datasets of the Bugzilla repository. The metrics used for evaluation are Mean Average Precision (MAP), Mean Reciprocal Rank (MRR) and Recall rate.

Visualization and Presentation

The top-N most similar reports to a given report are presented on a web page using Flask. Also, It presents the developer the statistical information about the bug reports in a dashboard using D3.js

Implementation Method

Implementation Method: The search engine is implemented on AWS using Docker Composer and ECS with Fargate

Installation Guide

System Requirement

You need to have at least 10GB of FREE RAM available to run this application.

To run the application there are two options:

Use the docker image
Use the source code

Implement the Application Using Docker

Please follow the steps below:

Install docker CE from https://docs.docker.com/install/linux/docker-ce/ubuntu/
Install docker compose from https://docs.docker.com/compose/install/
Create a vim file with name docker-compose.yml where you want to run the application using the content below:

version: '3'
services:
  web:
    build: .
    image: applia65/duplicatebugreportsearchengine:web
    # restart: always
    environment:
      DATABASE_HOST: postgres_docker
      DATABASE_USER: postgres
      DATABASE_PASSWD: password123
      DATABASE_DATABSE_NAME: bug_database
      MONGO_ADDRESS: mongodb://mongodb_docker:27017/
    ports:
      - "0.0.0.0:5000:5000"
    depends_on:
      - postgres_docker
      - mongodb_docker

  postgres_docker:
    image: postgres:10
    # restart: always
    # # Allow access from Development machine
    # ports:
    #  - "0.0.0.0:5432:5432"
    volumes:
      - ./pg_data/:/var/lib/postgresql/data:Z
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password123
      POSTGRES_DB: bug_database

  mongodb_docker:
    image: mongo:latest
    #restart: always
    #environment:
    #  MONGO_INITDB_ROOT_USERNAME: root
    #  MONGO_INITDB_ROOT_PASSWORD: example

Execute the code below where you create docker-compose.yml and wait until your server downloads all containers.

sudo docker-compose pull

Run the application.

sudo docker-compose up

Open your browser using 0.0.0.0:5000 address.

Implement the Application Using the Source Code Ubuntu

Please follow the steps below:

Pull the codes by git
Install all requirements

pip install --no-cache-dir -r requirements.txt

Install the WordNet which is about 900 MB

python -m spacy download en_core_web_lg

Install MongoDB from https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
Run the MongoDB service

sudo systemctl start mongod

Verify that MongoDB has started successfully.

sudo systemctl status mongod

Install PostgreSQL Version 10 from https://www.postgresql.org/download/linux/debian/
Run the Postgresql service

sudo systemctl start postgresql

Verify that Postgresql has started successfully.

sudo systemctl status postgresql

Create a user postgres with password password123
Run the application

python ./main.py

Open your browser using 0.0.0.0:5000 address.

Please contact us if you have any question.

- Alireza Ghasemieh `[email protected]`

- Sukhjit Singh Sehra `[email protected]`

Special thanks to Sukhjit for helping me with this project.

References:

POSTER: DWEN: DeepWord Embedding Network for Duplicate Bug Report Detection in Software Repositories, Amar Budhiraja, 2018 ACM/IEEE 40th International Conference on Software Engineering: Companion Proceedings
Preventing duplicate bug reports by continuously querying bug reports, Abram Hindle, Empirical Software Engineering, https://doi.org/10.1007/s10664-018-9643-4
A comparative study of the performance of IR models on duplicate bug detection, Nilam Kaushik, 2012 16th European Conference on Software Maintenance and Reengineering
Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling, Anh Tuan Nguyen, ASE ’12, September 3-7, 2012, Essen, Germany
Studying the needed effort for identifying duplicate issues, Mohamed Sami Rakha, Empir Software Eng, DOI 10.1007/s10664-015-9404-6
Revisiting the Performance Evaluation of Automated Approaches for the Retrieval of Duplicate Issue Reports, Mohamed Sami Rakha, DOI 10.1109/TSE.2017.2755005, IEEE Transactions on Software Engineering
Detection of Duplicate Defect Reports Using Natural Language Processing, Per Runeson, 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07
Rediscovery Datasets: Connecting Duplicate Reports, Mefta Sadat, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)
An Approach to Detecting Duplicate Bug Reports using Natural Language and Execution Information, Xiaoyin Wang, ICSE’08, May 10–18, 2008, Leipzig, Germany.
Detecting Duplicate Bug Report Using Character N-Gram-Based Features, Ashish Sureka, 2010 Asia Pacific Software Engineering Conference
Towards More Accurate Retrieval of Duplicate Bug Reports, Chengnian Sun
A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval, Chengnian Sun, ICSE’10, May 2–8, 2010, Cape Town, South Africa
Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports, Xinli Yang, 2016 IEEE 27th International Symposium on Software Reliability Engineering

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.idea/codeStyles		.idea/codeStyles
Modules		Modules
image		image
static/css		static/css
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
current_bug_id.txt		current_bug_id.txt
docker-compose.yml		docker-compose.yml
execution_flag_word2vec.txt		execution_flag_word2vec.txt
main.py		main.py
max_id.txt		max_id.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duplicate Bug Report Detection System

Background

Aim

ETL Process

Data Preparation

Evaluation

Visualization and Presentation

Implementation Method

Installation Guide

System Requirement

Implement the Application Using Docker

Implement the Application Using the Source Code Ubuntu

Please contact us if you have any question.

- Alireza Ghasemieh `[email protected]`

- Sukhjit Singh Sehra `[email protected]`

Special thanks to Sukhjit for helping me with this project.

References:

About

Releases

Packages

Contributors 2

Languages

License

ghasemieh/Duplicated-Bug-Report-Detection-System

Folders and files

Latest commit

History

Repository files navigation

Duplicate Bug Report Detection System

Background

Aim

ETL Process

Data Preparation

Evaluation

Visualization and Presentation

Implementation Method

Installation Guide

System Requirement

Implement the Application Using Docker

Implement the Application Using the Source Code Ubuntu

Please contact us if you have any question.

- Alireza Ghasemieh [email protected]

- Sukhjit Singh Sehra [email protected]

Special thanks to Sukhjit for helping me with this project.

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

- Alireza Ghasemieh `[email protected]`

- Sukhjit Singh Sehra `[email protected]`

Packages