Generating Authentic Grounded Synthetic Maintenance Work Orders (MWOs)

This repository contains the data, data analysis, code, and documentation for the Honours project titled "Generating Authentic Grounded Synthetic Maintenance Work Orders (MWOs)" by Allison Lau (2024). The project aims to generate synthetic MWOs that are grounded in real-world data and authentic to the domain of maintenance engineering. The project is supervised by Prof. Melinda Hodkiewicz, Dr. Caitlin Woods, and Dr. Michael Stewart

Overview

Maintenance Work Orders (MWOs) are technical short texts documenting equipment conditions and failures, often containing confidential data, making real-world datasets scarce for machine learning. To address this, this research generates synthetic MWO sentences using a graph database to query equipment-failure relationships and Large Language Models (GPT-4o mini). The generated data mimics real MWOs by incorporating industry-specific jargon and grammar.

Datasets

This datasets used in this research are describe in the DATASETS section of the repository. The datasets used/analysed in this project are:

Installation

To install the required packages for the project, run the following steps:

Create a virtual environment using python -m venv venv
Activate the virtual environment using venv/Scripts/activate (Windows)
Install the required packages using pip install -r requirements.txt

Synthetic Data Generation Pipeline

The synthetic data generation pipeline consists of three main steps:

Equipment-Failure Path Extraction
MWO Sentence Generation via LLM
MWO Sentence Humanisation via Rule-based Approach

The code for each step can be found in the respective directories in the repository.

Equipment-Failure Path Extraction

The code for extracting equipment-failure paths from the MaintIE Knowledge Graph can be found in the PathExtraction directory. The following steps are performed:

Types of Paths

We analysed the MaintIE gold standard dataset using a Neo4j graph database to develop paths that illustrate relationships between equipment (PhysicalObject) and failure modes (UndesirableEvent). These paths are classified as direct, showing immediate connections, or complex, involving intermediary entities. Additional paths are created by using hierarchical relations between equipment entities. This framework improves our understanding of component relationships within MWOs and informs the synthetic data generation process.

Load MaintIE Dataset into Neo4j Knowledge Graph

Make sure you have Neo4j Desktop installed on your machine.
Open Neo4j Desktop, create a project and an instance of a graph database.
Create a .env file in the root directory and add the following environment variables:

NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="password"

Replace the NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD with your Neo4j instance details.
Start the database and open the browser.
Run python maintie_to_kg.py to load the MaintIE dataset into Neo4j. Note: This will take a while to load the dataset into Neo4j.

Extracting paths (equipment and undesirable event combination) from Neo4j

Queries for extracting paths are stored in path_queries.py.
Run python path_matching.ipynb to extract paths from Neo4j.
Different paths are stored in their respective json files in path_patterns directory.
Analysis of paths can be found - total number of paths, frequency of equipment, frequency of undesirable events, frequency of inherent function of PhysicalObjects.

MWO Sentence Generation via LLM

The code for generating synthetic MWO sentences using LLM can be found in the Generate directory.

Generate Synthetic MWO Sentences

Create a .env file in the root directory and add the following environment variables:

API_KEY="your_openai_api_key"

Run python llm_generate.py to generate synthetic MWO sentences using GPT-4o mini.

Function used to generate synthetic MWO sentences: generate_mwo() and generate_diverse_mwo()
Generated synthetic MWO sentences are stored in the mwo_sentences directory. There is a log file (log.txt) detailing the given equipment + failure mode and the generated sentences. There is also a csv file (order_synthetic.csv) containing just the generated synthetic MWO sentences.
You can alter the number of path samples by changing the num_samples parameter in get_samples() function. You can also choose to exclude certain path types by including their path names (json file) in the exclude list in get_samples() function.

Note: More documentation details for function implementations can be found in the DOCUMENTATION section of the repository.

MWO Sentence Humanisation via Rule-based Approach

The code for humanising synthetic MWO sentences can be found in the Humanise directory.

Humanise Synthetic MWO Sentences

Run python humanise.py to test humanising synthetic MWO sentences using a rule-based approach.

Function used to humanise synthetic MWO sentences: humanise_sentence()

Note: More documentation details for function implementations can be found in the DOCUMENTATION section of the repository.

Evaluation

The code for evaluating the synthetic MWO sentences can be found in the Evaluation directory. More details on the evaluation can be found in the EVALUATION section of the repository. The following evaluations are performed:

Paths and Synthetic MWO Analysis
Turing Test
Ranking Test (replicated from Bikaun et al. 2022)

Synthetic MWOs Files

The synthetic MWOs generated over the course of the project can be found in different files:

Generate/mwo_sentences/order_synthetic.txt: synthetic MWO sentences generated, including the inherent function of the equipment, the equipment, and the failure mode
Generate/mwo_sentences/synthetic.txt: shuffled version of above
Generate/mwo_sentences/log.txt: logs of MWO sentences generated, including the path (equipment + failure mode) and the number of sentences generated
Evaluation/Turing2/synthetic_generate_v2.txt: synthetic MWO sentences generated for the Turing Test evaluation
Evaluation/Turing2/synthetic_humanise_v2.txt: humanised synthetic MWO sentences for the Turing Test evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
DataAnalysis		DataAnalysis
Evaluation		Evaluation
Generate		Generate
Humanise		Humanise
Images		Images
PathExtraction		PathExtraction
data		data
DATASETS.md		DATASETS.md
DOCUMENTATION.md		DOCUMENTATION.md
EVALUATION.md		EVALUATION.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating Authentic Grounded Synthetic Maintenance Work Orders (MWOs)

Table of Contents

Overview

Datasets

Installation

Synthetic Data Generation Pipeline

Equipment-Failure Path Extraction

Types of Paths

Load MaintIE Dataset into Neo4j Knowledge Graph

Extracting paths (equipment and undesirable event combination) from Neo4j

MWO Sentence Generation via LLM

Generate Synthetic MWO Sentences

MWO Sentence Humanisation via Rule-based Approach

Humanise Synthetic MWO Sentences

Evaluation

Synthetic MWOs Files

About

Releases

Packages

Languages

nlp-tlp/LLM-KG-Synthetic-MWO

Folders and files

Latest commit

History

Repository files navigation

Generating Authentic Grounded Synthetic Maintenance Work Orders (MWOs)

Table of Contents

Overview

Datasets

Installation

Synthetic Data Generation Pipeline

Equipment-Failure Path Extraction

Types of Paths

Load MaintIE Dataset into Neo4j Knowledge Graph

Extracting paths (equipment and undesirable event combination) from Neo4j

MWO Sentence Generation via LLM

Generate Synthetic MWO Sentences

MWO Sentence Humanisation via Rule-based Approach

Humanise Synthetic MWO Sentences

Evaluation

Synthetic MWOs Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages