This was a proprietary project so input and output data are unavailable at this time. Code has been generalized to showcase work.
Company team was looking to analyze issues/problems/exceptions manually documented from anyone who was having issues with equipment/plant/etc. Each report was 500+ words and was compiled into one excel file for complete analysis. The compiled excel file ultimately contained 3000+ rows of reports with each row containing the individual report of 500+ words (each row contained each issue/problem/exception report along with associated claim id).
The company team was spending 1-2 weeks to manually analyze these reports by manually searching keywords in the compiled excel, manually tracking the number of occurences of each keyword, manually identifying any commonalities or anomalies between the issues/problems/exceptions, etc. This process took 1-2 weeks.
I created an algorithm in Python that fully automated this process in 1-2 minutes. My algorithm automatically extracts keywords, tags the keywords, and summarizes the keywords in each report while also linking the keywords/tags with the corresponding claim id and lengthy 500+ issue/problem/exception report.
This algorithm is extendable to other similar use cases.