Skip to content

priscillaoclark/15.S08-applied-nlp-final

Repository files navigation

Thematic Analysis of Banking Regulatory Changes Following the Collapse of Silicon Valley Bank

MIT License


Logo

NLP-Driven Analysis of Banking Policies

Topic Modeling of Regulatory Documents using Natural Language Processing

About

The 2023 Silicon Valley Bank (SVB) collapse had a strong influence on financial regulations. Our project aims to utilize natural language processing topic modeling techniques to identify primary themes in our corpora of financial regulatory documents scraped from regulations.gov. The project objective is to identify and visualize topics, reveal shifts in regulatory focus, and how topics shift with market trends.

In this project, we compare the texts of proposed and implemented regulations in a 36-month window surrounding the SVB collapse - 18 months pre, and 18 months post. Methods include naive keyword counts as well as basic and advanced topic modeling techniques (TF-IDF, BERTopic).

We expect to see increased scrutiny on mid-sized banks vs. the historical focus on Global Systematically Important Banks (G-SIBs). We also expect key themes to include increased capital requirements, liquidity risk, and discussions around the appropriate levels of FDIC deposit insurance. Uncovering the root causes of the SVB collapse and its impact on regulatory trends will allow for better impact mitigation should similar crises arise. Experimental results show strong statistical evidence confirming the hypothesis. Detailed findings from this work are available in the final report.

Built With

The main packages used in our project:

  • pandas
  • numpy
  • os
  • re
  • json
  • sklearn
  • nltk
  • gensim
  • matplotlib.pyplot
  • openai
  • keybert
  • rake_nltk
  • yake
  • spacy
  • bertopic
  • umap

Methods

  • Keyword Identification & Extraction
  • Topic Modeling (TF-IDF, BERTopic)

Technologies

  • Python

Getting Started

The corpora of documents used in our project can be accessed in our documents folder. Implementation and method-specific preprocessing for our three primary methods are done within their respective folders (naive_model, TF-IDF - within keywords folder, BERTopic).

Prerequisites

  • pandas (A data manipulation and analysis library providing data structures like DataFrames for Python.)
  • numpy (A library for numerical computing in Python, providing support for large, multi-dimensional arrays and matrices.)
  • scikit-learn (A machine learning library for Python, offering tools for classification, regression, clustering, and dimensionality reduction.)
  • nltk (The Natural Language Toolkit, a platform for building Python programs to work with human language data.)
  • gensim (A library for topic modeling and document similarity analysis in Python.)
  • bertopic (A topic modeling library that leverages BERT embeddings for creating interpretable topics.)
  • PyTorch (An open-source machine learning framework for deep learning.)
  • scipy (A library for scientific computing in Python.)

Usage

Each of the python script files serve separate purposes and can be used for keyword extraction or to topic model our corpus of regulatory documents. Sample visualizations can be found in the figures folder.

Team Members

Name Handle
Priscilla Clark @priscillaoclark
Nicholas Wong @nicwjh
Harsh Kumar @harshk02
Elaine Zhang @ElainehxZhang

License

Distributed under the MIT License - LICENSE.

Repository Link: [https://github.com/priscillaoclark/15.S08-applied-nlp-final)

Acknowledgements

We would like to thank Mike Chen, Andrew Zachary, and Chengfeng Mao for their help and guidance throughout this project. The exceptional learning environment and resources provided by the Massachusetts Institute of Technology (MIT) have also been instrumental in shaping this work.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •