NLP-ProbingGPT-2/POS-Tagging/AutoCorrection

This repository contains three sub-projects focused on Natural Language Processing (NLP): ProbingGPT, Postagging, and Autocorrection. Each sub-project addresses different aspects of NLP, utilizing various techniques and algorithms.

ProbingGPT

Overview

The ProbingGPT project, inspired by the methodology outlined in the paper, utilizes the Baukit library to probe the GPT-2 small model downloaded from Hugging Face. Focused on layers h.0.mlp, h.3.mlp, h.9.mlp, and h.9.attn, the project employs the SNLI corpus to generate text. Hidden states from these layers are captured through Baukit probing, and linear classifiers are trained on them. The evaluation aims to assess the model's understanding of contradictions, neutral and entailments and how well the model learns at each layer, all executed in Google Colab for seamless collaboration and execution.

How to Run Results and Evaluation

Import the Jupyter file into Google Colab and follow the provided instructions. All the results and evaluations are present in the same file.

Postagging

Overview

The Postagging sub-project explores part-of-speech tagging using Hidden Markov Models (HMM) with bigram and trigram implementations. The study covers three languages: English, Japanese, and Bulgarian. Additionally, the performance of Vanilla RNN, LSTM, and Bidirectional LSTM models is compared for part-of-speech tagging.

How to Run and Evaluation

For HMM with Viterbi Algo

python3 train_hmm.py data/ptb.2-21.tgs data/ptb.2-21.txt > my.hmm # for training
python3 viterbi.py my.hmm < data/ptb.22.txt > my.out # for viterbi
python3 tag_acc.py data/ptb.22.tgs my.out # for eval

For VRNN, LSTM, BIDLSTM

python3 vrnn_lstm_bidlstm.py data/ptb.2-21.tgs data/ptb.2-21.txt data/ptb.22.tgs data/ptb.22.txt 22_1.out # for training and eval

Postagging Results and Evaluation

Detailed Evaluated results are provided in POSTagging/README.md file.

Autocorrection

Overview

Autocorrection focuses on evaluating different spell correction methods, including bigram, unigram, trigram, smoothed bigram, bigram with backoff, smoothed unigram, and trigram with backoff. The project uses an edit model for autocorrection and compares the performance of each method based on evaluation metrics.

How to Run and Perfomance check

python3 EditModel.py # for sanity check your edit model
python3 SpellCorrect.py # for performance of all the language models

Contributing

If you would like to contribute or report issues, please mail to Mrudhul Guda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-ProbingGPT-2/POS-Tagging/AutoCorrection

Table of Contents

ProbingGPT

Overview

How to Run Results and Evaluation

Postagging

Overview

How to Run and Evaluation

Postagging Results and Evaluation

Autocorrection

Overview

How to Run and Perfomance check

Contributing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
AutoCorrection		AutoCorrection
POSTagging		POSTagging
ProbingGPT		ProbingGPT
.DS_Store		.DS_Store
README.md		README.md

Luci-MG/NLP-ProbingGPT-2.POS-Tagging.AutoCorrection

Folders and files

Latest commit

History

Repository files navigation

NLP-ProbingGPT-2/POS-Tagging/AutoCorrection

Table of Contents

ProbingGPT

Overview

How to Run Results and Evaluation

Postagging

Overview

How to Run and Evaluation

Postagging Results and Evaluation

Autocorrection

Overview

How to Run and Perfomance check

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages