This repository focuses on automating Named Entity Recognition (NER) for electrical engineering texts using transformer-based encoder models. We will be making use of Electrical Engineering NER Dataset - ElectricalNER for fine-tuning BERT family models, and then use tools to push the models to the Hugging Face Hub. The project enables efficient entity extraction from technical texts, streamlining tasks like document analysis, data organization, and semantic search.
- Fine-Tuning Pipeline: Implements a complete pipeline for fine-tuning models like BERT and ModernBERT.
- Model Evaluation: Includes detailed metrics like precision, recall, F1, and accuracy.
- NER Utilities: Provides tools for post-processing NER results.
- Hugging Face Integration: Pushes fine-tuned models to the Hugging Face Hub with detailed model cards.
├── data/ # Contains tokenized datasets
├── models/ # Fine-tuned models
├── logs/ # Training and evaluation logs
├── notebooks/ # Jupyter notebooks for various stages of the pipeline
│ ├── 01_data_tokenization.ipynb # Tokenizing and preparing the dataset
│ ├── 02_model_training.ipynb # Fine-tuning transformer models
│ ├── 03_evaluation.ipynb # Evaluating model performance
│ ├── 04_inference_local.ipynb # Performing inference on unseen data - local models
│ ├── 05_push_model_to_hub.ipynb # Pushing models and cards to Hugging Face Hub
│ ├── 04_inference_hf.ipynb # Performing inference on unseen data - models that has been pushed to hub
├── utilities/ # Helper scripts and constants
│ ├── __init__.py # Initialize utilities as a package
│ ├── constants.py # Configuration and constants
│ ├── helper.py # Utility functions for NER and Hugging Face integration
├── README.md # Project documentation (this file)
-
Python 3.10+
-
Create conda environment:
conda create -n electrical_ner python=3.11 conda activate electrical_ner
-
Install required libraries:
pip install -r requirements.txt
-
Hugging Face Token:
- Create an account on Hugging Face.
- Generate a personal access token from your account settings.
- Add the Hugging Face token to a
.env
file:
HF_TOKEN=your_hugging_face_token
- Ensure the repository is structured as described above.
-
Dataset Preparation:
- Run
01_data_tokenization.ipynb
to tokenize and prepare the dataset.
- Run
-
Model Training:
- Execute
02_finetuning.ipynb
to fine-tune models on the electrical NER dataset.
- Execute
-
Model Evaluation:
- Use
03_evaluation.ipynb
to evaluate model performance.
- Use
-
Model Inference - Local Models:
- Use
04_inference_local.ipynb
to test the fine-tuned models on unseen data on locally saved models.
- Use
-
Model Upload to Hugging Face Hub:
- Use
05_push_to_hub.ipynb
to push models and model cards to the Hugging Face Hub.
- Use
-
Model Inference - Hugging Face Models:
- Use
06_inference_hf.ipynb
to test the fine-tuned models on unseen data that has been pushed to the hub.
- Use
Evaluation Metric Plots
Final Metrics Comparison
Refer to the Medium article for in-depth analysis of these obtained results.
After deploying a model to the Hugging Face Hub, use the following code snippet for inference:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
from utilities import clean_and_group_entities
model_name = "disham993/electrical-ner-ModernBERT-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "The Xilinx Vivado development suite was used to program the Artix-7 FPGA."
ner_results = nlp(text)
cleaned_results = clean_and_group_entities(ner_results)
print(cleaned_results)
The following models are fine-tuned and available on the Hugging Face Hub:
Model | Repository | Description |
---|---|---|
BERT Base | Link | Lightweight model for NER. |
BERT Large | Link | High-accuracy model for NER. |
DistilBERT Base | Link | Efficient model for quick tasks. |
ModernBERT Base | Link | Advanced base model. |
ModernBERT Large | Link | High-performance NER model. |
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature-name
). - Commit your changes (
git commit -m 'Add feature'
). - Push to the branch (
git push origin feature-name
). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or suggestions, feel free to reach out:
- Name: Isham Rashik
- Email: [email protected]
- Hugging Face Profile: disham993
Let’s revolutionize electrical engineering with state-of-the-art NLP! ⚡