FinTwitBERT is a language model specifically trained to understand and analyze financial conversations on Twitter. It's designed to pick up on the unique ways people talk about finance online, making it a valuable tool for anyone interested in financial trends and sentiments expressed through tweets.
Understanding financial markets can be challenging, especially when analyzing the vast amount of opinions and discussions on social media. FinTwitBERT is here to make sense of financial conversations on Twitter. It's a specialized tool that interprets the unique language and abbreviations used in financial tweets, helping users gain insights into market trends and sentiments.
This model was developed to fill a gap in traditional language processing tools, which often struggle with the shorthand and jargon found in financial tweets. Whether you're a financial professional, a market enthusiast, or someone curious about financial trends on social media, FinTwitBERT offers an easy-to-use solution to navigate and understand these discussions.
FinTwitBERT utilizes a diverse set of financial tweets for pre-training, including Taborda et al.'s Stock Market Tweets Data with over 940K tweets, and our dataset, Financial Tweets, with detailed statistics provided below.
For finetuning, we use several datasets, each offering varied sentiments in financial contexts. A collection of real-world, labeled datasets can be found on Huggingface. On top of that, we also created a synthetic dataset containing 1.43M tweets and corresponding sentiment labels. You can find that dataset here.
FinTwitBERT is based on FinBERT with added masks for user mentions (@USER
) and URLs ([URL]
). The model is pre-trained for 10 epochs with a focus on minimizing loss and applying early stopping to prevent overfitting.
Access the pre-trained model and tokenizer at FinTwitBERT on HuggingFace. For the fine-tuned version, visit FinTwitBERT-sentiment on HuggingFace.
# Clone this repository
git clone https://github.com/TimKoornstra/FinTwitBERT
# Install required packages
pip install -r requirements.txt
We offer two models: FinTwitBERT and FinTwitBERT-sentiment. The first is a pre-trained model and tokenizer for masked language modeling (MLM) which can be finetuned for other tasks such as sentiment analysis. This is what the second model is about, it is fine-tuned on sentiment analysis and labels tweets into three categories: bearish, neutral, and bullish.
from transformers import pipeline
pipe = pipeline(
"fill-mask",
model="StephanAkkerman/FinTwitBERT",
)
print(pipe("Bitcoin is a [MASK] coin."))
from transformers import pipeline
pipe = pipeline(
"sentiment-analysis",
model="StephanAkkerman/FinTwitBERT-sentiment",
)
print(pipe("Nice 9% pre market move for $para, pump my calls Uncle Buffett 🤑"))
If you would like to train this model yourself and report the metrics to weights and biases (wandb.ai). You can do so by adding a wandb.env file with the following content: WANDB_API_KEY=your_wandb_api_key
.
If you use FinTwitBERT or FinTwitBERT-sentiment in your research, please cite us as follows, noting that both authors contributed equally to this work:
@misc{FinTwitBERT,
author = {Stephan Akkerman, Tim Koornstra},
title = {FinTwitBERT: A Specialized Language Model for Financial Tweets},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/TimKoornstra/FinTwitBERT}}
}
@misc{FinTwitBERT-sentiment,
author = {Stephan Akkerman, Tim Koornstra},
title = {FinTwitBERT-sentiment: A Sentiment Classifier for Financial Tweets},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/StephanAkkerman/FinTwitBERT-sentiment}}
}
Contributions are welcome! If you have a feature request, bug report, or proposal for code refactoring, please feel free to open an issue on GitHub. We appreciate your help in improving this project.
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.