You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm running this code on my own custom dataset for ASTE. I've followed the proper formatting. However, certain reviews will get this message (this occurs in the training process as well): "Ignore error while processing: This hotel was absolute trash, the room was dirty and the staff was rude. But surprisingly the breakfast was good. Error info:list index out of range"
For more context, my max_seq_length was 250 when I trained the model.
Code To Reproduce
from pyabsa import AspectSentimentTripletExtraction as ASTE
hotel_triplet_extractor_deberta = ASTE.AspectSentimentTripletExtractor(r"C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98")
examples = [
"I loved this hotel so much, the beds were comfortable and the staff was very friendly.",
"This hotel was absolute trash, the room was dirty and the staff was rude. But surprisingly the breakfast was good.",
"Location can't be beat ! You’re in the heart of "
]
for example in examples:
hotel_triplet_extractor_deberta.predict(example)
Expected behavior
I expected it to work fine because it is still well below the word limit.
Console Output
[2024-05-15 23:46:59] (2.4.1.post1) Load sentiment classifier from C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98
[2024-05-15 23:46:59] (2.4.1.post1) config: C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98\emcgcn.config
[2024-05-15 23:46:59] (2.4.1.post1) state_dict: C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98\emcgcn.state_dict
[2024-05-15 23:46:59] (2.4.1.post1) model: None
[2024-05-15 23:46:59] (2.4.1.post1) tokenizer: C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98\emcgcn.tokenizer
c:\Users\Dini\anaconda3\envs\tf-gpu\lib\multiprocessing\pool.py:268: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.Pool state=RUN pool_size=1>
_warn(f"unclosed running multiprocessing pool {self!r}",
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2024-05-15 23:47:00] (2.4.1.post1) Set Model Device: cuda:0
[2024-05-15 23:47:00] (2.4.1.post1) Device Name: NVIDIA GeForce RTX 3080 Ti
Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['mask_predictions.LayerNorm.weight', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.dense.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.bias', 'mask_predictions.LayerNorm.bias', 'mask_predictions.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.classifier.weight']
This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
c:\Users\Dini\anaconda3\envs\tf-gpu\lib\site-packages\transformers\convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-05-15 23:47:03] (2.4.1.post1) Batch: 0 {'sentence_id': 0, 'sentence': 'I loved this hotel so much, the beds were comfortable and the staff was very friendly.', 'Triplets': [{'Aspect': 'beds', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'beds', 'Opinion': 'friendly.', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'friendly.', 'Polarity': 'Positive'}], 'True Triplets': []}
[2024-05-15 23:47:03] (2.4.1.post1) Batch: 0 {'sentence_id': 0, 'sentence': 'I loved this hotel so much, the beds were comfortable and the staff was very friendly.', 'Triplets': [{'Aspect': 'beds', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'beds', 'Opinion': 'friendly.', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'friendly.', 'Polarity': 'Positive'}], 'True Triplets': []}
[2024-05-15 23:47:03] (2.4.1.post1) Ignore error while processing: This hotel was absolute trash, the room was dirty and the staff was rude. But surprisingly the breakfast was good. Error info:list index out of range
[2024-05-15 23:47:03] (2.4.1.post1) Ignore error while processing: Location can't be beat ! You’re in the heart of Error info:list index out of range
Screenshots
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered:
@yangheng95
Version
2024-05-10 10:02:22,708 INFO: PyABSA version: 2.4.1.post1
2024-05-10 10:02:22,708 INFO: Transformers version: 4.30.0
2024-05-10 10:02:22,708 INFO: Torch version: 2.3.0+cu121+cuda12.1
Describe the bug
I'm running this code on my own custom dataset for ASTE. I've followed the proper formatting. However, certain reviews will get this message (this occurs in the training process as well): "Ignore error while processing: This hotel was absolute trash, the room was dirty and the staff was rude. But surprisingly the breakfast was good. Error info:list index out of range"
For more context, my max_seq_length was 250 when I trained the model.
Code To Reproduce
from pyabsa import AspectSentimentTripletExtraction as ASTE
hotel_triplet_extractor_deberta = ASTE.AspectSentimentTripletExtractor(r"C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98")
examples = [
"I loved this hotel so much, the beds were comfortable and the staff was very friendly.",
"This hotel was absolute trash, the room was dirty and the staff was rude. But surprisingly the breakfast was good.",
"Location can't be beat ! You’re in the heart of "
]
for example in examples:
hotel_triplet_extractor_deberta.predict(example)
Expected behavior
I expected it to work fine because it is still well below the word limit.
Console Output
[2024-05-15 23:46:59] (2.4.1.post1) Load sentiment classifier from C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98
[2024-05-15 23:46:59] (2.4.1.post1) config: C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98\emcgcn.config
[2024-05-15 23:46:59] (2.4.1.post1) state_dict: C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98\emcgcn.state_dict
[2024-05-15 23:46:59] (2.4.1.post1) model: None
[2024-05-15 23:46:59] (2.4.1.post1) tokenizer: C:\Users\Dini\Desktop\tripadvisor.com - Copy\checkpoints\aste\microsoft\deberta-v3-base\emcgcn_custom_dataset_f1_67.98\emcgcn.tokenizer
c:\Users\Dini\anaconda3\envs\tf-gpu\lib\multiprocessing\pool.py:268: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.Pool state=RUN pool_size=1>
_warn(f"unclosed running multiprocessing pool {self!r}",
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2024-05-15 23:47:00] (2.4.1.post1) Set Model Device: cuda:0
[2024-05-15 23:47:00] (2.4.1.post1) Device Name: NVIDIA GeForce RTX 3080 Ti
Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['mask_predictions.LayerNorm.weight', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.dense.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.bias', 'mask_predictions.LayerNorm.bias', 'mask_predictions.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.classifier.weight']
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
c:\Users\Dini\anaconda3\envs\tf-gpu\lib\site-packages\transformers\convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-05-15 23:47:03] (2.4.1.post1) Batch: 0 {'sentence_id': 0, 'sentence': 'I loved this hotel so much, the beds were comfortable and the staff was very friendly.', 'Triplets': [{'Aspect': 'beds', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'beds', 'Opinion': 'friendly.', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'friendly.', 'Polarity': 'Positive'}], 'True Triplets': []}
[2024-05-15 23:47:03] (2.4.1.post1) Batch: 0 {'sentence_id': 0, 'sentence': 'I loved this hotel so much, the beds were comfortable and the staff was very friendly.', 'Triplets': [{'Aspect': 'beds', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'beds', 'Opinion': 'friendly.', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'comfortable', 'Polarity': 'Positive'}, {'Aspect': 'staff', 'Opinion': 'friendly.', 'Polarity': 'Positive'}], 'True Triplets': []}
[2024-05-15 23:47:03] (2.4.1.post1) Ignore error while processing: This hotel was absolute trash, the room was dirty and the staff was rude. But surprisingly the breakfast was good. Error info:list index out of range
[2024-05-15 23:47:03] (2.4.1.post1) Ignore error while processing: Location can't be beat ! You’re in the heart of Error info:list index out of range
Screenshots
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered: