Skip to content

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction #406

@Surajhulketa

Description

@Surajhulketa

Title: Integrate Custom Dataset from Databricks for Aspect Sentiment Triplet Extraction
Description
I am trying to integrate a custom dataset stored in Databricks for Aspect Sentiment Triplet Extraction (ASTE) using the pyabsa library. However, I am encountering an error related to dataset loading. Below are the details of my implementation and the issues I am facing.

Code Implementation
python
Copy code
from pyabsa import (
ModelSaveOption,
DeviceTypeOption,
DatasetItem,
)

from pyabsa import AspectSentimentTripletExtraction as ASTE
import pandas as pd

if name == "main":
config = ASTE.ASTEConfigManager.get_aste_config_english()
config.max_seq_len = 120
config.log_step = -1
config.pretrained_bert = "bert-base-chinese"
config.num_epoch = 100
config.learning_rate = 2e-5
config.use_amp = True
config.cache_dataset = True
config.spacy_model = "zh_core_web_sm"

# Load dataset from Databricks
dataset_path = "datasets/atepc_datasets/300.vokols/vokols.test.txt.atepc'"
dataset = '300.vokols'


trainer = ASTE.ASTETrainer(
    config=config,
    dataset=dataset,
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    auto_device=True,
)
triplet_extractor = trainer.load_trained_model()

examples = [
    "I love this laptop, it is very good.",
    "I hate this laptop, it is very bad.",
    "I like this laptop, it is very good.",
    "I dislike this laptop, it is very bad.",
]
for example in examples:
    prediction = triplet_extractor.predict(example)
    print(prediction)

Error Encountered
vbnet
Copy code
ValueError: Cannot find dataset: 300.vokols, you may need to remove existing integrated_datasets and try again. Please note that if you are using keywords to let findfile search the dataset, you need to save your dataset(s) in integrated_datasets/task_name/dataset_name
Issues Faced
Dataset Loading: Clarification is needed on how to properly format and load a custom dataset from Databricks into the pyabsa library.
Integration: Guidance on ensuring that the custom dataset is correctly integrated and utilized during the training process.
Directory Structure: Instructions on the required directory structure for custom datasets to be recognized by pyabsa.
Steps to Reproduce
Place a custom dataset in Databricks (ensure it is in .atepc format).
Use the provided code to load the dataset and attempt to train the model.
Observe the error related to dataset loading.
Expected Behavior
The custom dataset should be loaded correctly, and the model should train and predict without errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions