-
-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Predict labels given a translation #196
Comments
Also, do you think it is better to create a model that only labels one term e.g B-TERM ? Ultimately, I want to be able to identify multiple terms B-TERM1, B-TERM2, B-TERM3, etc.. in the translated text but it's not clear to me whether this would become an issue for the model since B-TERM1, B-TERM2, B-TERM3 are just identifiers and aren't distinct labels. |
I'm thinking out aloud here... since ultimately this is a transformer model I presume the sequence needs to be aligned. Would I structure the data this way?
|
I think this is the right way to do. Just stack a text-embedding and a numeric feature embedding layer. Then you convert the second input as a numeric feature. Only one thing u need to keep in mind. When using pre-trained embedding for text, use |
Thanks, I will try this. How do you suggest I convert the second input into numeric feature? In my example above: I want the model to be able to specify the location of B-TERM1 and B-TERM2 (and more terms) in the translation. |
Just make a label dict then convert label to numeric sequece. eg: dic = {
"O": 0,
"B-TERM1": 1,
"B-TERM2": 2,
"B-TERM3": 3,
"B-TERM4": 4,
} This will work |
What about [SEP], [CLS], [BOS] tokens in a numeric sequence? Do I keep them the same? Or also convert to a digit? |
You need to convert those tokens to digits. |
So it looks like my labeled data includes examples where B-TERM can go as high as B-TERM26 in one sentence (26 different terms tagged in the source and target text). Two questions: 1) Does it make sense to create additional training data where B-TERM1, B-TERM2, B-TERM3,..., B-TERM26 are more obviously interchangeable? For example,
b) Example above augmented to make sure the model knows B-TERM1.. B-TERM26 are no different.
2) Since the model only learns word alignment, would it make sense to have data where fewer tags are used? For example,
b) Example above augmented to create two new sets:
ii)
|
Yes, Since you don't have 26 tags in one sentence and all you want the model to do is separate two entity's translation. The ii) way is much better and simpler for the model to learn. |
Yes, I agree only one term is much simpler for the model to learn. However, in the case where I have a sentence that has 26 different terms (most sentences avg. 5-6 terms), a model trained on only one term will require 5-6 separate inferences/predictions correct? |
Posting my model and evaluation results below. Seems like something went wrong. Val_accuracy dropped significantly on epoch 7 and 10 (from 0.4062 -> 0.0117 -> 0.0078). My model basically didn't do any predictions.
Possibly related to #217 ? |
Please try it without the CRF layer, let check out is that an issue related to CRF Layer. |
I tried the BiLSTM_Model (val accuracy stayed at 0.99 throughout training), but results don't seem to be good. Accuracy for [CLS], [SEP], [BOS] are 98-99% accurate. But B-TERM, I-TERM, and O predictions are mostly wrong. I would appreciate any suggestions or ideas!
|
@echan00 What is the evaluation result when you evaluate on the training set and the validation set? Maybe this is because of not enough dataset? |
Providing an example of my training data:
|
I mean the result of
|
Good idea. Evaluation result of training and validation set are similar to test set.
|
Seems like the model isn't learning anything. @BrikerMan do you think this is a problem with the library? Or something I am not doing correctly? |
Maybe our strategy is wrong. Just try using 26 different tags? And check out which label is not easy to learn? |
As I imagined, the model treats each of the 26 different tags as separate entities, which makes it harder for the model to learn. Accuracy for [CLS], [SEP], [BOS] are still 95%+ accurate, but the rest of the predictions are mostly incorrect.
|
Looking at the results, my guess is the simpler model with one keyword is better. But the main problem where the model isn't learning anything is the main issue. Theoretically, BERT embedding should be able to help with this task... I wonder if there is something else we can try to debug the issue and figure out how to get this to work. |
While debugging I've found that even when the training input is the same as the desired output the model is still inaccurate. For example:
The majority of predictions will look like this:
Perhaps there is a bug with stacked or numeric embedding? |
I think this issue may be related to the numeric embedding feature. Maybe we just add the numeric feature to WordEmbedding rather than embed the numeric feature then add WordEmbed result and numeric embed result. Here is the new numeric embedding, just add this to your code and replace the old numeric embedding. class NumericFeaturesEmbeddingV2(NumericFeaturesEmbedding):
"""Embedding layer without pre-training, train embedding layer while training model"""
def _build_model(self, **kwargs):
self.embed_model = keras.Sequential([
L.Reshape((self.sequence_length, 1), input_shape=(self.sequence_length,))
]) Old Stacked Embedding = Word Embedding + Numeric feature Embedding |
I just tried this, results are identical. Looks like the numeric features (with or without embedding) are not registering since the F1 score is still close to zero |
@BrikerMan I want to say thank you for developing this library and providing incredible support along with it. Kudos to you!
I want to create a model that will predict labels given a translation and its labels. For example:
My question is do you think this is possible using your library if I customize my own multi-input model? And if so, do you have any tips/suggestions for me?
It looks like I will mainly be using the Stacked Embedding feature and also referencing this article: https://kashgari.bmio.net/advance-use/multi-output-model/
The text was updated successfully, but these errors were encountered: