Skip to content

Zero-Shot-Relation-Extraction, DeBERTa for Sequence Classification, 150+ new models, 60+ Languages in John Snow Labs NLU 3.4.3

Compare
Choose a tag to compare
@C-K-Loan C-K-Loan released this 22 Apr 08:36
· 23 commits to 3.4.3rc1 since this release
186be00

We are very excited to announce NLU 3.4.3 has been released!

This release features new models for Zero-Shot-Relation-Extraction, DeBERTa for Sequence Classification,
Deidentification in French and Italian and
Lemmatizers, Parts of Speech Taggers, and Word2Vec Embeddings for over 66 languages, with 20 languages being covered
for the first time by NLU, including ancient and exotic languages like Ancient Greek, Old Russian,
Old French and much more. Once again we would like to thank our community to make this release possible.

NLU for Healthcare

On the healthcare NLP side, a new ZeroShotRelationExtractionModel is available, which can extract relations between
clinical entities in an unsupervised fashion, no training required!
Additionally, New French and Italian Deidentification models are available for clinical and healthcare domains.
Powerd by the fantastic Spark NLP for helathcare 3.5.0 release

Zero-Shot Relation Extraction

Zero-shot Relation Extraction to extract relations between clinical entities with no training dataset

import nlu

pipe = nlu.load('med_ner.clinical relation.zeroshot_biobert')
# Configure relations to extract
pipe['zero_shot_relation_extraction'].setRelationalCategories({
    "CURE": ["{{TREATMENT}} cures {{PROBLEM}}."],
    "IMPROVE": ["{{TREATMENT}} improves {{PROBLEM}}.", "{{TREATMENT}} cures {{PROBLEM}}."],
    "REVEAL": ["{{TEST}} reveals {{PROBLEM}}."]})
.setMultiLabel(False)
df = pipe.predict("Paracetamol can alleviate headache or sickness. An MRI test can be used to find cancer.")
df[
    'relation', 'relation_confidence', 'relation_entity1', 'relation_entity1_class', 'relation_entity2', 'relation_entity2_class',]
# Results in following table :
relation relation_confidence relation_entity1 relation_entity1_class relation_entity2 relation_entity2_class
REVEAL 0.976004 An MRI test TEST cancer PROBLEM
IMPROVE 0.988195 Paracetamol TREATMENT sickness PROBLEM
IMPROVE 0.992962 Paracetamol TREATMENT headache PROBLEM

New Healthcare Models overview

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.relation.zeroshot_biobert re_zeroshot_biobert Relation Extraction ZeroShotRelationExtractionModel
fr fr.med_ner.deid_generic ner_deid_generic De-identification MedicalNerModel
fr fr.med_ner.deid_subentity ner_deid_subentity De-identification MedicalNerModel
it it.med_ner.deid_generic ner_deid_generic Named Entity Recognition MedicalNerModel
it it.med_ner.deid_subentity ner_deid_subentity Named Entity Recognition MedicalNerModel

NLU general

On the general NLP side we have new transformer based DeBERTa v3 sequence classifiers models fine-tuned in Urdu, French and English for
Sentiment and News classification. Additionally, 100+ Part Of Speech Taggers and Lemmatizers for 66 Languages and for 7
languages new word2vec embeddings, including hi,azb,bo,diq,cy,es,it,
powered by the amazing Spark NLP 3.4.3 release

New Languages covered:

First time languages covered by NLU are :
South Azerbaijani, Tibetan, Dimli, Central Kurdish, Southern Altai,
Scottish Gaelic,Faroese,Literary Chinese,Ancient Greek,
Gothic, Old Russian, Church Slavic,
Old French,Uighur,Coptic,Croatian, Belarusian, Serbian

and their respective ISO-639-3 and ISO 630-2 codes are :
azb,bo,diq,ckb, lt gd, fo,lzh,grc,got,orv,cu,fro,qtd,ug,cop,hr,be,qhe,sr

New NLP Models Overview

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.classify.sentiment.imdb.deberta deberta_v3_xsmall_sequence_classifier_imdb Text Classification DeBertaForSequenceClassification
en en.classify.sentiment.imdb.deberta.small deberta_v3_small_sequence_classifier_imdb Text Classification DeBertaForSequenceClassification
en en.classify.sentiment.imdb.deberta.base deberta_v3_base_sequence_classifier_imdb Text Classification DeBertaForSequenceClassification
en en.classify.sentiment.imdb.deberta.large deberta_v3_large_sequence_classifier_imdb Text Classification DeBertaForSequenceClassification
en en.classify.news.deberta deberta_v3_xsmall_sequence_classifier_ag_news Text Classification DeBertaForSequenceClassification
en en.classify.news.deberta.small deberta_v3_small_sequence_classifier_ag_news Text Classification DeBertaForSequenceClassification
ur ur.classify.sentiment.imdb mdeberta_v3_base_sequence_classifier_imdb Text Classification DeBertaForSequenceClassification
fr fr.classify.allocine mdeberta_v3_base_sequence_classifier_allocine Text Classification DeBertaForSequenceClassification
ur ur.embed.bert_cased bert_embeddings_bert_base_ur_cased Embeddings BertEmbeddings
fr fr.embed.bert_5lang_cased bert_embeddings_bert_base_5lang_cased Embeddings BertEmbeddings
de de.embed.medbert bert_embeddings_German_MedBERT Embeddings BertEmbeddings
ar ar.embed.arbert bert_embeddings_ARBERT Embeddings BertEmbeddings
bn bn.embed.bangala_bert bert_embeddings_bangla_bert_base Embeddings BertEmbeddings
zh zh.embed.bert_5lang_cased bert_embeddings_bert_base_5lang_cased Embeddings BertEmbeddings
hi hi.embed.bert_hi_cased bert_embeddings_bert_base_hi_cased Embeddings BertEmbeddings
it it.embed.bert_it_cased bert_embeddings_bert_base_it_cased Embeddings BertEmbeddings
ko ko.embed.bert bert_embeddings_bert_base Embeddings BertEmbeddings
tr tr.embed.bert_cased bert_embeddings_bert_base_tr_cased Embeddings BertEmbeddings
vi vi.embed.bert_cased bert_embeddings_bert_base_vi_cased Embeddings BertEmbeddings
hif hif.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
azb azb.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
bo bo.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
diq diq.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
cy cy.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
es es.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
it it.embed.word2vec w2v_cc_300d Embeddings WordEmbeddingsModel
af af.lemma lemma Lemmatization LemmatizerModel
lt lt.lemma lemma_alksnis Lemmatization LemmatizerModel
nl nl.lemma lemma Lemmatization LemmatizerModel
gd gd.lemma lemma_arcosg Lemmatization LemmatizerModel
es es.lemma lemma Lemmatization LemmatizerModel
ca ca.lemma lemma Lemmatization LemmatizerModel
el el.lemma.gdt lemma_gdt Lemmatization LemmatizerModel
en en.lemma.atis lemma_atis Lemmatization LemmatizerModel
tr tr.lemma.boun lemma_boun Lemmatization LemmatizerModel
da da.lemma.ddt lemma_ddt Lemmatization LemmatizerModel
cs cs.lemma.cac lemma_cac Lemmatization LemmatizerModel
en en.lemma.esl lemma_esl Lemmatization LemmatizerModel
bg bg.lemma.btb lemma_btb Lemmatization LemmatizerModel
id id.lemma.csui lemma_csui Lemmatization LemmatizerModel
gl gl.lemma.ctg lemma_ctg Lemmatization LemmatizerModel
cy cy.lemma.ccg lemma_ccg Lemmatization LemmatizerModel
fo fo.lemma.farpahc lemma_farpahc Lemmatization LemmatizerModel
tr tr.lemma.atis lemma_atis Lemmatization LemmatizerModel
ga ga.lemma.idt lemma_idt Lemmatization LemmatizerModel
ja ja.lemma.gsdluw lemma_gsdluw Lemmatization LemmatizerModel
es es.lemma.gsd lemma_gsd Lemmatization LemmatizerModel
en en.lemma.gum lemma_gum Lemmatization LemmatizerModel
zh zh.lemma.gsd lemma_gsd Lemmatization LemmatizerModel
lv lv.lemma.lvtb lemma_lvtb Lemmatization LemmatizerModel
hi hi.lemma.hdtb lemma_hdtb Lemmatization LemmatizerModel
pt pt.lemma.gsd lemma_gsd Lemmatization LemmatizerModel
de de.lemma.gsd lemma_gsd Lemmatization LemmatizerModel
nl nl.lemma.lassysmall lemma_lassysmall Lemmatization LemmatizerModel
lzh lzh.lemma.kyoto lemma_kyoto Lemmatization LemmatizerModel
zh zh.lemma.gsdsimp lemma_gsdsimp Lemmatization LemmatizerModel
he he.lemma.htb lemma_htb Lemmatization LemmatizerModel
fr fr.lemma.gsd lemma_gsd Lemmatization LemmatizerModel
ro ro.lemma.nonstandard lemma_nonstandard Lemmatization LemmatizerModel
ja ja.lemma.gsd lemma_gsd Lemmatization LemmatizerModel
it it.lemma.isdt lemma_isdt Lemmatization LemmatizerModel
de de.lemma.hdt lemma_hdt Lemmatization LemmatizerModel
is is.lemma.modern lemma_modern Lemmatization LemmatizerModel
la la.lemma.ittb lemma_ittb Lemmatization LemmatizerModel
fr fr.lemma.partut lemma_partut Lemmatization LemmatizerModel
pcm pcm.lemma.nsc lemma_nsc Lemmatization LemmatizerModel
pl pl.lemma.pdb lemma_pdb Lemmatization LemmatizerModel
grc grc.lemma.perseus lemma_perseus Lemmatization LemmatizerModel
cs cs.lemma.pdt lemma_pdt Lemmatization LemmatizerModel
fa fa.lemma.perdt lemma_perdt Lemmatization LemmatizerModel
got got.lemma.proiel lemma_proiel Lemmatization LemmatizerModel
fr fr.lemma.rhapsodie lemma_rhapsodie Lemmatization LemmatizerModel
it it.lemma.partut lemma_partut Lemmatization LemmatizerModel
en en.lemma.partut lemma_partut Lemmatization LemmatizerModel
no no.lemma.nynorsklia lemma_nynorsklia Lemmatization LemmatizerModel
orv orv.lemma.rnc lemma_rnc Lemmatization LemmatizerModel
cu cu.lemma.proiel lemma_proiel Lemmatization LemmatizerModel
la la.lemma.perseus lemma_perseus Lemmatization LemmatizerModel
fr fr.lemma.parisstories lemma_parisstories Lemmatization LemmatizerModel
fro fro.lemma.srcmf lemma_srcmf Lemmatization LemmatizerModel
vi vi.lemma.vtb lemma_vtb Lemmatization LemmatizerModel
qtd qtd.lemma.sagt lemma_sagt Lemmatization LemmatizerModel
ro ro.lemma.rrt lemma_rrt Lemmatization LemmatizerModel
hu hu.lemma.szeged lemma_szeged Lemmatization LemmatizerModel
ug ug.lemma.udt lemma_udt Lemmatization LemmatizerModel
wo wo.lemma.wtb lemma_wtb Lemmatization LemmatizerModel
cop cop.lemma.scriptorium lemma_scriptorium Lemmatization LemmatizerModel
ru ru.lemma.syntagrus lemma_syntagrus Lemmatization LemmatizerModel
ru ru.lemma.taiga lemma_taiga Lemmatization LemmatizerModel
fr fr.lemma.sequoia lemma_sequoia Lemmatization LemmatizerModel
la la.lemma.udante lemma_udante Lemmatization LemmatizerModel
ro ro.lemma.simonero lemma_simonero Lemmatization LemmatizerModel
it it.lemma.vit lemma_vit Lemmatization LemmatizerModel
hr hr.lemma.set lemma_set Lemmatization LemmatizerModel
fa fa.lemma.seraji lemma_seraji Lemmatization LemmatizerModel
tr tr.lemma.tourism lemma_tourism Lemmatization LemmatizerModel
ta ta.lemma.ttb lemma_ttb Lemmatization LemmatizerModel
sl sl.lemma.ssj lemma_ssj Lemmatization LemmatizerModel
sv sv.lemma.talbanken lemma_talbanken Lemmatization LemmatizerModel
uk uk.lemma.iu lemma_iu Lemmatization LemmatizerModel
te te.pos pos_mtg Part of Speech Tagging PerceptronModel
te te.pos pos_mtg Part of Speech Tagging PerceptronModel
ta ta.pos pos_ttb Part of Speech Tagging PerceptronModel
ta ta.pos pos_ttb Part of Speech Tagging PerceptronModel
cs cs.pos pos_ud_pdt Part of Speech Tagging PerceptronModel
cs cs.pos pos_ud_pdt Part of Speech Tagging PerceptronModel
bg bg.pos pos_btb Part of Speech Tagging PerceptronModel
bg bg.pos pos_btb Part of Speech Tagging PerceptronModel
af af.pos pos_afribooms Part of Speech Tagging PerceptronModel
af af.pos pos_afribooms Part of Speech Tagging PerceptronModel
af af.pos pos_afribooms Part of Speech Tagging PerceptronModel
es es.pos.gsd pos_gsd Part of Speech Tagging PerceptronModel
en en.pos.ewt pos_ewt Part of Speech Tagging PerceptronModel
gd gd.pos.arcosg pos_arcosg Part of Speech Tagging PerceptronModel
el el.pos.gdt pos_gdt Part of Speech Tagging PerceptronModel
hy hy.pos.armtdp pos_armtdp Part of Speech Tagging PerceptronModel
pt pt.pos.bosque pos_bosque Part of Speech Tagging PerceptronModel
tr tr.pos.framenet pos_framenet Part of Speech Tagging PerceptronModel
cs cs.pos.cltt pos_cltt Part of Speech Tagging PerceptronModel
eu eu.pos.bdt pos_bdt Part of Speech Tagging PerceptronModel
et et.pos.ewt pos_ewt Part of Speech Tagging PerceptronModel
da da.pos.ddt pos_ddt Part of Speech Tagging PerceptronModel
cy cy.pos.ccg pos_ccg Part of Speech Tagging PerceptronModel
lt lt.pos.alksnis pos_alksnis Part of Speech Tagging PerceptronModel
nl nl.pos.alpino pos_alpino Part of Speech Tagging PerceptronModel
fi fi.pos.ftb pos_ftb Part of Speech Tagging PerceptronModel
tr tr.pos.atis pos_atis Part of Speech Tagging PerceptronModel
ca ca.pos.ancora pos_ancora Part of Speech Tagging PerceptronModel
gl gl.pos.ctg pos_ctg Part of Speech Tagging PerceptronModel
de de.pos.gsd pos_gsd Part of Speech Tagging PerceptronModel
fr fr.pos.gsd pos_gsd Part of Speech Tagging PerceptronModel
ja ja.pos.gsdluw pos_gsdluw Part of Speech Tagging PerceptronModel
it it.pos.isdt pos_isdt Part of Speech Tagging PerceptronModel
be be.pos.hse pos_hse Part of Speech Tagging PerceptronModel
nl nl.pos.lassysmall pos_lassysmall Part of Speech Tagging PerceptronModel
sv sv.pos.lines pos_lines Part of Speech Tagging PerceptronModel
uk uk.pos.iu pos_iu Part of Speech Tagging PerceptronModel
fr fr.pos.parisstories pos_parisstories Part of Speech Tagging PerceptronModel
en en.pos.partut pos_partut Part of Speech Tagging PerceptronModel
la la.pos.ittb pos_ittb Part of Speech Tagging PerceptronModel
lzh lzh.pos.kyoto pos_kyoto Part of Speech Tagging PerceptronModel
id id.pos.gsd pos_gsd Part of Speech Tagging PerceptronModel
he he.pos.htb pos_htb Part of Speech Tagging PerceptronModel
tr tr.pos.kenet pos_kenet Part of Speech Tagging PerceptronModel
de de.pos.hdt pos_hdt Part of Speech Tagging PerceptronModel
qhe qhe.pos.hiencs pos_hiencs Part of Speech Tagging PerceptronModel
la la.pos.llct pos_llct Part of Speech Tagging PerceptronModel
en en.pos.lines pos_lines Part of Speech Tagging PerceptronModel
pcm pcm.pos.nsc pos_nsc Part of Speech Tagging PerceptronModel
ko ko.pos.kaist pos_kaist Part of Speech Tagging PerceptronModel
pt pt.pos.gsd pos_gsd Part of Speech Tagging PerceptronModel
hi hi.pos.hdtb pos_hdtb Part of Speech Tagging PerceptronModel
is is.pos.modern pos_modern Part of Speech Tagging PerceptronModel
en en.pos.gum pos_gum Part of Speech Tagging PerceptronModel
fro fro.pos.srcmf pos_srcmf Part of Speech Tagging PerceptronModel
sl sl.pos.ssj pos_ssj Part of Speech Tagging PerceptronModel
ru ru.pos.taiga pos_taiga Part of Speech Tagging PerceptronModel
grc grc.pos.perseus pos_perseus Part of Speech Tagging PerceptronModel
sr sr.pos.set pos_set Part of Speech Tagging PerceptronModel
orv orv.pos.rnc pos_rnc Part of Speech Tagging PerceptronModel
ug ug.pos.udt pos_udt Part of Speech Tagging PerceptronModel
got got.pos.proiel pos_proiel Part of Speech Tagging PerceptronModel
sv sv.pos.talbanken pos_talbanken Part of Speech Tagging PerceptronModel
sv sv.pos.talbanken pos_talbanken Part of Speech Tagging PerceptronModel
pl pl.pos.pdb pos_pdb Part of Speech Tagging PerceptronModel
fa fa.pos.seraji pos_seraji Part of Speech Tagging PerceptronModel
tr tr.pos.penn pos_penn Part of Speech Tagging PerceptronModel
hu hu.pos.szeged pos_szeged Part of Speech Tagging PerceptronModel
sk sk.pos.snk pos_snk Part of Speech Tagging PerceptronModel
sk sk.pos.snk pos_snk Part of Speech Tagging PerceptronModel
ro ro.pos.simonero pos_simonero Part of Speech Tagging PerceptronModel
it it.pos.postwita pos_postwita Part of Speech Tagging PerceptronModel
gl gl.pos.treegal pos_treegal Part of Speech Tagging PerceptronModel
cs cs.pos.pdt pos_pdt Part of Speech Tagging PerceptronModel
ro ro.pos.rrt pos_rrt Part of Speech Tagging PerceptronModel
orv orv.pos.torot pos_torot Part of Speech Tagging PerceptronModel
hr hr.pos.set pos_set Part of Speech Tagging PerceptronModel
la la.pos.proiel pos_proiel Part of Speech Tagging PerceptronModel
fr fr.pos.partut pos_partut Part of Speech Tagging PerceptronModel
it it.pos.vit pos_vit Part of Speech Tagging PerceptronModel

Bugfixes

  • Improved Error Messages and integrated detection and stopping of endless loops which could occur during construction
    of nlu pipelines

Additional NLU resources

1 line Install NLU on Google Colab

!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash

1 line Install NLU on Kaggle

!wget https://setup.johnsnowlabs.com/nlu/kaggle.sh -O - | bash

Install via PIP

! pip install nlu pyspark streamlit==0.80.0