Release Zero-Shot-Relation-Extraction, DeBERTa for Sequence Classification, 150+ new models, 60+ Languages in John Snow Labs NLU 3.4.3 · JohnSnowLabs/nlu

We are very excited to announce NLU 3.4.3 has been released!

This release features new models for Zero-Shot-Relation-Extraction, DeBERTa for Sequence Classification,
Deidentification in French and Italian and
Lemmatizers, Parts of Speech Taggers, and Word2Vec Embeddings for over 66 languages, with 20 languages being covered
for the first time by NLU, including ancient and exotic languages like Ancient Greek, Old Russian,
Old French and much more. Once again we would like to thank our community to make this release possible.

NLU for Healthcare

On the healthcare NLP side, a new ZeroShotRelationExtractionModel is available, which can extract relations between
clinical entities in an unsupervised fashion, no training required!
Additionally, New French and Italian Deidentification models are available for clinical and healthcare domains.
Powerd by the fantastic Spark NLP for helathcare 3.5.0 release

Zero-Shot Relation Extraction

Zero-shot Relation Extraction to extract relations between clinical entities with no training dataset

import nlu

pipe = nlu.load('med_ner.clinical relation.zeroshot_biobert')
# Configure relations to extract
pipe['zero_shot_relation_extraction'].setRelationalCategories({
    "CURE": ["{{TREATMENT}} cures {{PROBLEM}}."],
    "IMPROVE": ["{{TREATMENT}} improves {{PROBLEM}}.", "{{TREATMENT}} cures {{PROBLEM}}."],
    "REVEAL": ["{{TEST}} reveals {{PROBLEM}}."]})
.setMultiLabel(False)
df = pipe.predict("Paracetamol can alleviate headache or sickness. An MRI test can be used to find cancer.")
df[
    'relation', 'relation_confidence', 'relation_entity1', 'relation_entity1_class', 'relation_entity2', 'relation_entity2_class',]
# Results in following table :

relation	relation_confidence	relation_entity1	relation_entity1_class	relation_entity2	relation_entity2_class
REVEAL	0.976004	An MRI test	TEST	cancer	PROBLEM
IMPROVE	0.988195	Paracetamol	TREATMENT	sickness	PROBLEM
IMPROVE	0.992962	Paracetamol	TREATMENT	headache	PROBLEM

New Healthcare Models overview

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.relation.zeroshot_biobert	re_zeroshot_biobert	Relation Extraction	ZeroShotRelationExtractionModel
fr	fr.med_ner.deid_generic	ner_deid_generic	De-identification	MedicalNerModel
fr	fr.med_ner.deid_subentity	ner_deid_subentity	De-identification	MedicalNerModel
it	it.med_ner.deid_generic	ner_deid_generic	Named Entity Recognition	MedicalNerModel
it	it.med_ner.deid_subentity	ner_deid_subentity	Named Entity Recognition	MedicalNerModel

NLU general

On the general NLP side we have new transformer based DeBERTa v3 sequence classifiers models fine-tuned in Urdu, French and English for
Sentiment and News classification. Additionally, 100+ Part Of Speech Taggers and Lemmatizers for 66 Languages and for 7
languages new word2vec embeddings, including hi,azb,bo,diq,cy,es,it,
powered by the amazing Spark NLP 3.4.3 release

New Languages covered:

First time languages covered by NLU are :
South Azerbaijani, Tibetan, Dimli, Central Kurdish, Southern Altai,
Scottish Gaelic,Faroese,Literary Chinese,Ancient Greek,
Gothic, Old Russian, Church Slavic,
Old French,Uighur,Coptic,Croatian, Belarusian, Serbian

and their respective ISO-639-3 and ISO 630-2 codes are :
azb,bo,diq,ckb, lt gd, fo,lzh,grc,got,orv,cu,fro,qtd,ug,cop,hr,be,qhe,sr

New NLP Models Overview

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.classify.sentiment.imdb.deberta	deberta_v3_xsmall_sequence_classifier_imdb	Text Classification	DeBertaForSequenceClassification
en	en.classify.sentiment.imdb.deberta.small	deberta_v3_small_sequence_classifier_imdb	Text Classification	DeBertaForSequenceClassification
en	en.classify.sentiment.imdb.deberta.base	deberta_v3_base_sequence_classifier_imdb	Text Classification	DeBertaForSequenceClassification
en	en.classify.sentiment.imdb.deberta.large	deberta_v3_large_sequence_classifier_imdb	Text Classification	DeBertaForSequenceClassification
en	en.classify.news.deberta	deberta_v3_xsmall_sequence_classifier_ag_news	Text Classification	DeBertaForSequenceClassification
en	en.classify.news.deberta.small	deberta_v3_small_sequence_classifier_ag_news	Text Classification	DeBertaForSequenceClassification
ur	ur.classify.sentiment.imdb	mdeberta_v3_base_sequence_classifier_imdb	Text Classification	DeBertaForSequenceClassification
fr	fr.classify.allocine	mdeberta_v3_base_sequence_classifier_allocine	Text Classification	DeBertaForSequenceClassification
ur	ur.embed.bert_cased	bert_embeddings_bert_base_ur_cased	Embeddings	BertEmbeddings
fr	fr.embed.bert_5lang_cased	bert_embeddings_bert_base_5lang_cased	Embeddings	BertEmbeddings
de	de.embed.medbert	bert_embeddings_German_MedBERT	Embeddings	BertEmbeddings
ar	ar.embed.arbert	bert_embeddings_ARBERT	Embeddings	BertEmbeddings
bn	bn.embed.bangala_bert	bert_embeddings_bangla_bert_base	Embeddings	BertEmbeddings
zh	zh.embed.bert_5lang_cased	bert_embeddings_bert_base_5lang_cased	Embeddings	BertEmbeddings
hi	hi.embed.bert_hi_cased	bert_embeddings_bert_base_hi_cased	Embeddings	BertEmbeddings
it	it.embed.bert_it_cased	bert_embeddings_bert_base_it_cased	Embeddings	BertEmbeddings
ko	ko.embed.bert	bert_embeddings_bert_base	Embeddings	BertEmbeddings
tr	tr.embed.bert_cased	bert_embeddings_bert_base_tr_cased	Embeddings	BertEmbeddings
vi	vi.embed.bert_cased	bert_embeddings_bert_base_vi_cased	Embeddings	BertEmbeddings
hif	hif.embed.w2v_cc_300d	w2v_cc_300d	Embeddings	WordEmbeddingsModel
azb	azb.embed.w2v_cc_300d	w2v_cc_300d	Embeddings	WordEmbeddingsModel
bo	bo.embed.w2v_cc_300d	w2v_cc_300d	Embeddings	WordEmbeddingsModel
diq	diq.embed.w2v_cc_300d	w2v_cc_300d	Embeddings	WordEmbeddingsModel
cy	cy.embed.w2v_cc_300d	w2v_cc_300d	Embeddings	WordEmbeddingsModel
es	es.embed.w2v_cc_300d	w2v_cc_300d	Embeddings	WordEmbeddingsModel
it	it.embed.word2vec	w2v_cc_300d	Embeddings	WordEmbeddingsModel
af	af.lemma	lemma	Lemmatization	LemmatizerModel
lt	lt.lemma	lemma_alksnis	Lemmatization	LemmatizerModel
nl	nl.lemma	lemma	Lemmatization	LemmatizerModel
gd	gd.lemma	lemma_arcosg	Lemmatization	LemmatizerModel
es	es.lemma	lemma	Lemmatization	LemmatizerModel
ca	ca.lemma	lemma	Lemmatization	LemmatizerModel
el	el.lemma.gdt	lemma_gdt	Lemmatization	LemmatizerModel
en	en.lemma.atis	lemma_atis	Lemmatization	LemmatizerModel
tr	tr.lemma.boun	lemma_boun	Lemmatization	LemmatizerModel
da	da.lemma.ddt	lemma_ddt	Lemmatization	LemmatizerModel
cs	cs.lemma.cac	lemma_cac	Lemmatization	LemmatizerModel
en	en.lemma.esl	lemma_esl	Lemmatization	LemmatizerModel
bg	bg.lemma.btb	lemma_btb	Lemmatization	LemmatizerModel
id	id.lemma.csui	lemma_csui	Lemmatization	LemmatizerModel
gl	gl.lemma.ctg	lemma_ctg	Lemmatization	LemmatizerModel
cy	cy.lemma.ccg	lemma_ccg	Lemmatization	LemmatizerModel
fo	fo.lemma.farpahc	lemma_farpahc	Lemmatization	LemmatizerModel
tr	tr.lemma.atis	lemma_atis	Lemmatization	LemmatizerModel
ga	ga.lemma.idt	lemma_idt	Lemmatization	LemmatizerModel
ja	ja.lemma.gsdluw	lemma_gsdluw	Lemmatization	LemmatizerModel
es	es.lemma.gsd	lemma_gsd	Lemmatization	LemmatizerModel
en	en.lemma.gum	lemma_gum	Lemmatization	LemmatizerModel
zh	zh.lemma.gsd	lemma_gsd	Lemmatization	LemmatizerModel
lv	lv.lemma.lvtb	lemma_lvtb	Lemmatization	LemmatizerModel
hi	hi.lemma.hdtb	lemma_hdtb	Lemmatization	LemmatizerModel
pt	pt.lemma.gsd	lemma_gsd	Lemmatization	LemmatizerModel
de	de.lemma.gsd	lemma_gsd	Lemmatization	LemmatizerModel
nl	nl.lemma.lassysmall	lemma_lassysmall	Lemmatization	LemmatizerModel
lzh	lzh.lemma.kyoto	lemma_kyoto	Lemmatization	LemmatizerModel
zh	zh.lemma.gsdsimp	lemma_gsdsimp	Lemmatization	LemmatizerModel
he	he.lemma.htb	lemma_htb	Lemmatization	LemmatizerModel
fr	fr.lemma.gsd	lemma_gsd	Lemmatization	LemmatizerModel
ro	ro.lemma.nonstandard	lemma_nonstandard	Lemmatization	LemmatizerModel
ja	ja.lemma.gsd	lemma_gsd	Lemmatization	LemmatizerModel
it	it.lemma.isdt	lemma_isdt	Lemmatization	LemmatizerModel
de	de.lemma.hdt	lemma_hdt	Lemmatization	LemmatizerModel
is	is.lemma.modern	lemma_modern	Lemmatization	LemmatizerModel
la	la.lemma.ittb	lemma_ittb	Lemmatization	LemmatizerModel
fr	fr.lemma.partut	lemma_partut	Lemmatization	LemmatizerModel
pcm	pcm.lemma.nsc	lemma_nsc	Lemmatization	LemmatizerModel
pl	pl.lemma.pdb	lemma_pdb	Lemmatization	LemmatizerModel
grc	grc.lemma.perseus	lemma_perseus	Lemmatization	LemmatizerModel
cs	cs.lemma.pdt	lemma_pdt	Lemmatization	LemmatizerModel
fa	fa.lemma.perdt	lemma_perdt	Lemmatization	LemmatizerModel
got	got.lemma.proiel	lemma_proiel	Lemmatization	LemmatizerModel
fr	fr.lemma.rhapsodie	lemma_rhapsodie	Lemmatization	LemmatizerModel
it	it.lemma.partut	lemma_partut	Lemmatization	LemmatizerModel
en	en.lemma.partut	lemma_partut	Lemmatization	LemmatizerModel
no	no.lemma.nynorsklia	lemma_nynorsklia	Lemmatization	LemmatizerModel
orv	orv.lemma.rnc	lemma_rnc	Lemmatization	LemmatizerModel
cu	cu.lemma.proiel	lemma_proiel	Lemmatization	LemmatizerModel
la	la.lemma.perseus	lemma_perseus	Lemmatization	LemmatizerModel
fr	fr.lemma.parisstories	lemma_parisstories	Lemmatization	LemmatizerModel
fro	fro.lemma.srcmf	lemma_srcmf	Lemmatization	LemmatizerModel
vi	vi.lemma.vtb	lemma_vtb	Lemmatization	LemmatizerModel
qtd	qtd.lemma.sagt	lemma_sagt	Lemmatization	LemmatizerModel
ro	ro.lemma.rrt	lemma_rrt	Lemmatization	LemmatizerModel
hu	hu.lemma.szeged	lemma_szeged	Lemmatization	LemmatizerModel
ug	ug.lemma.udt	lemma_udt	Lemmatization	LemmatizerModel
wo	wo.lemma.wtb	lemma_wtb	Lemmatization	LemmatizerModel
cop	cop.lemma.scriptorium	lemma_scriptorium	Lemmatization	LemmatizerModel
ru	ru.lemma.syntagrus	lemma_syntagrus	Lemmatization	LemmatizerModel
ru	ru.lemma.taiga	lemma_taiga	Lemmatization	LemmatizerModel
fr	fr.lemma.sequoia	lemma_sequoia	Lemmatization	LemmatizerModel
la	la.lemma.udante	lemma_udante	Lemmatization	LemmatizerModel
ro	ro.lemma.simonero	lemma_simonero	Lemmatization	LemmatizerModel
it	it.lemma.vit	lemma_vit	Lemmatization	LemmatizerModel
hr	hr.lemma.set	lemma_set	Lemmatization	LemmatizerModel
fa	fa.lemma.seraji	lemma_seraji	Lemmatization	LemmatizerModel
tr	tr.lemma.tourism	lemma_tourism	Lemmatization	LemmatizerModel
ta	ta.lemma.ttb	lemma_ttb	Lemmatization	LemmatizerModel
sl	sl.lemma.ssj	lemma_ssj	Lemmatization	LemmatizerModel
sv	sv.lemma.talbanken	lemma_talbanken	Lemmatization	LemmatizerModel
uk	uk.lemma.iu	lemma_iu	Lemmatization	LemmatizerModel
te	te.pos	pos_mtg	Part of Speech Tagging	PerceptronModel
te	te.pos	pos_mtg	Part of Speech Tagging	PerceptronModel
ta	ta.pos	pos_ttb	Part of Speech Tagging	PerceptronModel
ta	ta.pos	pos_ttb	Part of Speech Tagging	PerceptronModel
cs	cs.pos	pos_ud_pdt	Part of Speech Tagging	PerceptronModel
cs	cs.pos	pos_ud_pdt	Part of Speech Tagging	PerceptronModel
bg	bg.pos	pos_btb	Part of Speech Tagging	PerceptronModel
bg	bg.pos	pos_btb	Part of Speech Tagging	PerceptronModel
af	af.pos	pos_afribooms	Part of Speech Tagging	PerceptronModel
af	af.pos	pos_afribooms	Part of Speech Tagging	PerceptronModel
af	af.pos	pos_afribooms	Part of Speech Tagging	PerceptronModel
es	es.pos.gsd	pos_gsd	Part of Speech Tagging	PerceptronModel
en	en.pos.ewt	pos_ewt	Part of Speech Tagging	PerceptronModel
gd	gd.pos.arcosg	pos_arcosg	Part of Speech Tagging	PerceptronModel
el	el.pos.gdt	pos_gdt	Part of Speech Tagging	PerceptronModel
hy	hy.pos.armtdp	pos_armtdp	Part of Speech Tagging	PerceptronModel
pt	pt.pos.bosque	pos_bosque	Part of Speech Tagging	PerceptronModel
tr	tr.pos.framenet	pos_framenet	Part of Speech Tagging	PerceptronModel
cs	cs.pos.cltt	pos_cltt	Part of Speech Tagging	PerceptronModel
eu	eu.pos.bdt	pos_bdt	Part of Speech Tagging	PerceptronModel
et	et.pos.ewt	pos_ewt	Part of Speech Tagging	PerceptronModel
da	da.pos.ddt	pos_ddt	Part of Speech Tagging	PerceptronModel
cy	cy.pos.ccg	pos_ccg	Part of Speech Tagging	PerceptronModel
lt	lt.pos.alksnis	pos_alksnis	Part of Speech Tagging	PerceptronModel
nl	nl.pos.alpino	pos_alpino	Part of Speech Tagging	PerceptronModel
fi	fi.pos.ftb	pos_ftb	Part of Speech Tagging	PerceptronModel
tr	tr.pos.atis	pos_atis	Part of Speech Tagging	PerceptronModel
ca	ca.pos.ancora	pos_ancora	Part of Speech Tagging	PerceptronModel
gl	gl.pos.ctg	pos_ctg	Part of Speech Tagging	PerceptronModel
de	de.pos.gsd	pos_gsd	Part of Speech Tagging	PerceptronModel
fr	fr.pos.gsd	pos_gsd	Part of Speech Tagging	PerceptronModel
ja	ja.pos.gsdluw	pos_gsdluw	Part of Speech Tagging	PerceptronModel
it	it.pos.isdt	pos_isdt	Part of Speech Tagging	PerceptronModel
be	be.pos.hse	pos_hse	Part of Speech Tagging	PerceptronModel
nl	nl.pos.lassysmall	pos_lassysmall	Part of Speech Tagging	PerceptronModel
sv	sv.pos.lines	pos_lines	Part of Speech Tagging	PerceptronModel
uk	uk.pos.iu	pos_iu	Part of Speech Tagging	PerceptronModel
fr	fr.pos.parisstories	pos_parisstories	Part of Speech Tagging	PerceptronModel
en	en.pos.partut	pos_partut	Part of Speech Tagging	PerceptronModel
la	la.pos.ittb	pos_ittb	Part of Speech Tagging	PerceptronModel
lzh	lzh.pos.kyoto	pos_kyoto	Part of Speech Tagging	PerceptronModel
id	id.pos.gsd	pos_gsd	Part of Speech Tagging	PerceptronModel
he	he.pos.htb	pos_htb	Part of Speech Tagging	PerceptronModel
tr	tr.pos.kenet	pos_kenet	Part of Speech Tagging	PerceptronModel
de	de.pos.hdt	pos_hdt	Part of Speech Tagging	PerceptronModel
qhe	qhe.pos.hiencs	pos_hiencs	Part of Speech Tagging	PerceptronModel
la	la.pos.llct	pos_llct	Part of Speech Tagging	PerceptronModel
en	en.pos.lines	pos_lines	Part of Speech Tagging	PerceptronModel
pcm	pcm.pos.nsc	pos_nsc	Part of Speech Tagging	PerceptronModel
ko	ko.pos.kaist	pos_kaist	Part of Speech Tagging	PerceptronModel
pt	pt.pos.gsd	pos_gsd	Part of Speech Tagging	PerceptronModel
hi	hi.pos.hdtb	pos_hdtb	Part of Speech Tagging	PerceptronModel
is	is.pos.modern	pos_modern	Part of Speech Tagging	PerceptronModel
en	en.pos.gum	pos_gum	Part of Speech Tagging	PerceptronModel
fro	fro.pos.srcmf	pos_srcmf	Part of Speech Tagging	PerceptronModel
sl	sl.pos.ssj	pos_ssj	Part of Speech Tagging	PerceptronModel
ru	ru.pos.taiga	pos_taiga	Part of Speech Tagging	PerceptronModel
grc	grc.pos.perseus	pos_perseus	Part of Speech Tagging	PerceptronModel
sr	sr.pos.set	pos_set	Part of Speech Tagging	PerceptronModel
orv	orv.pos.rnc	pos_rnc	Part of Speech Tagging	PerceptronModel
ug	ug.pos.udt	pos_udt	Part of Speech Tagging	PerceptronModel
got	got.pos.proiel	pos_proiel	Part of Speech Tagging	PerceptronModel
sv	sv.pos.talbanken	pos_talbanken	Part of Speech Tagging	PerceptronModel
sv	sv.pos.talbanken	pos_talbanken	Part of Speech Tagging	PerceptronModel
pl	pl.pos.pdb	pos_pdb	Part of Speech Tagging	PerceptronModel
fa	fa.pos.seraji	pos_seraji	Part of Speech Tagging	PerceptronModel
tr	tr.pos.penn	pos_penn	Part of Speech Tagging	PerceptronModel
hu	hu.pos.szeged	pos_szeged	Part of Speech Tagging	PerceptronModel
sk	sk.pos.snk	pos_snk	Part of Speech Tagging	PerceptronModel
sk	sk.pos.snk	pos_snk	Part of Speech Tagging	PerceptronModel
ro	ro.pos.simonero	pos_simonero	Part of Speech Tagging	PerceptronModel
it	it.pos.postwita	pos_postwita	Part of Speech Tagging	PerceptronModel
gl	gl.pos.treegal	pos_treegal	Part of Speech Tagging	PerceptronModel
cs	cs.pos.pdt	pos_pdt	Part of Speech Tagging	PerceptronModel
ro	ro.pos.rrt	pos_rrt	Part of Speech Tagging	PerceptronModel
orv	orv.pos.torot	pos_torot	Part of Speech Tagging	PerceptronModel
hr	hr.pos.set	pos_set	Part of Speech Tagging	PerceptronModel
la	la.pos.proiel	pos_proiel	Part of Speech Tagging	PerceptronModel
fr	fr.pos.partut	pos_partut	Part of Speech Tagging	PerceptronModel
it	it.pos.vit	pos_vit	Part of Speech Tagging	PerceptronModel

Bugfixes

Improved Error Messages and integrated detection and stopping of endless loops which could occur during construction
of nlu pipelines

Additional NLU resources

140+ NLU Tutorials
NLU in Action
Streamlit visualizations docs
The complete list of all 4000+ models & pipelines in 200+ languages is available on Models Hub.
Spark NLP publications
NLU documentation
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!

1 line Install NLU on Google Colab

!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash

1 line Install NLU on Kaggle

!wget https://setup.johnsnowlabs.com/nlu/kaggle.sh -O - | bash

Install via PIP

! pip install nlu pyspark streamlit==0.80.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-Shot-Relation-Extraction, DeBERTa for Sequence Classification, 150+ new models, 60+ Languages in John Snow Labs NLU 3.4.3