You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am using spaCy lemmatizer for some tasks. I saw that when using a pipe to process the data faster, I'm getting different results with tok2vec disabled or enabled. Maintaining case-sensitivity is critical for me. Is the below behavior expected?
How to reproduce the behaviour
Case1
importspacynlp=spacy.load("en_core_web_sm")
fordocinnlp.pipe(["Hello! My name is Marcin.", "I have a SFTP server running in my HomeLab"], batch_size=100, n_process=1, disable=["ner", "tok2vec"]):
fortokenindoc:
print(str(token), token.lemma_)
print("")
Output:
Hello hello
! !
My my
name name
is is
Marcin marcin
. .
I i
have have
a a
SFTP sftp
server server
running running
in in
my my
HomeLab homelab
Case2
importspacynlp=spacy.load("en_core_web_sm")
fordocinnlp.pipe(["Hello! My name is Marcin.", "I have a SFTP server running in my HomeLab"], batch_size=100, n_process=1, disable=["ner"]):
fortokenindoc:
print(str(token), token.lemma_)
print("")
Output:
Hello hello
! !
My my
name name
is be
Marcin Marcin
. .
I I
have have
a a
SFTP sftp
server server
running run
in in
my my
HomeLab HomeLab
Introduction
Hi! I am using spaCy lemmatizer for some tasks. I saw that when using a pipe to process the data faster, I'm getting different results with tok2vec disabled or enabled. Maintaining case-sensitivity is critical for me. Is the below behavior expected?
How to reproduce the behaviour
Case1
Output:
Case2
Output:
Info about spaCy
The text was updated successfully, but these errors were encountered: