IndexError: index 4 is out of bounds for axis 0 with size 4 #643
-
|
Error This Error is becoming to common. A permanent solution is needed 👍 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
|
yes. experienced it too. i will be on a lookout for a permanent solution |
Beta Was this translation helpful? Give feedback.
-
|
please share a snippet of the code executed where this appeared, so we all have something reproducible to discuss |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
There are two reasons why this issue showed up: Adding more doc inside a vectorstore with new vocab
Potential solution
# Check if the model already has a vocabulary built
if len(self._model.wv) == 0:
self._model.build_vocab(
tagged_data
) # Build vocabulary if not already built
else:
self._model.build_vocab(
tagged_data, update=True
) # Update the vocabulary if it existsUsing OOV during retrievalUsing Out Of Vocabluary (OOV) words during retrieval also causes this errors. Potential solutionCheck if the word is already a known vocab before retrieval. def infer_vector(self, data: str) -> Vector:
words = data.split()
# Check if words are known to the model's vocabulary
known_words = [word for word in words if word in self._model.wv]
if not known_words:
# Return a zero-vector if all words are OOV
vector = [0.0] * self._model.vector_size
else:
# Infer vector from known words
vector = self._model.infer_vector(known_words)
return Vector(value=vector) |
Beta Was this translation helpful? Give feedback.
There are two reasons why this issue showed up:
Adding more doc inside a vectorstore with new vocab
Doc2VecVectorStore, when the vector store was initialized and added documents, it does it with no errors, however if you add more documents after that have new vocabs in the vector store again it causes theIndexErrorPotential solution
genism'sbuild_vocabmethod has the parameterupdate=which is usuallyFalseas the default though if changed toTrueafter build it fixes the problem. Below is an example