Skip to content
This repository has been archived by the owner on Feb 9, 2023. It is now read-only.

Tried running the model with a wav file, got ValueError: all the input arrays must have same number of dimensions #36

Open
ethanjyx opened this issue May 3, 2021 · 0 comments

Comments

@ethanjyx
Copy link

ethanjyx commented May 3, 2021

Hello, I tried running it on a wav file I had and got this error, what does it mean?

WARNING:tensorflow:Mixed precision compatibility check (mixed_float16): WARNING
The dtype policy mixed_float16 may run slowly because this machine does not have a GPU. Only Nvidia GPUs with compute capability of at least 7.0 run quickly with mixed_float16.
If you will use compatible GPU(s) not attached to this host, e.g. by running a multi-worker model, you can ignore this warning. This message will only be logged once
WARNING:tensorflow:From /opt/conda/envs/lab42/lib/python3.7/site-packages/tensorflow/python/keras/mixed_precision/loss_scale.py:56: DynamicLossScale.__init__ (from tensorflow.python.training.experimental.loss_scale) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras.mixed_precision.LossScaleOptimizer instead. LossScaleOptimizer now has all the functionality of DynamicLossScale
Model: "DeepSpeech2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
X (InputLayer)               [(None, None, 160)]       0         
_________________________________________________________________
lambda (Lambda)              (None, None, 160, 1)      0         
_________________________________________________________________
conv_1 (Conv2D)              (None, None, 80, 32)      14432     
_________________________________________________________________
conv_1_bn (BatchNormalizatio (None, None, 80, 32)      128       
_________________________________________________________________
conv_1_relu (ReLU)           (None, None, 80, 32)      0         
_________________________________________________________________
conv_2 (Conv2D)              (None, None, 40, 32)      236544    
_________________________________________________________________
conv_2_bn (BatchNormalizatio (None, None, 40, 32)      128       
_________________________________________________________________
conv_2_relu (ReLU)           (None, None, 40, 32)      0         
_________________________________________________________________
reshape (Reshape)            (None, None, 1280)        0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 1600)        9993600   
_________________________________________________________________
dropout (Dropout)            (None, None, 1600)        0         
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 1600)        11529600  
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 1600)        0         
_________________________________________________________________
bidirectional_3 (Bidirection (None, None, 1600)        11529600  
_________________________________________________________________
dropout_2 (Dropout)          (None, None, 1600)        0         
_________________________________________________________________
bidirectional_4 (Bidirection (None, None, 1600)        11529600  
_________________________________________________________________
dropout_3 (Dropout)          (None, None, 1600)        0         
_________________________________________________________________
bidirectional_5 (Bidirection (None, None, 1600)        11529600  
_________________________________________________________________
dense_1 (TimeDistributed)    (None, None, 1600)        2561600   
_________________________________________________________________
dense_1_relu (ReLU)          (None, None, 1600)        0         
_________________________________________________________________
dropout_4 (Dropout)          (None, None, 1600)        0         
_________________________________________________________________
dense_2 (TimeDistributed)    (None, None, 29)          46429     
=================================================================
Total params: 58,971,261
Trainable params: 58,971,133
Non-trainable params: 128
_________________________________________________________________
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-3a5155b23ea3> in <module>
      4 pipeline = asr.load('deepspeech2', lang='en')
      5 pipeline.model.summary()     # TensorFlow model
----> 6 sentences = pipeline.predict([sample])

/opt/conda/envs/lab42/lib/python3.7/site-packages/automatic_speech_recognition/pipeline/ctc_pipeline.py in predict(self, batch_audio, **kwargs)
     92     def predict(self, batch_audio: List[np.ndarray], **kwargs) -> List[str]:
     93         """ Get ready features, and make a prediction. """
---> 94         features = self._features_extractor(batch_audio)
     95         batch_logits = self._model.predict(features, **kwargs)
     96         decoded_labels = self._decoder(batch_logits)

/opt/conda/envs/lab42/lib/python3.7/site-packages/automatic_speech_recognition/features/feature_extractor.py in __call__(self, batch_audio)
      8     def __call__(self, batch_audio: List[np.ndarray]) -> np.ndarray:
      9         """ Extract features from the file list. """
---> 10         features = [self.make_features(audio) for audio in batch_audio]
     11         X = self.align(features)
     12         return X.astype(np.float16)

/opt/conda/envs/lab42/lib/python3.7/site-packages/automatic_speech_recognition/features/feature_extractor.py in <listcomp>(.0)
      8     def __call__(self, batch_audio: List[np.ndarray]) -> np.ndarray:
      9         """ Extract features from the file list. """
---> 10         features = [self.make_features(audio) for audio in batch_audio]
     11         X = self.align(features)
     12         return X.astype(np.float16)

/opt/conda/envs/lab42/lib/python3.7/site-packages/automatic_speech_recognition/features/spectrogram.py in make_features(self, audio)
     28         audio = self.pad(audio) if self.pad_to else audio
     29         frames = python_speech_features.sigproc.framesig(
---> 30             audio, self.frame_len, self.frame_step, self.winfunc
     31         )
     32         features = python_speech_features.sigproc.logpowspec(

/opt/conda/envs/lab42/lib/python3.7/site-packages/python_speech_features/sigproc.py in framesig(sig, frame_len, frame_step, winfunc)
     31 
     32     zeros = numpy.zeros((padlen - slen,))
---> 33     padsignal = numpy.concatenate((sig,zeros))
     34 
     35     indices = numpy.tile(numpy.arange(0,frame_len),(numframes,1)) + numpy.tile(numpy.arange(0,numframes*frame_step,frame_step),(frame_len,1)).T

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant