Bug: Tutorials importing test audio files do not work on Google Colab #147

900miles · 2024-08-15T00:10:55Z

Description

Any tutorial that imports test audio files (e.g. Audio.from_filepath("../src/tests/data_for_testing/audio_48khz_mono_16bits.wav")) do not work on Google Colab, as there is no audio file to load in that environment. This affects most if not all of the tutorials that we currently have.

Steps to Reproduce

Open a notebook tutorial, for example speech_to_text.ipynb. Add !pip install senselab to the top of the file, and then run.

Expected Results

The tutorial runs as expected.

Actual Results

When running the following code block:

audio1 = Audio.from_filepath("../src/tests/data_for_testing/audio_48khz_mono_16bits.wav")
audio2 = Audio.from_filepath("../src/tests/data_for_testing/audio_48khz_stereo_16bits.wav")

I get the following error:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-3-b610ecb7cd9e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 audio1 = Audio.from_filepath("../src/tests/data_for_testing/audio_48khz_mono_16bits.wav")
      2 audio2 = Audio.from_filepath("../src/tests/data_for_testing/audio_48khz_stereo_16bits.wav")

4 frames

[/usr/local/lib/python3.10/dist-packages/torio/io/_streaming_media_decoder.py](https://localhost:8080/#) in __init__(self, src, format, option, buffer_size)
    524             self._be = ffmpeg_ext.StreamingMediaDecoderFileObj(src, format, option, buffer_size)
    525         else:
--> 526             self._be = ffmpeg_ext.StreamingMediaDecoder(os.path.normpath(src), format, option)
    527 
    528         i = self._be.find_best_audio_stream()

RuntimeError: Failed to open the input "../src/tests/data_for_testing/audio_48khz_mono_16bits.wav" (No such file or directory).

Additional Notes

No response

The text was updated successfully, but these errors were encountered:

wilke0818 · 2024-08-15T17:06:10Z

So the way that the tutorials should work I think might be different than the way that you're using them. I would recommend testing this again once we have that tutorial merged into mainline as I think that would allow us to see if the "Open in Colab" button works more as expected. Another fix for what you're showing is to not reference these by relative file paths and rather use wget or something to locally download them onto colab from the github.

fabiocat93 · 2024-08-15T20:20:29Z

@wilke0818 nice catch! are you suggesting including a piece of code for downloading some audio as part of the tutorials? Something like this (https://github.com/sensein/fab/blob/main/tutorials/voice_anonymization/voice_anonymization.ipynb):

# This variable holds the web address from which we'll download the EmoDB dataset.
# It's like a treasure map guiding us to the wonderful voice recordings!
dataset_url = "http://emodb.bilderbar.info/download/download.zip"

# The data_folder variable points to the location where we'll store all the data and audio recordings.
# Think of it as our backstage area, well-organized and ready to showcase the talents of our voices!
data_folder = "./data/"

# The dataset_name variable will be the name we give to the EmoDB dataset once we download it.
# Just a friendly label to recognize it easily when we work with it later on.
dataset_name = "emodb_dataset"

%%bash

# This bash script checks if the EmoDB dataset has already been downloaded.
# If the dataset folder exists, it means the dataset is already downloaded.
# Otherwise, it proceeds with the download process.

if [ -d "$dataset_path" ]; then
  # The dataset folder exists, so the dataset is already downloaded.
  echo "$dataset_name already downloaded in $dataset_path."
else
  # The dataset folder does not exist, indicating the dataset needs to be downloaded.
  echo "Downloading..."

  # Create the dataset folder and its parent directories, if they don't exist.
  mkdir -p "$dataset_path"

  # Use the 'wget' command to fetch the EmoDB dataset from the provided URL ($dataset_url).
  # Save the downloaded file as "$dataset_name.zip" in the "$dataset_path" folder.
  wget -O "$dataset_path"/"$dataset_name".zip "$dataset_url"

  # Unzip the downloaded dataset file ($dataset_name.zip) into the "$dataset_path" folder.
  # The '-d' option specifies the destination directory for the extracted files.
  unzip "$dataset_path"/"$dataset_name".zip -d "$dataset_path"

  # Remove the downloaded zip file, as we don't need it anymore.
  rm "$dataset_path"/"$dataset_name".zip
fi

wilke0818 · 2024-08-15T20:25:32Z

Yeah I mean technically it could be anything. I was thinking (not certain this would work) in colab

!wget https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
test_audio_path = './audio_48khz_mono_16bits.wav'

wilke0818 · 2024-08-15T20:26:44Z

yours works if you want an entire dataset, though at that point it might be better to do something like use HuggingFace and convert it to a SenselabDataset which is the approach I was using in the ser tutorial.

wilke0818 · 2024-08-15T20:28:37Z

and also my note to Miles above was that it is possible that the code you have will work, once it is pulled into mainline. I have found that when working with the notebooks, there is a weird sort of Github/Colab interaction where Colab tries to use notebooks from the main branch of Github.

fabiocat93 · 2024-08-15T20:35:14Z

and also my note to Miles above was that it is possible that the code you have will work, once it is pulled into mainline. I have found that when working with the notebooks, there is a weird sort of Github/Colab interaction where Colab tries to use notebooks from the main branch of Github.

oh wow!

fabiocat93 · 2024-08-15T20:35:48Z

Yeah I mean technically it could be anything. I was thinking (not certain this would work) in colab
!wget https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
test_audio_path = './audio_48khz_mono_16bits.wav'

I agree that for most cases having one or two files is more than enough

fabiocat93 · 2024-08-15T20:38:52Z

So the way that the tutorials should work I think might be different than the way that you're using them. I would recommend testing this again once we have that tutorial merged into mainline as I think that would allow us to see if the "Open in Colab" button works more as expected. Another fix for what you're showing is to not reference these by relative file paths and rather use wget or something to locally download them onto colab from the github.

what branch/PR are you referring to?

900miles · 2024-08-15T20:40:42Z

and also my note to Miles above was that it is possible that the code you have will work, once it is pulled into mainline. I have found that when working with the notebooks, there is a weird sort of Github/Colab interaction where Colab tries to use notebooks from the main branch of Github.

I'm not exactly sure what you mean by this, as speech_to_text.ipynb is in the mainline branch, no? Note that this doesn't just affect that tutorial, but any tutorial with relative imports. Including getting_started.ipynb which is also in mainline.

wilke0818 · 2024-08-15T20:47:55Z

I just re-tested it and I see that I was mistaken (I thought speech_to_text.ipynb was still in a PR). And yeah, you either need to have something like !pip install senselab (or a variation that specifies the Github and a branch). We also then need to do as mentioned earlier and download the files from source, suggest the user uploads their own file, or use HuggingFace. Also @900miles the !pip install senselab is just commented out in this file though it is missing in others.

fabiocat93 · 2024-08-15T20:52:28Z

how about differentiating the flow on colab from the local flow?

def is_colab():
    try:
        import google.colab
        return True
    except ImportError:
        return False

# Example usage
if is_colab():
    # download the files of interest from the github link
    # change to test_audio_path accordingly
else:
    # set up the test_audio_path

900miles · 2024-08-15T20:52:45Z

Gotcha. I wasn't sure if the missing !pip install senselab on the tutorials was intentional or not (I can see an argument that if you're going through the module-level tutorials, you've already installed senselab). I think the wget is a good idea, or huggingface. We could also do like scipy does and have some sort of test_audios module

wilke0818 · 2024-08-15T21:03:18Z

@fabiocat93 not sure how much sense it makes to differentiate the two cases. the "local flow" I guess only really effects those that are running this after setting everything up for development which probably will not be most people. I feel like the tutorial shouldn't assume anything regarding. Also in both cases it seems like we need to have pip install senselab or something equivalent as even if you clone the repo, the importing isn't setup for running through with a notebook

fabiocat93 · 2024-11-18T16:51:41Z

@900miles, can you handle this issue in all the tutorials? You may create a utility for downloading an existing dataset to be processed. I have tentatively assigned this to you

900miles added the bug Something isn't working label Aug 15, 2024

fabiocat93 added this to senselab Nov 14, 2024

fabiocat93 moved this to Todo in senselab Nov 18, 2024

fabiocat93 added the help wanted Extra attention is needed label Nov 18, 2024

fabiocat93 assigned 900miles Nov 18, 2024

900miles linked a pull request Nov 24, 2024 that will close this issue

Updating tutorial files #210

Merged

6 tasks

900miles mentioned this issue Nov 24, 2024

Updating tutorial files #210

Merged

6 tasks

900miles moved this from Todo to In review in senselab Dec 3, 2024

fabiocat93 closed this as completed in #210 Dec 23, 2024

github-project-automation bot moved this from In review to Done in senselab Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Tutorials importing test audio files do not work on Google Colab #147

Bug: Tutorials importing test audio files do not work on Google Colab #147

900miles commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024 •

edited

Loading

wilke0818 commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

900miles commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

900miles commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Nov 18, 2024 •

edited

Loading

Bug: Tutorials importing test audio files do not work on Google Colab #147

Bug: Tutorials importing test audio files do not work on Google Colab #147

Comments

900miles commented Aug 15, 2024

Description

Steps to Reproduce

Expected Results

Actual Results

Additional Notes

wilke0818 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024 • edited Loading

wilke0818 commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

900miles commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Aug 15, 2024

900miles commented Aug 15, 2024

wilke0818 commented Aug 15, 2024

fabiocat93 commented Nov 18, 2024 • edited Loading

fabiocat93 commented Aug 15, 2024 •

edited

Loading

fabiocat93 commented Nov 18, 2024 •

edited

Loading