Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create file format tag --fileformat for converting audio downloads #7

Closed
wants to merge 5 commits into from

Conversation

caitriggs
Copy link

@caitriggs caitriggs commented May 17, 2018

Addresses feature request #6

@caitriggs caitriggs changed the title create file format tag -f for converting audio downloads to user defi… create file format tag --fileformat for converting audio downloads May 17, 2018
Copy link
Owner

@carlthome carlthome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

I want to keep this package as lightweight as possible so depending on ffmpeg is a little scary but I also see the benefit of getting mp3:s, Ogg:s or whatever out instead of a mix of formats.

Could you add a brief test of the audio conversion module (e.g https://github.com/caitriggs/audio-scraper/blob/fileformat/tests/test_audioconvert.py) so we can make sure it works?

"""Scrape various websites for audio."""
youtube.scrape(query, include, exclude, quiet, overwrite)
soundcloud.scrape(query, include, exclude, quiet, overwrite)
# create subdirectory for converted audio files if --fileformat tag set
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick but capitalize and have normal sentences with periods for code comments, please. 🐼

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is your show, so no problem there! Will do 👍

'-i', "./{}".format(file), # input file to convert
'-f', '{}'.format(fileformat), # output fileformat type
'-ac', '1', # mono channel
'-ar', '16000', # sampling rate 16000Hz
Copy link
Owner

@carlthome carlthome May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 16 kHz mono be the default or is 44100 Hz / 16-bit stereo more common? What are the intended use cases of the audio conversion?

Copy link
Author

@caitriggs caitriggs May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

44100 Hz stereo is better quality, but the specific application I'm using your scraper for is for ASR model training which are generally built for 16000 Hz mono channel wav files (e.g. Bai Du's DeepSpeech). I should probably re-tool it to allow user defined settings, but admit I just added a quick and dirty hard coded solution for my needs

Appears that soundcloud streams are mp3 only, so we'll
convert original mp3 to user defined file format with ffmpeg.
'''
# Convert to fileformat using ffmpeg
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condense the two comments into one.

itself until Stream.download(), we must convert
the audio after download with ffmpeg.
'''
# Convert to fileformat using ffmpeg
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condense the two comments into one.

@caitriggs
Copy link
Author

I want to keep this package as lightweight as possible so depending on ffmpeg is a little scary but I also see the benefit of getting mp3:s, Ogg:s or whatever out instead of a mix of formats.

Yea, I totally understand keeping the dependency footprint small, but for me it's really convenient to build file conversion into the scraping process. Many of us in the data science community scrape data for a specific purpose (e.g. all input files have to be 16kHz mono wave files).

ffmpeg is a pretty well established standard for audio/video encoding/decoding so thought it was safe library choice. Especially since if you're working with audio files in the first place, you're already likely to have ffmpeg installed.

Could you add a brief test of the audio conversion module (e.g https://github.com/caitriggs/audio-scraper/blob/fileformat/tests/test_audioconvert.py) so we can make sure it works?

Yup, shouldn't be an issue 👍

@carlthome
Copy link
Owner

Closing because stale.

@carlthome carlthome closed this Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants