-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create file format tag --fileformat for converting audio downloads #7
Conversation
…ned format with ffmpeg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
I want to keep this package as lightweight as possible so depending on ffmpeg is a little scary but I also see the benefit of getting mp3:s, Ogg:s or whatever out instead of a mix of formats.
Could you add a brief test of the audio conversion module (e.g https://github.com/caitriggs/audio-scraper/blob/fileformat/tests/test_audioconvert.py) so we can make sure it works?
audioscrape/__main__.py
Outdated
"""Scrape various websites for audio.""" | ||
youtube.scrape(query, include, exclude, quiet, overwrite) | ||
soundcloud.scrape(query, include, exclude, quiet, overwrite) | ||
# create subdirectory for converted audio files if --fileformat tag set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick but capitalize and have normal sentences with periods for code comments, please. 🐼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is your show, so no problem there! Will do 👍
audioscrape/audioconvert.py
Outdated
'-i', "./{}".format(file), # input file to convert | ||
'-f', '{}'.format(fileformat), # output fileformat type | ||
'-ac', '1', # mono channel | ||
'-ar', '16000', # sampling rate 16000Hz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should 16 kHz mono be the default or is 44100 Hz / 16-bit stereo more common? What are the intended use cases of the audio conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
44100 Hz stereo is better quality, but the specific application I'm using your scraper for is for ASR model training which are generally built for 16000 Hz mono channel wav files (e.g. Bai Du's DeepSpeech). I should probably re-tool it to allow user defined settings, but admit I just added a quick and dirty hard coded solution for my needs
audioscrape/soundcloud.py
Outdated
Appears that soundcloud streams are mp3 only, so we'll | ||
convert original mp3 to user defined file format with ffmpeg. | ||
''' | ||
# Convert to fileformat using ffmpeg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condense the two comments into one.
audioscrape/youtube.py
Outdated
itself until Stream.download(), we must convert | ||
the audio after download with ffmpeg. | ||
''' | ||
# Convert to fileformat using ffmpeg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condense the two comments into one.
Yea, I totally understand keeping the dependency footprint small, but for me it's really convenient to build file conversion into the scraping process. Many of us in the data science community scrape data for a specific purpose (e.g. all input files have to be 16kHz mono wave files). ffmpeg is a pretty well established standard for audio/video encoding/decoding so thought it was safe library choice. Especially since if you're working with audio files in the first place, you're already likely to have ffmpeg installed.
Yup, shouldn't be an issue 👍 |
Closing because stale. |
Addresses feature request #6