The package includes the following:
Input file format detection
Audio extraction from video
Music Separation from Audio
Recognition of audio content (music, singer's voice, speech(vocal), speech(vocal) with background music)
Speech To Text(STT)
Spelling Correcting
Query Finding
Caption Modification(remove id, tags, emojis,..)
keywords extraction