Merge pull request #255 from jeffxtang/pocket_fft

Version 2 of the Streaming ASR app
pytorch · Jul 7, 2022 · 8e2700a · 8e2700a
2 parents 86aff27 + d9ad95c
commit 8e2700a
Show file tree

Hide file tree

Showing 11 changed files with 4,228 additions and 645 deletions.
diff --git a/StreamingASR/CMakeLists.txt b/StreamingASR/CMakeLists.txt
diff --git a/StreamingASR/README.md b/StreamingASR/README.md
@@ -6,9 +6,9 @@ In the Speech Recognition Android [demo app](https://github.com/pytorch/android-
 
 ## Prerequisites
 
-* PyTorch 1.11 and torchaudio 0.11 or above (Optional)
+* PyTorch 1.12 and torchaudio 0.12 or above (Optional)
 * Python 3.8 (Optional)
-* Android Pytorch library org.pytorch:pytorch_android_lite:1.11.0
+* Android Pytorch library org.pytorch:pytorch_android_lite:1.12.2
 * Android Studio 4.0.1 or later
 
 ## Quick Start
@@ -22,39 +22,32 @@ git clone https://github.com/pytorch/android-demo-app
 cd android-demo-app/StreamingASR
 ```
 
-If you don't have PyTorch 1.11 and torchaudio 0.11 installed or want to have a quick try of the demo app, you can download the optimized scripted model file [streaming_asr.ptl](https://drive.google.com/file/d/1awT_1S6H5IXSOOqpFLmpeg0B-kQVWG2y/view?usp=sharing), then drag and drop it to the `StreamingASR/app/src/main/assets` folder inside `android-demo-app/StreamingASR`, and continue to Step 3.
-
-Also you need to download [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, for Android NDK build required to run the app (see last section of this README for more info):
-```
-mkdir external; cd external
-git clone https://github.com/jeffxtang/eigen
-```
+If you don't have PyTorch 1.12 and torchaudio 0.12 installed or want to have a quick try of the demo app, you can download the optimized scripted model file [streaming_asrv2.ptl](https://drive.google.com/file/d/1XRCAFpMqOSz5e7VP0mhiACMGCCcYfpk-/view?usp=sharing), then drag and drop it to the `StreamingASR/app/src/main/assets` folder inside `android-demo-app/StreamingASR`, and continue to Step 3.
 
 ### 2. Test and Prepare the Model
 
-To install PyTorch 1.11, torchaudio 0.11, and other required Python packages (numpy and pyaudio), do something like this:
+To install PyTorch 1.12, torchaudio 0.12, and other required packages (numpy, pyaudio, and fairseq), do something like this:
 
 ```
-conda create -n pt1.11 python=3.8.5
-conda activate pt1.11
-pip install torch torchaudio numpy pyaudio
+conda create -n pt1.12 python=3.8.5
+conda activate pt1.12
+pip install torch torchaudio numpy pyaudio fairseq
 ```
 
-Now download the streaming ASR model file
-[scripted_wrapper_tuple_no_transform.pt](https://drive.google.com/file/d/1_49DwHS_a3p3THGdHZj3TXmjNJj60AhP/view?usp=sharing) to the `android-demo-app/StreamingASR` directory.
+First, create the model file `scripted_wrapper_tuple.pt` by running `python generate_ts.py`.
 
-To test the model, run `python run_sasr.py`. After you see:
+Then, to test the model, run `python run_sasr.py`. After you see:
 ```
 Initializing model...
 Initialization complete.
 ```
-say something like "good afternoon happy new year", and you'll likely see the streaming recognition results `▁good ▁afternoon ▁happy ▁new ▁year` while you speak. Hit Ctrl-C to end.
+say something like "good afternoon happy new year", and you'll likely see the streaming recognition results `good afternoon happy new year` while you speak. Hit Ctrl-C to end.
 
-To optimize and convert the model to the format that can run on Android, run the following commands:
+Finally, to optimize and convert the model to the format that can run on Android, run the following commands:
 ```
 mkdir -p StreamingASR/app/src/main/assets
 python save_model_for_mobile.py
-mv streaming_asr.ptl StreamingASR/app/src/main/assets
+mv streaming_asrv2.ptl StreamingASR/app/src/main/assets
 ```
 
 ### 3. Build and run with Android Studio
@@ -67,10 +60,6 @@ Start Android Studio, open the project located in `android-demo-app/StreamingASR
 
 ## Librosa C++, Eigen, and JNI
 
-Note that this demo uses a [C++ port](https://github.com/ewan-xu/LibrosaCpp/) of [Librosa](https://librosa.org), a popular audio processing library in Python, to perform the MelSpectrogram transform. In the Python script `run_sasr.py` above, the torchaudio's [MelSpectrogram](https://pytorch.org/audio/stable/transforms.html#melspectrogram) is used, but you can achieve the same transform result by replacing `spectrogram = transform(tensor).transpose(1, 0)`, line 46 of run_sasr.py with:
-```
-mel = librosa.feature.melspectrogram(np_array, sr=16000, n_fft=400, n_mels=80, hop_length=160)
-spectrogram = torch.tensor(mel).transpose(1, 0)
-```
+The first version of this demo uses a [C++ port](https://github.com/ewan-xu/LibrosaCpp/) of [Librosa](https://librosa.org), a popular audio processing library in Python, to perform the MelSpectrogram transform, because torchaudio before version 0.11 doesn't support fft on Android (see [here](https://github.com/pytorch/audio/issues/408)). Using the Librosa C++ port and [JNI](https://developer.android.com/training/articles/perf-jni) (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, so both the port and the Eigen library are included in the first version of the demo app and built as JNI.
 
-Because torchaudio currently doesn't support fft on Android (see [here](https://github.com/pytorch/audio/issues/408)), using the Librosa C++ port and [JNI](https://developer.android.com/training/articles/perf-jni) (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, so both the port and the Eigen library are included in the demo app and built as JNI, using the `CMakeLists.txt` and `MainActivityJNI.cpp` in `StreamingASR/app/src/main/cpp`.
+See [here](https://github.com/jeffxtang/android-demo-app/tree/librosa_jni/StreamingASR) for the first version of the demo if interested in an example of using native C++ to expand operations not yet supported in PyTorch or one of its domain libraries.
diff --git a/StreamingASR/StreamingASR/app/build.gradle b/StreamingASR/StreamingASR/app/build.gradle
@@ -13,13 +13,6 @@ android {
         versionName "1.0"
 
         testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
-
-        externalNativeBuild {
-            cmake {
-                cppFlags ""
-                arguments "-DLOGGER_BUILD_HEADER_LIB=ON", "-DBUILD_TESTING=OFF"
-            }
-        }
     }
 
     buildTypes {
@@ -32,13 +25,6 @@ android {
         sourceCompatibility JavaVersion.VERSION_1_8
         targetCompatibility JavaVersion.VERSION_1_8
     }
-
-    externalNativeBuild {
-        cmake {
-            path "../../CMakeLists.txt"
-            version "3.10.2"
-        }
-    }
 }
 
 dependencies {
@@ -50,5 +36,5 @@ dependencies {
     androidTestImplementation 'androidx.test.ext:junit:1.1.3'
     androidTestImplementation 'androidx.test.espresso:espresso-core:3.4.0'
 
-    implementation 'org.pytorch:pytorch_android_lite:1.11'
+    implementation 'org.pytorch:pytorch_android_lite:1.12.2'
 }
diff --git a/StreamingASR/StreamingASR/app/src/main/cpp/CMakeLists.txt b/StreamingASR/StreamingASR/app/src/main/cpp/CMakeLists.txt
diff --git a/StreamingASR/StreamingASR/app/src/main/cpp/MainActivityJNI.cpp b/StreamingASR/StreamingASR/app/src/main/cpp/MainActivityJNI.cpp