Streaming ASR Android Demo App (#230)

* initial commit * Revert "initial commit" This reverts commit 5a65775. * main readme and helloworld/demo app readme updates * streaming asr code complete * scripts to run and prepare model for mobile * Anroid code cleanup; README and Python scripts update * README update * README, script and screenshots update * removed eigen from repo * README and code update * app name update * PR feedback
pytorch · Jan 13, 2022 · 486fc7a · 486fc7a
1 parent 76ba0e0
commit 486fc7a
Show file tree

Hide file tree

Showing 42 changed files with 1,827 additions and 0 deletions.
diff --git a/StreamingASR/CMakeLists.txt b/StreamingASR/CMakeLists.txt
@@ -0,0 +1,6 @@
+cmake_minimum_required(VERSION 3.4.1)
+
+project(StreamingASR)
+
+add_subdirectory(external/eigen)
+add_subdirectory(StreamingASR/app/src/main/cpp)
diff --git a/StreamingASR/README.md b/StreamingASR/README.md
@@ -0,0 +1,76 @@
+# Streaming Speech Recognition on Android with Emformer-RNNT-based Model
+
+## Introduction
+
+In the Speech Recognition Android [demo app](https://github.com/pytorch/android-demo-app/tree/master/SpeechRecognition), we showed how to use the [wav2vec 2.0](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) model on an Android demo app to perform non-continuous speech recognition. Here we're going one step further, using a torchaudio [Emformer-RNNT-based ASR](https://pytorch.org/audio/main/prototype.pipelines.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH) model in Android to perform streaming speech recognition.
+
+## Prerequisites
+
+* PyTorch 1.10.1 and torchaudio 0.10.1 or above (Optional)
+* Python 3.8 (Optional)
+* Android Pytorch library org.pytorch:pytorch_android_lite:1.10.0
+* Android Studio 4.0.1 or later
+
+## Quick Start
+
+### 1. Get the Repo
+
+Simply run the commands below:
+
+```
+git clone https://github.com/pytorch/android-demo-app
+cd android-demo-app/StreamingASR
+```
+
+If you don't have PyTorch 1.10.1 and torchaudio 0.10.1 installed or want to have a quick try of the demo app, you can download the optimized scripted model file [streaming_asr.ptl](https://drive.google.com/file/d/1awT_1S6H5IXSOOqpFLmpeg0B-kQVWG2y/view?usp=sharing), then drag and drop it to the `StreamingASR/app/src/main/assets` folder inside `android-demo-app/StreamingASR`, and continue to Step 3.
+
+Also you need to download [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, for Android NDK build required to run the app (see last section of this README for more info):
+```
+mkdir external; cd external
+git clone https://github.com/jeffxtang/eigen
+```
+
+### 2. Test and Prepare the Model
+
+To install PyTorch 1.10.1, torchaudio 0.10.1, and other required Python packages (numpy and pyaudio), do something like this:
+
+```
+conda create -n pt1.10.1 python=3.8.5
+conda activate pt1.10.1
+pip install torch torchaudio numpy pyaudio
+```
+
+Now download the streaming ASR model file
+[scripted_wrapper_tuple_no_transform.pt](https://drive.google.com/file/d/1_49DwHS_a3p3THGdHZj3TXmjNJj60AhP/view?usp=sharing) (the script used to create the model will be published soon) to the `android-demo-app/StreamingASR` directory.
+
+To test the model, run `python run_sasr.py`. After you see:
+```
+Initializing model...
+Initialization complete.
+```
+say something like "good afternoon happy new year", and you'll likely see the streaming recognition results `▁good ▁afternoon ▁happy ▁new ▁year` while you speak. Hit Ctrl-C to end.
+
+To optimize and convert the model to the format that can run on Android, run the following commands:
+```
+mkdir -p StreamingASR/app/src/main/assets
+python save_model_for_mobile.py
+mv streaming_asr.ptl StreamingASR/app/src/main/assets
+```
+
+### 3. Build and run with Android Studio
+
+Start Android Studio, open the project located in `android-demo-app/StreamingASR/StreamingASR`, build and run the app on an Android device. After the app runs, tap the Start button and start saying something. Some example recognition results are:
+
+![](screenshot1.png)
+![](screenshot2.png)
+![](screenshot3.png)
+
+## Librosa C++, Eigen, and JNI
+
+Note that this demo uses a [C++ port](https://github.com/ewan-xu/LibrosaCpp/) of [Librosa](https://librosa.org), a popular audio processing library in Python, to perform the MelSpectrogram transform. In the Python script `run_sasr.py` above, the torchaudio's [MelSpectrogram](https://pytorch.org/audio/stable/transforms.html#melspectrogram) is used, but you can achieve the same transform result by replacing `spectrogram = transform(tensor).transpose(1, 0)`, line 46 of run_sasr.py with:
+```
+mel = librosa.feature.melspectrogram(np_array, sr=16000, n_fft=400, n_mels=80, hop_length=160)
+spectrogram = torch.tensor(mel).transpose(1, 0)
+```
+
+Because torchaudio currently doesn't support fft on Android (see [here](https://github.com/pytorch/audio/issues/408)), using the Librosa C++ port and [JNI](https://developer.android.com/training/articles/perf-jni) (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, so both the port and the Eigen library are included in the demo app and built as JNI, using the `CMakeLists.txt` and `MainActivityJNI.cpp` in `StreamingASR/app/src/main/cpp`.
diff --git a/StreamingASR/StreamingASR/.gitignore b/StreamingASR/StreamingASR/.gitignore
@@ -0,0 +1,15 @@
+*.iml
+.gradle
+/local.properties
+/.idea/caches
+/.idea/libraries
+/.idea/modules.xml
+/.idea/workspace.xml
+/.idea/navEditor.xml
+/.idea/assetWizardSettings.xml
+.DS_Store
+/build
+/captures
+.externalNativeBuild
+.cxx
+local.properties
diff --git a/StreamingASR/StreamingASR/app/.gitignore b/StreamingASR/StreamingASR/app/.gitignore
@@ -0,0 +1 @@
+/build
diff --git a/StreamingASR/StreamingASR/app/build.gradle b/StreamingASR/StreamingASR/app/build.gradle
@@ -0,0 +1,54 @@
+plugins {
+    id 'com.android.application'
+}
+
+android {
+    compileSdkVersion 31
+
+    defaultConfig {
+        applicationId "org.pytorch.demo.streamingasr"
+        minSdkVersion 28
+        targetSdkVersion 31
+        versionCode 1
+        versionName "1.0"
+
+        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
+
+        externalNativeBuild {
+            cmake {
+                cppFlags ""
+                arguments "-DLOGGER_BUILD_HEADER_LIB=ON", "-DBUILD_TESTING=OFF"
+            }
+        }
+    }
+
+    buildTypes {
+        release {
+            minifyEnabled false
+            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt')
+        }
+    }
+    compileOptions {
+        sourceCompatibility JavaVersion.VERSION_1_8
+        targetCompatibility JavaVersion.VERSION_1_8
+    }
+
+    externalNativeBuild {
+        cmake {
+            path "../../CMakeLists.txt"
+            version "3.10.2"
+        }
+    }
+}
+
+dependencies {
+
+    implementation 'androidx.appcompat:appcompat:1.4.0'
+    implementation 'com.google.android.material:material:1.4.0'
+    implementation 'androidx.constraintlayout:constraintlayout:2.1.2'
+    testImplementation 'junit:junit:4.+'
+    androidTestImplementation 'androidx.test.ext:junit:1.1.3'
+    androidTestImplementation 'androidx.test.espresso:espresso-core:3.4.0'
+
+    implementation 'org.pytorch:pytorch_android_lite:1.10.0'
+}
diff --git a/StreamingASR/StreamingASR/app/src/main/AndroidManifest.xml b/StreamingASR/StreamingASR/app/src/main/AndroidManifest.xml
@@ -0,0 +1,26 @@
+<?xml version="1.0" encoding="utf-8"?>
+<manifest xmlns:android="http://schemas.android.com/apk/res/android"
+    package="org.pytorch.demo.streamingasr">
+
+    <uses-permission android:name="android.permission.RECORD_AUDIO" />
+    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
+    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
+
+    <application
+        android:allowBackup="true"
+        android:icon="@mipmap/ic_launcher"
+        android:label="@string/app_name"
+        android:roundIcon="@mipmap/ic_launcher_round"
+        android:supportsRtl="true"
+        android:theme="@style/Theme.StreamingASR">
+        <activity android:name=".MainActivity"
+            android:exported="true">
+            <intent-filter>
+                <action android:name="android.intent.action.MAIN" />
+
+                <category android:name="android.intent.category.LAUNCHER" />
+            </intent-filter>
+        </activity>
+    </application>
+
+</manifest>
diff --git a/StreamingASR/StreamingASR/app/src/main/cpp/CMakeLists.txt b/StreamingASR/StreamingASR/app/src/main/cpp/CMakeLists.txt
@@ -0,0 +1,5 @@
+cmake_minimum_required(VERSION 3.4.1)
+project(CTest LANGUAGES C CXX)
+
+add_library( MainActivityJNI SHARED MainActivityJNI.cpp )
+target_link_libraries( MainActivityJNI Eigen3::Eigen)
diff --git a/StreamingASR/StreamingASR/app/src/main/cpp/MainActivityJNI.cpp b/StreamingASR/StreamingASR/app/src/main/cpp/MainActivityJNI.cpp
@@ -0,0 +1,92 @@
+#include <jni.h>
+#include <string>
+#include <Eigen/Dense>
+#include "librosa/librosa.h"
+#include <iostream>
+#include <vector>
+#include <chrono>
+#include <numeric>
+#include <algorithm>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+
+using namespace std;
+
+using Eigen::MatrixXd;
+
+extern "C" JNIEXPORT jobject JNICALL
+Java_org_pytorch_demo_streamingasr_MainActivity_melSpectrogram(JNIEnv* env, jobject obj,
+                                                                    jdoubleArray data) {
+    int len = env -> GetArrayLength(data);
+    std::vector<float> x;
+    jdouble *element = env->GetDoubleArrayElements(data, 0);
+    for(int i=0; i<len; ++i) {
+        x.push_back((float)element[i]);
+    }
+
+    int n_fft = 400;
+    int n_hop = 160;
+    int n_mel = 80;
+    int fmin = 0;
+    int fmax = 8000;
+    int sr = 16000;
+
+    std::vector<std::vector<std::complex<float>>> X = librosa::Feature::stft(x, n_fft, n_hop, "hann", true, "reflect");
+
+    std::vector<std::vector<float>> mels = librosa::Feature::melspectrogram(x, sr, n_fft, n_hop, "hann", true, "reflect", 2.f, n_mel, fmin, fmax);
+
+    jclass vectorClass = env->FindClass("java/util/Vector");
+    if(vectorClass == NULL) {
+        return NULL;
+    }
+
+    jclass floatClass = env->FindClass("java/lang/Float");
+    if(floatClass == NULL) {
+        return NULL;
+    }
+
+    jmethodID vectorConstructorID = env->GetMethodID(
+            vectorClass, "<init>", "()V");
+    if(vectorConstructorID == NULL) {
+        return NULL;
+    }
+
+    jmethodID addMethodID = env->GetMethodID(
+            vectorClass, "add", "(Ljava/lang/Object;)Z" );
+    if(addMethodID == NULL) {
+        return NULL;
+    }
+
+    jmethodID floatConstructorID = env->GetMethodID(floatClass, "<init>", "(F)V");
+    if(floatConstructorID == NULL) {
+        return NULL;
+    }
+
+    jobject outerVector = env->NewObject(vectorClass, vectorConstructorID);
+    if(outerVector == NULL) {
+        return NULL;
+    }
+
+    for(vector<float> i : mels) {
+        jobject innerVector = env->NewObject(vectorClass, vectorConstructorID);
+
+        for(float f : i) {
+            jobject floatValue = env->NewObject(floatClass, floatConstructorID, f);
+            if(floatValue == NULL) {
+                return NULL;
+            }
+
+            env->CallBooleanMethod(innerVector, addMethodID, floatValue);
+        }
+
+        env->CallBooleanMethod(outerVector, addMethodID, innerVector);
+    }
+
+    env->DeleteLocalRef(vectorClass);
+    env->DeleteLocalRef(floatClass);
+
+    return outerVector;
+}