Skip to content
This repository has been archived by the owner on Aug 28, 2024. It is now read-only.

Commit

Permalink
Streaming ASR Android Demo App (#230)
Browse files Browse the repository at this point in the history
* initial commit

* Revert "initial commit"

This reverts commit 5a65775.

* main readme and helloworld/demo app readme updates

* streaming asr code complete

* scripts to run and prepare model for mobile

* Anroid code cleanup; README and Python scripts update

* README update

* README, script and screenshots update

* removed eigen from repo

* README and code update

* app name update

* PR feedback
  • Loading branch information
jeffxtang committed Jan 13, 2022
1 parent 76ba0e0 commit 486fc7a
Show file tree
Hide file tree
Showing 42 changed files with 1,827 additions and 0 deletions.
6 changes: 6 additions & 0 deletions StreamingASR/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
cmake_minimum_required(VERSION 3.4.1)

project(StreamingASR)

add_subdirectory(external/eigen)
add_subdirectory(StreamingASR/app/src/main/cpp)
76 changes: 76 additions & 0 deletions StreamingASR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Streaming Speech Recognition on Android with Emformer-RNNT-based Model

## Introduction

In the Speech Recognition Android [demo app](https://github.com/pytorch/android-demo-app/tree/master/SpeechRecognition), we showed how to use the [wav2vec 2.0](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) model on an Android demo app to perform non-continuous speech recognition. Here we're going one step further, using a torchaudio [Emformer-RNNT-based ASR](https://pytorch.org/audio/main/prototype.pipelines.html#torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH) model in Android to perform streaming speech recognition.

## Prerequisites

* PyTorch 1.10.1 and torchaudio 0.10.1 or above (Optional)
* Python 3.8 (Optional)
* Android Pytorch library org.pytorch:pytorch_android_lite:1.10.0
* Android Studio 4.0.1 or later

## Quick Start

### 1. Get the Repo

Simply run the commands below:

```
git clone https://github.com/pytorch/android-demo-app
cd android-demo-app/StreamingASR
```

If you don't have PyTorch 1.10.1 and torchaudio 0.10.1 installed or want to have a quick try of the demo app, you can download the optimized scripted model file [streaming_asr.ptl](https://drive.google.com/file/d/1awT_1S6H5IXSOOqpFLmpeg0B-kQVWG2y/view?usp=sharing), then drag and drop it to the `StreamingASR/app/src/main/assets` folder inside `android-demo-app/StreamingASR`, and continue to Step 3.

Also you need to download [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, for Android NDK build required to run the app (see last section of this README for more info):
```
mkdir external; cd external
git clone https://github.com/jeffxtang/eigen
```

### 2. Test and Prepare the Model

To install PyTorch 1.10.1, torchaudio 0.10.1, and other required Python packages (numpy and pyaudio), do something like this:

```
conda create -n pt1.10.1 python=3.8.5
conda activate pt1.10.1
pip install torch torchaudio numpy pyaudio
```

Now download the streaming ASR model file
[scripted_wrapper_tuple_no_transform.pt](https://drive.google.com/file/d/1_49DwHS_a3p3THGdHZj3TXmjNJj60AhP/view?usp=sharing) (the script used to create the model will be published soon) to the `android-demo-app/StreamingASR` directory.

To test the model, run `python run_sasr.py`. After you see:
```
Initializing model...
Initialization complete.
```
say something like "good afternoon happy new year", and you'll likely see the streaming recognition results `▁good ▁afternoon ▁happy ▁new ▁year` while you speak. Hit Ctrl-C to end.

To optimize and convert the model to the format that can run on Android, run the following commands:
```
mkdir -p StreamingASR/app/src/main/assets
python save_model_for_mobile.py
mv streaming_asr.ptl StreamingASR/app/src/main/assets
```

### 3. Build and run with Android Studio

Start Android Studio, open the project located in `android-demo-app/StreamingASR/StreamingASR`, build and run the app on an Android device. After the app runs, tap the Start button and start saying something. Some example recognition results are:

![](screenshot1.png)
![](screenshot2.png)
![](screenshot3.png)

## Librosa C++, Eigen, and JNI

Note that this demo uses a [C++ port](https://github.com/ewan-xu/LibrosaCpp/) of [Librosa](https://librosa.org), a popular audio processing library in Python, to perform the MelSpectrogram transform. In the Python script `run_sasr.py` above, the torchaudio's [MelSpectrogram](https://pytorch.org/audio/stable/transforms.html#melspectrogram) is used, but you can achieve the same transform result by replacing `spectrogram = transform(tensor).transpose(1, 0)`, line 46 of run_sasr.py with:
```
mel = librosa.feature.melspectrogram(np_array, sr=16000, n_fft=400, n_mels=80, hop_length=160)
spectrogram = torch.tensor(mel).transpose(1, 0)
```

Because torchaudio currently doesn't support fft on Android (see [here](https://github.com/pytorch/audio/issues/408)), using the Librosa C++ port and [JNI](https://developer.android.com/training/articles/perf-jni) (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, so both the port and the Eigen library are included in the demo app and built as JNI, using the `CMakeLists.txt` and `MainActivityJNI.cpp` in `StreamingASR/app/src/main/cpp`.
15 changes: 15 additions & 0 deletions StreamingASR/StreamingASR/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
*.iml
.gradle
/local.properties
/.idea/caches
/.idea/libraries
/.idea/modules.xml
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
.DS_Store
/build
/captures
.externalNativeBuild
.cxx
local.properties
1 change: 1 addition & 0 deletions StreamingASR/StreamingASR/app/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/build
54 changes: 54 additions & 0 deletions StreamingASR/StreamingASR/app/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
plugins {
id 'com.android.application'
}

android {
compileSdkVersion 31

defaultConfig {
applicationId "org.pytorch.demo.streamingasr"
minSdkVersion 28
targetSdkVersion 31
versionCode 1
versionName "1.0"

testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"

externalNativeBuild {
cmake {
cppFlags ""
arguments "-DLOGGER_BUILD_HEADER_LIB=ON", "-DBUILD_TESTING=OFF"
}
}
}

buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt')
}
}
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}

externalNativeBuild {
cmake {
path "../../CMakeLists.txt"
version "3.10.2"
}
}
}

dependencies {

implementation 'androidx.appcompat:appcompat:1.4.0'
implementation 'com.google.android.material:material:1.4.0'
implementation 'androidx.constraintlayout:constraintlayout:2.1.2'
testImplementation 'junit:junit:4.+'
androidTestImplementation 'androidx.test.ext:junit:1.1.3'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.4.0'

implementation 'org.pytorch:pytorch_android_lite:1.10.0'
}
26 changes: 26 additions & 0 deletions StreamingASR/StreamingASR/app/src/main/AndroidManifest.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="org.pytorch.demo.streamingasr">

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

<application
android:allowBackup="true"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/Theme.StreamingASR">
<activity android:name=".MainActivity"
android:exported="true">
<intent-filter>
<action android:name="android.intent.action.MAIN" />

<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>

</manifest>
5 changes: 5 additions & 0 deletions StreamingASR/StreamingASR/app/src/main/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
cmake_minimum_required(VERSION 3.4.1)
project(CTest LANGUAGES C CXX)

add_library( MainActivityJNI SHARED MainActivityJNI.cpp )
target_link_libraries( MainActivityJNI Eigen3::Eigen)
92 changes: 92 additions & 0 deletions StreamingASR/StreamingASR/app/src/main/cpp/MainActivityJNI.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#include <jni.h>
#include <string>
#include <Eigen/Dense>
#include "librosa/librosa.h"
#include <iostream>
#include <vector>
#include <chrono>
#include <numeric>
#include <algorithm>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

using namespace std;

using Eigen::MatrixXd;

extern "C" JNIEXPORT jobject JNICALL
Java_org_pytorch_demo_streamingasr_MainActivity_melSpectrogram(JNIEnv* env, jobject obj,
jdoubleArray data) {
int len = env -> GetArrayLength(data);
std::vector<float> x;
jdouble *element = env->GetDoubleArrayElements(data, 0);
for(int i=0; i<len; ++i) {
x.push_back((float)element[i]);
}

int n_fft = 400;
int n_hop = 160;
int n_mel = 80;
int fmin = 0;
int fmax = 8000;
int sr = 16000;

std::vector<std::vector<std::complex<float>>> X = librosa::Feature::stft(x, n_fft, n_hop, "hann", true, "reflect");

std::vector<std::vector<float>> mels = librosa::Feature::melspectrogram(x, sr, n_fft, n_hop, "hann", true, "reflect", 2.f, n_mel, fmin, fmax);

jclass vectorClass = env->FindClass("java/util/Vector");
if(vectorClass == NULL) {
return NULL;
}

jclass floatClass = env->FindClass("java/lang/Float");
if(floatClass == NULL) {
return NULL;
}

jmethodID vectorConstructorID = env->GetMethodID(
vectorClass, "<init>", "()V");
if(vectorConstructorID == NULL) {
return NULL;
}

jmethodID addMethodID = env->GetMethodID(
vectorClass, "add", "(Ljava/lang/Object;)Z" );
if(addMethodID == NULL) {
return NULL;
}

jmethodID floatConstructorID = env->GetMethodID(floatClass, "<init>", "(F)V");
if(floatConstructorID == NULL) {
return NULL;
}

jobject outerVector = env->NewObject(vectorClass, vectorConstructorID);
if(outerVector == NULL) {
return NULL;
}

for(vector<float> i : mels) {
jobject innerVector = env->NewObject(vectorClass, vectorConstructorID);

for(float f : i) {
jobject floatValue = env->NewObject(floatClass, floatConstructorID, f);
if(floatValue == NULL) {
return NULL;
}

env->CallBooleanMethod(innerVector, addMethodID, floatValue);
}

env->CallBooleanMethod(outerVector, addMethodID, innerVector);
}

env->DeleteLocalRef(vectorClass);
env->DeleteLocalRef(floatClass);

return outerVector;
}
Loading

0 comments on commit 486fc7a

Please sign in to comment.