Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an Arabic model in sherpa-ncnn? #321

Open
HassanTen opened this issue Mar 7, 2024 · 10 comments
Open

Is there an Arabic model in sherpa-ncnn? #321

HassanTen opened this issue Mar 7, 2024 · 10 comments

Comments

@HassanTen
Copy link

models Arabic
I only found this model icefall-asr-mgb2-conformer_ctc-2022-27-06 (Arabic) ( icefall )
I want this model Arabic ( sherpa-ncnn )

@csukuangfj
Copy link
Collaborator

No, we don't have.

@HassanTen
Copy link
Author

Can I create my own model, and how?

@csukuangfj
Copy link
Collaborator

please refer to icefall

@HassanTen
Copy link
Author

Can I use this code to create my own speech model for recognizing numbers from 0 to 13 (SherpaNcnn own model)?
from
https://github.com/k2-fsa/colab/tree/master/icefall

  1. Setting Up the Environment
    Install Required Libraries
    Begin by installing essential tools like PyTorch and torchaudio. Ensure compatibility between versions.

!pip install torchaudio==2.0.2
!pip install k2==1.24.3.dev20230718+cuda11.8.torch2.0.1 -f https://k2-fsa.github.io/k2/cuda.html

  1. Preparing the Dataset
    Create Custom Audio Data for Numbers
    Record or collect audio files containing spoken numbers from 0 to 13.
    Save the recordings in WAV format, naming them like 0.wav, 1.wav, ..., 13.wav.
    Uploading Data to Google Colab
    Upload the files to a folder in your Google Drive.
    Mount Google Drive in Colab:

from google.colab import drive
drive.mount('/content/drive')

Copy the audio files to a folder in Colab.

  1. Setting Up Icefall
    Clone Icefall

!git clone https://github.com/k2-fsa/icefall
!cd icefall && pip install -r requirements.txt

Set the Environment Variable

import os
os.environ['PYTHONPATH'] = '/content/icefall:$PYTHONPATH'

  1. Preparing Data with Lhotse
    Lhotse is used to prepare audio data and create metadata for training.

Install Lhotse

!pip install git+https://github.com/lhotse-speech/lhotse

Create Metadata
Use the following script to prepare your data:

from lhotse import Recording, SupervisionSegment, CutSet

Create recordings

recordings = []
for i in range(14):
recordings.append(Recording.from_file(f"/path/to/{i}.wav", id=str(i)))

Create supervision segments

supervisions = []
for i, recording in enumerate(recordings):
supervisions.append(SupervisionSegment(id=str(i), recording_id=str(i), start=0, duration=recording.duration, text=str(i)))

Combine them into a CutSet

cuts = CutSet.from_manifests(recordings=recordings, supervisions=supervisions)

Save the data

cuts.to_file("data/cuts.jsonl.gz")

  1. Training the Model Using Icefall
    Modify the Training Recipe
    Navigate to a training recipe such as yesno and modify it to use your custom dataset.

!cd /content/icefall/egs/yesno/ASR && ./prepare.sh

Update Dataset Configuration
Replace the dataset paths with the cuts.jsonl.gz file you created.

!export PYTHONPATH=/content/icefall:$PYTHONPATH &&
cd /content/icefall/egs/yesno/ASR &&
./tdnn/train.py

  1. Validating the Model
    Test the Model
    Use the same dataset or new data to evaluate the model:

!export PYTHONPATH=/content/icefall:$PYTHONPATH &&
cd /content/icefall/egs/yesno/ASR &&
./tdnn/decode.py

  1. Converting the Model for SherpaNCNN
    Export the model to TorchScript format:

torch.jit.save(model, "model.pt")

Are the previous steps correct?
How can I convert the model (model.pt) to NCNN format?

@csukuangfj
Copy link
Collaborator

No.

Only those 3 listed in the icefall.doc can.be converted.to. sherpa-ncnn.

@HassanTen
Copy link
Author

You said to train a custom model on Icefall and then export it to ncnn using:

Export streaming Zipformer transducer models to ncnn

Export ConvEmformer transducer models to ncnn

Export LSTM transducer models to ncnn

Is my previous statement correct? If it is correct, then why did you say no in the previous answer about creating my own model, while the code I copied is from Icefall, which only contains two files for Google Colab, and this link:
https://github.com/k2-fsa/colab/tree/master/icefall

If you have another link, you can provide it.

@csukuangfj
Copy link
Collaborator

Because.you.use tdnn

@csukuangfj
Copy link
Collaborator

Please.reread the doc

tdnn doesn't belong to one of.the 3 supported models.

@HassanTen
Copy link
Author

Please.reread the doc

tdnn doesn't belong to one of.the 3 supported models.

Which of the two files in Icefall do you recommend I use to create my own model containing the numbers from 0 to 13 on Google Colab, so I can use it in my Android project?
https://github.com/k2-fsa/sherpa-ncnn/tree/master/android/SherpaNcnn

yes_no_dataset_recipe_with_CPU.ipynb

ctc_forced_alignment_fst_based_kaldi.ipynb

Also, which option should I choose?

Export streaming Zipformer transducer models to ncnn

Export ConvEmformer transducer models to ncnn

Export LSTM transducer models to ncnn

@csukuangfj
Copy link
Collaborator

csukuangfj commented Dec 9, 2024

Neither of these two files.

As said before, only the 3 models listed in the.doc can be exported to ncnn, which means you MUST choose one of them to.train your.own.model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants