Same continuous phonemes are aggregated when computing gop features via compute-gop #4919

a2d8a4v · 2024-06-23T07:21:42Z

[Problem Statement]
In computer-assisted pronunciation training, we use time-aligned information to compute the pronunciation features such as goodness of pronunciation (GOP). We want each phoneme to be separately processed to obtain their features or scores. However, in the original implementation of compute-gop:L163, it used phoneme transition to decide if it is the next phoneme or not to recompute the phoneme duration, which encounters the problem that if some word is composed of some continuous duplicated phonemes, for example:

SUDDENNESS S AH1 D AH0 N N AH0 S

it finally makes an outcome for a single N.

[Solution]
Add the phoneme boundary information to solve such a case.

`queue.pl -config ...` should be revised as `queue.pl --config ...`, line 248

Same continuous phonemes are aggregated in comput-gop [Problem Statement] In computer-assisted pronunciation training, we use time-aligned information to compute the pronunciation features such as goodness of pronunciation (GOP). We want each phoneme to be separately processed to obtain their features or scores. However, in the original implementation of compute-gop:L163, it used phoneme transition to decide if it is the next phoneme or not to recompute the phoneme duration, which encounters the problem that if some word is composed of some continuous duplicated phonemes, for example: SOUNDNESS S AH1 D AH0 N N AH0 S it finally makes an outcome for a single N. [Solution] Add the phoneme boundary information to avoid such a case.

danpovey · 2024-06-23T07:30:42Z

@jimbozhang perhaps you'd like to comment?
If you don't object, I'm inclined to just merge this when it's marked done (i.e. draft label removed)

jimbozhang · 2024-06-23T19:04:54Z

@a2d8a4v Thanks for fixing the issue. I think it is reasonable. Could you please ensure the modified recipe has been thoroughly tested, as I won't have time to do so myself?

csukuangfj · 2024-06-24T01:23:15Z

src/bin/compute-gop.cc

@@ -130,16 +133,17 @@ int main(int argc, char *argv[]) {

    po.Read(argc, argv);

-    if (po.NumArgs() != 4 && po.NumArgs() != 5) {
+    if (po.NumArgs() != 6) {


This line is problematic.

The help message at line 111 says it takes 5 required positional arguments and 1 optional one. So 5 is also a valid value. But this line requires that the number of arguments has to be exactly 6.

I am wondering how you have tested the changes.

[<phone-feature-wspecifier>] the [] means it is optional.

Thank you for reviewing my code. I got it, thank you for your explanation, I fix this condition latter.

I just realized it is a problem with the original code.
I think the [] in [<phone-feature-wspecifier>] can be removed. The code does actually need it as a required argument.

csukuangfj · 2024-06-24T01:28:13Z

src/bin/compute-gop.cc

+      std::vector<int32> phone_boundary;
+      for (int32 i = 0; i < split.size(); i++) {
+        for (int32 j = 0; j < split[i].size(); j++) {
+          phone_boundary.push_back(static_cast<int32>(i));


i is already of type int32 and there is no need to cast it to int32.

OK, I got it. I will remove 'static_cast'

@csukuangfj

With the advisement from code reviewer @csukuangfj, I updated the code, addressing all the identified issues and implementing best practices.

a2d8a4v · 2024-06-24T03:31:49Z

Hi, @jimbozhang

I have already tested the updated code with two corpora: speechocean762 and L2-ARCTIC.

The speechocean762 corpus has the problem of duplicated continuous phonemes. The test for this corpus is to check if this influences the original results from here in terms of pure phone sequences and their lengths.

This phenomenon exists in the L2-ARCTIC corpus. For example, in the utterance with the identity number 'arctic_a0086'.

jimbozhang · 2024-06-28T08:33:44Z

Hi @a2d8a4v,

Could you do me a favor? I'd like to remove the Google Docs link from the top of egs/gop_speechocean762/README.md, but I don't want to create a separate pull request just for this. If it's not too much trouble, could you include this change in your current PR?

--- a/egs/gop_speechocean762/README.md
+++ b/egs/gop_speechocean762/README.md
@@ -1,8 +1,3 @@
-There is a copy of this document on Google Docs, which renders the equations better:
-[link](https://docs.google.com/document/d/1pie-PU6u2NZZC_FzocBGGm6mpfBJMiCft9UoG0uA1kA/edit?usp=sharing)
-
-* * *
-
 # GOP on Kaldi

 The Goodness of Pronunciation (GOP) is a variation of the posterior probability, for phone level pronunciation scoring.

Thanks alot.

@jimbozhang

Remove the deprecated document link from the request of @jimbozhang.

a2d8a4v · 2024-06-29T05:01:38Z

Hi, @jimbozhang, I've dealt with it.

a2d8a4v added 2 commits December 6, 2020 22:17

Missing hyphen in --config option

ae89665

`queue.pl -config ...` should be revised as `queue.pl --config ...`, line 248

Merge branch 'kaldi-asr:master' into master

32e4b9a

a2d8a4v marked this pull request as draft June 23, 2024 07:25

a2d8a4v added 2 commits June 23, 2024 15:28

Update run.sh due to the changes in comput-gop.cc

7c1c5e6

a2d8a4v changed the title ~~Same continuous phonemes are aggregated in comput-gop~~ Same continuous phonemes are aggregated in compute-gop Jun 23, 2024

a2d8a4v changed the title ~~Same continuous phonemes are aggregated in compute-gop~~ Same continuous phonemes are aggregated when computing gop features via compute-gop Jun 23, 2024

a2d8a4v marked this pull request as ready for review June 23, 2024 07:39

csukuangfj reviewed Jun 24, 2024

View reviewed changes

Correct the last commit of compute-gop.cc

768a23e

With the advisement from code reviewer @csukuangfj, I updated the code, addressing all the identified issues and implementing best practices.

Update README.md

0e61da9

Remove the deprecated document link from the request of @jimbozhang.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same continuous phonemes are aggregated when computing gop features via compute-gop #4919

Same continuous phonemes are aggregated when computing gop features via compute-gop #4919

a2d8a4v commented Jun 23, 2024 •

edited

Loading

danpovey commented Jun 23, 2024

jimbozhang commented Jun 23, 2024

csukuangfj Jun 24, 2024 •

edited

Loading

a2d8a4v Jun 24, 2024

csukuangfj Jun 24, 2024

csukuangfj Jun 24, 2024

a2d8a4v Jun 24, 2024

a2d8a4v commented Jun 24, 2024

jimbozhang commented Jun 28, 2024 •

edited

Loading

a2d8a4v commented Jun 29, 2024

Same continuous phonemes are aggregated when computing gop features via compute-gop #4919

Are you sure you want to change the base?

Same continuous phonemes are aggregated when computing gop features via compute-gop #4919

Conversation

a2d8a4v commented Jun 23, 2024 • edited Loading

danpovey commented Jun 23, 2024

jimbozhang commented Jun 23, 2024

csukuangfj Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

a2d8a4v Jun 24, 2024

Choose a reason for hiding this comment

csukuangfj Jun 24, 2024

Choose a reason for hiding this comment

csukuangfj Jun 24, 2024

Choose a reason for hiding this comment

a2d8a4v Jun 24, 2024

Choose a reason for hiding this comment

a2d8a4v commented Jun 24, 2024

jimbozhang commented Jun 28, 2024 • edited Loading

a2d8a4v commented Jun 29, 2024

a2d8a4v commented Jun 23, 2024 •

edited

Loading

csukuangfj Jun 24, 2024 •

edited

Loading

jimbozhang commented Jun 28, 2024 •

edited

Loading