Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not find example for Speaker diarization? #1145

Open
muneeb-ahmed-mcs opened this issue Jul 31, 2020 · 1 comment
Open

Could not find example for Speaker diarization? #1145

muneeb-ahmed-mcs opened this issue Jul 31, 2020 · 1 comment
Labels
samples Issues that are directly related to samples. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@muneeb-ahmed-mcs
Copy link

muneeb-ahmed-mcs commented Jul 31, 2020

Hi folks,
i have hard time to get an data for multiple speakers. and there is no example for it. On google official docs there is no example u can see here https://cloud.google.com/speech-to-text/docs/multiple-voices.

    use Google\Cloud\Speech\V1\SpeechClient;
    use Google\Cloud\Speech\V1\RecognitionAudio;
    use Google\Cloud\Speech\V1\RecognitionConfig;
    use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;

    /** Uncomment and populate these variables in your code */
  // $audioFile = 'path to an audio file';

  // change these variables if necessary
   $encoding = AudioEncoding::LINEAR16;
   $sampleRateHertz = 32000;
  $languageCode = 'en-US';

 if (!extension_loaded('grpc')) {
     throw new \Exception('Install the grpc extension (pecl install grpc)');
 }

 // When true, time offsets for every word will be included in the response.
 $enableWordTimeOffsets = true;

// get contents of a file into a string
$content = file_get_contents($audioFile);

 // set string as audio content
 $audio = (new RecognitionAudio())
      ->setContent($content);

   $speakerDiarizationConfig = (new SpeakerDiarizationConfig())  //changes that i made for different speaker
        ->setEnableSpeakerDiarization(true)  //changes that i made for different speaker
        ->setMinSpeakerCount(2)   //changes that i made for different speaker
        ->setMaxSpeakerCount(6); //changes that i made for different speaker

     // set config
    $config = (new RecognitionConfig())
       ->setEncoding($encoding)
         ->setSampleRateHertz($sampleRateHertz)
       ->setLanguageCode($languageCode)
       ->setEnableWordTimeOffsets($enableWordTimeOffsets)
       ->setDiarizationConfig($speakerDiarizationConfig);  //changes that i made for different speaker

      // create the speech client
     $client = new SpeechClient();

      // create the asyncronous recognize operation
     $operation = $client->longRunningRecognize($config, $audio);
    $operation->pollUntilComplete();

     if ($operation->operationSucceeded()) {
        $response = $operation->getResult();

      // each result is for a consecutive portion of the audio. iterate
     // through them to get the transcripts for the entire audio file.
        foreach ($response->getResults() as $result) {
            $alternatives = $result->getAlternatives();
              $mostLikely = $alternatives[0];
              
      foreach ($mostLikely->getWords() as $wordInfo) {
         $startTime = $wordInfo->getStartTime();
         $endTime = $wordInfo->getEndTime();
         printf('  Speaker %u Word: %s (start: %s, end: %s)' . PHP_EOL,
             $wordInfo->getSpeakerTag()            //changes that i made for different speaker
             $wordInfo->getWord(),
             $startTime->serializeToJsonString(),
             $endTime->serializeToJsonString());
        }
     }
  } else {
      print_r($operation->getError());
  }

 $client->close();

Output:
Speaker %u Word: %s (start: %s, end: %s)
Speaker 0 this (start: "0s", end: "0.5s")
Speaker 0 is (start: "0.5s", end: "1.5s")
Speaker 0 an (start: "1.5s", end: "2.5s")
Speaker 0 entire (start: "2s", end: "3.5s")
Speaker 0 audio (start: "3.5s", end: "4.5s")
Speaker 0 sentence (start: "4.5s", end: "5.5s")
Speaker 0 that (start: "5.5s", end: "6.5s")
Speaker 0 google (start: "6.5s", end: "7.5s")
Speaker 0 give (start: "7.5s", end: "8.5s")
Speaker 0 me (start: "8.5s", end: "9.5s")
Speaker 0 in (start: "9.5s", end: "10.5s")
Speaker 0 its (start: "10.5s", end: "11.5s")
Speaker 0 response (start: "11.5s", end: "12.5s")

Speaker 1 this (start: "0s", end: "0.5s")
Speaker 1 is (start: "0.5s", end: "1.5s")
Speaker 1 an (start: "1.5s", end: "2.5s")
Speaker 1 entire (start: "2s", end: "3.5s")
Speaker 1 audio (start: "3.5s", end: "4.5s")
Speaker 1 sentence (start: "4.5s", end: "5.5s")
Speaker 1 that (start: "5.5s", end: "6.5s")
Speaker 1 google (start: "6.5s", end: "7.5s")
Speaker 1 give (start: "7.5s", end: "8.5s")
Speaker 1 me (start: "8.5s", end: "9.5s")
Speaker 1 in (start: "9.5s", end: "10.5s")
Speaker 1 its (start: "10.5s", end: "11.5s")
Speaker 1 response (start: "11.5s", end: "12.5s")

Speaker 3 this (start: "0s", end: "0.5s")
Speaker 3 is (start: "0.5s", end: "1.5s")
Speaker 3 an (start: "1.5s", end: "2.5s")
Speaker 3 entire (start: "2s", end: "3.5s")
Speaker 3 audio (start: "3.5s", end: "4.5s")
Speaker 3 sentence (start: "4.5s", end: "5.5s")
Speaker 3 that (start: "5.5s", end: "6.5s")
Speaker 3 google (start: "6.5s", end: "7.5s")
Speaker 3 give (start: "7.5s", end: "8.5s")
Speaker 3 me (start: "8.5s", end: "9.5s")
Speaker 3 in (start: "9.5s", end: "10.5s")
Speaker 3 its (start: "10.5s", end: "11.5s")
Speaker 3 response (start: "11.5s", end: "12.5s")

For the sake of simplicity i just cut of some response. first problem as u can see speakerTag value is wrong. the audio that i am sending in request having 5 speakers. it gives me 0,1 and then jump into 3. Now i dont know why google is not responding with 0,1,2,3, and 4 speakersTag. second problem google responding with entire audio text with single person and then with the other person as u can see in my output. I cant figure out is that a problem with my code or something else. i hope u got my problem.

@bshaffer bshaffer added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Aug 1, 2020
@bshaffer
Copy link
Contributor

bshaffer commented Aug 1, 2020

Hi there! Yes, we'd love to see your code in PHP for separating different voices! Feel free to post your code snippets here, or to submit a pull request!

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samples Issues that are directly related to samples. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants