Could not find example for Speaker diarization? #1145

muneeb-ahmed-mcs · 2020-07-31T18:28:32Z

Hi folks,
i have hard time to get an data for multiple speakers. and there is no example for it. On google official docs there is no example u can see here https://cloud.google.com/speech-to-text/docs/multiple-voices.

    use Google\Cloud\Speech\V1\SpeechClient;
    use Google\Cloud\Speech\V1\RecognitionAudio;
    use Google\Cloud\Speech\V1\RecognitionConfig;
    use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;

    /** Uncomment and populate these variables in your code */
  // $audioFile = 'path to an audio file';

  // change these variables if necessary
   $encoding = AudioEncoding::LINEAR16;
   $sampleRateHertz = 32000;
  $languageCode = 'en-US';

 if (!extension_loaded('grpc')) {
     throw new \Exception('Install the grpc extension (pecl install grpc)');
 }

 // When true, time offsets for every word will be included in the response.
 $enableWordTimeOffsets = true;

// get contents of a file into a string
$content = file_get_contents($audioFile);

 // set string as audio content
 $audio = (new RecognitionAudio())
      ->setContent($content);

   $speakerDiarizationConfig = (new SpeakerDiarizationConfig())  //changes that i made for different speaker
        ->setEnableSpeakerDiarization(true)  //changes that i made for different speaker
        ->setMinSpeakerCount(2)   //changes that i made for different speaker
        ->setMaxSpeakerCount(6); //changes that i made for different speaker

     // set config
    $config = (new RecognitionConfig())
       ->setEncoding($encoding)
         ->setSampleRateHertz($sampleRateHertz)
       ->setLanguageCode($languageCode)
       ->setEnableWordTimeOffsets($enableWordTimeOffsets)
       ->setDiarizationConfig($speakerDiarizationConfig);  //changes that i made for different speaker

      // create the speech client
     $client = new SpeechClient();

      // create the asyncronous recognize operation
     $operation = $client->longRunningRecognize($config, $audio);
    $operation->pollUntilComplete();

     if ($operation->operationSucceeded()) {
        $response = $operation->getResult();

      // each result is for a consecutive portion of the audio. iterate
     // through them to get the transcripts for the entire audio file.
        foreach ($response->getResults() as $result) {
            $alternatives = $result->getAlternatives();
              $mostLikely = $alternatives[0];
              
      foreach ($mostLikely->getWords() as $wordInfo) {
         $startTime = $wordInfo->getStartTime();
         $endTime = $wordInfo->getEndTime();
         printf('  Speaker %u Word: %s (start: %s, end: %s)' . PHP_EOL,
             $wordInfo->getSpeakerTag()            //changes that i made for different speaker
             $wordInfo->getWord(),
             $startTime->serializeToJsonString(),
             $endTime->serializeToJsonString());
        }
     }
  } else {
      print_r($operation->getError());
  }

 $client->close();

Output:
Speaker %u Word: %s (start: %s, end: %s)
Speaker 0 this (start: "0s", end: "0.5s")
Speaker 0 is (start: "0.5s", end: "1.5s")
Speaker 0 an (start: "1.5s", end: "2.5s")
Speaker 0 entire (start: "2s", end: "3.5s")
Speaker 0 audio (start: "3.5s", end: "4.5s")
Speaker 0 sentence (start: "4.5s", end: "5.5s")
Speaker 0 that (start: "5.5s", end: "6.5s")
Speaker 0 google (start: "6.5s", end: "7.5s")
Speaker 0 give (start: "7.5s", end: "8.5s")
Speaker 0 me (start: "8.5s", end: "9.5s")
Speaker 0 in (start: "9.5s", end: "10.5s")
Speaker 0 its (start: "10.5s", end: "11.5s")
Speaker 0 response (start: "11.5s", end: "12.5s")

Speaker 1 this (start: "0s", end: "0.5s")
Speaker 1 is (start: "0.5s", end: "1.5s")
Speaker 1 an (start: "1.5s", end: "2.5s")
Speaker 1 entire (start: "2s", end: "3.5s")
Speaker 1 audio (start: "3.5s", end: "4.5s")
Speaker 1 sentence (start: "4.5s", end: "5.5s")
Speaker 1 that (start: "5.5s", end: "6.5s")
Speaker 1 google (start: "6.5s", end: "7.5s")
Speaker 1 give (start: "7.5s", end: "8.5s")
Speaker 1 me (start: "8.5s", end: "9.5s")
Speaker 1 in (start: "9.5s", end: "10.5s")
Speaker 1 its (start: "10.5s", end: "11.5s")
Speaker 1 response (start: "11.5s", end: "12.5s")

Speaker 3 this (start: "0s", end: "0.5s")
Speaker 3 is (start: "0.5s", end: "1.5s")
Speaker 3 an (start: "1.5s", end: "2.5s")
Speaker 3 entire (start: "2s", end: "3.5s")
Speaker 3 audio (start: "3.5s", end: "4.5s")
Speaker 3 sentence (start: "4.5s", end: "5.5s")
Speaker 3 that (start: "5.5s", end: "6.5s")
Speaker 3 google (start: "6.5s", end: "7.5s")
Speaker 3 give (start: "7.5s", end: "8.5s")
Speaker 3 me (start: "8.5s", end: "9.5s")
Speaker 3 in (start: "9.5s", end: "10.5s")
Speaker 3 its (start: "10.5s", end: "11.5s")
Speaker 3 response (start: "11.5s", end: "12.5s")

For the sake of simplicity i just cut of some response. first problem as u can see speakerTag value is wrong. the audio that i am sending in request having 5 speakers. it gives me 0,1 and then jump into 3. Now i dont know why google is not responding with 0,1,2,3, and 4 speakersTag. second problem google responding with entire audio text with single person and then with the other person as u can see in my output. I cant figure out is that a problem with my code or something else. i hope u got my problem.

The text was updated successfully, but these errors were encountered:

bshaffer · 2020-08-01T01:08:36Z

Hi there! Yes, we'd love to see your code in PHP for separating different voices! Feel free to post your code snippets here, or to submit a pull request!

bshaffer added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Aug 1, 2020

product-auto-label bot added the samples Issues that are directly related to samples. label Aug 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not find example for Speaker diarization? #1145

Could not find example for Speaker diarization? #1145

muneeb-ahmed-mcs commented Jul 31, 2020 •

edited

Loading

bshaffer commented Aug 1, 2020

Could not find example for Speaker diarization? #1145

Could not find example for Speaker diarization? #1145

Comments

muneeb-ahmed-mcs commented Jul 31, 2020 • edited Loading

bshaffer commented Aug 1, 2020

muneeb-ahmed-mcs commented Jul 31, 2020 •

edited

Loading