Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Audio Input transcription issue #624

Open
12 tasks
kamjony opened this issue Dec 20, 2024 · 5 comments
Open
12 tasks

User Audio Input transcription issue #624

kamjony opened this issue Dec 20, 2024 · 5 comments
Labels
p:openai_realtime_dart openai_realtime_dart package t:bug Something isn't working

Comments

@kamjony
Copy link

kamjony commented Dec 20, 2024

System Info

Dart

Related Components

  • doc-loaders
  • doc-transformers
  • prompts
  • llms
  • chat-models
  • output-parsers
  • chains
  • memory
  • stores
  • embeddings
  • retrievers
  • agents

Reproduction

await client.connect();
await client.updateSession(
    instructions: gptInstructions,
    modalities: [Modality.audio],
    voice: Voice.shimmer,
    inputAudioFormat: AudioFormat.pcm16,
    outputAudioFormat: AudioFormat.pcm16,
    inputAudioTranscription: InputAudioTranscriptionConfig(
      // enabled: true,
      model: 'whisper-1',
    )
);

Expected behavior

The event "type:"conversation.item.input_audio_transcription.completed", is never fired. But if I Use chatgpt playground, I can see this event has been fired. I need audio transcription from user audio. How to achieve it?

@kamjony kamjony added the t:bug Something isn't working label Dec 20, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in LangChain.dart Dec 20, 2024
@davidmigloz davidmigloz added the p:openai_realtime_dart openai_realtime_dart package label Dec 21, 2024
@davidmigloz
Copy link
Owner

There seems to be an issue with the input_audio_transcription param:

{type: error, event_id: event_AgoCpMJ7LkKOPOCfcNLn9, error: {type: invalid_request_error, code: unknown_parameter, message: Unknown parameter: 'session.input_audio_transcription.enabled'., param: session.input_audio_transcription.enabled, event_id: evt_nAh8N2BDAnkzVHMeH}}

I'll look further into it.

@davidmigloz
Copy link
Owner

davidmigloz commented Dec 21, 2024

I'll remove the enable parameter which is not required anymore. Apart from that, the request looks to be according to the spec.

It seems more people are facing this issue:
https://community.openai.com/t/realtime-api-session-update-doesnt-change-input-audio-format/967077

Most of the issues when not getting a transcription have to do with the input audio that is being passed.
Write it to a file and listen to it, maybe you’ll spot some errors.
Make sure that the samplerate is 24000 Hz as the API requires this.
Make sure that the audio doesn’t sound distorted, cut out, speed up or down or pitched up/down.

Let me know if you manage to solve it.

@kamjony
Copy link
Author

kamjony commented Dec 23, 2024

@davidmigloz I did everything that you suggested above but was unable to get "type:"conversation.item.input_audio_transcription.completed".
So, I just created my own client and consumed the realtime api just to see if it was actually a problem with the package. After several hours of debugging, I found out that, if you send full audio in conversation.create, transcription is not generated by openAI. But, if you append the audio PCM using input_audio_buffer.append and then commit the audio using "type": "input_audio_buffer.commit", then the transcription is generated. I will try with the package client soon and report back.

@davidmigloz
Copy link
Owner

Interesting, thanks for sharing! Let me know the results when you try it with openai_realtime_dart client

@DS-RigER
Copy link

DS-RigER commented Jan 3, 2025

I try to send my audio and commit it after, and got this:

flutter: [openai_realtime_dart.api/2025-01-02T18:10:27.212597]: sent: inputAudioBufferAppend {"event_id":"evt_LzpvaHQWV5mZcrTgv","type":"input_audio_buffer.append","audio":"base64-encoded-audio"} 
flutter: [openai_realtime_dart.api/2025-01-02T18:10:27.241814]: sent: inputAudioBufferCommit {"event_id":"evt_9B3MKMcEWv5Sctokq","type":"input_audio_buffer.commit"} 
{event_id: evt_LzpvaHQWV5mZcrTgv, type: input_audio_buffer.append, audio: UklGRpJBAwBXQVZFSlVOSxwAAAAAAAAAAAA......sAEUARQBhAGEA} // Audio 5 second
{event_id: evt_9B3MKMcEWv5Sctokq, type: input_audio_buffer.commit}
flutter: [openai_realtime_dart.api/2025-01-02T18:10:27.707570]: received: error {"event_id":"event_AlQUdzVvxFvhIQ9QkJ6nd","type":"error","error":{"type":"invalid_request_error","message":"Error committing input audio buffer: buffer too small. Expected at least 100ms of audio, but buffer only has 0.00ms of audio."}} 
{event_id: event_AlQUdzVvxFvhIQ9QkJ6nd, type: error, error: {type: invalid_request_error, message: Error committing input audio buffer: buffer too small. Expected at least 100ms of audio, but buffer only has 0.00ms of audio.}}

So, OpenAI didn't see my audio. And when I try to speak with him on stream - he try to answer, but - hear nothing.
And even if I send him audio with: sendUserMessageContent - he didn't hear me.
My settings on record:

        final config = RecordConfig(
          encoder: AudioEncoder.pcm16bits,
          bitRate: 24000,
          sampleRate: 44100,
          numChannels: 2,
          device: device,
        );

And audio record work's fine, I check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p:openai_realtime_dart openai_realtime_dart package t:bug Something isn't working
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants