User Audio Input transcription issue #624

kamjony · 2024-12-20T10:58:56Z

davidmigloz · 2024-12-21T07:30:12Z

There seems to be an issue with the input_audio_transcription param:

{type: error, event_id: event_AgoCpMJ7LkKOPOCfcNLn9, error: {type: invalid_request_error, code: unknown_parameter, message: Unknown parameter: 'session.input_audio_transcription.enabled'., param: session.input_audio_transcription.enabled, event_id: evt_nAh8N2BDAnkzVHMeH}}

I'll look further into it.

davidmigloz · 2024-12-21T07:42:09Z

I'll remove the enable parameter which is not required anymore. Apart from that, the request looks to be according to the spec.

It seems more people are facing this issue:
https://community.openai.com/t/realtime-api-session-update-doesnt-change-input-audio-format/967077

Most of the issues when not getting a transcription have to do with the input audio that is being passed.
Write it to a file and listen to it, maybe you’ll spot some errors.
Make sure that the samplerate is 24000 Hz as the API requires this.
Make sure that the audio doesn’t sound distorted, cut out, speed up or down or pitched up/down.

Let me know if you manage to solve it.

kamjony · 2024-12-23T15:14:41Z

@davidmigloz I did everything that you suggested above but was unable to get "type:"conversation.item.input_audio_transcription.completed".
So, I just created my own client and consumed the realtime api just to see if it was actually a problem with the package. After several hours of debugging, I found out that, if you send full audio in conversation.create, transcription is not generated by openAI. But, if you append the audio PCM using input_audio_buffer.append and then commit the audio using "type": "input_audio_buffer.commit", then the transcription is generated. I will try with the package client soon and report back.

davidmigloz · 2024-12-26T20:09:43Z

Interesting, thanks for sharing! Let me know the results when you try it with openai_realtime_dart client

DS-RigER · 2025-01-03T01:17:56Z

I try to send my audio and commit it after, and got this:

flutter: [openai_realtime_dart.api/2025-01-02T18:10:27.212597]: sent: inputAudioBufferAppend {"event_id":"evt_LzpvaHQWV5mZcrTgv","type":"input_audio_buffer.append","audio":"base64-encoded-audio"} 
flutter: [openai_realtime_dart.api/2025-01-02T18:10:27.241814]: sent: inputAudioBufferCommit {"event_id":"evt_9B3MKMcEWv5Sctokq","type":"input_audio_buffer.commit"} 
{event_id: evt_LzpvaHQWV5mZcrTgv, type: input_audio_buffer.append, audio: UklGRpJBAwBXQVZFSlVOSxwAAAAAAAAAAAA......sAEUARQBhAGEA} // Audio 5 second
{event_id: evt_9B3MKMcEWv5Sctokq, type: input_audio_buffer.commit}
flutter: [openai_realtime_dart.api/2025-01-02T18:10:27.707570]: received: error {"event_id":"event_AlQUdzVvxFvhIQ9QkJ6nd","type":"error","error":{"type":"invalid_request_error","message":"Error committing input audio buffer: buffer too small. Expected at least 100ms of audio, but buffer only has 0.00ms of audio."}} 
{event_id: event_AlQUdzVvxFvhIQ9QkJ6nd, type: error, error: {type: invalid_request_error, message: Error committing input audio buffer: buffer too small. Expected at least 100ms of audio, but buffer only has 0.00ms of audio.}}

So, OpenAI didn't see my audio. And when I try to speak with him on stream - he try to answer, but - hear nothing.
And even if I send him audio with: sendUserMessageContent - he didn't hear me.
My settings on record:

        final config = RecordConfig(
          encoder: AudioEncoder.pcm16bits,
          bitRate: 24000,
          sampleRate: 44100,
          numChannels: 2,
          device: device,
        );

And audio record work's fine, I check it.

kamjony added the t:bug Something isn't working label Dec 20, 2024

github-project-automation bot added this to LangChain.dart Dec 20, 2024

github-project-automation bot moved this to 📋 Backlog in LangChain.dart Dec 20, 2024

davidmigloz added the p:openai_realtime_dart openai_realtime_dart package label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Audio Input transcription issue #624

User Audio Input transcription issue #624

kamjony commented Dec 20, 2024 •

edited by davidmigloz

Loading

davidmigloz commented Dec 21, 2024

davidmigloz commented Dec 21, 2024 •

edited

Loading

kamjony commented Dec 23, 2024

davidmigloz commented Dec 26, 2024

DS-RigER commented Jan 3, 2025 •

edited

Loading

User Audio Input transcription issue #624

User Audio Input transcription issue #624

Comments

kamjony commented Dec 20, 2024 • edited by davidmigloz Loading

System Info

Related Components

Reproduction

Expected behavior

davidmigloz commented Dec 21, 2024

davidmigloz commented Dec 21, 2024 • edited Loading

kamjony commented Dec 23, 2024

davidmigloz commented Dec 26, 2024

DS-RigER commented Jan 3, 2025 • edited Loading

kamjony commented Dec 20, 2024 •

edited by davidmigloz

Loading

davidmigloz commented Dec 21, 2024 •

edited

Loading

DS-RigER commented Jan 3, 2025 •

edited

Loading