Skip to content

Conversation

@samwillis
Copy link

🎯 Changes

Very Early WIP: Add audio output streaming support to OpenAI adapter

This PR adds support for streaming audio output from OpenAI's audio-capable models (e.g., gpt-4o-audio-preview) via the Chat Completions API.

Opening as a discussion starter.

Background

The current OpenAI adapter uses the Responses API (client.responses.create()), which does not support audio output modalities. Audio output streaming requires the Chat Completions API with modalities: ['text', 'audio'] and audio: { voice, format } configuration.

Changes

1. New AudioStreamChunk type (packages/typescript/ai/src/types.ts)

  • Added 'audio' to StreamChunkType union
  • Added AudioStreamChunk interface with data (base64), transcript, and format fields
  • Added to StreamChunk union type

2. Audio output options (packages/typescript/ai-openai/src/text/text-provider-options.ts)

  • Added OpenAIAudioOutputOptions interface with modalities and audio config
  • Included in ExternalTextProviderOptions

3. OpenAI adapter audio routing (packages/typescript/ai-openai/src/openai-adapter.ts)

  • chatStream() now detects modalities.includes('audio') in provider options
  • When audio is requested, routes to new chatStreamWithAudio() method
  • chatStreamWithAudio() uses Chat Completions API instead of Responses API
  • Yields AudioStreamChunk for audio data and ContentStreamChunk for transcripts

4. Model metadata (packages/typescript/ai-openai/src/model-meta.ts)

  • Added gpt-4o-audio-preview model definition
  • Added OpenAIAudioOutputOptions to audio model provider option types

Usage

import { chat } from "@tanstack/ai"
import { createOpenAI } from "@tanstack/ai-openai"

const stream = chat({
  adapter: createOpenAI(apiKey),
  model: "gpt-4o-audio-preview",
  messages: [{ role: "user", content: "Tell me a story" }],
  providerOptions: {
    modalities: ["text", "audio"],
    audio: { voice: "alloy", format: "pcm16" },
  },
})

for await (const chunk of stream) {
  if (chunk.type === "audio") {
    // chunk.data is base64-encoded PCM16 audio
  }
  if (chunk.type === "content") {
    // chunk.delta is transcript text
  }
}

Real-world usage

This is being used by the Durable Streams story-app example - a child-friendly AI story generator that streams both narrated audio and synchronized text transcripts to a durable stream for resilient playback.

Open questions

  • Should audio routing be explicit via a separate method, or is auto-detection from modalities the right approach?
  • Should we add a ChatCompletionsAdapter as a separate class for broader Chat Completions API support?
  • Are there other audio-related events from the Chat Completions API we should handle?

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

SYSTEM_READY >> ...MS