Revolutionize Your Business with AI — Unveiling the Future of Tech

OpenAI Unveils 'gpt-realtime' with Image Input, Faster Responses

Now understand images and speak faster: OpenAI's 'gpt-realtime' is here. Explore the new model's enhanced capabilities today.

, and Administrator

2025 October 8 . 6:16 AM

1 min read

In this picture we can see a screen. On the screen there is an image of a microphone and there are... — In this picture we can see a screen. On the screen there is an image of a microphone and there are some words on it.

OpenAI Unveils 'gpt-realtime' with Image Input, Faster Responses

OpenAI has unveiled significant updates to its real-time API, now generally available. The new model, 'gpt-realtime', boasts improved voices, enhanced accuracy, and expanded capabilities, including image input processing.

The 'gpt-realtime' model introduces two new voices, Cedar and Marin, and refines existing ones. It achieves higher accuracy scores than its predecessor in benchmarks. Notably, it can now handle image input, reading text from images or answering questions based on visual representations.

OpenAI has also made its real-time API available for productive use, with Bandwidth Inc. announcing support for the latest API, integrating voice communication via SIP. The API is now available for general use, though the specific release date is not mentioned.

The new model offers faster response times, more natural speech, and improved compliance with complex instructions. It supports asynchronous tool calls, making conversations smoother. Unlike previous models, 'gpt-realtime' processes and generates speech directly without text models.

It recognizes non-verbal signals like laughter, switches languages mid-sentence, and speaks with fine-tuned intonation. Tool calls are more reliable, with the model selecting suitable tools, timings, and parameters more deliberately.

OpenAI has also introduced new cost management functions for long sessions, reducing prices by 20%.

The 'gpt-realtime' model's updates and expanded capabilities promise a more seamless and efficient user experience. With its general availability, users can now explore and benefit from these improvements in real-time interactions.

Latest

In this image there is a painting on the wall on which we can see there is a watch with some...

Smart-home-devices

Louis Vuitton Revives Classic Monterey Watch After 33 Years

The iconic Monterey returns after 33 years. This timepiece blends Louis Vuitton's heritage with modern watchmaking.

, and Administrator

2025 October 9

In this image on both sides there are buildings, electric poles. There are few vehicles parked in...

Climate change

Apple Invests €100m in Schroders' China Renewable Energy Strategy

Apple's significant investment in China's renewable energy sector signals growing global interest. This move could accelerate China's transition to cleaner energy, reducing global emissions and fossil fuel demand.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

Revolutionize Your Business with AI

Confluent Explores Sale Amidst Private Equity and Tech Interest

Confluent's robust streaming software draws interest from private equity and tech companies. A sale could benefit shareholders, but no deals are final yet.

, and Administrator

2025 October 9

In the image there is an insect on a web and the background is blurry.

Strengthen Your Digital Fortunes

UK's NCA Launches 'Power Off' Operation to Combat Cybercrime

The NCA's innovative 'Power Off' operation is using fake DDoS-for-hire sites to catch cybercriminals. It's already led to arrests in the UK and the US.

, and Administrator

2025 October 9

OpenAI Unveils 'gpt-realtime' with Image Input, Faster Responses

OpenAI Unveils 'gpt-realtime' with Image Input, Faster Responses

Read also:

Related

Latest