OpenAI Unveils 'gpt-realtime' with Image Input, Faster Responses
OpenAI has unveiled significant updates to its real-time API, now generally available. The new model, 'gpt-realtime', boasts improved voices, enhanced accuracy, and expanded capabilities, including image input processing.
The 'gpt-realtime' model introduces two new voices, Cedar and Marin, and refines existing ones. It achieves higher accuracy scores than its predecessor in benchmarks. Notably, it can now handle image input, reading text from images or answering questions based on visual representations.
OpenAI has also made its real-time API available for productive use, with Bandwidth Inc. announcing support for the latest API, integrating voice communication via SIP. The API is now available for general use, though the specific release date is not mentioned.
The new model offers faster response times, more natural speech, and improved compliance with complex instructions. It supports asynchronous tool calls, making conversations smoother. Unlike previous models, 'gpt-realtime' processes and generates speech directly without text models.
It recognizes non-verbal signals like laughter, switches languages mid-sentence, and speaks with fine-tuned intonation. Tool calls are more reliable, with the model selecting suitable tools, timings, and parameters more deliberately.
OpenAI has also introduced new cost management functions for long sessions, reducing prices by 20%.
The 'gpt-realtime' model's updates and expanded capabilities promise a more seamless and efficient user experience. With its general availability, users can now explore and benefit from these improvements in real-time interactions.
Read also:
- Elon Musk accused by Sam Altman of exploiting X for personal gain
- China's Automotive Landscape: Toyota's Innovative Strategy in Self-Driving Vehicles
- L3Harris' RASOR Revolutionizes Military Communications with Secure Satellite Broadband
- EU Bolsters Defense Capabilities: Orbotix Secures €6.5M for AI-Driven Drones