Home Latest Insights | News OpenAI Launches GPT-4o, a new AI model with the ability to reason across audio, vision, and text in real time

OpenAI Launches GPT-4o, a new AI model with the ability to reason across audio, vision, and text in real time

OpenAI Launches GPT-4o, a new AI model with the ability to reason across audio, vision, and text in real time

In a bold leap forward for artificial intelligence (AI), OpenAI unveiled its latest innovation on Monday—a revolutionary AI model named GPT-4o, accompanied by a desktop version of ChatGPT and an overhauled user interface. 

This release represents OpenAI’s ambitious endeavor to democratize access to its renowned chatbot technology while elevating user experience to unprecedented heights.

During a live-streamed event, Mira Murati, the technology chief at OpenAI, delivered the momentous announcement, announcing the integration of GPT-4 into ChatGPT for all users, including those on the free tier. 

Tekedia Mini-MBA edition 15 (Sept 9 – Dec 7, 2024) has started registrations; register today for early bird discounts.

Tekedia AI in Business Masterclass opens registrations here.

Join Tekedia Capital Syndicate and invest in Africa’s finest startups here.

In her address, Murati highlighted the significance of this development, emphasizing OpenAI’s commitment to accessibility and user empowerment.

“This is the first time that we are really making a huge step forward when it comes to the ease of use,” Murati said.

Powered by Microsoft and boasting a valuation exceeding $80 billion by investors, OpenAI faces the dual challenge of maintaining its leadership in the fiercely competitive generative AI market while navigating the complexities of monetization amidst substantial investments in hardware and infrastructure.

The introduction of GPT-4o, the latest iteration in OpenAI’s esteemed GPT series, represents a quantum leap in AI capabilities. With superior speed and enhanced proficiency in text, video, and audio processing, GPT-4o heralds a new era of AI sophistication. 

The company said GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence while setting new high watermarks on multilingual, audio, and vision capabilities.

“With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” the company said in a blog post.

Notably, OpenAI envisions extending ChatGPT’s functionality to encompass video chat capabilities in the near future, further expanding the scope of interactive experiences facilitated by the platform.

“The ‘o’ in GPT-4o signifies ‘omni,‘ reflecting the model’s versatility and adaptability,” Murati explained, highlighting GPT-4o’s ability to support 50 different languages with unparalleled speed and precision. Moreover, GPT-4o will be seamlessly integrated into OpenAI’s API, empowering developers to harness its transformative potential for a myriad of applications.

“Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in the API in the coming weeks,” the company said on Monday.

In a captivating demonstration of GPT-4o’s capabilities, OpenAI showcased its advanced audio processing prowess. Mark Chen, a distinguished OpenAI researcher, demonstrated the model’s capacity to discern and respond to users’ emotions, setting a new standard for human-machine interaction. Furthermore, GPT-4o exhibited remarkable adaptability, seamlessly accommodating user interruptions and dynamically adjusting its responses to suit the conversational context.

As part of its commitment to enhancing user experience, OpenAI announced plans to introduce Voice Mode—an innovative feature that enables ChatGPT to respond to audio prompts with lightning speed. Leveraging cutting-edge technology, OpenAI aims to replicate the fluidity and spontaneity of human conversation, ushering in a new era of interactive dialogue between users and AI.

Beyond its proficiency in audio processing, GPT-4o showcased its versatility across diverse domains, from storytelling to code generation. With its ability to undertake complex tasks with unparalleled efficiency, GPT-4o emerges as a formidable competitor to industry giants like Microsoft’s GitHub Copilot, positioning itself at the vanguard of AI innovation.

The company said “GPT-4o’s text and image capabilities are starting to roll out today [Monday] in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. We’ll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.”

As the rollout of GPT-4o commences, users can anticipate a transformative ChatGPT experience characterized by seamless interactions, enhanced capabilities, and unparalleled versatility. With GPT-4o leading the charge, OpenAI reaffirms its commitment to shaping the future of AI and revolutionizing the way humans engage with technology.

No posts to display

Post Comment

Please enter your comment!
Please enter your name here