OpenAI Debuts GPT-4o and Adds More Features to ChatGPT

Micah_Sulit
edited August 9 in AI

OpenAI recently unveiled GPT-4o, its latest flagship model that can process text, vision, and audio faster than ever. This reveal was the highlight of the livestreamed OpenAI Spring Update on May 13, 2024, along with live demonstrations of the model’s capabilities and announcements of its gradual rollout. 

This newest model retains the intelligence level of GPT-4 but offers increased speed and enhanced capabilities in generating natural interactions. The “o” in GPT-4o stands for “omni,” a nod to its cutting-edge multimodal functionalities. According to OpenAI, GPT-4o is currently the most effective model when it comes to understanding audio and vision. 

What Are the Highlights of GPT-4o?

GPT-4o has groundbreaking abilities that are coming soon to ChatGPT. Here are some examples of those key features and how we might see them in action.

Multimodal Capabilities:

In a significant leap forward, GPT-4o is able to process and interpret not only text inputs but also images, video, and audio. Users can give any of these as prompts, and in response, the model can provide text, image, and audio as output. Voice Mode was a highlight in OpenAI’s Spring Update live demos, with GPT-4o able to pick up on a user’s facial expressions or provide real-time translations between two people speaking different languages. The Voice Mode feature wasn’t new, but the speed of its processing was. Before, this functionality used three separate models that transcribed audio to text, provided intelligence (using GPT-3.5 or GPT-4), and converted text to audio. Multimodal GPT-4o now handles all those steps on its own, which means no information is lost between steps and there’s a wider range of outputs it can produce.

Superior Conversational Abilities:

GPT-4o’s improved ability to understand and generate natural language is a standout feature because it can now speak to users much like a human does—and just as fast as a human can. (OpenAI says the model’s response rate to audio prompts is 0.32 seconds on average.) It’s not just the speed that’s impressive, but also the ability to add human-like inflections and emotions to the AI-generated voice. The GPT-4o live demos showed how ChatGPT can engage a user in light banter, or narrate a story more and more dramatically as prompted. It can even laugh and sing.

These landmark features open up new possibilities for GPT-4o applications in education, content creation, and beyond. For example, the Be My Eyes app has published a video showing how GPT-4o can function as a virtual guide for the visually impaired, verbally describing what’s happening around them and even helping them hail a taxi on a street. OpenAI’s other Voice Mode demos show the model being used as an academic tutor or a roleplay partner for scenarios like preparing for a job interview. New types of text and image outputs are just as compelling and include 3D renderings, product mockups, and summarizations of uploaded presentations or audio files. 

Other ChatGPT features announced during OpenAI’s Spring Update include support for over 50 languages, an improved web interface, and a new macOS desktop app with the current Voice Mode version. Windows users have to wait until later in the year to use ChatGPT on desktop. (That’s enough time to shop for an AI PC that’s optimized for AI workloads, like the Acer Swift Go 14 Laptop.) 

Who Has Access to GPT-4o? 

Users of both the free and paid versions of ChatGPT can now take GPT-4o for a spin. Previously, ChatGPT Free users were limited to GPT-3.5, with GPT-4 only available to Plus subscribers. Now, ChatGPT Free automatically uses GPT-4o, but there’s a limit to the messages you can send and ChatGPT will revert to the GPT-3.5 model once you’ve used up your allotted messages for the day. OpenAI has not provided specifics on the cap for free users, only noting that the limit “will vary based on current usage and demand.” It currently appears to be 10 messages per 24-hour window.  

ChatGPT Plus subscribers have message limits that refresh every three hours: 80 messages using GPT-4o and 40 messages using GPT-4. These caps may be reduced during peak periods “to keep GPT-4 and GPT-4o accessible to the widest number of people,” says OpenAI. Those with ChatGPT Team subscriptions get a bigger message limit than Plus users, although OpenAI does not give specific numbers.  

Paying users still get exclusive access to more advanced features, including the new Voice Mode version that will become available to ChatGPT Plus in the weeks to come. 

As for developer access, GPT-4o is now available as a text and vision model in the API, with support for the new video and audio functionalities launching first to a small number of partners. The advantages of GPT-4o for developers are that it’s twice as fast, half as expensive, and provides five times the rate limits of GPT-4 Turbo. 

What Are People Saying About GPT-4o? 

Developers and ChatGPT users were eager to put GPT-4o to the test, especially after many were impressed and intrigued by the OpenAI demos. However, as the advanced vision capabilities are not yet accessible to the public, whether ChatGPT lives up to the hype remains to be seen. What users do agree on is that GPT-4o delivers the advertised boost in speed, generating responses more quickly than its predecessors. Then again, feedback on the accuracy of responses has been mixed. Some people reported that GPT-4o gave more thorough answers than GPT-4 and GPT-4 Turbo, while others said the latest model was faster but not necessarily better at reasoning. 

There was also much wariness about GPT-4o’s abilities to interact with users in a way that’s more human-like than previously possible. Several opinion pieces (like these op-eds from CNN and MSNBC) have called the developments “creepy.” Concerns include the female Voice Mode assistant’s seemingly flirtatious personality and whether it’s pushing the conversation on gender stereotypes forward, as well as the risk of users becoming too dependent on or attached to anthropomorphic AI models. OpenAI’s release notes acknowledge that GPT-4o’s snazzy audio features come with a new array of risks. The company says it will address safety, usability, and technical issues as it rolls out the full capabilities of GPT-4o over the next months. 

Whether you find GPT-4o thrilling or terrifying (or perhaps both), it has set a new standard for AI models and the possibilities they offer. Explore innovative use cases as soon as the pioneering audio and vision capabilities become available, or boot up ChatGPT right now and see what you can already do with GPT-4o. 

Want to stay up to date on topics like AI, Gaming, PC Tech, Business, and Education? Subscribe here to get a weekly Acer Corner Email Digest that's tailored to your interests. 

Recommended Products

Swift Go 14 Laptop

Shop Now

Aspire Vero 16

Shop Now

Aspire Vero 14 Laptop

Shop Now

 
About Micah Sulit: Micah is a writer and editor with a focus on lifestyle topics like tech, wellness, and travel. She loves writing while sipping an iced mocha in a cafe, preferably one in a foreign city. She's based in Manila, Philippines. 

Tagged:

Introducing: Email Digest


Every week, we’ll bring you the top 5 trending topics from our Acer Corner.

Socials

Stay Up to Date


Get the latest news by subscribing to Acer Corner in Google News.