OpenAI Releases GPT-4, A Multimodal AI


After months of anticipation, OpenAI has released a powerful new image- and text-understanding AI model, GPT-4, that the company calls “the latest milestone in its effort to scale up deep learning.”

GPT-4 is available via OpenAI’s API with a waitlist and in ChatGPT Plus, OpenAI’s premium plan for ChatGPT, its viral AI-powered chatbot.

It’s been hiding in plain sight, as it turns out. Microsoft confirmed that Bing Chat, its chatbot tech co-developed with OpenAI, is running on GPT-4. Other early adopters include Stripe, which is using GPT-4 to scan business websites and deliver a summary to customer support staff, and Duolingo, which built GPT-4 into a new language learning subscription tier.

According to OpenAI, GPT-4 can accept image and text inputs — an improvement over GPT-3.5, its predecessor, which only accepted text — and performs at the “human level” on various professional and academic benchmarks. For example, GPT-3 passes a simulated bar exam with a score around the top 10% of test takers.

OpenAI spent six months iteratively aligning GPT-4 using lessons from an adversarial testing program as well as ChatGPT, resulting in “best-ever results” on factuality, steerability and refusing to go outside of guardrails, according to the company

“In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle,” OpenAI wrote in a blog post announcing GPT-4. “The difference comes out when the complexity of the task reaches a sufficient threshold — GPT-4 is more reliable, creative and able to handle much more nuanced instructions than GPT-3.5.”

Without a doubt, one of GPT-4’s more interesting aspects is its ability to understand images as well as text. GPT-4 can caption — and even interpret — relatively complex images, for example, identifying a Lightning Cable adapter from a picture of a plugged-in iPhone.

The image understanding capability isn’t available to all OpenAI customers just yet — OpenAI’s testing it with a single partner, Be My Eyes, to start with. Powered by GPT-4, Be My Eyes’ new Virtual Volunteer feature can answer questions about images sent to it.

“For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what’s in it, but also extrapolate and analyse what can be prepared with those ingredients. The tool can also then offer a number of recipes for those ingredients and send a step-by-step guide on how to make them.”

A more meaningful improvement, potentially, is the aforementioned steerability. With GPT-4, OpenAI is introducing “system” messages that allow developers to prescribe their AI’s style and task by describing specific directions. System messages, which will also come to ChatGPT in the future, are in the form of instructions that set the tone — and establish boundaries — for the AI’s next interactions.

For example, a system message might read:

“You are a tutor that always responds in the Socratic style. You never give the student the answer, but always try to ask just the right question to help them learn to think for themselves. You should always tune your question to the interest and knowledge of the student, breaking down the problem into simpler parts until it’s at just the right level for them.”