homehome Home chatchat Notifications


OpenAI just released GPT-4, which can now understand images. Here's what you need to know

The new iteration can perform at "human level" on various professional and academic benchmarks.

Tibi Puiu
March 14, 2023 @ 10:06 pm

share Share

Credit: OpenAI.

ChatGPT has taken the world by storm, setting a record for the fastest user growth in January when it reached 100 million active users two months after launch. For those of you who’ve been living under a rock, ChatGPT is a chatbot launched by OpenAI, a research laboratory founded by some of the biggest names in tech such as Elon Musk, Reid Hoffman, Peter Thiel, and Sam Altman. ChatGPT can write emails, essays, poetry, answer questions, or generate complex lines of code all based on a text prompt.

In short, ChatGPT is a pretty big freaking deal, which brings us to the news of the week: OpenAI just announced the launch of GPT 4, the new and improved large multimodal model. Starting today, March 14, 2023, the model will be available for ChatGPT Plus users and to select 3rd party partners via its API.

Every time OpenAI has released a new Generative Pre-trained Transformer, or GPT, the latest version nearly always marked at least one order of magnitude of improvement over the previous iteration. I’ve yet to test the tool out but judging from the AI company’s official research blog post, this new update is no different, bringing a number of important improvements and new features.

GPT-4 can now use images as prompts

Until GPT-3.5, the next-generation AI could only understand and output text. But now GPT-4 can accept images as prompts.

“It generates text outputs given inputs consisting of interspersed text and images,” the OpenAI announcement reads. “Over a range of domains — including documents with text and photographs, diagrams, or screenshots — GPT-4 exhibits similar capabilities as it does on text-only inputs.”

What this means in practical terms is that you can now upload an image and ask GPT-4 to do a number of things with it based on its analysis. For instance, say you upload an image depicting a bunch of balloons floating in the sky tethered by strings. If you ask GPT-4 what would happen if you cut the strings, the model can reason that the balloons will fly away into the sky.

In the example below shared by the OpenAI team, a purposefully strange image is uploaded showing a man ironing clothes while attached to the back of a taxi, and ChatGPT-4 can actually reason what’s unusual about the image if you ask it.

There were rumors that GPT-4 would also have video abilities, but we now know that if there were any such plans, they were scraped for this version. As of yet, there are no video or animation features but those are certainly not too far away.

It’s (a lot) smarter too

Per OpenAI, GPT-4 scores a lot higher on various aptitude tests. GPT-3.5 scored in the bottom 10th percentile at the Uniform Bar Exam and the 31st percentile at the Biology Olympiad. GPT 4 scores in the 90th and 99th percentile for the bar exam and olympiad, respectively, placing it on par with some of the brightest human students.

Here’s an example of GPT-4’s improved reasoning abilities.

INPUT: Andrew is free from 11 am to 3 pm, Joanne is free from noon to 2 pm and then 3:30 pm to 5 pm. Hannah is available at noon for half an hour, and then 4 pm to 6 pm. What are some options for start times for a 30 minute meeting for Andrew, Hannah, and Joanne?

GPT-3.5 GPT-4
Here is a list of their availability:
– Andrew: 11 am to 3 pm
– Joanne: noon to 2 pm, and 3:30 pm to 5 pm
– Hannah: noon to 12:30 pm, and 4 pm to 6 pm

Based on their availability, there is a 30-minute window where all three of them are available, which is from 4 pm to 4:30 pm. So, the meeting can be scheduled at 4 pm.
Andrew: 11 am – 3 pm
Joanne: 12 pm – 2 pm, 3:30 pm – 5 pm
Hannah: 12 pm – 12:30 pm, 4 pm – 6 pm

Common availability for a 30-minute meeting: 12 pm – 12:30 pm

Finding the open window for when three people with different schedules are available can be complicated even for a human to figure out quickly. In this scenario, GPT-3.5 fails in suggesting 4 pm as a proper time because it conflicts with Andrew’s schedule. However, GPT-4 was able to reason correctly and offers a good solution that works for all three.

GPT-4 will be integrated into Microsoft services, including Bing

In February, Microsoft integrated a modified version of GPT-3.5 into Bing, its search engine that for years has been laughingly behind Google. Not anymore, though. Microsoft has invested over $10 billion in OpenAI, which highlights how serious it is about the coming generative AI revolution and in going after Google. In response, Google made a clumsy release announcement for its own AI-powered search engine called Bard, which at the moment looks underwhelming, to say the least.

GPT-4 can only answer questions about non-fiction events and people that it has information on until September 2021. But Bing will start using GPT-4 which has access to the open web, thereby enabling it to answer questions about events happening almost in real-time, as soon as they are reported across the internet.

In addition, GPT-4 is now available through the app’s API, which enables select 3rd parties to access the AI engine in their products. Duolingo the language app is using GPT-4 to deepen conversations with users looking to learn a new language. Similarly, Khan Academy integrated the new GPT to offer personalized, one-on-one tutoring to students for math, computer science, and a range of other disciplines available on their platform.

The image prompt feature is currently available to only one outside partner. Be My Eyes, a free app that connects blind and low-vision people with sighted volunteers, integrated GPT-4 with its Virtual Volunteer.

“For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what’s in it, but also extrapolate and analyze what can be prepared with those ingredients. The tool can also then offer a number of recipes for those ingredients and send a step-by-step guide on how to make them,” says Be My Eyes in a blog post explaining this feature.

However, these features are pricy to have. OpenAI charges $0.03 per 1,000 “prompt” tokens, which is about 750 words. The image processing pricing has not been made public yet.

GPT-4 is still not perfect though

ChatGPT is as famous for its convincing, sometimes hilarious lies and hallucinations as it is for its phenomenal ability to synthesize information and drive human-like conversations. The good news is that GPT-4 is much more accurate and factual.

“We spent 6 months making GPT-4 safer and more aligned. GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations,” OpenAI said.

“In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle,” OpenAI wrote in a blog post announcing GPT-4. “The difference comes out when the complexity of the task reaches a sufficient threshold — GPT-4 is more reliable, creative and able to handle much more nuanced instructions than GPT-3.5.”

However, while it’s 40% more likely to deliver factual information, this doesn’t mean that it won’t continue to make mistakes, something that OpenAI acknowledges. This means that ChatGPT should be used with extreme caution, especially in high-stakes situations such as for outputting content for your job presentation.

Nevertheless, GPt-4 marks yet another huge milestone in the ongoing AI revolution that is set to transform our lives in more than one way.

share Share

50 years later, Vietnam’s environment still bears the scars of war – and signals a dark future for Gaza and Ukraine

When the Vietnam War finally ended on April 30, 1975, it left behind a landscape scarred with environmental damage. Vast stretches of coastal mangroves, once housing rich stocks of fish and birds, lay in ruins. Forests that had boasted hundreds of species were reduced to dried-out fragments, overgrown with invasive grasses. The term “ecocide” had […]

America’s Cornfields Could Power the Future—With Solar Panels, Not Ethanol

Small solar farms could deliver big ecological and energy benefits, researchers find.

Plants and Vegetables Can Breathe In Microplastics Through Their Leaves and It Is Already in the Food We Eat

Leaves absorb airborne microplastics, offering a new route into the food chain.

Explorers Find a Vintage Car Aboard a WWII Shipwreck—and No One Knows How It Got There

NOAA researchers—and the internet—are on the hunt to solve the mystery of how it got there.

Teen Influencer Watches Her Bionic Hand Crawl Across a Table on Its Own

The future of prosthetics is no longer science fiction.

Meet the Indian Teen Who Can Add 100 Numbers in 30 Second and Broke 6 Guinness World Records for Mental Math

The Indian teenager is officially the world's fastest "human calculator".

NASA Captured a Supersonic Jet Breaking the Sound Barrier and the Image Is Unreal

The coolest thing about this flight is that there was no sonic boom.

NASA’s Curiosity Rover Spotted Driving Across Mars From Space for the First Time

An orbiter captured Curiosity mid-drive on the Red Planet.

New Quantum Navigation System Promises a Backup to GPS — and It’s 50 Times More Accurate

An Australian startup’s device uses Earth's magnetic field to navigate with quantum precision.

Fully Driverless Trucks Hit Texas Highways (This Time With No Human Oversight)

Driverless trucks will haul freight in Texas without a human behind the wheel.