homehome Home chatchat Notifications


Google shows off ChatGPT-like bot that turns hums and text into music

AI is yet again redrawing the boundaries of what we call 'art'.

Tibi Puiu
January 30, 2023 @ 5:06 pm

share Share

Credit: Pixabay.

ChatGPT finally brought AI to the masses, garnering over a million users in its first week of release in December 2022. Since then, we’ve seen a ton of creative uses for virtually anything from organizing people’s meals to hosting Dungeons and Dragons nights. However, ChatGPT is, strictly speaking, a chatbot. Text flows in, text flows out.

As you’re probably aware from the flux of AI-generated media on social media, there are also very robust algorithms that can turn text prompts into images or even videos, sometimes with striking results. Now, Google unveiled a new system that can generate music in any genre starting from a simple text description. There’s even an option to generate music based on your humming or whistling if you can’t really capture your idea for a song in words.

Music-making AI bots

This isn’t the first text-to-music AI that we’ve seen. However, the new system, called MusicLM, is heads and shoulders above any other previous iteration.

Trained using a massive database of over 280,000 hours of music, Google’s AI can combine various genres and instruments to generate surprisingly eclectic works, be they short songs or entire playlists. It’s also remarkably capable of integrating more abstract requests. For instance, here’s one of the text prompts that was used in the past and shared by the authors in their research paper:

“The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.”

And here’s what the output sounds like:

Here’s another interesting one:

“Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive.”

There’s also a story mode that you can use to generate tracks based on several descriptions stitched together, which you could theoretically use to make an entire DJ set. This is useful if you to generate a soundtrack in which different sections of the song need to evoke different feelings or play in a different style, like in this example:

time to meditate (0:00-0:15)
time to wake up (0:15-0:30)
time to run (0:30-0:45)
time to give 100% (0:45-0:60)

One of the Google researchers really had fun with the next one, stretching the limits of MusicLM by asking it to generate a track that starts off with some jazzy vibes only to roll into pop, rap, and even death metal while staying cohesive.

jazz song (0:00-0:15)
pop song (0:15-0:30)
rock song(0:30-0:45)
death metal song (0:45-1:00)
rap song (1:00-1:15)
string quartet with violins (1:15-1:30)
epic movie soundtrack with drums (1:30-1:45)
scottish folk song with traditional instruments (1:45-2:00)

Here’s a Google developer humming the main theme of the Italian protest folk song Bella Ciao:

And now here’s MusicLM reproducing the melody using a variety of instruments:

jazz with saxophone
opera singer
tribal drums and flute

But perhaps the most interesting feature is the AI’s ability to generate soundtracks using paintings and their description as prompts.

“His melting-clock imagery mocks the rigidity of chronometric time. The watches themselves look like soft cheese—indeed, by Dali s own account they were inspired by hallucinations after eating Camembert cheese. In the center of the picture, under one of the watches, is a distorted human face in profile. The ants on the plate represent decay.” By Gromley, Jessica. “The Persistence of Memory”. Encyclopedia Britannica, 14 Apr. 2022.
Dali soundtrack
“Inspired by a hallucinatory experience in which Munch felt and heard a scream throughout nature, it depicts a panic-stricken creature, simultaneously corpse like and reminiscent of a sperm or fetus, whose contours are echoed in the swirling lines of the blood-red sky.” By Zaczek, Iain. “The Scream”. Encyclopedia Britannica, 14 Apr. 2022.
Munch soundtrack

There are dozens of other sample tracks made using MusicLM posted on GitHub.

These are surely impressive results, although don’t expect any of these songs to win a Grammy any time soon. The compositions, while entertaining and even creative at times, are littered with all sorts of artifacts that sound oddly out of place, like the seven-finger hands you sometimes see in AI-generated visual art. Sound quality-wise, although Google claims the AI generates files at 24 kHz, the output can sound like it was mixed and mastered by some junior sound engineer in his basement.

Despite its shortcomings, MusicLM is still pretty mindblowing. Furthermore, it shows that neither Google nor its rival Meta for that matter, is sitting idle while everyone is going crazy about ChatGPT. Google might even have a better chatbot than OpenAI but they might just be keeping their cards close to their chest, waiting for the perfect moment to unveil their own work. If there’s anything that Google showed us through its DeepMind division, is that it’s capable of delivering extraordinary AI machines, like AlphaGo that can steamroll the world’s best champions at Go (a game several orders of magnitude more complex than chess) or AlphaFold, which cracked the structure of over 200 million proteins.

For now, MusicLM is not publicly available. The authors say that the machine is not ready for public release yet, as researchers still need to figure out how to solve some glitches, but also some licensing dilemmas that may prove particularly thorny. Stability AI and Midjourney—two of the biggest names in the exploding field of AI-generated imagery— have become the target of a class action lawsuit in California filed by many artists who are requesting financial reparation for copyright infringement. The artists are “con­cerned about AI sys­tems being trained on vast amounts of copy­righted work with no con­sent, no credit, and no com­pen­sa­tion,” and Google might have a similar concern that it could get sued if it releases a public AI trained on music without the authors’ permission.

share Share

Ronan the Sea Lion Can Keep a Beat Better Than You Can — and She Might Just Change What We Know About Music and the Brain

A rescued sea lion is shaking up what scientists thought they knew about rhythm and the brain

Did the Ancient Egyptians Paint the Milky Way on Their Coffins?

Tomb art suggests the sky goddess Nut from ancient Egypt might reveal the oldest depiction of our galaxy.

Dinosaurs Were Doing Just Fine Before the Asteroid Hit

New research overturns the idea that dinosaurs were already dying out before the asteroid hit.

Denmark could become the first country to ban deepfakes

Denmark hopes to pass a law prohibiting publishing deepfakes without the subject's consent.

Archaeologists find 2,000-year-old Roman military sandals in Germany with nails for traction

To march legionaries across the vast Roman Empire, solid footwear was required.

Mexico Will Give U.S. More Water to Avert More Tariffs

Droughts due to climate change are making Mexico increasingly water indebted to the USA.

Chinese Student Got Rescued from Mount Fuji—Then Went Back for His Phone and Needed Saving Again

A student was saved two times in four days after ignoring warnings to stay off Mount Fuji.

The perfect pub crawl: mathematicians solve most efficient way to visit all 81,998 bars in South Korea

This is the longest pub crawl ever solved by scientists.

This Film Shaped Like Shark Skin Makes Planes More Aerodynamic and Saves Billions in Fuel

Mimicking shark skin may help aviation shed fuel—and carbon

China Just Made the World's Fastest Transistor and It Is Not Made of Silicon

The new transistor runs 40% faster and uses less power.