homehome Home chatchat Notifications


IBM is building the largest data array in the world - 120 petabytes of storage

IBM recently made public its intentions of developing what will be upon its completion the world’s largest data array, consisting of 200,000 conventional hard disk drives intertwined and working together, adding to 120 petabytes of available storage space. The contract for this massive data array, 10 times bigger than any other data center in the […]

Tibi Puiu
August 29, 2011 @ 11:10 am

share Share

Data Center

IBM recently made public its intentions of developing what will be upon its completion the world’s largest data array, consisting of 200,000 conventional hard disk drives intertwined and working together, adding to 120 petabytes of available storage space. The contract for this massive data array, 10 times bigger than any other data center in the world at present date, has been ordered by an “unnamed client”, whose intentions has yet to be disclaimed. IBM claims that the huge storage space will be used for complex computations, like those used to model weather and climate.

To put things into perspective 120 petabytes, or 120 million gygabites would account for 24 billion typical five-megabyte MP3 files or 60 downloads of the entire internet, which currently spans across 150 billion web pages. And while 120 petabytes might sound outrageous by any sane standard today, in just a short time, at the rate technology is advancing, it might become fairly common to encounter a data center similarly sized in the future.

“This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it,” Hillsberg says. Just keeping track of the names, types, and other attributes of the files stored in the system will consume around two petabytes of its capacity.

I know some of you tech enthusiasts out there are already grinding your teeth a bit to this fairly dubious numbers. I know I have – 120 petabytes/200.000 equals 600 GB. Does this mean IBM is using only 600 GB hard drives? I’m willing to bet they’re not that cheap, it’s would be extremely counter-productive in the first place. Firstly, it’s worth pointing out that we’re not talking about your usual commercial hard drives. Most likely, the hard-drives used will be of the sort of 15K RPM Fibre Channel disks, at the very least – which beats the heck out of your SATA drive currently powering your computer storage. These kind of hard-drives are currently not that voluminous in storage as SATA ones, so this might be an explanation. There’s also the issue of redundancy which is encountered in data centers, which decreases the amount of available real storage spaces and increases as a data center is larger. So the hard-drives used could actually be somewhere between 1.5 and 3 TB, all running on cutting edge data transfer speed.

Steve Conway, a vice president of research with the analyst firm IDC who specializes in high-performance computing (HPC), says IBM’s repository is significantly bigger than previous storage systems. “A 120-petabye storage array would easily be the largest I’ve encountered,” he says.

To house these massively numbered hard-drives IBM located them horizontaly on drawers, like in any other data center, but made these spaces even wider, in order to accommodate more disks within smaller confines. Engineers also implemented a new data backup mechanism, whereby information from dying disks is slowly reproduced on a replacement drive, allowing the system to continue running without any slowdown. Also, a system called GPFS, meanwhile, spreads stored files over multiple disks, allowing the machine to read or write different parts of a given file at once, while indexing its entire collection at breakneck speeds.

Last month a team from IBM used GPFS to index 10 billion files in 43 minutes, effortlessly breaking the previous record of one billion files scanned in three hours. Now, that’s something!

Fast access to huge storage is of crucial necessity for supercomputers, who need humongous amounts of bytes to compute the various complicate model they’re assigned to, be it weather simulations or the decoding of the human genome. Of course, they can be used, and most likely are already in place, to store identities and human biometric data too. I’ll take this opportunity to remind you of a frightful fact we published a while ago – every six hours the NSA collects data the size of the Library of Congress.

As quantum computing takes ground and eventually the first quantum computer will be developed, these kind of data centers will become highly more common.

UPDATE: The facility has indeed opened in 2012. 

MIT Technology Review

share Share

This new blood test could find cancerous tumors three years before any symptoms

Imagine catching cancer before symptoms even appear. New research shows we’re closer than ever.

DARPA Just Beamed Power Over 5 Miles Using Lasers and Used It To Make Popcorn

A record-breaking laser beam could redefine how we send power to the world's hardest places.

Big Tech Said It Was Impossible to Create an AI Based on Ethically Sourced Data. These Researchers Proved Them Wrong

A massive AI breakthrough built entirely on public domain and open-licensed data

Lawyers are already citing fake, AI-generated cases and it's becoming a problem

Just in case you're wondering how society is dealing with AI.

Leading AI models sometimes refuse to shut down when ordered

Models trained to solve problems are now learning to survive—even if we tell them not to.

AI slop is way more common than you think. Here's what we know

The odds are you've seen it too.

Scientists Invented a Way to Store Data in Plastic Molecules and It Could Someday Replace Hard Drives

What if your next hard drive wasn’t a box, but a string of molecules? Synthetic polymers promises to revolutionize data storage.

Meet Cavorite X7: An aircraft that can hover like a helicopter and fly like a plane

This unusual hybrid aircraft has sliding panels on its wings that cover hidden electric fans.

AI is quietly changing how we design our work

AI reshapes engineering, from sketches to skyscrapers, promising speed, smarts, and new creations.

Inside the Great Firewall: China’s Relentless Battle to Control the Internet

On the Chinese internet, a river crab isn’t just a crustacean. It’s code. River crab are Internet slang terms created by Chinese netizens in reference to the Internet censorship, or other kinds of censorship in mainland China. They need to do this because the Great Firewall of China censors and regulates everything that is posted […]