Unless you’ve been living in a cave, you probably know about ChatGPT and its kin of generative AIs. The AIs have taken the world by storm, sending big tech companies into a frenzy to make their own AI. Of course, Facebook and Instagram’s parent company Meta also joined the race. But in a recent turn of events, the language model was leaked online. Of all places, it managed to make its way to 4chan — the oozing cesspool of the internet.
AIs and society
It seemed that just a year or two ago, the possibility of AI generated but seemingly-human content was decades away. Now, it’s already here. OpenAI recently launched GPT-4 (short for Generative Pre-trained Transformer 4), the latest iteration in this series. It is a successor to GPT-3 and ChatGPT and in many instances, it generates human-looking text.
For instance, Facebook’s large language model (called LLaMa) is only available to approved members. Or at least, it was until a few days ago. The language model was leaked and subsequently shared on 4chan.
Several computer scientists confirmed the leak, per The Verge. It’s the first time a major company’s proprietary generative model has been leaked to the public.
According to Vice, Meta didn’t deny that the leak happened, and instead, said:
“It’s Meta’s goal to share state-of-the-art AI models with members of the research community to help us evaluate and improve those models. LLaMA was shared for research purposes, consistent with how we have shared previous large language models. While the model is not accessible to all, and some have tried to circumvent the approval process, we believe the current release strategy allows us to balance responsibility and openness,” a Meta spokesperson wrote in an email.
It’s a bit ironic that, even as Meta bragged that using a limited release approach was a way of “democratizing access” to large language models, this type of leak still happened. This is all the more ironic since this approach was employed to avoid the kinds of toxic outputs we’ve seen from AIs in the past.
The fact that it reached 4chan, which has a long litany of controversies of all sorts, ranging from racism and links to the alt-right to hacktivism, is all the more telling. Big tech companies can try to keep their models locked, but it seems likely that sooner or later, the models will reach the public — or some parts of the public.
Can we open-source AIs?
Some have blamed Meta for its approach, and said we should all expect consequences like spam or phishing to reach us.
Other experts seem to have a different idea. For instance, researchers Sayash Kapoor and Arvind Narayanan wrote in a blog post that despite warnings of a new wave of AI-powered spam and malicious attacks, we’ve not actually seen much of this.
For now, the AI world is brimming with possibility and risk, but all this is just starting to take shape. If this incident teaches us anything, it’s that it’s hard to keep algorithms away from the public.
For the average user, being able to download engines like LLaMa doesn’t really do all that much. This isn’t a plug-and-play engine, but rather a complex AI system that requires a lot of experience to set up. In fact, LLaMa isn’t even a single system, but rather four systems.
But this does mean that someone with experience will be able to get it running and bypass at least some of the safeguards imposed by companies. For instance, the already-famous ChatGPT has several safeguards that prevent it from creating bigotry or other types of contentious or dangerous content — but if you run the engine yourself, you can escape some of those restraints.
This leak will likely put that hypothesis to the test, and we’ll soon see what people can do when they run the AI themselves.
“I think it’s very likely that this model release will be a huge milestone,” Shawn Presser, an independent AI researcher who’s helped distribute the leaked model, tells The Verge.
At any rate, the AI is entering the real world. Not in the future, but right now.