The race to detect AI can be won
Press play to listen to this article
Voiced by artificial intelligence.
Jan Nicola Beyer is the research coordinator of the Digital Democracy unit at Democracy Reporting International.
The debate over the risks of generative artificial intelligence (AI) is currently in full swing.
On the one hand, advocates of the model for generative AI tools praise their potential to drive productivity gains not witnessed since the Industrial Revolution. On the other, there’s a growing chorus raising concerns regarding the potential dangers that these tools pose.
While there have been ample calls for regulating, or even stalling, new AI technology development, however, there’s a whole other dimension that appears to be missing from the debate — detection.
When compared with regulation, investing in technologies that discern between human and machine-generated content — such as DetectGPT and GPTZero for text, and AI Image Detector for visuals — may be seen by some as a substandard solution. As regulation will face insurmountable challenges, however, detection can offer a promising avenue for mitigating AI’s potential risks.
It’s undeniable that generative AI has the potential to enhance creativity and increase productivity. Yet, losing the ability to distinguish between natural and synthetic content could also empower nefarious actors. From simple forms of plagiarism in schools and universities to the breach of electronic security systems and the launch of professional disinformation campaigns, the dangers behind machines writing text, drawing pictures or making videos are manifold.
All these threats call for a response — not only a legal one but a technical one too. Yet, such technical solutions don’t receive the support they should.
Currently, funds allocated to new generative tools vastly outweigh investment in detection. Microsoft alone invested a whopping $10 billion in OpenAI, the company behind ChatGPT. To put that in perspective, the total European expenditure on AI is estimated at approximately $21 billion, and given that detection hasn’t featured strongly in the public debate, only a small fraction of this sum can be assumed to be directed toward this purpose.
But in order to mitigate this imbalance, we can’t simply rely on the industry to step up.
Private businesses are unlikely to match funds allocated for detection with their expenditure on generative AI, as profits from detecting generative output aren’t likely to be anywhere near as lucrative as those for developing new creative tools. And even in cases where lucrative investment opportunities for detection tools exist, specialized products will rarely reach the hands of the public.
Synthetic audio technology is a good example of this. Even though so-called voice clones pose a serious threat to the public — especially when used to impersonate politicians or public figures — private companies prioritize other concerns, such as detection mechanisms aimed at security systems in banks to prevent fraud. And developers of such tech have little interest in sharing their source code, as it would encourage attempts to bypass their security systems.
Meanwhile, lawmakers have so far emphasized the regulation of AI content over research funding for detection. The European Union, for example, has taken up the effort of regulation via the AI Act, a regulatory framework aimed at ensuring the responsible and ethical development and use of AI. Nevertheless, finding the right balance between containing high-risk technology and allowing for innovation is proving challenging.
Additionally, it remains to be seen whether effective regulation can even be achieved.
While ChatGPT may be subject to legal oversight because it was developed by OpenAI — an organization that can be held legally accountable — the same cannot be said for smaller projects creating large-language models (LLMs), which are the algorithms that underpin tools like ChatGPT. Using Meta’s LLaMA model, for example, Stanford University researchers were able to create their own LLM with similar performance to ChatGPT for the cost of only $600. This case demonstrates that other LLMs can be built rather easily and cheaply on already existing models and avoid self-regulation — an attractive option for criminals or disinformation actors. And in such instances, legal accountability may be quite impossible.
Robust detection mechanisms thus present a viable solution to gain an edge in the ever-evolving arms race against generative AI tools.
Already at the forefront of fighting disinformation and having pledged massive investments in AI, this is where the EU should lead in providing research funding. And the good news is that it isn’t even necessary to match the amount of funding dedicated to the development of generative AI tools and the money spent on developing tools that facilitate their detection. As a general rule, detection tools don’t require large amounts of scraped data and don’t have the high training costs associated with recent LLMs.
Nevertheless, as the models underlying generative AI advance, detection technology will need to keep pace as well. Additionally, detection mechanisms may also require the cooperation of domain experts too. When it comes to synthetic audio, for example, it’s necessary for machine learning engineers to collaborate with linguists and other researchers in order for such tools to be effective, and provided research funding should facilitate such collaborations.
COVID-19 showed the world states can drive innovation that can help overcome crises when needed. And governments have a role to play in ensuring the public is protected from potentially harmful AI content — investing in the detection of generative AI output is one way to do this.