AI Companies Promise Watermarked Data, Additional Safety Measures

white house AI
(Image credit: Shutterstock)

At a summit at the White House this Friday, seven of the world's top AI companies vowed to improve upon the safety and security guardrails around their AI products. After months of back-and-forth consultations and requests for comment, the deal between the White House and AI-invested companies Amazon, Anthropic, Meta, Microsoft, Google, Inflection, and OpenAI seeks to address the Administration's concerns regarding the risks and dangers of AI systems.

One of the agreed-upon measures is an increase in funding for discrimination research, as a way to counter the algorithmic biases that are currently inherent to AI networks.

The companies also agreed to make additional investments in cybersecurity. If you've ever developed a project or coded something within a tool like ChatGPT, you know how comprehensive the information contained in your AI chat is. And there have already been enough ChatGPT user credentials leaked online — at no fault to OpenAI, mind you — which is what the increased cybersecurity investment is meant to combat.

Also promised was the implementation of watermarks on AI-generated content — an issue that's been particularly hot off the presses lately, for a variety of reasons. 

There's the copyright angle, which has seen multiple lawsuits launched at generative AI companies already: watermarking AI-generated content would be a way to assuage fears of human-generated, emergent data (the one that's automatically produced just from acting out our lives) being further and further diluted in a sea of rapidly-improving AI-generated content.

There's also the systemic impact angle: what jobs will be affected by AI? Yes, there's enough need in the world to eventually absorb workers toward other, less impacted industries, but that transition does have human, economic, and timeframe costs. If too much changes, too fast, the entire economy and labor system could be broken. 

Of course, watermarking AI-generated data (or synthetic data, as it's been more recently and frequently called) is also in the interest of AI companies. They don't want their AIs to eventually go MAD due to synthetic data-sets, poisoned datasets, nor by the inability to discern synthetic data from the safer, but much more expensive, emergent data. 

And if issues in recursively training AIs remain too tough to crack for too long now that the AI is out of the bottle, AI developers could soon run out of good datasets with which to keep training their networks.

All of the promises were voluntary, possibly in a show of goodwill on the part of the corporations most heavily-invested on AI. But there's an added bonus: a move such as this also takes some of the edge off from the "can we control AI at the pace we're currently going at?" debate. If AI's own developers are willing to voluntarily increase the safety and security of their systems, then perhaps this too is an area they'll also be good gatekeepers at (although your mileage may vary). 

Part of the problem with this approach is that these are just seven companies: what about the hundreds of other companies developing AI products? Can those that are already smaller and at a disadvantage compared to giants such as OpenAI and Microsoft be trusted? Because those are the companies that have more to gain from forcibly bringing the product their livelihoods depend upon out into the unprepared open. It definitely wouldn't be the first time a product was rushed to monetization.

The commitments did call for internal and external validation and verification that they're being actively pursued (but there are always oversights, miscommunications, lost documents, and loopholes). 

The issue here is that AI does present a fundamental, extinction-level risk, and there's one side of that edge we definitely don't want to be on.

Francisco Pires
Freelance News Writer

Francisco Pires is a freelance news writer for Tom's Hardware with a soft side for quantum computing.

  • Kamen Rider Blade
    I prefer we regulate the hell out of them, including where they get their sources, if the sources "Approved them for using their data".

    Make sure that "Approval is Authenticated" and not falsified.

    They also need to fully register to the public that it's a "AI created data" or "AI created Data that is modified by a human".

    Both need to be fully regulated as a common requirement with financial & criminal punishment for those who don't comply.
    Reply
  • hotaru251
    Imagine trusting bleeding edge tech companies to keep promises if it interferes with advancement/profit..


    There's also the systemic impact angle: what jobs will be affected by AI? Yes, there's enough need in the world to eventually absorb workers toward other, less impacted industries, but that transition does have human, economic, and timeframe costs. If too much changes, too fast, the entire economy and labor system could be broken.

    too late for that.

    most jobs WILL be replaced in future.

    anything that isn't required to have human judgements. (i.e. up keeping systems that need physical manipulation, medical/scientific fields where they need to verify stuff, and ofc CEO casue they wont ever replace themselves)


    services, ordering, user support, driving, etc are all stuff that ai can eventually do and have no need for humans to do em.

    This is why gov should be outlining/brainstorming how to support most of their nations populace that no longer has jobs because ai is cheaper/better/efficient than any human can be & the other jobs are so few avilible not enoguh for all of ppl who need em.

    There will be a time gov will have to pay citizens for doing nothing because thats the sad future in machines can do most of humans do.

    the labor system is a time bomb that will happen just matter of when.
    Reply
  • jp7189
    Many AI images already carry an invisible watermark. It's a pretty clever algorithm that survives a lot of image post processing steps.
    https://medium.com/@steinsfu/stable-diffusion-the-invisible-watermark-in-generated-images-2d68e2ab1241
    I'm not sure how this would be done for LLM output though.
    Reply