Nvidia accused of scraping ‘A Human Lifetime’ of videos per day to train AI

(Image credit: Shutterstock)

Nvidia is being accused of scraping millions of videos online to train its own AI products. Sources say the videos weren’t just intended for research but were supposed to be used for the company’s products, including Omniverse 3D world generator, self-driving car systems, and its Digital Humans avatar generator. These reports allegedly came from an anonymous former Nvidia employee who shared the data with 404 Media.

According to the outlet, several employees were instructed to download videos to train Nvidia’s AI. Many have raised concerns about the legality and ethics of the move, but project managers have consistently assured them. Ming-Yu Liu, vice president of Research at Nvidia, allegedly responded to one question with, “This is an executive decision. We have an umbrella approval for all of the data.”

It isn’t the first time an AI tech company has been accused of scraping online content without permission. Several lawsuits exist against AI companies like OpenAI, Stability AI, Midjourney, DeviantArt, and Runway. Nvidia isn’t affected at the moment, as it’s primarily known for supplying AI chip data centers, which helped make it one of the most valuable companies in the world.

Some sources report that Nvidia used publicly available videos, data licensed exclusively for non-commercial research, YouTube videos, and even movies and shows from Netflix. It’s even alluded that the company will have someone watching the movies while using screen capture technology to record from Netflix, although we cannot ascertain if this was a joke. “We should get a lot of high-quality face videos from this,” adds Liu.

The Nvidia team working on its AI training should also consider capturing gameplay video and tapping the GeForce Now team to help them get it. However, Jim Fan, a senior research scientist at Nvidia said, “We don’t have yet have statistics or video files yet, because the infras [sic] is not yet set up to capture lots of live game videos & actions. They’re both engineering & regulatory hurdles to hop through. But we will add cleaned & processed GFN (GeForce Now) data to team-vfm as soon (as) they arrive.”

It’s unclear how deep and wide the Cosmos project is in Nvidia, but 404 Media has quoted Nvidia CEO Jensen Huang responding to an email about it with, “Great update. Many companies have to build video FM [foundational models]. We can offer a fully accelerated pipeline.”

Nvidia is likely rushing to build its model while copyright and other AI training issues haven’t yet settled, resulting in a massive legal gray area. At the moment, there is no specific law that deals with AI training, but legislators have already taken notice. Several bills in Congress specifically tackle this, like the AI Foundation Model Transparency Act and the Generative AI Copyright Disclosure Act.

TOPICS

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

3 Comments Comment from the forums

ThomasKinsley

More reasons to hate AI. CoPilot finally appeared on my W10 machine. Thankfully I was able to uninstall the abomination.
Reply
vanadiel007

This article shows how short our lives are. Each hour ticking away until the number reaches 0...
Reply
hotaru251

Many have raised concerns about the legality and ethics of the move

This is why we need a court system to go over it (ideally one that has an understanding of the issue of its importance) sooner rather than later and if it is indeed breaking rules and stealing content either pay people or (ideally) scrap it all & force em to start over.
Reply