U.S. Commerce Sec. Lutnick says American AI dominates DeepSeek, thanks Trump for AI Action Plan — OpenAI and Anthropic beat Chinese models across 19 different benchmarks

Deepseek logo on an iPhone
(Image credit: Getty / Herstockart)

The National Institute of Science and Technology (NIST) has just completed a comprehensive test of Chinese and American AI models, with the results showing that models from OpenAI and Anthropic outperformed DeepSeek across 19 different benchmarks. U.S. Commerce Secretary Howard Lutnick shared the results on X, thanking President Donald Trump for his AI Action Plan to accelerate American AI innovation and infrastructure while encouraging its allies and friendly nations to adopt it.

“The report is clear: DeepSeek lags far behind, especially in cyber and software engineering. These weaknesses aren’t just technical. They demonstrate why relying on foreign AI is dangerous and shortsighted,” Sec. Lutnick said in his post. “Allowing our adversaries to control AI poses serious risks to our security. By setting the standards, driving innovation, and keeping America secure, the Department of Commerce is helping ensure continued U.S. leadership in AI.”

NIST is a federal agency under the Commerce Department that develops standards and supports industry to help keep the U.S. industrially competitive globally, and it conducted this study under the newly-established Center for AI Standards and Innovation (CAISI).

The tests pitted the R1, R1-0528, and V3.1 DeepSeek models (crucially not DeepSeek's new V3.2 released this week) against OpenAI’s GPT-5, GPT-5-mini, and GPT-oss, and Anthropic’s Opus 4, using 19 different benchmarks. These publicly available tests include SWE-bench Verified and Breakpoint for software engineering, MMLU-Pro and GPQA for general knowledge capabilities, SMT 2025, PUMaC 2024, and OTIS-AIME 2025 math contests for mathematical reasoning, and the AgentDojo framework for hijacking attack resilience. Aside from this, the institution also customized and developed its own custom assessments to test for things like CCP censorship, as there’s no standard test for that.

All the results were outlined in a 69-page document [PDF], with CAISI saying that OpenAI and Anthropic outperform DeepSeek in all tests, but most especially in software engineering and cyber tasks. The U.S. AI models generally outperform DeepSeek by 20 to 80%, and cost around 35% less to operate. The latter is also easier to hijack and jailbreak, making it more susceptible to acting unintentionally. The report also said that Chinese models are biased and that they toe the line when it comes to messaging from Beijing, although it's worth bearing in mind that other AI benchmarking tools exist that might yield different results.

Despite all this, DeepSeek R1 is continuously being adopted, with CAISI saying that the “use of these models may pose a risk to application developers, to consumers, and to U.S. national security.” Beyond that, the Chinese AI company is continuously releasing new models, with DeepSeek-V3.2-Exp being released earlier this week, possibly rendering some of these tests moot.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Jowi Morales
Contributing Writer

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

  • Roland Of Gilead
    Well, of course. The US AI models are better than Deepseek! Those big, beautiful AI models! The best AI models. Never been seen before! Why would anyone doubt that?

    Hmmm. I'll. take with a pinch of salt any claims such as this, particularly when the current administrations applies coercive control of Gov Departments and the results and data they provide.

    When NISC do a full test with the latest iteration of DeepSeek, and we can see those numbers, then we might be able to consider spurious claims with more accurate results and their validity.

    As the article notes
    Despite all this, DeepSeek R1 is continuously being adopted,
    There's a reason for this.

    This seems more political spin to me.
    Reply