US evaluation finds security and performance gaps in DeepSeek AI models
The CAISI report was produced as part of President Trump’s America’s AI Action Plan, which mandates federal evaluation of foreign AI systems and their potential security risks.

A new evaluation by the US Department of Commerce’s Center for AI Standards and Innovation (CAISI) has found that artificial intelligence models developed by China’s DeepSeek lag behind US models in key areas including performance, cost, security, and adoption.
The assessment, conducted under the National Institute of Standards and Technology (NIST), compared three DeepSeek models (R1, R1-0528, and V3.1) against four leading US systems – OpenAI’s GPT-5, GPT-5-mini, gpt-oss, and Anthropic’s Opus 4 – across 19 benchmarks. The U.S. models outperformed their Chinese counterparts in nearly every category, particularly in software engineering and cybersecurity tasks, where they solved over 20% more tasks.
The report also highlights serious security vulnerabilities. DeepSeek’s most secure model was found to be 12 times more likely than US frontier models to follow malicious instructions, such as sending phishing emails or leaking credentials in simulations. It was also far more prone to ‘jailbreaking’ attacks, responding to 94% of malicious prompts compared to just 8% among US models.
Cost efficiency was another concern: US models performed similar tasks at roughly one-third lower cost. Additionally, evaluators found that DeepSeek systems frequently amplified misleading narratives aligned with the Chinese Communist Party, raising questions about information integrity and model alignment.
The CAISI report was produced as part of President Trump’s America’s AI Action Plan, which mandates federal evaluation of foreign AI systems and their potential security risks.