In 2025, the AI landscape is filled with powerful models, including Llama 3.1-405B, GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, O1 Preview, and the recently introduced DeepSeek R1. DeepSeek R1 claims to be more affordable and superior in performance, but how true are those claims? A group of researchers set out to answer this question, and their findings might surprise you.
This situation reminds me of The Wizard of Oz, where Toto pulls back the curtain to reveal the so-called "great and powerful" Wizard, who desperately tries to maintain the illusion, shouting, "Pay no attention to that man behind the curtain!" But today, we’re pulling back the curtain on DeepSeek. Let’s see what’s really going on.
DeepSeek R1 Was Tested & Competitor Results
Researchers from Cisco & the University of Pennsylvania tested DeepSeek R1 using algorithmic jailbreaking, a method designed to bypass AI safety filters by crafting prompts that exploit model weaknesses.
They used 50 harmful prompts from the HarmBench dataset, which contains 400 behaviors across 7 harm categories (e.g., cybercrime, misinformation, and illegal activities). DeepSeek R1 failed 100% of the time, providing harmful responses to every test prompt. The researchers suggest that DeepSeek’s low-cost training methods, which cost an estimated $6M, may have compromised its safety mechanisms for performance. Building fast and cheap has its drawbacks.
Compared to DeepSeek R1’s 0% resistance (100% failure rate), other major AI models had varying levels of resistance to algorithmic jailbreaking:

While powerful, DeepSeek R1 has no safeguards in place, making it a serious security risk.
Why an AI Audit is Necessary
With the increasing number of AI models being released, several pressing concerns have cropped up. Failures of safety in AI models mean harmful content is not filtered out, creating security risks. Algorithmic jailbreaking may also occur without proper audits and allowing AI to bypass safety measures. Transparency issues, such as DeepSeek R1 cost inconsistencies, call into question the veracity of AI research. This contradiction cannot be missed, given that it is predicted to cost either $6 million or $1.3 billion. Data usage and ethical issues, such as OpenAI's allegations of data theft despite the dubious origin of their own training data, place a focus on audits and training data sources.
These are trade-off risks that can arise from AI companies compromising on safety in preference for performance, leading to vulnerabilities. Weaknesses in benchmarking require standardized audits that would compare models based on their resistance to exploits. Finally, an AI audit ensures that AI meets Security, Technical Assessment, Regulatory Compliance, Ethics, and Data Governance standards. This holistic approach ensures a thorough evaluation of AI systems.
Book your Free Security Consultation:
Google Calendar: https://calendar.app.google/Ai15eyQhiV5c1pBXA
Telegram: https://t.me/m_ndr