Explore key topics and content on security risks from powerful AI models while unleashing growth.

Key topics

As frontier AI models continue to develop dangerous capabilities, protecting them from theft and misuse is becoming a critical and neglected mission. Important developments include:
- RAND authored a playbook for Securing AI Model Weights where they define necessary security levels (including SL5 - defending against highly-resourced nation states), and map the current state of frontier AI company security, which they estimate at SL2 - secured against individual professional hackers. RAND also released a paper Five Hard National Security Problems that relate to advanced AI.
- Situational Awareness argues for increased securitization of leading AI companies.
- Anthropic and Googl e have published detailed security frameworks outlining their model protection strategies and implementation plans.
Organizations are actively assessing AI models to understand potential security implications:
- Anthropic released their Frontier Red Team update which lays out the dangerous capabilities and potential dual-use of their models, highlighting cyber risks and biological weapons.
- Pattern Labs built the SOLVE benchmark to answer the question of how capable frontier AI model’s are at vulnerability discovery & exploit development challenge, and used it to assess Claude Sonnet 3.7 pre-release.
- Google’s Project Zero identified a non-trivial zero day with their LLM assisted vulnerability researcher.
- OpenAI's evaluation of their latest 03-mini model shows significant progress on major 3 risk factors, and while it scored “low” on cyber capabilities, it’s possible the evaluation was not indicative of its actual capabilities.
Ensuring AI is secure and beneficial to society requires robust technical policy solutions:
- GovAI published a comprehensive paper that maps key Open Problems in Technical AI Governance, subdivided into skill sets
- CNAS published a report on the importance of secure, governable chip mechanisms in safeguarding the development and deployment of AI.
- RAND presented technical solutions for hardware-enabled governance mechanisms, examined the use of HEMs in policy and analyzed attack vectors that might compromise them.
- The Institute for Progress is publishing a series on how to create secure data centers for AI training and inference.

So what can you do to enhance the security of AI systems?

Way Forward

Blog

Shanni Gurkevitch 3/23/25 Shanni Gurkevitch 3/23/25

Deception and Scheming in AI Agents: Mikita Balensi on How Your AI Will Lie to Achieve Its Goals

Shanni Gurkevitch 3/23/25 Shanni Gurkevitch 3/23/25

The Time of Troubles: Asher Brass on Recognizing and Responding to AI Existential Threats

Shanni Gurkevitch 2/21/25 Shanni Gurkevitch 2/21/25

The frontier of AI Security: what did we learn in the last year?

This is the first edition of the AI Security newsletter, where we’ll be sharing articles, insights and developments related to securing frontier AI and reducing serious AI risks.

Shanni Gurkevitch 2/19/25 Shanni Gurkevitch 2/19/25

Breaking Language Models: A Deep Dive into AI Security Flaws with Nicholas Carlini and Itay Yona

Nicholas Carlini and Itay Yona from Google DeepMind discuss the shift from theoretical to practical attacks on production AI systems and the intersection of traditional security and machine learning.

Explore key topics and content on security risks from powerful AI models while unleashing growth.

Key topics

Blog

Deception and Scheming in AI Agents: Mikita Balensi on How Your AI Will Lie to Achieve Its Goals

The Time of Troubles: Asher Brass on Recognizing and Responding to AI Existential Threats

The frontier of AI Security: what did we learn in the last year?

Breaking Language Models: A Deep Dive into AI Security Flaws with Nicholas Carlini and Itay Yona

Join Our Community