OpenAI Details Its New Risk Management Frameworks
In response to recent challenges and the global concern surrounding AI capabilities, OpenAI released its beta version of the new risk management frameworks.
- “Preparedness Framework” outlines strategies to mitigate potential advanced AI risks.
- The company recognized key categories, including cyber security, CBRN threats, persuasion, and model autonomy.
- The board of directors has veto power.
OpenAI has published a comprehensive “Preparedness Framework” designed to address and mitigate potential advanced AI “catastrophic risks.”
The last couple of months have been a rollercoaster for the AI company what with the Sam Altman firing debacle and the global fear of AI’s capabilities. With this beta documentation, the company is committing to preventing worst-case scenarios,
They pinpointed five key elements to their framework:
- Tracking catastrophic risk level via evaluations
- Seeking out unknown-unknowns
- Establishing safety baselines
- Tasking the Preparedness team with on-the-ground work
- Creating a cross-functional advisory body
In the first section, OpenAI outlines the “Tracked Risk Categories” for evaluating potential risks associated with advanced AI models. Each category is graded from Low to Critical, indicating the severity of potential risks.
- Cyber security
- Chemical, biological, Nuclear, and Radiological (CBRN) threats
- Persuasion
- Model autonomy
The risk categories are not considered exhaustive, and the company commits to updating and expanding the list based on evolving understanding and research developments.
OpenAI also introduced a Scorecard, aiming to track pre-mitigation and post-mitigation model risks across various categories. The Scorecard will be dynamic, frequently updated by the Preparedness team, incorporating research findings, observed misuse, and input from other teams.
- Pre-mitigation risk is assessed for worst-case scenarios, considering base and fine-tuned models.
- Post-mitigation risk is targeted to be kept at a “medium” level or below.
The scorecard serves as a comprehensive tool for ongoing risk assessment and safety management.
The company also detailed its governance and safety baselines. They focus on asset protection, restricting deployment, and restricting development.
If “high” pre-mitigation risk is forecasted, security measures will be implemented, such as compartmentalization and deployment restrictions. Deployment is limited to models with a post-mitigation score of “medium” or below, and development is restricted for models with a post-mitigation score of “high” or below.
To implement the safety framework, OpenAI has formed a dedicated “Preparedness Team,” led by Massachusetts Institute of Technology professor Aleksander Madry. The team’s responsibilities include evaluating and monitoring potential risks associated with advanced AI models.
From the looks of this beta Preparedness framework, the decision-making process involves the SAG assessing cases, forwarding recommendations to OpenAI Leadership, and allowing the board of directors to review and potentially reverse decisions.
Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.