> **来源:[研报客](https://pc.yanbaoke.cn)** Future of Life Institute # AI Safety Index Winter 2025 December 2025 Available online at: futureoflife.org/index Contact us: policy@futureoflife.org # Contents 1 Executive Summary 2 1.1 Key Findings 2 1.2 Company Progress Highlights and Improvement Recommendations 4 1.3 Methodology 6 1.4 Independent Review Panel 7 2 Introduction 8 3 Methodology 9 3.1 Indicator Selection 9 3.2 Company Selection 12 3.3 Related Work 12 3.4 Evidence Collection 13 3.5 Grading 14 3.6 Limitations 14 4 Results 16 4.1 Key Findings 16 4.2 Company Progress Highlights and Improvement Recommendations 18 4.3 Domain-level findings 20 5 Conclusions 25 Bibliography 26 Appendix A: Grading Sheets 27 Risk Assessment 29 Current Harms 42 Safety Frameworks 51 Existential Safety 66 Governance & Accountability 74 Information Sharing and Public Messaging 84 Appendix B: Company Survey 98 Introduction 98 Whistleblowing policies (16 Questions) 99 External Pre-Deployment Safety Testing (6 Questions) 104 Internal Deployments (3 Questions) 107 Safety Practices, Frameworks, and Teams (9 Questions) 108 About the Organization: The Future of Life Institute (FLI) is an independent nonprofit organization with the goal of reducing large-scale risks and steering transformative technologies to benefit humanity, with a particular focus on artificial intelligence (AI). Learn more at futureoflife.org. # 1 Executive Summary The Future of Life Institute's AI Safety Index provides an independent assessment of eight leading AI companies' efforts to manage both immediate harms and catastrophic risks from advanced AI systems. Conducted with an expert review panel of distinguished AI researchers and governance specialists, this third evaluation reveals an industry struggling to keep pace with its own rapid capability advances—with critical gaps in risk management and safety planning that threaten our ability to control increasingly powerful AI systems. <table><tr><td></td><td>Anthropic</td><td>OpenAI</td><td>Google DeepMind</td><td>xAI</td><td>Z.ai</td><td>Meta</td><td>DeepSeek</td><td>(-) Alibaba Cloud</td></tr><tr><td>Overall Grade</td><td>C+</td><td>C+</td><td>C</td><td>D</td><td>D</td><td>D</td><td>D</td><td>D-</td></tr><tr><td>Score</td><td>2.67</td><td>2.31</td><td>2.08</td><td>1.17</td><td>1.12</td><td>1.10</td><td>1.02</td><td>0.98</td></tr><tr><td>Risk Assessment 6 indicators</td><td>B</td><td>B</td><td>C+</td><td>D</td><td>D+</td><td>D</td><td>D</td><td>D</td></tr><tr><td>Current Harms 7 indicators</td><td>C+</td><td>C-</td><td>C</td><td>F</td><td>D</td><td>D+</td><td>D+</td><td>D+</td></tr><tr><td>Safety Frameworks 4 indicators</td><td>C+</td><td>C+</td><td>C+</td><td>D+</td><td>D-</td><td>D+</td><td>F</td><td>F</td></tr><tr><td>Existential Safety 4 indicators</td><td>D</td><td>D</td><td>D</td><td>F</td><td>F</td><td>F</td><td>F</td><td>F</td></tr><tr><td>Governance & Accountability 4 indicators</td><td>B-</td><td>C+</td><td>C-</td><td>D</td><td>D</td><td>D</td><td>D</td><td>D+</td></tr><tr><td>Information Sharing 10 indicators</td><td>A-</td><td>B</td><td>C</td><td>C</td><td>C-</td><td>D-</td><td>C-</td><td>D+</td></tr><tr><td>Survey Responses</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>✓</td><td>×</td><td>×</td><td>×</td></tr></table> Grading: Uses the US GPA system for grade boundaries: A+, A-, B+, [...], F letter values corresponding to numerical values 4.3, 4.0, 3.7, 3.3, [...], 0. # 1.1 Key Findings - The top 3 companies from last time, Anthropic, OpenAI and Google DeepMind, hold their position, with Anthropic receiving the best score in every domain. Anthropic has sustained its leadership in safety practices through consistently high transparency in risk assessment, a comparatively well-developed safety framework, substantial investment in technical safety research, and governance commitments reflected in its Public Benefit Corporation structure and support for state-level legislation such as SB 53. However, it also shows areas of deterioration, including the absence of a human uplift trial in its latest risk-assessment cycle and a shift toward using user interactions for training by default. - There is a substantial gap between these top three companies and the next tier (xAI, Z.ai, Meta, DeepSeek, and Alibaba Cloud), but recent steps taken by some of these companies show promising signs of improvement that could help close this gap in the next iteration. The next-tier companies still face major gaps in risk-assessment disclosure, safety-framework completeness, and governance structures such as whistleblowing policies. That said, several companies have taken meaningful steps forward: Meta's new safety framework may support more robust future disclosures, and Z.ai has indicated that it is developing an existential-risk plan. - Existential safety remains the sector's core structural failure, making the widening gap between accelerating AGI/superintelligence ambitions and the absence of credible control plans increasingly alarming. While companies accelerate their AGI and superintelligence ambitions, none has demonstrated a credible plan for preventing catastrophic misuse or loss of control. No company scored above a D in this domain for the second consecutive edition. Moreover, although leaders at firms such as Anthropic, OpenAI, Google DeepMind, and Z.ai have spoken more explicitly about existential risks, this rhetoric has not yet translated into quantitative safety plans, concrete alignment-failure mitigation strategies, or credible internal monitoring and control interventions. - xAI and Meta have taken meaningful steps towards publishing structured safety frameworks, although limited in scope, measurability, and independent oversight. Meta introduces a relatively comprehensive safety framework with the only outcome-based thresholds, although its trigger for mitigation is set too high and decision-making authority remains unclear. Meanwhile, xAI has formalized its safety framework with quantitative thresholds, but it remains narrow in risk coverage and does not specify how threshold breaches translate into mitigation mechanisms. - More companies have conducted internal and external evaluations of frontier AI risks, although the risk scope remains narrow, validity is weak, and external reviews are far from independent. Compared to the last edition, xAI and Z.ai both shared more about their risk assessment processes, joining Anthropropic, OpenAI and Google DeepMind. However, reviewers have pointed out that disclosures still fall short: key risk categories are under-addressed, external validity is not adequately tested, and external reviewers are not truly "independent." - Although there were no Chinese companies in the Top 3 group, reviewers noted and commended several of their safety practices mandated under domestic regulation. Domestic regulations, including binding requirements for content labeling and incident reporting, and voluntary national technical standards outlining structured AI risk-management processes, give Chinese firms stronger baseline accountability for some indicators compared to their Western counterparts. - Companies' safety practices are below the bar set by emerging standards, including EU AI Code of Practice. Reviewers underscored the persistent gap between published governance frameworks and actual safety practices of companies across industry, noting that companies still fail to meet basic requirements such as independent oversight, transparent threat modeling, measurable thresholds, and clearly defined mitigation triggers Taken together, these findings point to a frontier-AI ecosystem where companies' safety commitment continues to lag far behind its capability ambition. Even the strongest performers lack the concrete safeguards, independent oversight, and credible long-term risk-management strategies that such powerful systems demand, while the rest of the industry remains far behind on basic transparency and governance obligations. This widening gap between capability and safety leaves the sector structurally unprepared for the risks it is actively creating. Note: the evidence was collected up until November 8, 2025 and does not reflect recent events such as the releases of Google DeepMind's Gemini 3 Pro, xAI's Grok 4.1, OpenAI's GPT-5.1, or Anthropic's Claude Opus 4.5. # 1.2 Company Progress Highlights and Improvement Recommendations All companies must move beyond high-level existential-safety statements and produce concrete, evidence-based safeguards with clear triggers, realistic thresholds, and demonstrated monitoring and control mechanisms capable of reducing catastrophic-risk exposure—either by presenting a credible plan for controlling and aligning AGI/ASI or by clarifying that they do not intend to pursue such systems. <table><tr><td>Company</td><td>Progress Highlights</td><td>Improvement Recommendations</td></tr><tr><td>Anthropic</td><td>·Anthropic has increased transparency by filling out the company survey for the AI Safety Index. ·Anthropic has improved governance and accountability mechanisms by sharing more details about its whistleblower policy and promising to release a public version soon. ·Compared to other US companies, Anthropic has been relatively supportive of both international and U.S. state-level governance and legislative initiatives related to AI safety.</td><td>·Make thresholds and safeguards more concrete and measurable by replacing qualitative, loosely defined criteria with quantitative risk-tied thresholds, and by providing clearer evidence and documentation that deployment and security safeguards can meaningfully mitigate the risks they target. ·Strengthen evaluation methodology and independence, including moving beyond fragmented, weak-validity, task-based assessments and incorporating latent-knowledge elicitation, involving uncensored and credibly independent external evaluators.</td></tr><tr><td>OpenAI</td><td>·OpenAI has documented a risk assessment process that spans a wider set of risks and provides more detailed evaluations than its peers. ·Although OpenAI's new governance structure has been criticized, reviewers considered a public benefit corporation to be better than a pure for-profit corporation.</td><td>·Make safety-framework thresholds measurable and enforceable, by clearly defining when safeguards trigger, linking thresholds to concrete risks, and demonstrating proposed mitigations can be implemented in practice. ·Increase transparency and external oversight, by aligning public positions with stated safety commitments, and creating more and stronger open channels for independent audit. ·Increase efforts to prevent AI psychosis and suicide, and act less adversarially toward alleged victims. ·Reduce lobbying against state-level regulations focused on AI safety.</td></tr><tr><td>Google DeepMind</td><td>·Google DeepMind has improved in transparency by completing the AI Safety Index survey. ·Google DeepMind has improved governance and accountability mechanisms by sharing details about its whistleblower policy.</td><td>·Strengthen risk-assessment rigor and independence, by moving beyond fragmented and evaluations of weak validity, testing in more realistic noisy or adversarial conditions, and ensuring that external evaluators are not selectively chosen and compensated for. ·Make thresholds and governance structures more concrete and actionable, by defining measurable criteria, adapting Cyber CCLs to reflect volume-based risk, and establishing clear relationships with external governance, among internal governance bodies, and mechanisms for acting on thresholds being passed. ·Increase efforts to prevent AI psychological harm and consider distancing itself from CharacterAI. ·Reduce lobbying against state-level regulations focused on AI safety.</td></tr><tr><td>xAI</td><td>·xAI has formalized and published its frontier AI safety framework.</td><td>·Improve breadth, rigor and independence of risk assessments, including sharing more detailed evaluation methods and incorporating meaningful external oversight. ·Consolidate and clarify the risk-management framework with broader coverage of risk categories, measurable thresholds, assigned responsibilities, and defined procedures for acting on risk signals. ·Allow more pre-deployment testing for future models than what was done for Grok4.</td></tr><tr><td>Z.ai</td><td>·Z.ai took a meaningful step toward external oversight, including allowing third-party evaluators to publish safety evaluation results without censorship and expressing willingness to defer to external authorities for emergency response.</td><td>·Publicize the full safety framework and governance structure with clear risk areas, mitigations, and decision-making processes. ·Substantially improve model robustness and trustworthiness by improving performance on system and operational risks benchmarks, content-risk benchmarks and safety benchmarks. ·Establish and publicize a whistleblower policy to enable employees to raise safety concerns without fear of retaliation. ·Consider signing the EU AI Act Code of Practice.</td></tr><tr><td>Meta</td><td>·Meta has formalized and published its frontier AI safety framework with clear thresholds and risk modeling mechanisms.</td><td>·Improve breadth, depth and rigor of risk assessments and safety evaluations, including clarifying methodologies as well as sharing more robust internal and external evaluation processes. ·Strengthen internal safety governance by establishing empowered oversight bodies, transparent whistleblower protections, and clearer decision-making authority for development and deployment safeguards. ·Foster a culture that takes frontier-level risks more seriously, including a more cautious stance toward releasing model weights. ·Improve overall information sharing, including by completing the AI Safety Index survey, participating in international voluntary standards efforts, signing the EU AI Act Code of Practice, and providing more substantive disclosures in the model card.</td></tr><tr><td>DeepSeek</td><td>·DeepSeek's employees have become more outspoken about frontier AI risks and the company has contributed to standard-setting for these risks.</td><td>·Establish and publish a foundational safety framework and risk-assessment process, including system cards and basic model evaluations. ·Establish and publish a whistle-blower policy and bug bounty program. ·Substantially improve model robustness and trustworthiness by improving performance on benchmarks that evaluate system & operational Risks, content safety risks, societal risks, legal & rights-related risks, fairness, and safety. ·Establish and publicize a whistleblower policy to enable employees to raise safety concerns without fear of retaliation. ·Improve overall information sharing, including by completing the AI Safety Index survey, participating in international voluntary standards efforts. ·Consider signing the EU AI Act Code of Practice.</td></tr><tr><td>Alibaba Cloud</td><td>·Alibaba Cloud has contributed to the binding national standards on watermarking requirements.</td><td>·Establish and publish a foundational safety framework and risk-assessment process, including system cards and basic model evaluations. ·Substantially improve model robustness and trustworthiness by improving performance on truthfulness, fairness, and safety benchmarks. ·Establish and publicize a whistleblower policy to enable employees to raise safety concerns without fear of retaliation. ·Improve overall information sharing, including by completing the AI Safety Index survey, participating in international voluntary standards efforts. ·Consider signing the EU AI Act Code of Practice.</td></tr></table> # 1.3 Methodology Index Structure: The Winter 2025 Index evaluates eight leading AI companies on 35 indicators spanning six critical domains. The eight companies include Anthropic, OpenAI, Google DeepMind, xAI, Z.ai, Meta, DeepSeek, Alibaba Cloud. The indicators are listed below, and more detailed definitions can be found in Section 3.1. # Risk Assessment Internal Testing Dangerous Capability Evaluations Elicitation for Dangerous Capability Evaluations Human Uplift Trials External Testing Independent Review of Safety Evaluations Pre-deployment External Safety Testing Bug Bounties for System Vulnerabilities # Current Harms Safety Performance Stanford's HELM Safety Benchmark Stanford's HELM AIR Benchmark TrustLLM Benchmark Center for AI Safety Benchmarks Digital Responsibility Protecting Safeguards from Fine-tuning Watermarking User Privacy # Safety Frameworks Risk Identification Risk Analysis and Evaluation Risk Treatment Risk Governance # Information Sharing Technical Specifications System Prompt Transparency Behavior Specification Transparency Voluntary Commitment G7 Hiroshima AI Process Reporting EU General-Purpose AI Code of Practice Frontier Al Safety Commitments (Al Seoul Summit, 2024) FLI AI Safety Index Survey Engagement Endorsement of the Oct. 2025 Superintelligence Statement Risks & Incidents Serious Incident Reporting & Government Notifications Extreme-Risk Transparency & Engagement Public Policy Policy Engagement on AI Safety Regulations # Existential Safety Existential Safety Strategy Internal Monitoring and Control Interventions Technical AI Safety Research Supporting External Safety Research # Governance & Accountability Company Structure & Mandate Whistleblowing Protection Whistleblowing Policy Transparency Whistleblowing Policy Quality Analysis Reporting Culture & Whistleblowing Track Record Data Collection: The Index collected evidence up until November 8, 2025, combining publicly available materials—including model cards, research papers, and benchmark results—with responses from a targeted company survey designed to address specific transparency gaps in the industry, such as transparency on whistleblower protections and external model evaluations. Anthropic, OpenAI, Google DeepMind, xAI and Z.ai have submitted their survey responses. The complete evidence base is documented in Appendix A and Appendix B. Expert Evaluation: An independent panel of eight leading AI researchers and governance experts reviewed company-specific evidence and assigned domain-level grades (A-F) based on absolute performance standards with discretionary weights. Reviewers provided written justifications and improvement recommendations. Final scores represent averaged expert assessments, with individual grades kept confidential. # 1.4 Independent Review Panel The scoring was conducted by a panel of distinguished AI experts: # David Krueger David Krueger is an Assistant Professor in Robust, Reasoning and Responsible AI in the Department of Computer Science and Operations Research (DIRO) at University of Montreal, a Core Academic Member at Mila, and an affiliated researcher at UC Berkeley's Center for Human-Compatible AI, and the Center for the Study of Existential Risk. His work focuses on reducing the risk of human extinction from Al. # Dylan Hadfield-Menell Dylan Hadfield-Menell is an Assistant Professor at MIT, where he leads the Algorithmic Alignment Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL). A Schmidt Sciences AI2050 Early Career Fellow, his research focuses on safe and trustworthy AI deployment, with particular emphasis on multi-agent systems, human-AI teams, and societal oversight of machine learning. # Stuart Russell Stuart Russell is a Professor of Computer Science at the University of California at Berkeley and Director of the Center for Human-Compatible AI and the Kavli Center for Ethics, Science, and the Public. He is a member of the National Academy of Engineering and a Fellow of the Royal Society. He is a recipient of the IJCAI Computers and Thought Award, the IJCAI Research Excellence Award, and the ACM Allen Newell Award. In 2021 he received the OBE from Her Majesty Queen Elizabeth and gave the BBC Reith Lectures. He coauthored the standard textbook for AI, which is used in over 1500 universities in 135 countries. # Sharon Li Sharon Li is an Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. Her research focuses on algorithmic and theoretical foundations of safe and reliable AI, addressing challenges in both model development and deployment in the open world. She serves as the Program Chair for ICML 2026. Her awards include a Sloan Fellowship (2025), NSF CAREER Award (2023), MIT Innovators Under 35 Award (2023), Forbes 30under30 in Science (2020), and "Innovator of the Year 2023" (MIT Technology Review). She won the Outstanding Paper Award at NeurIPS 2022 and ICLR 2022. # Jessica Newman Jessica Newman is the Founding Director of the AI Security Initiative, housed at the Center for Long-Term Cybersecurity at the University of California, Berkeley. She serves as an expert in the OECD Expert Group on AI Risk and Accountability and contributes to working groups within the U.S. Center for AI Standards and Innovation, EU Code of Practice Plenaries, and other AI standards and governance bodies. # Sneha Revanur Sneha Revanur is the founder and president of Encode, a global youthled organization advocating for the ethical regulation of AI. Under her leadership, Encode has mobilized thousands of young people to address challenges like algorithmic bias and AI accountability. She was featured on TIME's inaugural list of the 100 most influential people in AI. # Tegan Maharaj Tegan Maharaj is an Assistant Professor in the Department of Decision Sciences at HEC Montréal, where she leads the ERRATA lab on Ecological Risk and Responsible AI. She is also a core academic member at Mila. Her research focuses on advancing the science and techniques of responsible AI development. Previously, she served as an Assistant Professor of Machine Learning at the University of Toronto. # Yi Zeng Yi Zeng is an AI Professor at the Chinese Academy of Sciences, the Founding Dean of the Beijing Institute of AI Safety and Governance, and the Director of the Beijing Key Laboratory of Safe AI and Superalignment. He serves on the UN High-level Advisory Body on AI, the UNESCO Ad Hoc Expert Group on AI Ethics, the WHO Expert Group on the Ethics/Governance of AI for Health, and the National Governance Committee of Next Generation AI in China. He has been recognized by the TIME100 AI list. # 2 Introduction Frontier AI systems are now advancing with such speed and autonomy that make questions of near-term harms and long-term controllability increasingly salient. While today's AI systems already raise serious concerns around misuse and reliability, the development of more advanced, highly agentic, and self-improving models introduces risks at an entirely different scale and impact. As capabilities rise, both the opportunities offered by these systems and the risks they pose expand accordingly. Yet capability alone does not determine the overall risk landscape; it is also shaped by factors such as geopolitical competition, safety priorities, and public consensus. Because leading AI companies sit closest to these emerging thresholds, the safeguards they build—or fail to build—will heavily influence whether increasingly capable systems remain controllable or aligned with human intentions and values as they advance. In response to this growing urgency, the AI Safety Index—developed by the Future of Life Institute together with an independent panel of experts in AI safety, governance, and technical evaluation—offers an independent assessment of how responsibly the world's leading AI companies are developing and deploying frontier systems. The Index evaluates companies safety practices on 35 indicators across six domains, from frontier risk management frameworks, to pre-deployment safety evaluations, from internal governance structure to external information sharing. By presenting results in a format accessible to both specialists and general audiences, the Index provides a transparent, evidence-based, and comparative picture of how companies manage risks as their systems become more capable, helping to identify where best practices are emerging and where critical gaps remain. This iteration arrives at a moment when international expectations for corporate responsibility are becoming more concrete. New regulatory and governance initiatives, such as the G7 Hiroshima AI Process, the EU AI Code of Practice, California's SB53, and strengthened evaluation protocols from national AI Safety Institutes, are raising the baseline for what responsible behavior should look like. In this context, it is increasingly important to examine how companies are responding to these emerging obligations and voluntary commitments, and how these responses align with the scale of their stated ambitions for increasingly capable systems. The broader global consensus remains clear: rapidly advancing capabilities require urgent investment in alignment research and major improvements in risk-management practices. Therefore, in this iteration, we evaluate eight frontier AI companies from across the world—including Anthropic, OpenAI, Google DeepMind, xAI, Z.ai, Meta, DeepSeek, and Alibaba Cloud—using a set of indicators that remain largely consistent with the previous edition. Keeping the indicators stable allows not only meaningful comparison across companies, but also comparison across iterations, making it possible to track how firms' safety practices evolve over time. This edition continues to serve as a practical and public-facing tool for tracking corporate behavior, identifying emerging best practices, and surfacing critical gaps in preparedness. By making companies' risk-management practices more visible and comparable, the Index aims to strengthen incentives for responsible development and narrow the gap between formal commitments and real-world actions, especially at a time when the stakes continue to rise. # 3 Methodology The AI Safety Index evaluates and grades the safety practices from AI companies in four steps: indicator selection, company selection, evidence collection, and grading. # 3.1 Indicator Selection To closely examine AI companies' safety practices throughout the lifecycle, we use 32 out of 34 indicators from the Summer 2025 edition, spanning six domains. The domains capture different aspects of responsible AI development and deployment, including risk assessment, current harms, safety framework, existential risk strategy, governance and accountability, as well as information sharing and public messaging, echoing principles embedded in regulatory obligations and voluntary commitments frameworks including the EU AI Code of Practice and the G7 Hiroshima Process. In particular, the Index highlights the existential risk strategy—a dimension not explicitly addressed in leading governance frameworks—because proactive planning for existential risk has become a pressing need, as emphasized by leading AI technical researchers and governance experts, including Bengio et al. (2024). Two indicators from the original set, based on one-off robustness evaluations from UK's AI Safety Institute (AISI) and Cisco, were removed due to the lack of replicable evaluation protocols for the newly released frontier AI systems. Instead, we adopt the CAIS Safety Index, which aggregates performance across a range of open and ongoing evaluations, including deception, harmful behavior, overconfidence, jailbreak resistance, and bioweapon misuse. With support from CAIS, these benchmarks were run on the most recent models, ensuring consistency for comparison. Additionally, three new indicators were added to the Information Sharing and Public Messaging domain to more comprehensively monitor company participation in key global voluntary commitments on safeguarding against frontier AI risks: the EU AI Code of Practice, the Frontier AI Safety Commitments at the Al Seoul Summit, and the October 2025 Superintelligence Statement issued by FLI. # Risk Assessment This domain evaluates the rigor and comprehensiveness of companies' risk identification and assessment processes for their current flagship models. The focus is on implemented assessments, not stated commitments. <table><tr><td>Group</td><td>Indicator Title</td><td>Summary</td></tr><tr><td rowspan="3">Internal testing</td><td>Dangerous Capability Evaluations</td><td>Tracks whether developers assess AI systems for harmful capabilities like cyber-offense, autonomous replication, or influence operations.</td></tr><tr><td>Elicitation for Dangerous Capability Evaluations</td><td>Evaluates how transparently companies disclose and share their elicitation strategy used in dangerous capability evaluations.</td></tr><tr><td>Human Uplift Trials</td><td>Evaluates whether companies conduct controlled experiments to measure how AI may increase users' ability to cause real-world harm.</td></tr><tr><td rowspan="3">External testing</td><td>Independent Review of Safety Evaluations</td><td>Assess whether third-party experts independently verify and critique the quality and accuracy of a developer's safety evaluations.</td></tr><tr><td>Pre-deployment External Safety Testing</td><td>Measures whether independent, unaffiliated experts are given meaningful access to test a model's safety before public release.</td></tr><tr><td>Bug Bounties for System Vulnerabilities</td><td>Assess whether developers offer structured incentives for discovering and disclosing safety issues specific to AI model behavior.</td></tr></table> # Current Harms This domain covers demonstrated safety outcomes rather than commitments or processes. It focuses on the AI model's performance on safety benchmarks and the robustness of implemented safeguards against adversarial attacks. <table><tr><td rowspan="4">Safety Performance</td><td>Stanford's HELM Safety Benchmark</td><td>Evaluates how language models perform on key safety metrics like robustness, fairness, and resistance to harmful behavior.</td></tr><tr><td>Stanford's HELM AIR Benchmark</td><td>Measures AI model safety and security on benchmark aligned with emerging government regulations and company policies.</td></tr><tr><td>TrustLLM Benchmark</td><td>Assesses a model's trustworthiness across dimensions such as safety, ethics, and alignment with human values and expectations.</td></tr><tr><td>Center for AI Safety Benchmarks</td><td>Measures AI safety behaviors including resistance to misuse, appropriate refusals, calibration accuracy, honesty under pressure, and ethical restraint in scenarios.</td></tr><tr><td rowspan="3">Digital Responsibility</td><td>Protecting Safeguards from Fine-tuning</td><td>Evaluates whether AI providers implement protections that prevent fine-tuning from disabling important safety mechanisms or filters.</td></tr><tr><td>Watermarking</td><td>Assess whether AI outputs are marked in a detectable way to help track origin and reduce misinformation or misuse.</td></tr><tr><td>User Privacy</td><td>Measures the degree to which an AI company protects user data from extraction, exposure, or inappropriate use by models.</td></tr></table> # Safety Frameworks This domain evaluates the companies' published safety frameworks for frontier AI development and deployment from a risk management perspective. This comprehensive analysis was conducted by the non-profit research organisation SaferAI. <table><tr><td>Risk Identification</td><td>Evaluates whether companies systematically identify AI risks through comprehensive methods, including literature review, red teaming, and diverse threat modeling techniques.</td></tr><tr><td>Risk Analysis & Evaluation</td><td>Assesses whether companies translate abstract risk tolerances into concrete, measurable thresholds that trigger specific responses</td></tr><tr><td>Risk Treatment</td><td>Measures whether companies implement comprehensive mitigation strategies across containment, deployment safeguards, and affirmative safety assurance, with continuous monitoring throughout the AI lifecycle</td></tr><tr><td>Risk Governance</td><td>Examines whether companies establish clear risk ownership, independent oversight, safety-oriented culture, and transparent disclosure of their risk management approaches and incidents</td></tr></table> # Existential Safety This domain examines companies' preparedness for managing extreme risks from future AI systems that could match or exceed human capabilities, including stated strategies and research for alignment and control. <table><tr><td>Existential Safety Strategy</td><td>Assesses whether companies developing AGI publish credible, detailed strategies for mitigating catastrophic and existential AI risks, including alignment and control, governance, and planning.</td></tr><tr><td>Internal Monitoring and Control Interventions</td><td>Evaluates whether companies implement technical controls and protocols to detect and prevent model misalignment during internal use.</td></tr><tr><td>Technical AI Safety Research</td><td>Tracks whether companies publish research relevant to extreme-risk mitigation, including areas like interpretability, scalable oversight, and dangerous capability evaluations.</td></tr><tr><td>Supporting External Safety Research</td><td>Assesses the extent to which companies support independent AI safety work through mentorships, funding, model access, and collaboration with external researchers.</td></tr></table> # Governance & Accountability This domain evaluates how openly companies share technical, safety, and governance information, and how their public and legislative messaging align with responsible AI governance <table><tr><td colspan="2">Company Structure & Mandate</td><td>Evaluates whether a company's legal and governance setup includes enforceable commitments that prioritize safety over profit incentives.</td></tr><tr><td rowspan="3">Whistleblowing Protections</td><td>Whistleblowing Policy Transparency</td><td>Assesses how publicly accessible and complete a company's whistleblowing system is, including reporting channels, protections, and transparency of outcomes.</td></tr><tr><td>Whistleblowing Policy Quality Analysis</td><td>Rates the comprehensiveness and alignment of a company's whistleblowing policy with international best practices and AI-specific safety needs.</td></tr><tr><td>Reporting Culture & Whistleblowing Track Record</td><td>Examines whether the company climate makes employees feel they can safely report AI safety concerns, based on leadership behavior, third-party evidence, and past incidents.</td></tr></table> # Information Sharing This section gauges how openly firms share information about products, risks, and risk management practices. Indicators cover voluntary cooperation, transparency on technical specifications, and risk/incident communication. <table><tr><td rowspan="2">Technical Specifications</td><td>System Prompt Transparency</td><td>Assesses whether companies publicly disclose the actual system prompts used in their deployed AI models, including version histories and design rationales.</td></tr><tr><td>Behavior Specification Transparency</td><td>Evaluates if developers publish detailed and up-to-date documentation explaining their models' intended behavior, values, and decision-making logic across diverse scenarios.</td></tr><tr><td rowspan="5">Voluntary Cooperation</td><td>G7 Hiroshima AI Process Reporting</td><td>Tracks whether companies submitted detailed safety and governance disclosures to the G7 Hiroshima AI Process, reflecting their commitment to transparency.</td></tr><tr><td>EU General-Purpose AI Code of Practice</td><td>Demonstrates AI companies' voluntary compliance with EU AI Act General-Purpose AI (GPAI) obligations by signing the non-binding guidelines.</td></tr><tr><td>Frontier AI Safety Commitments (AI Seoul Summit, 2024)</td><td>Measures adherence to voluntary pledges by leading AI companies to develop safety frameworks for evaluating and managing severe AI risks.</td></tr><tr><td>FLI AI Safety Index Survey Engagement</td><td>Reports which companies voluntarily completed and submitted FLI's detailed safety survey to supplement publicly available information.</td></tr><tr><td>Endorsement of the Oct. 2025 Superintelligence Statement</td><td>Indicates whether a company has endorsed calls to prohibit superintelligence development until broad scientific consensus confirms safety and controllability.</td></tr><tr><td rowspan="2">Risks & Incidents</td><td>Serious Incident Reporting & Government Notifications</td><td>Evaluates public commitments, frameworks, and track records around reporting serious AI-related incidents to governments and peers.</td></tr><tr><td>Extreme-Risk Transparency & Engagement</td><td>Measures whether company leaders publicly acknowledge catastrophic AI risks and proactively communicate those concerns to external audiences.</td></tr><tr><td>Public Policy</td><td>Policy Engagement on AI Safety Regulations</td><td>Tracks company involvement in shaping AI safety laws through public statements, consultations, testimony, and participation in regulatory coalitions.</td></tr></table> # 3.2 Company Selection The Index is primarily focused on companies that have deployed the most highly capable models currently available, or those that have previously done so and continue to invest actively in the development and deployment of new frontier systems. Based on the selection of Top 10 performing LLMs from LMArena's leaderboard overview as of October 8, 2025, this edition includes Anthropic, Google DeepMind, OpenAI, xAI, DeepSeek, Alibaba Cloud, and Z.ai<sup>1</sup>. Although Meta does not currently offer a model at the highest capability frontier, we are keeping it in the Index for one additional iteration in recognition of its sustained investment toward superintelligence-level research. The flagship models that we evaluate are: Claude-Sonnet-4.5 (Anthropic), Gemini-2.5-Pro (Google DeepMind), GPT-5 (OpenAI), Grok-4 (xAI), R1 (DeepSeek), Qwen3-Max (Alibaba Cloud), and GLM-4.6 (Z.ai). # 3.3 Related Work Related Work: Several notable related efforts that drive transparency and accountability within the industry continue to inspire and complement the Al Safety Index. The most comprehensive of these efforts include SaferAl's in-depth analysis and ranking of Al companies' public safety frameworks (most recently updated as of October 2025), and two projects by Zach Stein-Perlman-AlLabWatch.org (most recently updated as of September 15, 2025) and AlSafetyClaims.org (most recently updated as of September 1, 2025)—which regularly provide detailed and technical evaluations of how leading Al companies work to avert catastrophic risks from advanced AI. Complementing these, the OECD report published in September 2025 synthesizes disclosures submitted through the G7's voluntary reporting framework and offers one of the first comparative, policy-grounded views of companies' governance and risk-management practices (Perset and Fialho Esposito, 2025). Earlier efforts include the Foundation Model Transparency Index in October 2023 and May 2024 published by Stanford Center for Research and Foundational Models (CRFM), which provides an empirical baseline for model transparency across the ecosystem. Incorporated Work: Where appropriate, the 2025 Index incorporates existing comparative analysis led by credible research institutions. In the Safety Framework domain, the Index draws on the indicator set developed by SaferAI's in-depth assessment of companies' published safety frameworks, while leaving all scoring to the independent reviewers convened by FLI. SaferAI is a leading governance and research non-profit with significant expertise in AI risk management. The Index further integrates AILabWatch.org's tracking of technical AI safety research within the Existential Safety domain and complements it in two ways: by adding research published after the tracker's most recent update, and by incorporating safety-relevant research from companies not included in AILabWatch's coverage. Our research on the quality of companies' whistleblowing policies in the 'Governance & Accountability' domain was enabled through support from OAIIS, a non-profit supporting individuals working at the frontier of AI who want to flag risks. The 'Current Harms' domain evaluates flagship model performance on leading safety benchmarks, including the TrustLLM benchmark, the HELM AIR-Bench and HELM Safety benchmarks by Stanford's CRFM, and the Safety Index benchmarks curated by the Center of AI Safety (CAIS) AI Dashboard. # 3.4 Evidence Collection The evidence collected for this iteration of the Index covers information up until November 8, 2025, drawing from publicly available information and a dedicated company survey for additional voluntary disclosures. Throughout the data collection process, FLI aimed to minimize bias and ensure a fair evaluation by applying consistent search protocols and evidence standards across companies. To ensure fair evaluation across companies in China and those in the US and UK, this iteration introduces a methodological improvement that directly addresses the limitations identified last year. The Index now includes a concise, structured section explaining how China's regulatory system—across binding national laws, local regulations, voluntary technical standards, draft instruments, and policy guidance—shapes company behavior and disclosure practices. This addition enables reviewers to interpret Chinese companies' evidence within the regulatory environment they operate in, rather than through assumptions derived from US and UK contexts that emphasize voluntary self-governance and public documentation. By integrating this regulatory mapping into each relevant domain, the Index aims to improve cross-jurisdictional comparability and reduce systematic bias in grading. In addition, this iteration incorporates a structured mapping to the EU AI Code of Practice. For each domain, we identify which commitments in the Code are most relevant and present them as a baseline reference for what voluntary obligations for many of the companies included currently look like. This mapping is provided solely as contextual material to help reviewers situate the indicators within emerging governance expectations; it does not prescribe grading thresholds, or function as an official rubric. Instead, graders are encouraged to use their own expert judgment, drawing on the EU AI Code of Practice as one of several reference points when interpreting companies' safety practices, particularly as firms navigate both compliance expectations and their own frontier-model development ambitions. Desk research: Our evidence base primarily consists of public documentation that companies have released about their AI systems and risk management practices. This includes technical model cards detailing capabilities and limitations, peer-reviewed research papers on safety methodologies, official policy documents, blog posts outlining safety commitments, and recordings or transcripts of leadership interviews or testimony before government bodies. We further incorporated metrics of flagship model performance on external safety benchmarks, news reports from credible media outlets, and reports of relevant assessments by independent research organizations. Company survey: To supplement public information, FLI created a 34-question survey that addresses current gaps in voluntary disclosures. The survey was sent out via e-mail on October 13, 2025 and firms were given until October 31, 2025 to respond. The survey can be reviewed in full in Appendix B. The survey questions have been kept the same from the Summer 2025 iteration in order to be more consistent and show changes over time. They specifically focus on risk management-related domains where current transparency standards in the industry are lacking, such as whistleblowing policies, external third-party model evaluations, and internal AI deployment practices. We received survey responses from five companies (OpenAI, xAI, Z.ai, Google DeepMind, Anthropic), representing $62.5\%$ of assessed firms. Meta, DeepSeek, and Alibaba Cloud have not submitted a response. Grading Sheets: The evidence collected for this edition of the Index was organized into the grading sheets presented in Appendix A. These sheets are divided across six domains and provide company-specific information for each of the 35 indicators included in the current edition. For every indicator, the grading sheets outline its scope, explain the rationale for its inclusion, and reference relevant literature with hyperlinks where appropriate. We prioritized primary sources directly from companies over secondary reporting wherever possible. Investigative journalism played an important role by surfacing practices that companies have not publicly disclosed. Survey responses submitted by companies were incorporated and clearly highlighted within the relevant indicators. Each domain also includes a concise description of the corresponding Chinese regulatory environment. Where applicable, indicators are mapped to commitments in the EU AI Code of Practice to help situate them within emerging governance expectations. # 3.5 Grading The grading process was designed to ensure an impartial and qualified evaluation of the companies' performance across the selected indicators, based on expertise of individual reviewers in relevant fields. It features a review panel of distinguished independent experts who assess the company-specific evidence for their assigned indicators and assign domain-level grades that represent companies' performance within these domains. Review Panel: To ensure that the Index scores rest upon authoritative judgements, FLI selected a group of eight leading independent experts to grade company performance on the set of indicators. Panel members were selected for their domain expertise and absence of conflicts of interest. Because the Index spans technical AI safety, governance, and policy, the panel brings together specialists across these areas and reflects broader geographic diversity from the previous iterations. The panel thus features both renowned machine learning professors who specialize in alignment and control, and governance experts from the academic and non-profit sectors. The composition of the panel remained largely consistent with the previous edition. We are grateful to Sharon Li and Yi Zeng for joining the panel as new members. The review panel is introduced at the beginning of this document. Grading Phase: Grading sheets and survey results were shared with the review panel for evaluation on November 10, 2025, and the grading period ended on November 20, 2025. After reviewing the evidence, reviewers assigned letter grades (A+ to F) to each company per domain. For each grade assigned to individual companies, reviewers could provide brief justifications and recommendations. They were also able to provide domain-level comments when feedback applied to multiple firms or to explain their judgments. Not every reviewer graded every domain, but experts were assigned domains relevant to their area of expertise. Importantly, no fixed weighting was imposed across indicators within a domain. This approach allowed expert reviewers to apply their judgment in emphasizing aspects they deemed most critical. The grading sheets provided to reviewers further contained grading scales based on absolute performance standards rather than relative rankings, ensuring consistent expectations regardless of company size or geography. Final domain scores were calculated by averaging all reviewer grades for that domain, provided at least three panelists submitted an assessment. Overall grades were then derived by averaging the domain-level scores. # 3.6 Limitations # Information Availability and Verification Our evaluation relies primarily on public information, which creates fundamental constraints. Companies control what they disclose, despite occasional cases of whistleblowing, making it difficult to distinguish between poor transparency and poor strategy and implementation. We designed indicators around these transparency constraints, focusing where meaningful differences between companies were identifiable. For example, we cannot assess critical practices such as cybersecurity investments to protect model weights, as this information is rarely disclosed publicly but we instead look at how companies assess cybersecurity-related risks with their frontier AI systems. The 35 indicators represent a subset of important practices for which meaningful evidence exists, but it does not comprehensively cover all safety dimensions. Furthermore, we cannot independently verify individual company claims and must assume official reports are truthful, which constitutes a significant limitation given the high stakes involved. # Alignment with Transparency Standards and Reporting Requirements The transparency and disclosure expectations embedded across emerging governance instruments—ranging from voluntary codes such as the EU AI Code of Practice, to multilateral reporting frameworks like the G7 Hiroshima Process, to regulatory requirements such as California's SB 53—contain many overlapping elements but also differ substantially in scope, emphasis, and legal force. Incorporating every requirement would introduce unnecessary complexity, dilute the evaluative signal, and risk information fatigue among both expert reviewers and public audiences. In this edition, we therefore focus on a limited and targeted mapping to the EU AI C