Many safety evaluations for AI models have significant limitations

The demand for AI safety and accountability grows, with concerns rising over the limitations of current tests and benchmarks for generative AI models. Proposals for new safety benchmarks are surfacing, but challenges persist in evaluating real-world behavior and risks.