Thread
The ML community is fond of adopting methods from safety & security engineering. But the rampant co-opting and misuse of established terms and techniques (e.g., “safety”, red teaming) is counterintuitive to actual safety. So we wrote a paper on it: docs.google.com/viewer?url=https://raw.githubusercontent.com/trailofbits/publications/master/papers/t...
For one value alignment DOES NOT subsume safety. Those building AI have abdicated safety by falsely equating it with a system meeting its intent. But in system safety engineering, safety centers the lack of harm to others that arises DUE to the aligned system intent itself.
ML has also overlooked why safety and security take differing and even opposing approaches in their risk/threat modeling. Safety prevents a system from impacting people in a harmful way. Security instead prevents adversarial environmental agents from impacting a system.
So instead of looking to safety frameworks for exploring harms from the ML model itself, there's wrong and ineffective use of “security” terms like red teaming, bug bounty, which means misleading claims on the properties a model satisfies and only providing a veneer of safety.
Adopting from hardware safety is not suitable either (e.g. FMEA). Exploring AI safety properties through techniques developed under the assumption of random component failure is not conducive to uncovering systematic failures. System Safety techniques are more appropriate.
So we introduce an Operational Design Domain for AI that multi-modal risks can be assessed against. Our lack of defined operational envelopes for ML models has made the evaluation of their risks and safety to be intractable due to the sheer number of applications and risks posed.
The ODD helps provide specific operating conditions where an AI-system is designed to properly behave. By defining a more concrete operational envelope, developers and auditors can better assess potential risks and required safety mitigations for AI-based systems.
I think the link doesn't work for some folks so here is the repository link github.com/trailofbits/publications/blob/master/papers/toward_comprehensive_risk_assessments.pdf
Mentions
See All