Engineering has experienced a technological revolution, but the basic engineering techniques applied in safety and reliability engineering, created in a simpler, analog world, have changed very little over the years. In this groundbreaking book, Nancy Leveson proposes a new approach to safety -- more suited to today's complex, sociotechnical, software-intensive world -- based on modern systems thinking and systems theory. Revisiting and updating ideas pioneered by 1950s aerospace engineers in their System Safety concept, and testing her new model extensively on real-world examples, Leveson has created a new approach to safety that is more effective, less expensive, and easier to use than current techniques. Arguing that traditional models of causality are inadequate, Leveson presents a new, extended model of causation (Systems-Theoretic Accident Model and Processes, or STAMP), then then shows how the new model can be used to create techniques for system safety engineering, including accident analysis, hazard analysis, system design, safety in operations, and management of safety-critical systems. She applies the new techniques to real-world events including the friendly-fire loss of a U.S. Blackhawk helicopter in the first Gulf War; the Vioxx recall; the U.S. Navy SUBSAFE program; and the bacterial contamination of a public water supply in a Canadian town. Leveson's approach is relevant even beyond safety engineering, offering techniques for "reengineering" any large sociotechnical system to improve safety and manage risk.
(I can't believe it took me more than a month to finish this one.) This is a best book for software engineers that is not about software. It's full of elegant models, clean engineering methodologies and just good old system thinking. Every other page has some nuggets of wisdom that an engineer should be able to understand and appreciate.
If you are software engineer and if you're running system in production - you should read this book. Forget all that clean code, google swe, "best practice for microservices" nonsense. This is the book to read.
But it won't be easy. This book is hard. It uses real-life examples to demonstrate the methodologies. And that means pretty deep dive in complex domains of aerospace, military aviation, chemical plant factories, etc. And the book spends almost no time introducing these domains to a casual reader. It requires quite a bit of attention to tag alone.
The bare minimum I would say one should read and reread first 4 chapters - those that are introducing STAMP and provide justification for it. They are pretty abstract and straightforward. Easy to follow and be amazed of how clear it's laid out. And they will knock off those cargo cult practices that software people love and cherish, things like "5 whys", "root cause analysis".
Then there are couple of chapters that will be pretty heavy - filled up with domain specific details that might be hard to follow. Chapter 5 is a "doozy"... It's a perfect example on how to apply the methodology but my god it is full of acronyms and cryptic jargon from military aviation.
As I've understood a lot of people don't read this book pass chapter 5. And it's a real shame! Another chapter that is mandatory for every IT person is chapter 9. Most of running software still heavily relies on human operators and controllers. And this chapter is focused on human psychology, how humans interact with automated systems, and human errors. It's a gold mine of engineering wisdom.
If you're a software architect - read chapter 10 "Integrating Safety into Systems Engineering" If you're an SRE - also read chapter 12 "Controlling Safety During Operations". If you're a manager - read chapter 13 "managing safety and safety culture".
Overall this book is amazing and I'm planning to re-read it multiple times.
I love this peace: “Design alternatives are generated through a process of system architecture development and analysis. The system engineers first develop requirements and design constraints for the system as a whole and then break the system into subsystems and design the subsystem interfaces and the subsystem interface topology. System functions and constraints are refined and allocated to the individual subsystems. The emerging design is analyzed with respect to desired system performance characteristics and constraints, and the process is iterated until an acceptable system design results.”
A must read for software engineers , managers , designers of systems in any kind of domain.
Some key ideas: - systems should be designed as hierarchies of control where each level enforces constraints upon the level below it , basically controls in which paramters the children function and receives feedback from them regarding the efficiency of its control - a system is continously bombarded by risky events but if the system is in a state of low risk nothing big happens. - usually a system always goes into a state of higher risk (managers cut corners, cut costs, workers "optimize") , and when the risk is very high , esentially becoming "an accident waiting to happen" - safety is a system characteristic and cant be designed at component level. Children systems can not be safe unless they are being controlled and supervised by a parent level system. - a system must be seen as continously evolving, always changing, thus the designer has a mental model of the designed system, model which suffers a delta when it is produced , and then again multiple other deltas as time passes and the system is in operation (production). The designer must be provided with constant feedback from the operations as to how his system is being modified so that he updates his mental model for next projects - when an accident happens and you hear the root cause "operator error" , that means there is not enough information , and the easiest way is to blame the component that behaves with the biggest delta from the normal procedures , which is the human operator....but as i said a system is always changing and operators adapt the system to their needs.
A very very interesting read but not an easy or fast one !
This entire review has been hidden because of spoilers.
This is a great book. Leveson covers so much it verges onto the impossible to completely get through.
There is a weaving of chapters introducing great, terse overview of decades worth of theory into systems, safety, resilience, and cognitive engineering disciplines, and of chapters applying STAMP and its derived methods to various real world examples.
The author’s lens is centred on hazard analysis, processes for feedback, and control mechanisms to manage them.
In the end I’m unsure how I would go about applying her methods as-is in the comparatively small scale, low budget projects I am part of; but I also come out with a very useful unifying view that covers tons of papers I’ve read and casts them all with that lens. Some of the ideas in there I already started using in some form, even before I was done.
So despite the long text that becomes somewhat of a challenge, this book id still very much worth it.
Good thesis, that engineering mistakes are more often systems mistakes. If you defund all of the safety mechanisms, of course things are going to be disastrous when they do go wrong. I've seen this time and time again during my engineering career, where the business people and project managers care only about safety in reactionary circumstances.
I abandoned this book about a quarter of the way through though, because my god is it long.
A comprehensive thesis on understanding complex systems with an emphasis on achieving safe outcomes. There is quite alot of information in the book and I found myself having to read some parts several times. Not because they were poorly written, but because they were so thought provoking. Mind blowing read.
I'm sure there's good information in this book. But its *so* hard to find it. I didn't even really finish it. I just stopped because I couldn't keep going.
This is a great deep dive into Safety with a System Theory approach. Lots of case studies, some well known others less so. Really provided me with a better approach to improvements across the board, not just with regard to safety. One of the biggest takeaways for me was the idea that a single root cause is not the best we can do. Becomes a bit repetitive, and delved in and out of it due to the writing style - hence it took so long to read. Plenty to apply from this book in the IT sector and elsewhere.
I got half way through this book. It starts nice. I loved it. But it degrades more and more to become an ad for the safety analysis that they are research.