A Tale of Two Stories
National Patient Safety Foundation

  Report from a Workshop on
Assembling the Scientific Basis
for Progress on Patient Safety

 

 
 
Day Two   Incident
Reporting and Analysis

 


The issues surrounding incident reporting systems were the focus of discussions that began the second day of the workshop. The history of systematic incident reporting and analysis in medicine is a rich one, extending back at least to the targeted efforts of Cooper et al. (1978) to generate and analyze patterns in a corpus of cases in anesthesiology. Today, there are a number of such systems in place in health care and a variety proposed or in development.

The interest in incident systems is spurred by several different beliefs:
1) the belief that there exist a variety of patterns in the character and occurrence of incidents that go unnoticed because there are no larger, continuously replenished, systematically generated collections of data;
2) the belief that the analysis of these patterns can be used to direct attention to the areas most rewarding for study and amenable to improvement;
3) the belief that the present pace and character of technological, organizational, and economic change in health care is shifting the pattern of incidents; and
4) the belief that the absence of data defining these patterns will prove to be the critical, limiting factor in improving safety.

Closely linked to these beliefs are experiences with existing incident reporting systems. While there is no real method for measuring the performance of existing systems, the view is widespread that less than 5% and perhaps less than 1% of incidents that might fit the criteria for reporting are actually reported. The existing systems are mainly mandatory, and many are linked either directly or indirectly to enforcement and sanction mechanisms.

Many leaders in health care feel that new approaches to incident reporting are required. However, most incident reporting discussions revolve around how to achieve greater compliance with reporting requirements. Proposals for anonymous systems, confidential systems, immunized systems, or mandatory systems are framed primarily by concerns for gaining more (greater numbers, more detailed) reports.

The discussion during the workshop explored incident reporting in health care from different perspectives. The stimulus for the discussion was several short presentations on lessons learned about incident reporting and incident analysis from other industries. The presentations generated a discussion that focused more on how the analysis of reports is complicated, difficult, and sometimes controversial. The discussion was wide-ranging and complicated. Topics included:

  • building consensus among stakeholder groups,
  • analysis of incidents with respect to factors influencing human performance,
  • complexities and limits in the attribution of "cause,"
  • linkages between incidents and accidents, particularly in health care,
  • difficulties in using incident data to improve safety.

The session opened with a short talk by Charles Billings, MD, Chief Scientist (retired), NASA Ames, on the lessons learned from incident reporting in aviation. Dr. Billings designed, started and managed the Aviation Safety Reporting System (ASRS) 22 years ago when he was at NASA's Ames Research Center.

The ASRS is a confidential reporting system for incidents and not for accidents. It is often proposed as a model for incident reporting in health care. Dr. Billings described the history of that system and the conditions that now appear to have been critical for its success (interest in the aviation experience is widespread in medicine; as a result, Appendix B contains an edited transcript of Billing's presentation.)
 
 


Lessons from the Aviation Safety Reporting System (ASRS)

The ASRS is operated by NASA and largely funded by the Federal Aviation Administration (FAA). It is a successful system that was developed in part because of the failure of a predecessor system run from within the FAA. Because the FAA is a regulatory and enforcement body, reports to that system were limited. The ASRS was developed as an independent system, run entirely outside the FAA, and was, from the outset, designed to be entirely confidential. Reports made to ASRS include an identification strip that provides analysts the means to contact the reporter. This strip, and anything that would uniquely identify any individual, is removed during the analysis. The narrative description of the incident is retained as are a host of indexing keys. Incidents are collected and reported to the aviation community as individual episodes and as exemplars of larger problems. The larger database of incidents is available for research. Each year there are on the order of 30,000 incidents reported. The system costs several million dollars per year to run.

A consensus among stakeholders that such a system is needed was essential to the continued success of the ASRS. Producing this consensus was a substantial effort in itself. Some portion of the success of the ASRS was derived simply from creating the consensus. The effort needed to acquire agreement among the stakeholders created an environment that nurtured the system and protected it from political tampering when its output was controversial. But creating the consensus also generated a widespread (but not universal) view of safety that insisted that practitioners (pilots, air traffic controllers, mechanics, flight attendants, etc.) were the observers most likely to recognize hazards and incidents and were also vital in preventing bad outcomes. The goal of collecting the details surrounding "accidents that might have happened" is to identify previously unknown hazards and to see new emerging threats as systems and organizations change. The system generates this type of information by performing analyses of sets of narratives as questions about threats to safety emerge. It does not generate large statistical measures of systemic performance--a fact that was stressed repeatedly.

The analysis of incidents reported to the ASRS depends on a cadre of analysts with multiple skills. These individuals are domain experts (e.g., pilots) rather than technicians or clerks. The point was made several times that the analysis of the reports requires at least as much expertise as is involved in their generation. Researchers can also make use of the database by working with the staff of analysts to put together subsets of narratives that address a particular theme or question. The analysis also depends on the ability to contact reporters to clarify details of the incident. These activities depend on an effective indexing scheme so that analysts can put together related or contrasting sets of cases for analysis. Note that, although the system uses substantial indexing, the primary purpose of analysis is not to reduce the incident to a category but rather to make sure that the narrative is descriptive, complete, and precise. Because the ASRS is not fundamentally a statistical system, the substance of the narratives is the critical information that the system provides.

A critical part of the activities of the staff at the ASRS is providing feedback to the operational community--the people who voluntarily provide the information. The staff uses several mechanisms such as the Callback newsletter to provide highly visible, monthly feedback to the community of the results of its analyses and studies of the data received. The visibility of the information provided by individual reporters back to the operational communities has proven to be an essential part of system success, building support for the system and making safety a tangible value.

The ASRS does not provide guidance about how to solve problems or about which problems are economically or socially worth attention. It has no regulatory function. It does not deal with accidents, which are reported and analyzed separately through the independent National Transportation Safety Board (NTSB). Studies using the ASRS data base have been motivated by accidents and have proven helpful to the NTSB in understanding the contributors to an accident it is investigating.

Reports to the ASRSfor specific incidents provide limited immunity against FAA enforcement action but only under specific circumstances. This immunization of the reporter has itself been an incentive to report and has led to a substantial continuing flow of reports. Technical developments in the aviation system have allowed for automated detection of "altitude busts" where an aircraft strays outside its assigned altitude. This has created an incentive for pilots to report such incidents to the ASRS in order to be able to claim immunity against later disciplinary action. Viewed from one perspective, these reports are monotonous and repetitious. They are, however, more informative than the automated detection system, which simply records the event. The narrative descriptions can provide information about how and why such "altitude busts" occur. Such information has provided the basis for procedural modifications designed to ameliorate the problem in several air carriers. Nevertheless, it is clear that the incentive of immunity affects the number and kind of reports received.

There have been no breaches in the confidentiality of the ASRS system. Narratives entering the database are "de-identified" in a process that removes all the features of the report that might be used to identify the event and people it describes. This process takes priority in handling ASRS data. It provides effective immunity by transforming the data into a form useless for civil sanctions. It is clear that the reputation of the ASRS among practitioners is derived in large part from the record of success in providing such functional anonymity.

The impact of the ASRS on safety is partly indirect. Simply by its presence it has served as a potent indication to all the stakeholders that safety is a critical concern, that new hazards will continue to appear, and that there is a system-wide concern for safety that arches over all organizational and institutional boundaries.

The above lessons are abstracted from the aviation experience. Both in the presentation and the ensuing discussion, the workshop explored important differences between health care and aviation. While a successful system for aviation is not likely to transfer directly and literally to health care, the lessons Dr. Billings has derived are generic, e.g., a non-punitive approach, the importance of communication back to practitioners, and the critical role of an independent organization. As such, these lessons can serve as a guide to develop successful systems in health care.

 

 


Incident Classification and Analysis

Collections of incidents and accidents cry out for classification. The apparent similarities and differences between the events, their outcomes, and the circumstances that precede them encourage us to organize them in categories and rank them in severity. But classification also has its own hazards, especially in complex domains where there are multiple possible paths to any outcome and multiple possible outcomes from any path. Classification involves identifying relevant similarities and differences; their effective use depends on being able to know a priori what relevant means. Erik Hollnagel, an expert in the evaluation of human performance, explained some of his experience with classification systems used in industrial incident and accident work (see Hollnagel, 1993). His examination of these sorts of systems revealed that an extensive effort at a priori classification may yield very little insight into the underlying features that incidents have in common.

In the discussion about incident reporting, it was pointed out that the ASRS uses an extensive indexing system, but this is used to collect related subsets of narrative cases from the database that pertain to a theme or question. The indexing system does not work automatically but is a tool used by the staff to carryout analyses and to assist outside parties use the database in their analyses. The indexing is used as a tool in analysis; the classification system it represents is not the analysis.

Classification does involve a type of analysis but a type that greatly constrains the insights that can be obtained from the data. Typically, when classification systems are used as the analysis, a report of an incident is assigned, through a procedure or set of criteria, into one or another fixed category. The category set is thought to capture or exhaust all of the relevant aspects of failures. Once the report is classified the narrative is lost or downplayed. Instead, tabulations are built up and put into statistical comparisons. Put simply, once assigned to a single category, one event is precisely, and indistinguishably like all the others in that category.

Yet research on human performance in incidents and accidents emphasizes the diversity of issues and interconnections (e.g., Woods et al., 1994). As Billings emphasized in the discussion of the ASRS, capturing a rich narrative of the sequence and factors involved in the case has proven essential. Often, new knowledge or changing conditions leads investigators to ask new questions of the database of narratives. The analyst often goes back to the narrative level to look for new patterns or connections.

As an example, Hollnagel described an industrial incident reporting system that in one sense seemed a success but in another sense failed. It was successful in that people reported to the system, but it was a failure in that these reports did not lead to significant learning about vulnerabilities or to constructive changes. The central reason for this failure was the removal of the interesting, informative aspects of the events that were present in the narratives but lost in the process of classification.

Hollnagel traced the failure, in part, to the classification system's failure to distinguish between the phenomenal appearance of a failure event and the underlying pattern of contributing factors that generated the event. To use a medical metaphor that Hollnagel has employed, most classification systems confuse phenotype with genotype. The phenotype of an incident is what happens, what people actually do or what they do wrong, what you can observe. Phenotypes are specific to the local situation and context-the surface appearance of the incident. On the other hand the genotype of an incident is the characteristic collection of factors that lead to the surface, phenotypical appearance of the event. Genotypes refer to patterns of contributing factors. The significance of a genotype is that it identifies deeper characteristics that many superficially different phenotypes have in common.

Genotypical patterns are not observable directly. All statements about them are inferences that represent models about the factors that drive human performance rather than observations. It is simple to state the difference between these but quite difficult to separate them in practice. What reporting systems provide are phenotypes. What drives performance, however, are genotypes. The processes of inference about the contributors to events depend on a thorough understanding of the background or context of the event. The uncelebrated, researched cases illustrate the process of finding possible genotypical patterns. They also illustrate how finding these patterns can help identify meaningful positive interventions to enhance safety.

Incident collections do spur interest, in part because of the contrasts and similarities between cases. But classification systems that rely on phenotypical categories do not capture these characteristics very well. Indeed, many at the workshop noted that "human error" is nearly always an important category in classification systems for accidents, but assigning a case to this category generally stops or limits the analysis of what factors influenced human performance.

Classification systems that obscure, simplify, or discard the story of the cases they classify have generally not been successful. The systems themselves become outdated relatively quickly. More significantly, the collections they represent generally lead to little real progress on understanding the nature of success and failure in complex domains. Even when motivation is high in management and there are high consequences of failure, the process of classifying by phenotypes eliminates the ability to see the second story of contributors to the system failures. Classification systems limit the depth of the analysis that can be conducted, and they limit what it is that one can learn from the collection of data. This is especially a problem in complex environments where failures do not occur because of single causes (Reason, 1990; 1997). The net result is that classification systems tend to strip away the rich contextual information from which inferences about genotypes may be made and thereby make such collections sterile and uninformative.

Although their methodologies differ, virtually all the researchers present at the workshop commented that their work depends on capturing the process and the context that led up to the outcome. This "story" is the fundamental data, and all analyses build up patterns, trends and contrasts across these stories. From a research perspective the sparse, simplistic stories of the celebrated cases were not so much wrong as they were uninformative; the researchers did not see a way to make progress based on those kinds of data. Rather, it was the richer stories that captured attention and served as examples in the conversations during the workshop.

Incident reporting is one way to obtain such rich stories. But this method of gathering data is largely passive. There is no way to obtain data other than by encouraging practitioners to send back reports when things go awry. Other, more active approaches are also possible. Gary Klein has conducted many critical incident studies to better understand the nature of expertise in complex settings (see Klein, 1998), and he commented on other approaches that can be used to generate collections of incidents.

In Klein's technique, researchers first proactively go out to practitioners and help them recall and walk through past incidents. The focus of these discussions is to help practitioners generate cases that illustrate the nature of expertise, show how they succeed, demonstrate what makes problems hard, and reveal how failure occurs. As in the uncelebrated cases, contrasting success and failure provides critical insights. The analysis is an involved process that extracts the critical factors in the story and shows the interplay between these factors. It depends on concepts and models about the factors that affect human performance (genotypical patterns). It looks for patterns and contrasts across a set of cases that speak to an issue or question.

The studies in the uncelebrated cases illustrate this kind of active research process. They illustrate how the results provide insight about how the system works much of the time but how it is also vulnerable to failure. They illustrate how this insight can guide investments that will enhance safety. The uncelebrated cases are not simply specific places where this learning has gone on, places where we are ready for the work to develop and test enhancements. They are also markers and beacons for the kind of process that is needed to better understand the vulnerabilities in other areas of health care and to see new ways forward to enhance safety.

 

 


Learning from Incidents and Accidents

The discussion at the workshop considered the many issues associated with analyzing incidents or accidents--how we learn from such events.

As failure rates fall, the ability to learn individually or collectively from failure falls as well. The meaning of a particular failure will be ambiguous and contentious. The multiple contributors each necessary but only jointly sufficient for the accident, complicate the ways in which accidents are investigated and understood. This makes the attributions of cause complex. In turn, these characteristics of the post-accident aftermath influence the learning process in several ways. At least two are worth mentioning here.

First, many accident investigations end prematurely. After the fact, people only see the ways that practitioners at the sharp end could have acted differently because of hindsight bias. The variety of organizational and institutional factors that influence the decisions and actions at the sharp end are unexamined or discounted. The risk of ending the investigation early is great. Taken as a whole, the research studies show that organizational factors play a critical role in fostering events and create vulnerabilities and latent failures that contribute to events.

Second, failure is often seen as a unique event, an anomaly without wider meaning for the domain in question. Post-accident commentary typically emphasizes how the circumstances of the accident were unusual and do not have parallels for other people, other groups, other organizations, other technological systems.

The narrow focus on human error as the cause of the accident serves to reinforce this view. If a given accident is caused by isolated human error then the accident is without deeper meaning. After all, the reasoning goes, the human performance in the accident was so egregious that it cannot possibly have meaning for us here. We are more careful. We are more conscientious.

Emphasizing differences blocks the learning process. High reliability organizations appear to recognize that incidents mark vulnerabilities and threats that could indeed happen to them (see Weick and Roberts, 1993). They search for levels of analysis that demonstrate, not the differences, but the similarities between the accident situation and others in order to find new ways to improve the larger system.

 

 

Conclusion

 Table of Contents

 

 

 

 

 Day Two Footnotes

 

The reason for this is that knowing the rate of reporting requires knowing the denominators for numbers of events; that is, knowing precisely what it is that the incident reporting system is supposed to be discovering.

Return to document  

 

 

In aviation, there is a reasonably clear demarcation between categories labelled as "incidents" and "accidents." "Accidents" is used to refer to cases where passengers are injured or where there is overt damage to the aircraft. The term "incidents" refers to cases that violated some aspect of good practice or rules but did not lead to injuries. Despite these working definitions in aviation and other fields, the links between good practice and outcome are complex. In medicine the links between good practice and outcome are even more difficult to untangle.

Return to document  

 

 

An early example in medicine of the use of these active techniques is a series of studies by Cooper and colleagues in anesthesia in the late 70's (e.g., Cooper et al., 1978). Based in part on the classic work in Human Factors (Flanagan JC. The critical incident technique. Psychol Bull. 1954; 51: 327-358.) Cooper used proactive techniques to better understand the landscape of safety in anesthesia.

Return to document  

 

 

  

 

 

 

Copyright 1998 National Patient Safety Foundation at the AMA

Prepared for Web publication by
Annenberg Center for Health Sciences