|
A Tale of Two Stories
National Patient Safety Foundation
|
Report
from a Workshop on
Assembling the Scientific Basis
for Progress on Patient Safety |
|
The issues surrounding incident reporting
systems were the focus of discussions that began the second day
of the workshop. The history of systematic incident reporting
and analysis in medicine is a rich one, extending back at least
to the targeted efforts of Cooper et al. (1978) to generate and
analyze patterns in a corpus of cases in anesthesiology. Today,
there are a number of such systems in place in health care and
a variety proposed or in development.
The interest in incident systems is spurred by several different
beliefs:
1) the belief that there exist a variety of patterns in the character
and occurrence of incidents that go unnoticed because there are
no larger, continuously replenished, systematically generated
collections of data;
2) the belief that the analysis of these patterns can be used
to direct attention to the areas most rewarding for study and
amenable to improvement;
3) the belief that the present pace and character of technological,
organizational, and economic change in health care is shifting
the pattern of incidents; and
4) the belief that the absence of data defining these patterns
will prove to be the critical, limiting factor in improving safety.
Closely linked to these beliefs are experiences with existing
incident reporting systems. While there is no real method for
measuring the performance of existing systems, the view is widespread that less
than 5% and perhaps less than 1% of incidents that might fit
the criteria for reporting are actually reported. The existing
systems are mainly mandatory, and many are linked either directly
or indirectly to enforcement and sanction mechanisms.
Many leaders in health care feel that new approaches to incident
reporting are required. However, most incident reporting discussions
revolve around how to achieve greater compliance with
reporting requirements. Proposals for anonymous systems, confidential
systems, immunized systems, or mandatory systems are framed primarily
by concerns for gaining more (greater numbers, more detailed)
reports.
The discussion during the workshop explored incident reporting
in health care from different perspectives. The stimulus for
the discussion was several short presentations on lessons learned
about incident reporting and incident analysis from other industries.
The presentations generated a discussion that focused more on
how the analysis of reports is complicated, difficult, and sometimes
controversial. The discussion was wide-ranging and complicated.
Topics included:
- building consensus among stakeholder groups,
- analysis of incidents with respect to factors influencing
human performance,
- complexities and limits in the attribution of "cause,"
- linkages between incidents and accidents, particularly in
health care,
- difficulties in using incident data to improve safety.
The session opened with a short talk by Charles Billings,
MD, Chief Scientist (retired), NASA Ames, on the lessons learned
from incident reporting in aviation. Dr. Billings designed, started
and managed the Aviation Safety Reporting System (ASRS) 22 years
ago when he was at NASA's Ames Research Center.
The ASRS is a confidential reporting system for incidents
and not for accidents. It is often proposed as a model
for incident reporting in health care. Dr. Billings described
the history of that system and the conditions that now appear
to have been critical for its success (interest in the aviation
experience is widespread in medicine; as a result, Appendix B
contains an edited transcript of Billing's presentation.)
|
|
Lessons from the Aviation Safety Reporting
System (ASRS)
The ASRS is operated by NASA and largely funded by the Federal
Aviation Administration (FAA). It is a successful system that
was developed in part because of the failure of a predecessor
system run from within the FAA. Because the FAA is a regulatory
and enforcement body, reports to that system were limited. The
ASRS was developed as an independent system, run entirely
outside the FAA, and was, from the outset, designed to be entirely
confidential. Reports made to ASRS include an identification
strip that provides analysts the means to contact the reporter.
This strip, and anything that would uniquely identify any individual,
is removed during the analysis. The narrative description
of the incident is retained as are a host of indexing keys. Incidents
are collected and reported to the aviation community as individual
episodes and as exemplars of larger problems. The larger database
of incidents is available for research. Each year there are on
the order of 30,000 incidents reported. The system costs several
million dollars per year to run.
A consensus among stakeholders that such a system is
needed was essential to the continued success of the ASRS. Producing
this consensus was a substantial effort in itself. Some portion
of the success of the ASRS was derived simply from creating the
consensus. The effort needed to acquire agreement among the stakeholders
created an environment that nurtured the system and protected
it from political tampering when its output was controversial.
But creating the consensus also generated a widespread (but not
universal) view of safety that insisted that practitioners (pilots,
air traffic controllers, mechanics, flight attendants, etc.)
were the observers most likely to recognize hazards and incidents
and were also vital in preventing bad outcomes. The goal of collecting
the details surrounding "accidents that might have happened"
is to identify previously unknown hazards and to see new emerging
threats as systems and organizations change. The system generates
this type of information by performing analyses of sets of
narratives as questions about threats to safety emerge. It
does not generate large statistical measures of systemic performance--a
fact that was stressed repeatedly.
The analysis of incidents reported to the ASRS depends on
a cadre of analysts with multiple skills. These individuals
are domain experts (e.g., pilots) rather than technicians or
clerks. The point was made several times that the analysis of
the reports requires at least as much expertise as is involved
in their generation. Researchers can also make use of the database
by working with the staff of analysts to put together subsets
of narratives that address a particular theme or question. The
analysis also depends on the ability to contact reporters to
clarify details of the incident. These activities depend on an
effective indexing scheme so that analysts can put together
related or contrasting sets of cases for analysis. Note that,
although the system uses substantial indexing, the primary purpose
of analysis is not to reduce the incident to a category but rather
to make sure that the narrative is descriptive, complete, and
precise. Because the ASRS is not fundamentally a statistical
system, the substance of the narratives is the critical information
that the system provides.
A critical part of the activities of the staff at the ASRS
is providing feedback to the operational community--the people
who voluntarily provide the information. The staff uses several
mechanisms such as the Callback newsletter to provide
highly visible, monthly feedback to the community of the results
of its analyses and studies of the data received. The visibility
of the information provided by individual reporters back to the
operational communities has proven to be an essential part of
system success, building support for the system and making safety
a tangible value.
The ASRS does not provide guidance about how to solve problems
or about which problems are economically or socially worth attention.
It has no regulatory function. It does not deal with accidents,
which are reported and analyzed separately through the independent
National Transportation Safety Board (NTSB). Studies using the
ASRS data base have been motivated by accidents and have proven
helpful to the NTSB in understanding the contributors to an accident
it is investigating.
Reports to the ASRSfor specific incidents provide limited
immunity against FAA enforcement action but only under specific
circumstances. This immunization of the reporter has itself been
an incentive to report and has led to a substantial continuing
flow of reports. Technical developments in the aviation system
have allowed for automated detection of "altitude busts"
where an aircraft strays outside its assigned altitude. This
has created an incentive for pilots to report such incidents
to the ASRS in order to be able to claim immunity against later
disciplinary action. Viewed from one perspective, these reports
are monotonous and repetitious. They are, however, more informative
than the automated detection system, which simply records the
event. The narrative descriptions can provide information about
how and why such "altitude busts" occur. Such information
has provided the basis for procedural modifications designed
to ameliorate the problem in several air carriers. Nevertheless,
it is clear that the incentive of immunity affects the number
and kind of reports received.
There have been no breaches in the confidentiality of the
ASRS system. Narratives entering the database are "de-identified"
in a process that removes all the features of the report that
might be used to identify the event and people it describes.
This process takes priority in handling ASRS data. It provides
effective immunity by transforming the data into a form useless
for civil sanctions. It is clear that the reputation of the ASRS
among practitioners is derived in large part from the record
of success in providing such functional anonymity.
The impact of the ASRS on safety is partly indirect. Simply
by its presence it has served as a potent indication to all the
stakeholders that safety is a critical concern, that new hazards
will continue to appear, and that there is a system-wide concern
for safety that arches over all organizational and institutional
boundaries.
The above lessons are abstracted from the aviation experience.
Both in the presentation and the ensuing discussion, the workshop
explored important differences between health care and aviation.
While a successful system for aviation is not likely to transfer
directly and literally to health care, the lessons Dr. Billings
has derived are generic, e.g., a non-punitive approach, the importance
of communication back to practitioners, and the critical role
of an independent organization. As such, these lessons can serve
as a guide to develop successful systems in health care.
|
Incident Classification and Analysis
Collections of incidents and accidents cry out for classification.
The apparent similarities and differences between the events,
their outcomes, and the circumstances that precede them encourage
us to organize them in categories and rank them in severity.
But classification also has its own hazards, especially in complex
domains where there are multiple possible paths to any outcome
and multiple possible outcomes from any path. Classification
involves identifying relevant similarities and differences; their
effective use depends on being able to know a priori what
relevant means. Erik Hollnagel, an expert in the evaluation
of human performance, explained some of his experience with classification
systems used in industrial incident and accident work (see Hollnagel,
1993). His examination of these sorts of systems revealed that
an extensive effort at a priori classification may yield
very little insight into the underlying features that incidents
have in common.
In the discussion about incident reporting, it was pointed
out that the ASRS uses an extensive indexing system, but this
is used to collect related subsets of narrative cases from the
database that pertain to a theme or question. The indexing system
does not work automatically but is a tool used by the staff to
carryout analyses and to assist outside parties use the database
in their analyses. The indexing is used as a tool in analysis;
the classification system it represents is not the analysis.
Classification does involve a type of analysis but a type
that greatly constrains the insights that can be obtained from
the data. Typically, when classification systems are used as
the analysis, a report of an incident is assigned, through a
procedure or set of criteria, into one or another fixed category.
The category set is thought to capture or exhaust all of the
relevant aspects of failures. Once the report is classified the
narrative is lost or downplayed. Instead, tabulations are built
up and put into statistical comparisons. Put simply, once assigned
to a single category, one event is precisely, and indistinguishably
like all the others in that category.
Yet research on human performance in incidents and accidents
emphasizes the diversity of issues and interconnections (e.g.,
Woods et al., 1994). As Billings emphasized in the discussion
of the ASRS, capturing a rich narrative of the sequence and factors
involved in the case has proven essential. Often, new knowledge
or changing conditions leads investigators to ask new questions
of the database of narratives. The analyst often goes back to
the narrative level to look for new patterns or connections.
As an example, Hollnagel described an industrial incident
reporting system that in one sense seemed a success but in another
sense failed. It was successful in that people reported to the
system, but it was a failure in that these reports did not lead
to significant learning about vulnerabilities or to constructive
changes. The central reason for this failure was the removal
of the interesting, informative aspects of the events that were
present in the narratives but lost in the process of classification.
Hollnagel traced the failure, in part, to the classification
system's failure to distinguish between the phenomenal appearance
of a failure event and the underlying pattern of contributing
factors that generated the event. To use a medical metaphor that
Hollnagel has employed, most classification systems confuse phenotype
with genotype. The phenotype of an incident is what happens,
what people actually do or what they do wrong, what you can observe.
Phenotypes are specific to the local situation and context-the
surface appearance of the incident. On the other hand the genotype
of an incident is the characteristic collection of factors that
lead to the surface, phenotypical appearance of the event. Genotypes
refer to patterns of contributing factors. The significance of
a genotype is that it identifies deeper characteristics that
many superficially different phenotypes have in common.
Genotypical patterns are not observable directly. All statements
about them are inferences that represent models about the factors
that drive human performance rather than observations. It is
simple to state the difference between these but quite difficult
to separate them in practice. What reporting systems provide
are phenotypes. What drives performance, however, are genotypes.
The processes of inference about the contributors to events depend
on a thorough understanding of the background or context of the
event. The uncelebrated, researched cases illustrate the process
of finding possible genotypical patterns. They also illustrate
how finding these patterns can help identify meaningful positive
interventions to enhance safety.
Incident collections do spur interest, in part because of
the contrasts and similarities between cases. But classification
systems that rely on phenotypical categories do not capture these
characteristics very well. Indeed, many at the workshop noted
that "human error" is nearly always an important category
in classification systems for accidents, but assigning a case
to this category generally stops or limits the analysis of what
factors influenced human performance.
Classification systems that obscure, simplify, or discard
the story of the cases they classify have generally not been
successful. The systems themselves become outdated relatively
quickly. More significantly, the collections they represent generally
lead to little real progress on understanding the nature of success
and failure in complex domains. Even when motivation is high
in management and there are high consequences of failure, the
process of classifying by phenotypes eliminates the ability to
see the second story of contributors to the system failures.
Classification systems limit the depth of the analysis that can
be conducted, and they limit what it is that one can learn from
the collection of data. This is especially a problem in complex
environments where failures do not occur because of single causes
(Reason, 1990; 1997). The net result is that classification systems
tend to strip away the rich contextual information from which
inferences about genotypes may be made and thereby make such
collections sterile and uninformative.
Although their methodologies differ, virtually all the researchers
present at the workshop commented that their work depends on
capturing the process and the context that led up to the outcome.
This "story" is the fundamental data, and all analyses
build up patterns, trends and contrasts across these stories.
From a research perspective the sparse, simplistic stories of
the celebrated cases were not so much wrong as they were uninformative;
the researchers did not see a way to make progress based on those
kinds of data. Rather, it was the richer stories that captured
attention and served as examples in the conversations during
the workshop.
Incident reporting is one way to obtain such rich stories.
But this method of gathering data is largely passive. There is
no way to obtain data other than by encouraging practitioners
to send back reports when things go awry. Other, more active
approaches are also possible.
Gary Klein has conducted many critical incident studies to better
understand the nature of expertise in complex settings (see Klein,
1998), and he commented on other approaches that can be used
to generate collections of incidents.
In Klein's technique, researchers first proactively go out
to practitioners and help them recall and walk through past incidents.
The focus of these discussions is to help practitioners generate
cases that illustrate the nature of expertise, show how they
succeed, demonstrate what makes problems hard, and reveal how
failure occurs. As in the uncelebrated cases, contrasting success
and failure provides critical insights. The analysis is an involved
process that extracts the critical factors in the story and shows
the interplay between these factors. It depends on concepts and
models about the factors that affect human performance (genotypical
patterns). It looks for patterns and contrasts across a set of
cases that speak to an issue or question.
The studies in the uncelebrated cases illustrate this kind
of active research process. They illustrate how the results provide
insight about how the system works much of the time but how it
is also vulnerable to failure. They illustrate how this insight
can guide investments that will enhance safety. The uncelebrated
cases are not simply specific places where this learning has
gone on, places where we are ready for the work to develop and
test enhancements. They are also markers and beacons for the
kind of process that is needed to better understand the vulnerabilities
in other areas of health care and to see new ways forward to
enhance safety.
|
|
Learning from Incidents and Accidents
The discussion at the workshop considered the many issues
associated with analyzing incidents or accidents--how we learn
from such events.
As failure rates fall, the ability to learn individually or
collectively from failure falls as well. The meaning of a particular
failure will be ambiguous and contentious. The multiple contributors
each necessary but only jointly sufficient for the accident,
complicate the ways in which accidents are investigated and understood.
This makes the attributions of cause complex. In turn, these
characteristics of the post-accident aftermath influence the
learning process in several ways. At least two are worth mentioning
here.
First, many accident investigations end prematurely. After
the fact, people only see the ways that practitioners at the
sharp end could have acted differently because of hindsight
bias. The variety of organizational and institutional
factors that influence the decisions and actions at the sharp
end are unexamined or discounted. The risk of ending the investigation
early is great. Taken as a whole, the research studies show that
organizational factors play a critical role in fostering events
and create vulnerabilities and latent failures that contribute
to events.
Second, failure is often seen as a unique event, an
anomaly without wider meaning for the domain in question. Post-accident
commentary typically emphasizes how the circumstances of the
accident were unusual and do not have parallels for other people,
other groups, other organizations, other technological systems.
The narrow focus on human error as the cause of the accident
serves to reinforce this view. If a given accident is caused
by isolated human error then the accident is without deeper meaning.
After all, the reasoning goes, the human performance in the accident
was so egregious that it cannot possibly have meaning for us
here. We are more careful. We are more conscientious.
Emphasizing differences blocks the learning process. High
reliability organizations appear to recognize that incidents
mark vulnerabilities and threats that could indeed happen to
them (see Weick and Roberts, 1993). They search for levels of
analysis that demonstrate, not the differences, but the similarities
between the accident situation and others in order to find new
ways to improve the larger system.
Conclusion
Table of Contents |
|
Day Two Footnotes
The reason for this is that knowing
the rate of reporting requires knowing the denominators for numbers
of events; that is, knowing precisely what it is that the incident
reporting system is supposed to be discovering.
Return to document
In aviation, there is a reasonably
clear demarcation between categories labelled as "incidents"
and "accidents." "Accidents" is used to refer
to cases where passengers are injured or where there is overt
damage to the aircraft. The term "incidents" refers
to cases that violated some aspect of good practice or rules
but did not lead to injuries. Despite these working definitions
in aviation and other fields, the links between good practice
and outcome are complex. In medicine the links between good practice
and outcome are even more difficult to untangle.
Return to document
An early example in medicine
of the use of these active techniques is a series of studies
by Cooper and colleagues in anesthesia in the late 70's (e.g.,
Cooper et al., 1978). Based in part on the classic work in Human
Factors (Flanagan JC. The critical incident technique. Psychol
Bull. 1954; 51: 327-358.) Cooper used proactive techniques to
better understand the landscape of safety in anesthesia.
Return to document
|
Copyright 1998 National Patient Safety Foundation at the AMA
Prepared for Web publication by
Annenberg Center for Health Sciences
|
|