Advanced Fault Injection Methods for Safety Critical Systems
July 26, 2013 by John Day
By Victor Reyes, technical marketing manager, Synopsys, Inc.
ISO 26262 is a functional safety standard that replaces the older and more generic IEC 61508 standard for passenger vehicles. ISO 26262 addresses hazards caused by malfunctioning behavior of electric and electronic safety related systems.
The standard provides a well-defined, automotive-specific safety life cycle. It also provides an automotive-specific, risk-based approach focused on automotive safety integrity levels, ASIL. Finally, ISO 26262 provides requirements and recommended methods for validation of the safety levels.
ISO 26262 defines a safety life cycle composed of three phases: the concept phase, development phase, and an “after the start of production” phase.
During the concept phase, the item under investigation – an electronic stability control system, for example – is defined in terms of functionality, interfaces, environmental conditions, hazards, and other key characteristics.
After this analysis, a risk assessment is performed to define the item or system’s automotive safety integrity level. Based on the safety goals a functional safety concept is specified based on preliminary architectural assumptions.
With the functional safety concept in mind, the item is developed from the system level perspective, following a typical V-model, with specification, design, and testing on the left-hand side and integration, verification and validation on the right-hand side.
System development will also apply V-model specific flows for system hardware and software, guided by parts four, five, and six of the standard.
Analyzing an item to determine its Automotive Safety Integrity Level (ASIL) includes the potential severity of item failures (from light injuries to fatal injuries), exposure, (the probability of failure), and controllability (how easy or difficult it may be for a driver to control the effects of an item failure. Levels range from the lowest (ASIL A) to the highest (ASIL D).
The standard addresses system-level, hardware, and software sub-parts, each of which also follows the V-model. The standard describes verification methods for each sub-part, with specific weights for the different ASIL levels: no recommendation, recommended, and highly recommended.
For example, section 4.7, system design, highly recommends simulation as a method to verify the system for ASIL C and D compliance. On the other side of the V curve, section 4.8, item integration and test, highly recommends fault injection testing for ASIL C and D compliance.
Figure 1: ISO 26262 Technique Recommendations: “o – no recommendation”, “+ – recommended”, “++ – highly recommended”
ISO 26262 highly recommends simulation and prototyping methods for system, hardware and software verification, on the left side of the V, as a fault-injection technique.
Fault-injection is mentioned specifically on the right side of the V-model for system, hardware and software integration.
Fault injection can improve the test coverage of safety mechanisms at the system level, covering corner cases difficult to trigger during normal operation.
Fault injection is recommended whenever a hardware safety mechanism is defined, to analyze its response to faults, or where arbitrary faults corrupting software or hardware components must be injected to test safety mechanisms.
Fault injection helps to determine if the response of a system match its specification despite the presence of faults. It helps developers understand the effects of faults on the target system behavior. Fault injection also helps assess the effectiveness of fault tolerance mechanisms and it helps reduce the presence of faults during design and implementation.
Hardware faults can be categorized by their duration as permanent faults triggered by component damage, transient faults typically triggered by environmental conditions, also knows as soft-errors, and intermittent faults due to unstable hardware.
Fault injection techniques include hardware-based fault injection with and without contact, software-based fault injection, and simulation-based fault injection.
Fault injection criteria includes fault injection points (what types of faults can be triggered with a specific technique), determining whether or not the technique can model permanent faults, and intrusiveness, or how the injection of the fault changes the original execution flow of the system.
Figure 2: Comparison of traditional fault-injection techniques
Other aspects include observability, which defines how well the reactions or events triggered by the fault can be seen and recorded; controllability, which defines when and where the fault can be injected; the ability to repeat the experiment in a determinist fashion (repeatability), and the experiment speed, which will define the complexity and duration of the test scenarios.
Typically, hardware-based techniques with contact are used to inject errors on the input/output boundary of the Electronic Control Unit, whereas techniques without contact are used to trigger soft errors, such as memory corruption due to radiation or electromagnetic interference.
Techniques with contact can model permanent faults and are not intrusive, although there is always a risk of damage if misused. Techniques without contact, which are also not intrusive and incur less risk of damage, are more focused on transient errors.
Software-based fault injection techniques can only inject errors on those locations accessible by the software; that is, memory and memory-mapped peripherals registers. Therefore, there are only able to model transient faults.
The biggest problem with software-based fault injection techniques is their intrusiveness. They modify the software binary with the code to inject the errors, which could lead to differences on behavior compared to the production software running on the field.
All three of these techniques run fast enough and in real time to handle complex software stacks. The fact that the experiments are performed on real hardware limits the ability to observe all the internal effects triggered by the fault. Experiments are controllable and repeatable, but the experiments are never completely deterministic.
Simulation-based fault injection
Simulation-based fault injection performed at the gate or RTL levels has the advantage of having full access to all hardware elements on the system. Without being intrusive it has full observability, controllability, and determinism. The downside of simulations at this level is that they are extremely slow, which makes them unusable on more complex fault scenarios where the software must be taken into account.
A virtual prototype is a software model that emulates the hardware. This software model can be simulated on a desktop PC. One big advantage of a virtual prototype is its ability to run exactly the same binary software without modification on the virtual model that will run later on the real hardware.
As a soft model a virtual prototype can be made available to software teams months before the actual hardware is implemented, but a virtual prototype is more than just a software model simulating the hardware.
A virtual prototype can freeze the full system execution at any point in time – even with multi-core hardware – and read and modify internal values. Advanced analysis capabilities that correlate software at the application level with hardware events, code coverage, fault injection, and the ability to script and automate the simulations are just some of the capabilities of this technology.
Virtual prototypes integrate seamlessly into existing software tool chains and connect to external third party tools for hardware in the loop (HIL) and rest of bus complete simulations. Programming is easy and scalable, and virtual prototypes are easy to share, archive, and deploy across a worldwide organization.
A virtual prototype allows a user to modify the complete state of the system and virtual prototypes are completely non-intrusive. Mechanisms to inject errors or faults reside on the simulation framework and are enabled through the models. Unlike software fault injection techniques, the embedded software remains unmodified.
Virtual prototypes allow developers to visualize and trace all hardware and software events on the system that have been modeled. Analysis visualization tools present both software and hardware execution and events on the same windows and over the same timeline, which makes it easy to correlate them and see the cause-and-effect of a fault.
Faults can be triggered by software, hardware or time events, or by any combination. Triggers can be concatenated, which increases the precision of when a fault can be injected.
Because it is simulation- based, the execution of a fault injection scenario is repeatable and will lead to the exact same results each time. This allows possible faults to be part of regression testing, simplifying the validation of the safety measurements whenever software changes.
Finally, a virtual prototype can simulate fast enough to enable complex fault scenarios where software specs like AUTOSAR that include diagnostics are needed to validate the safety mechanism.
To be really useful a fault-injection framework is required beyond just the bare hardware models. If we look at existing literature a basic fault injection environment is composed of the following parts: a target system, a fault injector (that injects faults from a library), the workload generator (to create stimuli according to the test scenarios), a monitor that feeds information back from the target system and the data collector and analyzer, everything or quested by a controller. In our case the target system is the virtual hardware model.
The fault injector and library are based on a simple fault injection API to model fault injection scenarios that seats on top of a more generic control and inspection interface. This API has two basic commands: trigger and inject.
The trigger command invokes an injector routine when a trigger event happens, as mentioned before HW, SW and time events can be used here. Triggers can be concatenated and enable other triggers dynamically based on system status. The inject command sets the element specified to a certain value. Supported elements are IO pins, registers, internal signals and memory locations.
The value can be set just once (transient) or can be forced permanently. Besides these two commands, model dependent commands can be added for specific purposes (for instance, to a memory to flag an ECC error after a read/write access). For the workload generation the VP framework can rely on the control and inspection interface to introduce stimuli on the platform, or it can integrate external tools for plant model or rest bus simulation.
The controller is, in effect, the user interactively driving the scenario through the tool GUI or via a script that can be played automatically during the simulation. Finally, the fault injection framework provides built-in monitoring, tracing and analysis views to all hardware and software elements on the virtual model.
In summary, fault injection as recommended by ISO 26262 is the best method available to test fault tolerance hardware blocks and to test diagnostic software, especially in cases that are not executed during normal operation. Virtual prototypes provide a complete framework to create advanced fault injection scenarios with several benefits:
First, the framework provides more visibility and fault injection points than hardware based fault injection. Second, unlike software-based fault injection, this framework is completely non-intrusive. Third, it is orders of magnitude faster than RTL gate level simulators with the same control and reliability level. And finally, developers can model both permanent and soft errors.
With a virtual prototype-based fault injection framework errors can be put under version control and fault injection testing can be automated during regressions every time the software changes, saving very valuable time during testing. Finally, we expect that virtual prototype simulations may be used or can be used over time as an evidence for certification and compliancy with standards like ISO 26262.
About the Author:
Victor Reyes is currently a Technical Marketing Manager in the System Level Solutions group at Synopsys. His responsibilities are in the area of Virtual Prototype technology and tools with special focus on Automotive. Victor Reyes received his MsC and PhD in Electronics and Telecommunication from University of Las Palmas, Spain, in 2002 and 2008 respectively. Before joining Synopsys, he held positions at CoWare, NXP Semiconductors and Philips Research.