Introduction to Safety Critical System Design

Introduction to Safety Critical System Design

Well beyond reliable, safety-critical systems must do no harm—to themselves, their operators or to bystanders. There are standards, certification procedures and definite designed methodologies that have been developed to enable such systems.


A safety critical system is generally defined as any system whose failure to operate correctly would cause damage, injury or death. This is different from a mission critical system whose failure would result in an aborted mission, but not cause damage, injury or death. For the purposes of this discussion we will use examples for manned and unmanned vehicles. When we discuss safety certification, we will further restrict this to flying vehicles.

Safety critical system design starts with knowing the function of the system and the level of criticality should it malfunction or fail. There exist multiple standards for criticality determination. ARP-4761 is typically used for civil and commercial aircraft. MIL-STD-882 is used for many U.S. Department of Defense projects, but is not well designed for flying platforms. ARP-4761 defines four levels; Catastrophic, Hazardous, Major and Minor. Other methods typically include four levels as well; changing the names they are called to confuse us. One of the more common names, Safety Integrity Level (SIL) is used by a number of standards, but is not used in the area of interest for this discussion.

Once the criticality level is determined a plan to develop the system hardware and software can be designed. If the system is used in civil or commercial aircraft, the system will have to be certifiable to the appropriate level for hardware, software or both. The standards most often used for this are DO-178B or C for software and DO-254 for hardware. These standards define one additional level, No Effect. These are referred to as Design Assurance Levels (DAL). Levels A to E refer to these measures with A being Catastrophic and E being No Effect. The planning stage is critical because omissions in subsequent project phases can lead to significant delays and cost overruns in trying to recover. The plan itself will take into consideration civilian or military, certifiable or not and which entities will need to certify flight worthiness if it flies. Regardless of whether certifiability is required or the transportation mode, the design of any system deemed safety critical must follow a rigorous design and verification plan.

As the project plan is being developed, specific requirements for both hardware and software must be captured. Safety critical systems rely not only on the ability to product the correct result, but also to produce the correct result within the correct time period. The time to execute algorithms from the different processes in software must be matched with more than sufficient processing power in the hardware for a proper outcome. Similarly, the plan must detail the test methodologies used to validate both the hardware and software designs as well as the documentation required to show compliance with the plan and design requirements. There are a number of tools commercially available to document and capture different phases of safety critical design projects. At the end of each design phase the project review should audit the output to assure that the work has been done in accordance with the plan and meets the requirements and objectives. Hardware and software developments typically run in parallel efforts, but must still be coordinated. We will discuss each effort separately.


Hardware design for a safety critical system requires a different set of processes compared to doing a typical COTS design. The typical COTS design starts with the reference design from the microprocessor supplier, adds a few features and is then put into the chosen form factor such as VME, VPX or VNX. The design is usually done to either use industrial temperature components where available and/or up-screen the assembly at the board level. Sufficient mechanical consideration is given for the shock and vibration constraints as well. However, most reference designs start out with the very latest components that are available to show off the high performance shiny new processor. That shiny new processor also comes with a brand new operating system port, but may not yet be supported by a safety critical compliant operating system. The military frequently pushes to get the newest, most powerful technology into its systems, but newness and complexity in the civil and commercial world are not conducive to getting the higher DAL levels certified.

With the requirements in hand an appropriate system architecture can be synthesized as a part of the conceptual design phase. To achieve the highest DAL levels some degree of redundancy and/or control/monitor architecture could be selected to detect and correct errors from a malfunctioning unit. Generally speaking, the development process must demonstrate not only nominal functioning, but also coverage of failure cases.

Even with identical redundant voting systems there is the possibility that a common unforeseen flaw in the microprocessor, operating system or the compiler used to create the application code could cause all of the system to fail simultaneously. This is known as common cause failure. To eliminate this failure mode the voting systems must be based on different microprocessor architectures, use different operating systems and have the code compiled on different compilers. This is known as dissimilar architecture and/or development. Furthermore, to enforce dissimilarity, it is highly recommended that subsystems are developed by different teams, to avoid partial reuse which will, again, possibly lead to introduce common cause failure…

Any functionality not essential to the platform shall be excluded. Unnecessary circuitry only complicates proving compliance and certifiability. Furthermore, certification authorities or auditors particularly pay attention to “derived requirements,” i.e., capabilities not covering upper level requirements, but introduced by the developer.

The conceptual design phase is followed by the implementation phase where the planned design becomes a physical system. Component selection for the selected architecture is key to proving that the hardware design has met the safety criteria. Devices chosen for the design must be mature enough to have demonstrated that they are reliable but still have a long life cycle ahead of them. All of the design elements mentioned earlier apply here, but the documentation required to show compliance is a key difference at this stage.

The design verification stage for a safety critical design is far more rigorous than a typical COTS design. In addition to the normal functional, boundary scan, mechanical and thermal testing, programmable devices such as ASICs and FPGAs must undergo extensive testing to insure that all combinations of inputs always result in the correct output. At higher DAL levels, this verification process also requires demonstration of independence: the reviewer is not the author; the tester is not the developer, etc.

Figure 1 shows an example of a DO-254 DAL-C certified control channel. Figure 2 shows an example of a DO-254 DAL-A certified monitor channel. Note the difference in complexity.  Figure 3 shows an example of a DO-178 / DO-254 DAL-C certifiable mission computer based on 3U VPX COTS modules.

Figure 1
A DO-254 DAL-C certified control channel

Figure 2
A DO-254 DAL-A certified monitor channel

Figure 3
A DO-178 / DO-254 DAL-C certifiable mission computer based on 3U VPX COTS modules


Now that we have followed the flow for the compute platform hardware design we will take a similar look at the software side. The overall software plan and functional requirements must be captured just as with the hardware. The processor choices made on the hardware side will influence the operating system selection as not all suppliers support all processors equally.

The software conceptual design will need to allocate and partition different tasks according to execution priority and CPU utilization. Care must be taken to insure that there is no contention for common resources or improper inter-task communication. While security is not the subject here, modern concerns with hacking and tampering could render the system unsafe by unauthorized modification. The software must be capable of verifying that it has been properly provisioned and knows which run mode to execute. As with all of the other design phases, evidence of compliance with the plan and requirements must be documented.

Traceability requirements for coding make the use of proper tools for coding and verification mandatory. At DAL-C and above every line of code is subjected to inspection to insure it executes properly and performs according to the requirements. Unused functions in libraries shall be removed, as required to demonstrate 100% structural coverage of statements (DAL C) or decisions (DAL B) or MC/DC (DAL A).

The software verification phase subjects the code to multiple levels of testing, reviews and analysis at the source code level to insure compliance and traceability. Once the software and hardware have now been tested somewhat separately, it is time to move to the integration phase.

During the Integration phase, the software running on the target hardware is subjected to rigorous testing to verify that all functional requirements have been met. In the final project review all documentation and traceability evidences are checked to insure there have been no omissions from the plan. From here, the system and compliance package can be submitted to the next level of integration. In the case of safety certifications for air worthiness, this is obtained by the airframe manufacturer and each certification is stand-alone. That is why the systems are referred to as safety certifiable as opposed to certified. Each new application will require a new certification.

As the now certified system enters production, configuration and life cycle management become very important. Any change to an existing system can trigger the need to retest. Supply chain concerns over counterfeit parts create additional concerns over the life of the system. The manufacturing processes must be tightly controlled to insure no accidental substitution of material or other process changes that could affect board or system stability.

In conclusion, the engineering and manufacturing processes required to design and manufacture a safety critical or safety certifiable board or system level product differ substantially from those required for a standard COTS design. The detailed planning required at the system level propagating to both the hardware and software make it incredibly difficult to back into a certifiable system design using standard COTS boards. COTS system components that have been designed from the start for safety critical applications are available from multiple venders. Coupled with the appropriate tools and design methodologies, it is possible to design certifiable systems without resorting to completely custom designs.

Creative Electronic Systems,
Geneva, Switzerland
+41 22 884 51 00