Fault tolerance quality attribute

Fault tolerance quality attribute

January 4, 2016

Nowadays, we critically depend of the vast number of software products. They do their job every minute of our life and some of these minutes are really challenging for them. While we become more and more aware of software development complexity, an average end-user doesn’t get filled with compassion and understanding when an application he/she has bought suddenly loses data or simply fails. Needless to say some apps have been made to save lives or to prevent disasters, so their failure are literally fatal.

According to ISO 9126, fault tolerance is a part of Reliability quality attribute group and it represents the ability of a system to withstand component failure.

So, a fault may happen, no one is guaranteed from this, that’s why software products are designed within three main categories, each agrees with a possibility of fault, but each has a different approach to fault countermeasures planning and realization.

  1.       Fault prevention: when software is designed as fault-free as possible. This approach requires very thoughtful and a little bit paranoid perspective from developers because any possible issue should be taken to account. It’s very time-consuming and mainly is restricted by time and cost limits of a project.
  2.       Fault removal: here comes the testers’ team. The development stage is completed and now it’s testers’ turn to check everything in the most detailed manner.
  3.       Fault tolerance: this approach doesn’t attempt to prevent a fault or discover it. Tolerance to failures is based on assumption that there’s no way to detect all possible faults as well as to create a failure-free design. That’s why the system should be designed in a way which will allow it to operate properly even if faults occur.
  4.       Fault forecasting: planning, investigating possible presence and calculating future chances of fault occurrence.

Software faults are not identical as twins, however they all are design faults, and they can be classified depending on phase of their occurrence, system boundaries, cause, intent, and persistence (Xie, Sun and Saluja, 2001).

Certainly, it is important for a system to be fault-tolerant on hardware as well as software levels. And software fault tolerance can’t be assured without trustworthy hardware background.

Current methods for software fault tolerance include recovery blocks, N-version programming, and self-checking software. Their application depends on environment settings and characteristics.

Therefore, according to Laura L. Pullum,

Monitoring techniques, atomicity of actions, decision verification, and exception handling may be used to partially tolerate software design faults for Single Version Software Environment (SVSE).

Multiple Version Software Environment (MVSE) requires design diverse techniques which provide independently developed equivalent software to guarantee tolerance to software design faults. This type of design techniques includes recovery blocks (RcB), N-version programming (NVP), and N self-checking programming (NSCP).

Multiple Data Representation Environment (MDRE) involves data diverse techniques which multiple data representation environment and utilize different representations of input data to provide tolerance to software design faults. Examples of such techniques include retry blocks (RtB), N-copy programming and N-self-checking Programming.

But the main challenge that persists within developing a fault-tolerant products is a controversy between redundancy requirements and economy situation. Redundancy has its high price associated with operating cost (performance), development cost, as well as additional complexity. In point of fact, providing hardware redundancy is much cheaper because faults of hardware are often expected to be independent and due to wear, plus producing of identical hardware units is cheaper than developing diverse designs to tolerate software to faults.

And, as a conclusion, let us all remember, that fault tolerance is not an alternative to performing regular backups.



Recent Articles

Get 100% Code Coverage for Salesforce Custom Metadata Based Decisions

January 18, 2018 | Bohdan Dovhan

How to obtain a full coverage for code which uses Custom Metadata for strategy-like decision implementation? Introduction Many applications use configuration data. Configuration data might be relevant to the entire organization, or a subset of user, or even different for each user. For the purposes of this article, we will focus only on global configuration […]

Logging of Exceptions in Salesforce

January 11, 2018 | Mykola Senyk

Unpredicted behaviour in a custom code. Can we eliminate it? The ability to customize your Salesforce org code is not just a “nice to have.”  It greatly increases the capability and flexibility of Salesforce. However, custom code can also be tricky to use. It would be great if we could detect unpredicted behavior in our […]

© Copyright - CoreValue 2018
Salesforce, Sales Cloud, and others are trademarks of salesforce.com, inc., and are used here with permission.
Used with permission from Microsoft.