Classification of software failures and defects

software failures and software defects

In one of our previous articles, we discussed how various support ticket classifications can be implemented in our help desk software for customer service and support. We’ve received a lot of feedback and comments from different individuals closely connected to this issue, as well as from people in other domains. For example, we’ve received comments and inquiries from software designers and developers, quality assurance personnel, team leaders, and from end users of other similar systems. The comments were very interesting and sometimes led to long-term discussions.

Service request management

What we have learned from this information is that people understand the general classifications of service requests very differently and also handle them in various ways. Additionally, we learned that this question is much more complex than it appears at first glance, and the reasons are numerous. The most important is the environment and the domain from which it is observed and where the service management is performed. That’s why we decided to further clarify the rationale behind our solution in the customer support system.

To fully address these issues, we want to start with the basic requirements, such as the classification of different problems and requests in general. Afterwards, we will explain how they can be classified in the help desk environment, along with its implementation.

The main requirement for recording and resolving customer requests and the related support tickets is the proper description of the request. The description can contain several different attributes, as well as free-form text. It is later used in the help desk workflow to solve the problem effectively and to satisfy all relevant stakeholders: customers, developers, and end users.

Most individuals in the support or maintenance area are part of the software development team. They are software engineers by profession who understand their work very well and recognize how important the classification of failures and defects is. They’ve learned to work with several classic schemes. One of the commonly presented schemes relates to software problems, defects, and failures.

Problems, software failures, and defects

The most widely used classification scheme is the IEEE – 1044-2009 Standard Classification for Software Anomalies. In this document, the scheme with the a set of attributes is precisely presented. It can be used in almost any development environment or life cycle model choice.

The scheme mainly considers software defects and failures, although the following terms are also recognized and considered: problem, error, and fault.  For example, software defects are described with 18 attributes, while failures are described with 20 attributes. Their common attributes include ID, status, description, etc.

The description of all attributes and their values requires a broad analysis, which is out the scope of this article. It can be concluded that each attribute has its purpose within the life cycle model or help desk practices. However, in the help desk environment where customer support is performed, we distinguish the following main activities within the workflow:

  1. Initial investigation and diagnosis: The primary investigation allows for an initial evaluation of the request, leading its acceptance or rejection.
  2. Assignment: Routing to the appropriate person or organization within a maintenance or development team for further action.
  3. Analytic reporting: Providing customer support metrics needed for software process improvement (team efficiency, help desk load, Service Level Agreement (SLA) performance, quality assurance), customer billing for maintenance services, etc.

By having a restricted set of activities compared to the entire life cycle model, the number of attributes can be consequently be restricted. In the following lists, the main attributes required for the primary activities are presented.

The main attributes of software defects are:

  1. Asset: Product (Product A, Product B, …), Component (C1, C2, …), Module (M1, M2, …), etc.
  2. Priority: Low, Medium, High. The “Urgent” is also added to some types of implementation.
  3. Severity: Trivial (inconsequential), Minor, Major, Critical
  4. Effect: Functionality, Usability, Security, Performance, Serviceability, Other
  5. Type: Data, Interface, Logic, Description, Syntax, Standard, Other
  6. Mode: Wrong, Missing, Extra
  7. Insertion activity: Requirements, Design, Coding, Configuration, Configuration
  8. Detection activity: Requirements, Design, Coding, Supplier testing, Customer testing, Production, Audit, Other.

Similar to defects, the main attributes of software failures are:

  1. Environment: webserver1, webserver2, dbserver1, …
  2. Configuration: cfg-A.1, cfg-B.9, …
  3. Disposition: Cause unknown, Duplicate, Resolved
  4. Severity: same as for software defects.

Note that the sample values are informative, which is also clearly noted in the standard, and that each organization can adjust or record its own values.

We can agree that this classification supports the complete workflow in a typical software lifecycle when it comes to software defects and failures. Although the IEEE classification is very clear and precise, its implementation and proper use within a development organization remain very complex. This is particularly relevant to small and agile teams that usually do not have a dedicated staff in charge of the classification process. The activities that need to be performed in the process require a certain amount of time and other resources, which slows everything down and burdens the entire help desk workflow.

Data acquisition and recording

One of the main problems in implementating this or a similar classification scheme is the question of who is responsible for recording and maintaining this data in an organization. When considering the effort required to solve the reported problem itself, whether it is a defect or failure, there is more or less agreement that developers or maintenance personnel should be responsible for the data acquisition and recording.

The prerequisite for solving a problem is that a high-quality identification of the cause of the problem has been previously carried out, which follows the classification step, so the matching assignment can be performed. On the other hand, to carry out the classification, it is necessary to have a broad knowledge of the software requirements, which again points to an experienced analyst, designer or at least a developer.

As a consequence of problems with the initial classification of support requests and issues, there are two most common errors that occur when collecting and recording problems in a typical help desk software. These errors are even visible in high-quality and expensive solutions:

1. Provision of the submission request form with attributes related to software defects

At first glance, all listed attributes make sense in describing incidents or problems reported by customers. However, software defects are related to any work product from the lifecycle, which includes documentation, source code, modules, libraries, and other artifacts. In this way, the support system or a help desk software is unnecessarily burdened with data that is difficult to enter and maintain, which additionally increases the workload of the support and maintenance team.

2. The customer is expected to perform the initial classification

Once a ticket submission form is designed containing a large number of attributes, the following mistake is usually made by asking the customer to perform an initial classification of their request, partial or even complete. A typical example is the formation of a hierarchical scheme with a main category and a subcategory as follows:

Main CategorySubcategory
SoftwareApplication  
Operating system
DBMS
HardwareMemory
Disk
CPU
Keyboard
NetworkIP address
DHCP
DNS
Wi-Fi

At first glance, this kind of hierarchical scheme is clear and rational. It can also be assumed that the intention of the organization that creates this scheme is the assignment of created support requests and tickets, and later for generating various analytical reports.

However, presenting this scheme to a customer may raise additional questions. We can imagine a customer who fills out a submission form, and who previously experienced a failure in the system. For example, the customer experiences a non-responsive system during manipulation at some point in time – “locked screen”. After that, the customer should report the problem and perform the correct classification. One could ask: “Is the problem in software, hardware, or maybe network?”. Let’s say they choose hardware; then they will be asked again whether the cause of the problem is their IP address, DHCP, etc.

Even if a typical customer had sufficient technical knowledge, we can expect that the problem is the result of several causes, e.g. memory + DHCP + application, which does not help in the actual classification. On the other hand, when a customer does not have sufficient technical knowledge, then the probability of correct classification tends to be zero. In both cases, this kind of effort is very annoying for customers and increases user dissatisfaction.

In the next article, we’ll explain the classification of general requests, as well as the difference compared to the classification of software problems.