ALTA Example 3 - Accelerated Degradation

Examples provided here are purely for illustrating software features and functionality.

Example 5 – Analyzing Software Reliability Growth

Software Used: Software Used: ALTA PRO

Download Example File for Version 10 (*.rsgz10) or Version 9 (*.rsr9)

When considering reliability growth, some sort of hardware is typically being analyzed. But the same theory and analysis procedures can also be applied to the analysis of software under development. The faults (bugs) that are found during each day’s testing of the software can be recorded and then analyzed, just as would be done for hardware. This example will explore how software reliability growth can be analyzed using RGA.

Background

Software for a particular application is under development. The reliability requirement is that no more than one fault may occur during every 8 hours of continuous operation.

Testing begins when the software reaches the “beta” phase. Three employees perform continuous testing during business hours. This results in 24 hours of cumulative software testing per day. The software faults are reported and captured in a Failure Reporting, Analysis and Corrective Action System (FRACAS). Given that a new compile of the software is available for testing every week, design engineers implement fixes within a week with the exception of the last two weeks of testing, when fixes are implemented at a faster rate.

The failure rate goal for this software is to have no more than one failure per 8 hours of operation, or 1/8 = 0.125 failures per hour. In one day of testing (3 x 8 = 24 hours), the failure intensity goal is 0.125 x 24 = 3 faults per day.

Assume that the following data set was extracted from the FRACAS system. The data set is grouped by the number of days until a new compile of the software is available.

Failures in Interval Days of Testing
45 5
34 10
25 15
17 20
21 23
14 26
10 28

To analyze the data set, calculate the parameters using the Crow-AMSAA (NHPP) model and use the Quick Parameter Estimator (QCP) to estimate the demonstrated failure intensity. Then determine when the failure rate goal will be achieved and how many days of developmental testing are required.

Analysis and Results

An appropriate standard folio data sheet is created by selecting Times-to-Failure Data > Grouped Failure Times on the first page of the RGA Folio Data Sheet Setup window and choosing Days as the units of measurement on the second page, as shown next.

Select data type

Select unitsFigure 1: Selecting the data type and units of measurement for the new folio.

The data set is then entered and the Crow-AMSAA (NHPP) model is selected for analysis, as shown next.

Folio with dataFigure 2: Data set entered in the standard folio.

After analyzing the data, the results summary shows that the demonstrated failure intensity (DFI) is 4.4947. In other words, at the end of the test, the failure rate is about 5 faults per day.

Results summaryFigure 3: Summary of results showing the failure intensity demonstrated at the end of the test (DFI).

This can also be seen by using the QCP. Since the test ended at 28 days, the DFI is equal to the instantaneous FI at 28 days.

DFI in QPCFigure 4: Quick calculation showing the DFI.

Analysis and Discussion

The above results show that the demonstrated failure intensity is 4.4947 faults per day. The question now is: “If we continue testing with the same growth rate, when will we achieve the goal of no more than three faults per day?”

To calculate this, the QCP is used to solve for time given an instantaneous failure intensity of 3, as shown next. About 149 days of testing are estimated to be required to reach the failure intensity goal.

Figure 5: Quick calculation showing that a total of about 149 days are required to reach the failure intensity goal.

Since we have already completed 28 days of testing, this indicates that only 121 additional days of testing and development (test-analyze-and-fix) are required to achieve the goal. This is much more time than the analysts anticipated, so they decide to take a closer look.

A plot of the failure intensity vs. time is used to display the results. On the plot sheet control panel, the Use Logarithmic Axes option is cleared to specify that the plot will use a linear scale.

FI vs Time plotFigure 6: Failure intensity vs. time plot for full data set.

From this plot, it can be seen that there is a jump in the failure intensity between 20 and 23 days (i.e., the data point at 23 days is higher than the point at 20 days). This is the reason why it is estimated that more development time than expected is required. Therefore, the next step is to analyze the data set for the period up to 20 days of testing.

A new data sheet (Data 2) is added to the folio and the faults from the first 20 days of testing are entered. Then the parameters are calculated using the Crow-AMSAA (NHPP) model, as shown next.

Analysis of 20 daysFigure 7: First 20 days of data entered and analyzed in folio.

The failure intensity vs. time plot for Data 2 is displayed on a linear scale, as shown next. This plot shows the decrease in the failure intensity rate over the first 20 days of testing.

FI vs Time plot for 20 days

Figure 8: Failure intensity vs. time plot for first 20 days of test data.

The QCP is used to solve for the days of testing and development that are required to achieve the failure intensity goal, based on the first 20 days of test data, as shown next.

Total number of days for goal

Figure 9: Total number of days needed to reach the goal, based on data from the first 20 days of testing.

The calculation indicates that a total of 55 days of total time, or 27 additional days after the current test, are required. Note that this is much different than the result obtained from the analysis of the full data set.

So the new question is: “What happened when the failure intensity jumped on the 23rd day of testing and development?” It turns out that new functionality was implemented at the request of a customer, which caused a major redesign on some general modules of the software. This type of jump is typical in both software and hardware development when new features are introduced and observed.

Due to these significant changes, it is decided that the clock should be reset and the analysts should track the reliability growth from the 20th day forward. In other words, the origin of the test is set at 20 days, and the data thereafter are considered as follows:

Failures in Interval Days of Testing
21 3
14 6
10 8

Another data sheet is added to the folio (Data 3) for the faults over the last 8 days of testing, and the parameters are calculated using the Crow-AMSAA (NHPP) model, as shown next.

Faults over last 8 daysFigure 10: Faults over the last 8 days of testing entered and analyzed in folio.

The failure intensity vs. time plot for Data 3 is displayed on a linear scale, as shown next.

Figure 11: Failure intensity vs. time plot for faults over last 8 days of testing.

The QCP is once again used to solve for the additional days of testing and development that are required to achieve the failure intensity goal, this time based on the analysis from days 20 through 28 of the testing, as shown next.

Figure 12: Total number of days needed to reach the goal failure rate, based on data from the last 8 days of testing.

Therefore, 51 – 8 = 43 more days of developmental testing are estimated to be required.

While it is too early to make any predictions based on just 8 days of testing, this result can be used to get a general idea of the remaining development time required and to come up with a new testing plan. In this case, it is decided that three more employees need to be added to testing and, if possible, that a new compile needs to be created every two days. This yields a much more aggressive testing and development program with the objective of completing the project within one month.