By: Dr. Joe Hamaker, Mr. Eric Sick, Mr. Sam Sanchez, Dr. Christian Smart, Galorath Federal
Abstract: The development of cost models involves a heavy reliance on mathematical techniques, particularly the correct application of statistics. However, the process of developing a cost model is not a cut-and-dried mechanical process. There is a significant amount of judgment required. There are many effective ways to develop a cost model, but no single best way. The best models are a judicious admixture of both art and science. We discuss the estimating process, and discuss a variety of considerations that need to be handled along the way. The process starts with data collection. We need more than just cost data, we also need schedule, technical, and other programmatic inputs to develop a model. Once data are collected, they need to be normalized and checked for errors. If key parameters are missing for a potential data point, the decision must be made whether to exclude that data point, or to impute the missing value or values. Once the data are collected and normalized, variables and model forms must be investigated. The attempt to try out different equation forms and variable combinations is fraught with the strong potential for overfitting, the most important scientific problem that you have probably never heard of. The problem with overfitting is that it makes the model look great on your data set, but those models tend not to do provide accurate predictions when used in practice. To avoid overfitting the choice of variables and model forms can be established by prior experience. The use of the power equation form Y =Ax^b has proven effective in modeling spacecraft costs for several decades. Experience with cost modeling can also be used for variable selection. However, we still need to look and the data and focus on the variables that are statistically significant. There is a tendency to want to include as many statistically significant variables as possible, because it makes our goodness-of-fit statistics look better, but this leads to overfitting. One way to avoid overfitting is to employ cross-validation. We discuss cross-validation in detail. We provide practical examples of cross-validation in practice and recommendations for its use in modeling. The use of modern regression techniques is important in determining the cost model coefficients. We need to avoid developing biased models that tend to underestimate cost. We discuss several different regression techniques and their use in practice, including the minimum unbiased percent error and minimum percent error methods. We also discuss a recent paper by one of the authors on the use of maximum likelihood techniques for regression analysis, including a method for developing unbiased estimates in the presence of lognormally distributed residuals. Technology readiness is a key driver for program cost. A program that begins development with a technology readiness level that is low will almost certainly experience cost growth. We discuss how to model the cost of developing technologies prior to the start of program development. Once a model is complete, validation is important. Cross-validation and out-of-sample testing can be used for this purpose. If enough data are available, it is advisable to hold out some of the data for validation to ensure the model is not overfit. Models are used by human beings, so the modeling process needs to be tailored with the result in mind that it needs to be useful. We discuss several criteria for model usefulness, including: relevant inputs; ease of use; and others. The issue of overfitting can lead a model developer to produce models with only a few inputs. This is the principle of parsimony and is an important part of the scientific side of model development. On the other hand, too inclusion of too few variables leads to a model that is not relevant or useful for the end user. A healthy balance of these two is important – the use of variable combination can be helpful. As Einstein once said, “models should be as simple as possible, but no simpler.” The authors discuss the application of the art and science of cost estimating to a recent project that they worked on together.
View the full presentation here.