Cost vs Price Webinar on Demand - Click here

The Law of Small Numbers: Developing Accurate Estimates with Limited Data

Parametric methods have traditionally used classical frequentist statistical methods. Frequentist statistics is the classical method that uses a sample of data as inputs. If you have taken Statistics 101 in college, most if not all the class was oriented towards this approach. For example, traditional linear and nonlinear regression analysis is a frequentist approach. The challenge with frequentist statistics is that it requires a large amount of data. Statisticians who have conducted simulation studies using random data have concluded that 50 data points are needed for a regression analysis with 10 additional data points for every independent variable.

The highly specialized systems used in the Department of Defense and NASA means that there is typically nowhere near that much data available. For example, the Missile Defense Agency has only developed a handful of different kill vehicles, and NASA has only developed a few crewed launch vehicles. When looking at truly applicable data, the sample size shrinks even further – when considering launch vehicles, the primary systems that NASA has completed have been those for the Apollo and Shuttle programs. The Apollo program began in the 1960s, and the Shuttle program began in the 1970s. Thus, there are no directly applicable historical data points within the last 40 years. Considering the changes that have taken place in the realm of technology since then, there no applicable historical data at all for these systems.

The Law of Small Numbers is the belief that large sample methods and rules (like the Law of Large Numbers) apply to small data sets. This common belief is problematic when applying traditional statistics to the development of cost estimating relationships for small data sets – as it leads to inaccurate estimates that are based more on noise than signal.

For small data sets like these, Bayesian methods can help improve accuracy. Bayesian methods leverage all your experience, making them less subject to being overwhelmed by noise. This prior experience can be subjective or objective.

Bayesian methods have proven to be successful in a multitude of applications. Bayesian techniques were used in World War II to help crack the Enigma code used by the Germans, thus helping to shorten the war. John Nash’s equilibrium for games with incomplete or imperfect information is a form of Bayesian analysis (John Nash’s life was portrayed in the film A Beautiful Mind). Actuaries have used Bayesian methods for over 100 years to set insurance premiums. Bayesian voice recognition researchers applied their skills as leaders of the portfolio and technical trading team for the Medallion Fund, a $5 billion hedge fund which has averaged annual returns of 35% after fees since 1989.

This paper introduces the Bayesian method and shows in detail how it can be applied to regression. The basic Gaussian framework is provided as a starting point and is explained in a straightforward and intuitive way. In the case of limited sample data, there are two key assumptions in the standard Gaussian linear model that are dubious. One is that the variance of the estimating equation is known and is equal to the estimating equation variance based on the sample data. A second is that the residuals of the estimating equation derived from the sample data follow a Gaussian distribution. Neither of these assumptions is valid for small samples. The assumption of known variance is relaxed and an analytical method for conducting Bayesian analysis is derived. Next the assumption of Gaussian residuals is relaxed, and are modeled with a Student’s t distribution instead, as is typically done for small samples. In this case, with both assumptions changed there is no analytical solution possible. Markov Chain Monte Carlo simulation is provided as a technique to overcome this.

The paper provides a single practical example throughout the paper. R code that implements the Markov Chain Monte Carlo simulation for the example is provided.

View the presentation.

Go Back

Related Resources

Live Training: Effective Ways to Realistically Achieve Savings

Zoom Webinar: Thursday, October 28 @ 10 am PT / 1 pm ET Video will be made available to registrants … Read More Live Training: Effective Ways to Realistically Achieve Savings

Read More

Why Function Points?

Quantitative software measurement extends significant benefits to IT organizations. Relatively few successful, robust, and mature measurement frameworks have been implemented.Function … Read More Why Function Points?

Read More

The impact of COVID-19 on Your Cybersecurity Budget

In response to the pandemic, plenty of organizations had to re-invent themselves or significantly change the way they do business. … Read More The impact of COVID-19 on Your Cybersecurity Budget

Read More