Cost vs Price Webinar on Demand - Click here

Step Three: Collect Data

Any estimate, by definition, encompasses a range of uncertainty, so you should express estimate inputs as least, likely and most rather than characterizing them as single data points. Using ranges for inputs permits the development of a viable initial estimate even before you have defined fully the scope of the system you are estimating.

Certain core information must be obtained in order to ensure a consistent project estimate. Not all data will come from one source and it will not all be available at the same time, so a comprehensive data collection form will aid your efforts. As new information is collected, you will already have an organized and thorough system for documenting it.

A general form should be customized for each job to delete the parameters that are not relevant to the current estimate as well as parameters that may be gleaned from the provided documentation. Generally speaking, the fewer questions you need to ask your sources, the happier they will be to participate.

Collected data is grouped into relevant categories which are then assigned unique identifiers which describe the attribute (S=sizing, P=productivity, etc.). It is further identified in terms of its description, whether it is required at the start of the estimate or whether it will evolve as the estimate proceeds, and from whom the information must be gathered.


Data collection can be a frustrating and problematic process. Over the years, Galorath’s analysts have evolved certain practices that may assist you.

First you must persuade potential data providers to participate. Convince them of the value their information will bring to the project, and assure them that their data will be sanitized and will only be used for the purposes discussed. If possible, provide an incentive for sources to participate, such as a sanitized copy of the eventual database or a benchmark of their data relative to the rest of the database.

You may encounter developers who claim not to have the data you seek, or who complain about the costs involved in the data collection process. If the developer claims a CMM or CMMI rating of 3 or higher, it should be collecting data for its own use. Ask for the data in their collection format and offer to transfer it to yours.

Be sure you are asking the right people the right questions. Certain types of data are likely to be most easily obtained from the software development team, while other categories of information are more easily and accurately provided by the estimation personnel or the program office. Contractors will not contribute subcontractors’ data, so get commitments from subcontractors also.

Once you have obtained buy-in from the data providers, execute any necessary nondisclosure agreements so that this crucial paperwork will not delay your collection process. Sources may feel more comfortable using their own companies’ nondisclosure agreements; in this case, carefully review the text to ensure that the terms are acceptable. Avoid agreements containing clauses requiring exclusivity or destruction of data.

Equip your sources with data collection forms and instructions as early as possible, in both hard copy and electronic formats. This enables participants to familiarize themselves with the format and scope to expect when you visit them for the formal interview.

Clearly define the data you are soliciting from each respondent, and recognize that even if you do provide clear definitions, he or she may ignore them. Assume that people will not always read the instructions, and acknowledge that some providers may misrepresent the data intentionally.

Follow up to encourage data providers to review the instructions and complete drafts of the collection forms in preparation for your visit.

Help the provider help himself. For size data, ask your source to use a code counter you have already determined to be accurate for your purposes. On the data collection form, identify which inputs are required, highly desirable or desirable.

Express a preference for completed project actuals over data from underway projects, and if you are given both, be sure you understand which is which. However, do not rely on past program productivity, because this indicator will vary widely from project to project.

During the face-to-face interview, ask pertinent questions to confirm insofar as possible that the data is realistic and valid. Determine whether code in question was hand-generated or autogenerated, because they correlate to effort differently. Capture the amount of reuse as well as total size, and ensure that COTS are really COTS.

It may be that some of the data you collect will not make sense, despite your efforts to clarify and understand it. Rather than eliminating it, assign it a grade to indicate your confidence in it.

If a personal interview is not possible, you must at least have an appropriate person review the data before it is entered into the database.

When you have determined that the supplied data is valid and complete, publish the corrected raw data. Be sure to identify which forms contain draft material and which have been thoroughly vetted.

Next, normalize the data via a well-documented process to a standard set of activities, phases, etc. Convert sizing data to your language of interest if necessary. Compare the data points to established metrics to determine whether it is reasonable, and rate the quality of the data so your analysts will consider it accordingly. Identify the normalized data as such.

Finally, store the raw and normalized data in a database (rather than a spreadsheet, which will prove to be insufficient as the repository grows). This will help to ensure the data is consistent and facilitate configuration management.

During the data collection process, project management should:

  • Identify the activities necessary to accomplish the project’s purpose.
  • Determine dependencies among activities.
  • Define a schedule for conducting the required activities.
  • Define and locate the resources needed to accomplish the activities and determine how much they will cost (by resource or category).
  • Monitor and control the resources in order to achieve the required result on schedule.

If you lack the time to complete all the activities described in the ten-step process, prioritize the estimation effort: Spend the bulk of the time available on sizing (sizing databases and tools like SEER-AccuScope can help save time in this process). Using an automated software cost and schedule tool like SEER-SEM can provide the analyst with time-saving tools (SEER-SEM knowledge bases save time in the data collection process).

Step Two: Establish Technical Baseline, Ground Rules, and Assumptions

Coming Next:
Step Four: Size software

Go Back

Related Resources

Live Training: Effective Ways to Realistically Achieve Savings

Zoom Webinar: Thursday, October 28 @ 10 am PT / 1 pm ET Video will be made available to registrants … Read More Live Training: Effective Ways to Realistically Achieve Savings

Read More

Why Function Points?

Quantitative software measurement extends significant benefits to IT organizations. Relatively few successful, robust, and mature measurement frameworks have been implemented.Function … Read More Why Function Points?

Read More

The impact of COVID-19 on Your Cybersecurity Budget

In response to the pandemic, plenty of organizations had to re-invent themselves or significantly change the way they do business. … Read More The impact of COVID-19 on Your Cybersecurity Budget

Read More