Upcoming Webinar - Navigating Analogous Cost Estimation for Unique, High-Complexity Projects

Book a Consultation

Large language models struggle with arithmetic

  • October 17, 2024
img

About 3 calendar months ago ( about 16 dog years), I needed to do a Monte Carlo analysis. I thought it would be interesting to use AI. So I asked chatGPT, “Can you do a Monte Carlo analysis?” Yes was the answer. So I gave chatGPT the parameters and dragged in the file. chatGPT generated code to perform the analysis then executed it. Very cool. Then, out came a clearly wrong answer.

Upon analyzing, I saw it eliminated anything that was 100% probable, insisting that was correct (it’s not). I gave it more direction. ChatGPT refused to include those items that were a 100% probability. So I said “do the monte Carlo, then add-in those items with 100% probability.” Next, I asked for a cumulative probability distribution. It gave me a straight line. I told it an s curve was more appropriate and it did so. I did run it a few times and saw the expected variance in the results.

What is my point: large language models struggle with arithmetic. I hear some people say they can replace hundreds of effort years of data collection and analysis with “AI.” Most don’t consider where the “data” is coming from nor that modeling is far more than simple regression.

Apple just released an excellent paper on Understanding the limitations of Mathematical Reasoning in Large Language Models. The bottom line is that the GSM8k model assesses large language models’ mathematical abilities compared to an eighth-grader. They point out that while the LLMs seem to be improving on this test ,the models themselves do not appear to have improved mathematical reasoning. They concluded the test was wrong (hmmm, dealing with cases making the test perform better. Sounds like processor benchmark tricks) and came up with a better test of mathematical skills. 

Their conclusion:

“Furthermore, we investigate the fragility of mathematical reasoning in these models and demonstrate that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is due to the fact that current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data. When we add a single clause that appears relevant to the question, we observe significant performance drops (up to 65%) across all state-of-the-art models, even though the added clause does not contribute to the reasoning chain needed to reach the final answer.

Of course there are libraries, such as Google TensorFlow, or Wolfram Alpha, that can perform mathematics. But it isn’t coming from today’s LLMs.

Still, Galorath has found that the LLMs can do an excellent job driving language of the parameters to cost, schedule, risk models such as SEER.

This is very exciting, providing viable inputs to proven estimation equations. Stay tuned for additional conclusions on the topic.

Building Estimates that Stakeholders Trust

Creating estimates that stakeholders can rely on is a critical part of project planning and execution. Estimates inform key business decisions, influence budgets, and set expectations for project success. Trust is the bedrock of any effective estimate. Stakeholders depend on accurate figures to guide important decisions, and for an estimate to be trusted, it must reflect a deep understanding of the project’s technical challenges. It also needs to be rooted in reliable data and constructed with proven methodologies.  …

img
Navigating Estimation Bias Across Software Development

In today’s software development landscape of ever-changing requirements and uncertainties, it can be difficult to manage projects, ensure each initiative is rooted in data-driven insights, and stay on track for success. However, it doesn’t have to be a time-consuming and demanding process….

img
Avatar for Galorath
Galorath From software to hardware, IT to space, Galorath's cost estimating software helps the most complex projects on the planet stay on budget and on schedule.
Your Vision. Our Expertise. Let’s Build Success Together.

Every project is a journey, and with Galorath by your side, it’s a journey towards assured success. Our expertise becomes your asset, our insights your guiding light. Let’s collaborate to turn your project visions into remarkable realities.

BOOK A CONSULTATION