Large Language Models Struggle with Arithmetic

About 3 calendar months ago ( about 16 dog years), I needed to do a Monte Carlo analysis. I thought it would be interesting to use AI. So I asked chatGPT, “Can you do a Monte Carlo analysis?” Yes was the answer. So I gave chatGPT the parameters and dragged in the file. chatGPT generated code to perform the analysis then executed it. Very cool. Then, out came a clearly wrong answer.

Upon analyzing, I saw it eliminated anything that was 100% probable, insisting that was correct (it’s not). I gave it more direction. ChatGPT refused to include those items that were a 100% probability. So I said “do the monte Carlo, then add-in those items with 100% probability.” Next, I asked for a cumulative probability distribution. It gave me a straight line. I told it an s curve was more appropriate and it did so. I did run it a few times and saw the expected variance in the results.

What is my point: large language models struggle with arithmetic. I hear some people say they can replace hundreds of effort years of data collection and analysis with “AI.” Most don’t consider where the “data” is coming from nor that modeling is far more than simple regression.

Apple just released an excellent paper on Understanding the limitations of Mathematical Reasoning in Large Language Models. The bottom line is that the GSM8k model assesses large language models’ mathematical abilities compared to an eighth-grader. They point out that while the LLMs seem to be improving on this test ,the models themselves do not appear to have improved mathematical reasoning. They concluded the test was wrong (hmmm, dealing with cases making the test perform better. Sounds like processor benchmark tricks) and came up with a better test of mathematical skills.

“Furthermore, we investigate the fragility of mathematical reasoning in these models and demonstrate that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is due to the fact that current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data. When we add a single clause that appears relevant to the question, we observe significant performance drops (up to 65%) across all state-of-the-art models, even though the added clause does not contribute to the reasoning chain needed to reach the final answer.

Of course there are libraries, such as Google TensorFlow, or Wolfram Alpha, that can perform mathematics. But it isn’t coming from today’s LLMs.

Still, Galorath has found that the LLMs can do an excellent job driving language of the parameters to cost, schedule, risk models such as SEER.

This is very exciting, providing viable inputs to proven estimation equations. Stay tuned for additional conclusions on the topic.

Building Estimates that Stakeholders Trust

Creating estimates that stakeholders can rely on is a critical part of project planning and execution. Estimates inform key business decisions, influence budgets, and set expectations for project success. Trust is the bedrock of any effective estimate. Stakeholders depend on accurate figures to guide important decisions, and for an estimate to be trusted, it must reflect a deep understanding of the project’s technical challenges. It also needs to be rooted in reliable data and constructed with proven methodologies. …

Cost Estimation
October 3, 2024

Maximizing Price-to-Win Success with SEER and SEERai: A Comprehensive Approach for Analysts and Stakeholders

By Chris Hutchings, VP Global Solutions, Galorath Incorporated In today’s highly competitive market, Price-to-Win (PTW) analysts play a pivotal role in determining the optimal pricing strategy for complex projects, particularly in defense, aerospace, and…

AI
October 17, 2024

How to Show the Value of Cost Estimation and Build Trust

Learn how to show the value of cost estimation and build trust with stakeholders. Accurate and reliable cost estimates drive better decision-making, efficiency, and project success. …

Cost Estimation
October 13, 2024

Navigating Estimation Bias Across Software Development

In today’s software development landscape of ever-changing requirements and uncertainties, it can be difficult to manage projects, ensure each initiative is rooted in data-driven insights, and stay on track for success. However, it doesn’t have to be a time-consuming and demanding process….

Software
October 13, 2024

The Crucial Role of Project Controls in Accurate Cost Estimation

Delivering projects on time and within budget is a challenge every project manager faces. To achieve this goal, a strong project controls framework is essential. Project controls encompass the processes, and the tools used…

Cost Estimation
October 8, 2024

Large language models struggle with arithmetic

Building Estimates that Stakeholders Trust

Maximizing Price-to-Win Success with SEER and SEERai: A Comprehensive Approach for Analysts and Stakeholders

How to Show the Value of Cost Estimation and Build Trust

Navigating Estimation Bias Across Software Development

The Crucial Role of Project Controls in Accurate Cost Estimation

Your Vision. Our Expertise. Let’s Build Success Together.

Large language models struggle with arithmetic

Building Estimates that Stakeholders Trust

Maximizing Price-to-Win Success with SEER and SEERai: A Comprehensive Approach for Analysts and Stakeholders

How to Show the Value of Cost Estimation and Build Trust

Navigating Estimation Bias Across Software Development

The Crucial Role of Project Controls in Accurate Cost Estimation

Your Vision. Our Expertise. Let’s Build Success Together.

Stay at the Forefront with Galorath Insights