The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Alejandro Cuadron; Dacheng Li; Wenjie Ma; Xingyao Wang; Yichuan Wang; Siyuan Zhuang; Shu Liu; Luis Gaspar Schroeder; Xia Tian; Hui Mao; Nicholas Thumiger; Aditya Desai; Ion Stoica; Ana Klimovic; Graham Neubig; Joseph E. Gonzalez

doi:10.48550/arxiv.2502.08235

Abstract

1 min read

Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement. We propose a framework to study these behaviors, which correlates with human expert assessments, and analyze 4018 trajectories. We observe that higher overthinking scores correlate with decreased performance, with reasoning models exhibiting stronger tendencies toward overthinking compared to non-reasoning models. Our analysis reveals that simple efforts to mitigate overthinking in agentic environments, such as selecting the solution with the lower overthinking score, can improve model performance by almost 30% while reducing computational costs by 43%. These results suggest that mitigating overthinking has strong practical implications. We suggest that by leveraging native function-calling capabilities and selective reinforcement learning overthinking tendencies could be mitigated. We also open-source our evaluation framework and dataset to facilitate research in this direction at https://github.com/AlexCuadron/Overthinking.

Related publications

Preprint2025

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Junyu Zhang, Runpei Dong, Han Wang, Ning Xia, Haoran Geng, Peihao Li, Xijun He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang

Preprint2025

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Héctor Carrión, Yutong Bai, Víctor M. Castro, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik

Preprint2025

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

Shiyi Cao, Dacheng Li, Fangzhou Zhao, Yuan Su-fang, Sumanth Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

Preprint2024

GameArena: Evaluating LLM Reasoning through Live Computer Games

Lanxiang Hu, Qiyu Li, A-Di Xie, Nan Jiang, Ion Stoica, Hongzhong Jin, Han Zhang

Article2025

A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists

A.H. Mirza, Nawaf Alampara, Sreekanth Kunchapu, Martiño Ríos-García, Benedict Emoekabu, Aswanth Krishnan, Mara Schilling-Wilhelmi, Macjonathan Okereke, Anagha Aneesh, Mehrdad Asgari, J. Eberhardt, Amir Mohammad Elahi, Hani M. Elbeheiry, M.V. Gil, Christina Glaubitz, Maximilian Greiner, Caroline T. Holick, Tim Hoffmann, Lea C. Klepsch, Yannik Köster, Fabian Alexander Kreth, Jakob Meyer, Santiago Miret,

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Abstract

Discussion(0)

Open reviews(0)

Related publications

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

GameArena: Evaluating LLM Reasoning through Live Computer Games

A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists