Optimizing Model Selection for Compound AI Systems

Compound AI systems that combine multiple LLM calls, such as self-refine and multi-agent-debate, achieve strong performance on many AI tasks. We address a core question in optimizing compound systems: for each LLM call or module in the system, how should one decide which LLM to use? We show that these LLM choices have a large effect on quality, but the search space is exponential. We propose LLMSelector, an efficient framework for model selection in compound systems, which leverages two key empirical insights: (i) end-to-end performance is often monotonic in how well each module performs, with all other modules held fixed, and (ii) per-module performance can be estimated accurately by an LLM. Building upon these insights, LLMSelector iteratively selects one module and allocates to it the model with the highest module-wise performance, as estimated by an LLM, until no further gain is possible. LLMSelector is applicable to any compound system with a bounded number of modules, and its number of API calls scales linearly with the number of modules, achieving high-quality model allocation both empirically and theoretically. Experiments with popular compound systems such as multi-agent debate and self-refine using LLMs such as GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 show that LLMSelector confers 5%-70% accuracy gains compared to using the same LLM for all modules.

Discussion(0)

No comments yet. Be the first to comment.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Publication Info

DOI: 10.48550/arxiv.2502.14815
Year: 2025
Published: —
Language: en

Preprint Details

Link Of The Paper: http://arxiv.org/abs/2502.14815

Timeline

Created:June 19, 2026

Related publications

Preprint2024

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

Preprint2025

Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Michael Luo, Xiaotao Shi, Ce Cai, Tianjun Zhang, Justin Wong, Yichuan Wang, Chi Chiu Wang, Yanping Huang, Zhifeng Chen, Joseph E. Gonzalez, Ion Stoica

Article2025

Hybrid TrafficAI: A Generative AI Framework for Real-Time Traffic Simulation and Adaptive Behavior Modeling

Hazrat Bilal, Abbas Rehman, Muhammad Shamrooz Aslam, Inam Ullah, Wen‐Jer Chang, Neeraj Kumar, Abdullah M. Almuhaideb

Preprint2025

Let the Barbarians In: How AI Can Accelerate Systems Performance Research

Audrey Cheng, Shu Liu, Margaret Pan, Zhifei Li, Shubham Agarwal, Mert Cemri, Bowen Wang, Alexander Krentsel, Tian Xia, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya A Agrawal, A. K. Naren, Shifang Li, Ruiying Ma, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, Ion Stoica

Chapter in a book2024