When to Reason: Semantic Router for vLLM
Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen
NeurIPS - MLForSys 2025We propose vLLM semantic router integrated with vLLM that selectively applies reasoning only when beneficial, achieving over 10 percentage point accuracy gains while nearly halving latency and token usage