Skip to main content

🎓 Publications

Discover community-driven latest research contributions in LLM and intelligent routing systems. Our work pushes the boundaries of efficient LLM inference.

When to Reason: Semantic Router for vLLM

Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen

NeurIPS - MLForSys 2025

We propose vLLM semantic router integrated with vLLM that selectively applies reasoning only when beneficial, achieving over 10 percentage point accuracy gains while nearly halving latency and token usage