Tag
1 articles
A scheduling approach for black-box LLM inference that uses predicted output lengths to reduce queueing friction at scale.