Bing Search gets faster, more accurate and efficient through SLM models and TensorRT-LLM

The Bing Search team shared how it helped make Bing Search and Bing’s Deep Search faster, more accurate and more cost-effective by transitioning to SLM models and the integration of TensorRT-LLM.

Bing wrote, “to improve efficiency, we trained SLM models (~100x throughput improvement over LLM), which process and understand search queries more precisely.”

The benefits. Bing said it has helped make search better overall by bringing these three core benefits to its searchers:

Faster Search Results: With optimized inference, Bing users can enjoy quicker response times, making the Bing search experience more seamless and efficient.

Improved Accuracy: The enhanced capabilities of SLM models allow Microsoft to deliver more accurate and contextualized search results, helping bing searchers find the information they need more effectively.

Cost Efficiency: By reducing the cost of hosting and running large models, Microsoft said it can continue to invest in further innovations and improvements, ensuring that Bing remains at the forefront of search technology.

Why we care. A faster searcher experience and a more accurate search experience can help Bing become more trusted and useful to searchers. This may lead to more searchers adopting Bing Search in the future, taking search market share away from bigger players like Google.