Behind every seamless customer experience is a complex, ever-changing codebase. When issues arise in that codebase, developers face a deceptively hard task: locating the exact part of the code that needs fixing. This process, called issue localization, can be time-consuming, manual, and error-prone—even with AI assistance. We know that faster, smarter development ultimately means faster innovation. That’s why our AI Research team built SweRank—a powerful, efficient code ranking framework that helps automatically pinpoint the source of software issues with state-of-the-artaccuracy.
Understanding SweRank
Software issue localization involves locating the exact parts of code—be it files, classes, or functions—that need modification to resolve a reported issue. Current approaches primarily rely on agent-based methods driven by large language models (LLMs) that can be resource-intensive and time-consuming.
SweRank offers a more efficient and cost-effective solution by employing a two-step “retrieve-and-rerank” framework that comprises the following:
- SweRankEmbed: An embedding-based retriever that quickly narrows down potential code segments related to the issue.
- SweRankLLM: A lightweight LLM-based reranker that refines these results to identify the most relevant code snippets.
To train this system, the team developed SweLoc, a large-scale dataset curated from public python repositories on GitHub. This dataset pairs real-world issue descriptions with corresponding code modifications, providing a rich resource for training.
SweRank achieves state-of-the-art performance on two major issue localization benchmarks: SWE-Bench-Lite and LocBench. It outperforms earlier retrieval and reranking systems as well as newer agent-based methods that rely on closed-source LLMs like Claude-3.5. It’s not only more accurate—it’s also considerably moremore cost-efficient. Unlike agent-based approaches that require multiple iterations , SweRank performs only one pass of retrieval and reranking,making it quick and affordable to run.
Implications for CRM and AI Agents
While SweRank is tailored for software development and not productized, its underlying principles can have significant implications for CRM systems:
- Enhanced Automation: By efficiently identifying relevant code changes, SweRank can accelerate the development of CRM features, leading to faster deployment of customer-facing tools.
- Cost Efficiency: SweRank’s approach reduces reliance on large, expensive LLMs, making it a cost-effective solution that can be integrated into CRM systems without significant overhead.
Looking Ahead
The success of SweRank underscores the potential of combining ranking-based methods with LLM code agents to build better automatic bug fixing. While SweRank currently is primarily built for python, we will be releasing soon a more general version that works across a variety of programming languages. Research like SweRank helps us explore what’s possible—and shapes our thinking on how future AI tools could improve developer workflows and, ultimately, the customer experience.
Explore More
- For more details on SweRank and its underlying research, visit the SweRank project page.
- Salesforce AI Website: salesforceairesearch.com
- Follow us on X: @SFResearch, @Salesforce