SpareCodeSearch: Searching for Code Context When You Have No Spare GPU
Abstract
Keyword-search can effectively retrieve relevant code context in large codebases for CLMs, achieving competitive performance on code completion benchmarks without requiring extensive computational resources.
Retrieval-Augmented Generation (RAG) frameworks aim to enhance Code Language Models (CLMs) by including another module for retrieving relevant context to construct the input prompt. However, these retrieval modules commonly use semantic search, requiring substantial computational resources for training and hosting these embedded models, making them infeasible to integrate into lightweight applications such as in-IDE AI-based code completion. In this solution paper, we prove that using keyword-search is sufficient to retrieve relevant and useful code context inside large codebases, without the need for extensive GPU resources. The usefulness of code contexts found by our solution is demonstrated through their completion results on the Code Context Competition's benchmark, reaching 0.748 and 0.725 chRF scores on Kotlin and Python tracks, respectively.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper