AutoCodeRover: AST-Aware Program Improvement
Zhang, Y., et al. (2024). AutoCodeRover: Autonomous Software Engineer with Spectrum-based Fault Localization. arXiv preprint arXiv:2404.05427.

AutoCodeRover represents a shift from "LLM-centric" coding agents toward "Software Engineering-centric" agents. While contemporary agents like SWE-agent treat a repository as a collection of text files, AutoCodeRover operates on a program representation - the Abstract Syntax Tree (AST). By combining the reasoning capabilities of LLMs with classical Spectrum-based Fault Localization (SBFL), the system achieved a 22.67% success rate on SWE-bench Lite with an average resolution time of under 10 minutes, significantly outperforming unconstrained iterative agents.
Spectrum-based Fault Localization (SBFL)
The primary technical contribution of AutoCodeRover is its use of SBFL to sharpen the agent's context. When a test suite is available, the system executes both passing and failing tests to analyze the program's control flow. It then applies statistical metrics - specifically the Ochiai formula - to assign "suspiciousness scores" to various methods. The Ochiai score for a method is calculated as: These scores act as a "spatial hint" for the agent. Instead of the LLM searching blindly through thousands of lines of code, the system provides a ranked list of suspicious methods. This reduces the search space by orders of magnitude, minimizing the "context window noise" that typically leads to hallucinated fixes in large repositories. For researchers, this proves that integrating classical debugging statistics into the LLM prompt is more effective than scaling the LLM's raw reasoning width.
Stratified Context Search and AST APIs
To navigate the codebase without overflowing the context window, AutoCodeRover utilizes a suite of Stratified Context Retrieval APIs. The agent does not "read" files in the traditional sense; instead, it executes targeted queries against the AST:
get_class_definition(class_name): Retrieves the skeleton of a class, including its method signatures and docstrings, without the implementation details.get_method_signature(method_name, class_name): Pinpoints a specific method's entry point and parameters.search_code_in_ast(code_snippet): Performs a semantic search rather than a regex search, identifying usages of specific logic patterns across the program.
This iterative process allows the agent to build a structural "mental model" of the program. If SBFL is unavailable (e.g., no existing tests), the agent defaults to this stratified search, starting from the issue description keywords and recursively expanding its context by "jumping" through method calls and class definitions.
Patch Validation and Reproducer Generation
The repair phase is governed by a strict Patch Validation loop designed to eliminate "silent failures." Once a potential fix is generated, the agent attempts to automatically generate a reproducer test - a minimal script that fails on the original code but passes on the patched code. The patch is subjected to three levels of verification:
- Linter Check: Ensures the patch is syntactically valid and adheres to PEP8 or similar standards.
- Reproducer Validation: Confirms the fix actually addresses the reported symptom.
- Regression Testing: Ensures the fix doesn't break existing functionality.
If a patch fails any of these gates, AutoCodeRover enters a self-correction loop (limited to 3 retries), using the error logs from the test runner to refine the fix. This closed-loop system treats the LLM as a "proposal engine" while the AST and test-runner act as the "truth engine." By achieving a resolution cost of $0.43 per task, AutoCodeRover demonstrates that the future of AI software engineering lies in the deep integration of symbolic program analysis and neural generation.
Join the EulerFold community
Track progress and collaborate on roadmaps with students worldwide.
Dive Deeper
AutoCodeRover Paper on arXiv
arXiv • article
Explore ResourceAutoCodeRover GitHub Repository
GitHub • code
Explore Resource
Discussion
0Join the discussion
Sign in to share your thoughts and technical insights.
Loading insights...
Recommended Readings
The author of this article utilized generative AI (Google Gemini 3.1 Pro) to assist in part of the drafting and editing process.
