Background
During routine development, addressing accumulated Sonar issues across legacy codebases presents a significant challenge. After several years of continuous development, thousands of issues have accumulated, ranging from code style violations to deprecated comments. Manually resolvign these issues proves time-consuming and tedious.
The possibility of leveraging AI for automated code remediation emerged as a potential solution. However, initial attempts with AI-generated unit tests produced mixed results, leading to hesitation about pursuing this approach further.
Discovering the SonarQube Web API
SonarQube provides a Web API that exposes detailed scan results programmatically. The scanning process records critical information including file paths, line numbers, and error descriptions. This API enables programmatic retrieval of all reported issues and supports advanced operations like custom queries and project configuration.
An example API response contains sufficient information to precisely locate errors in source code.
Approach One: Direct Code Submission
The first strategy involves fetching all issues through the Sonar Web API, then categorizing them by file and aggregating related information. Each file's source code combined with its associated issue list gets submitted to the AI model along with specific line numbers and error descriptions.
A typical prompt structure:
You are a senior Java developer with extensive experience in Sonar rule compliance.
Fix the code issues based on the provided source code and SonarQube error reports.
Requirements:
- Return only the corrected code
- Do not fabricate code or alter business logic
- Ensure code correctness and syntactical accuracy
- Modified code must be executable
- If variable names are duplicated, update all relevant context
Source Code:
<source_code>
Sonar Errors:
- Lines 1-10: xxxxxxxxxxx
- Lines 3-6: xxxxxxxxxxx
- Lines 8-20: xxxxxxxxxxx
- Lines 30-40: xxxxxxxxxxx
- Lines 50-70: xxxxxxxxxxx
This straightforward method has a critical limitation: AI models impose strict token limits on both input and output. Large files risk having their output truncated, making this approach impractical for extensive codebases.
Approach Two: Interval Merging Strategy
The second approach addresses the token limitation by focusing on minimizing API calls while maintaining sufficient context for accurate fixes. The observation that Sonar reports often contain overlapping issue ranges led to developing an interval merging strategy.
Rather than submitting every problem individually, overlapping issue ranges get merged using a greedy algorithm. This consolidates multiple issues into a single API call, reducing request frequency while providing comprehensive context for the AI model.
Revised prompt:
You are a senior Java developer with extensive experience in Sonar rule compliance.
Fix the code issues based on the provided source code and SonarQube error reports.
Requirements:
- Return only the corrected code
- Do not fabricate code or alter business logic
- Ensure code correctness and syntactical accuracy
- Modified code must be executable
- If variable names are duplicated, update all relevant context
- Return only the modified code for lines 1-20, ready for direct replacement
Source Code:
<source_code>
Sonar Errors:
- Lines 1-10: xxxxxxxxxxx
- Lines 3-6: xxxxxxxxxxx
- Lines 8-20: xxxxxxxxxxx
Implementing the interval merging algorithm:
def consolidate_issue_ranges(issue_list):
"""
Merge overlapping issue ranges using a greedy algorithm.
:param issue_list: List of tuples containing (start_line, end_line)
:return: List of merged ranges
"""
if not issue_list:
return []
sorted_ranges = sorted(issue_list, key=lambda x: x[0])
merged = [sorted_ranges[0]]
for current_start, current_end in sorted_ranges[1:]:
last_end = merged[-1][1]
if current_start <= last_end:
merged[-1] = (merged[-1][0], max(last_end, current_end))
else:
merged.append((current_start, current_end))
return merged
def fetch_sonar_issues(project_key, api_token):
"""
Retrieve all open issues from SonarQube for a specific project.
:param project_key: Sonar project identifier
:param api_token: Authentication token
:return: List of issue dictionaries
"""
base_url = "https://sonar.example.com/api/issues/search"
headers = {"Authorization": f"Bearer {api_token}"}
params = {
"componentKeys": project_key,
"statuses": "OPEN,CONFIRMED,REOPENED",
"ps": 500
}
all_issues = []
page = 1
while True:
params["p"] = page
response = requests.get(base_url, headers=headers, params=params)
data = response.json()
all_issues.extend(data.get("issues", []))
if page >= data.get("total", 0) // 500 + 1:
break
page += 1
return all_issues
def group_issues_by_file(issues):
"""
Organize issues by file path and merge overlapping ranges.
:param issues: List of Sonar issue dictionaries
:return: Dictionary mapping file paths to merged issue ranges
"""
file_issues = {}
for issue in issues:
file_path = issue.get("component", "").split(":")[-1]
line = issue.get("line", 1)
message = issue.get("message", "")
if file_path not in file_issues:
file_issues[file_path] = {
"ranges": [],
"details": []
}
file_issues[file_path]["ranges"].append((line, line))
file_issues[file_path]["details"].append({
"line": line,
"message": message
})
for fp in file_issues:
file_issues[fp]["merged_ranges"] = consolidate_issue_ranges(
file_issues[fp]["ranges"]
)
return file_issues
Integration Process
The workflow proceeds as follows:
- Retrieve all open issues from SonarQube via the Web API
- Group issues by source file
- Merge overlapping ranges within each file using the greedy algorithm
- For each merged range, read the corresponding source code
- Submit code and issue details to the AI model with targeted prompts
- Parse AI responses and apply corrections to source files
This approach significantly reduces the number of API calls while maintaining enough context for the AI to understand code structure and make appropriate corrections.
Considerations
The effectiveness of this automated approach depends on several factors: the quality of Sonar rule configurations, the complexity of the codebase, and the AI model's familiarity with the specific programming patterns used. Some issues, particularly those involving architectural decisions or extensive refactoring, may require human review and intervention.