Using Syntax Trees for Code Analysis
When developing code analysis rules, such as detecting invalid regular expressions, understanding syntax trees becomes essential. The Roslyn platform provides powerful tools for examining source code structure.
To visualize syntax trees, utilize the Roslyn Syntax Visualizer available through View > Other Windows > Syntax Visualizer in Visual Studio. This tool helps identify different components of your code structure.
Consider the following example code:
namespace SampleApp
{
class Program
{
static void Main(string[] args)
{
Regex.Match("sample text", @"\pXXX");
}
}
}When examining this code with the Syntax Visualizer, you'll notice three distinct visual elements:
- Blue nodes represent syntax elements in the code structure
- Green tokens indicate lexical elements identified by the compiler, including keywords, identifiers, and symbols
- Red trivia represents non-essential content like whitespace and comments
Creating a DiagnosticAnalyzer
To implement a custom rule analyzer, create a class that inherits from DiagnosticAnalyzer. The following example demonstrates a basic structure for analyzing method invocations:
public class RegexValidationAnalyzer : DiagnosticAnalyzer
{
public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics
{
get { return ImmutableArray.Create(Rule); }
}
public override void Initialize(AnalysisContext context)
{
context.RegisterSyntaxNodeAction(
AnalyzeMethodInvocations, SyntaxKind.InvocationExpression);
}
}The AnalysisContext provides several registration methods for different analysis scenarios:
- RegisterSyntaxNodeAction - Triggers when analyzing specific syntax node types
- RegisterSymbolAction - Triggers when analyzing specific symbol types
- RegisterSyntaxTreeAction - Triggers when analyzing entire syntax trees
- RegisterSemanticModelAction - Triggers when semantic models are available for entire files
- RegisterCodeBlockStartAction/EndAction - Triggers before/after analyzing method bodies or other code blocks
- RegisterCompilationStartAction/EndAction - Triggers before/after analyzing entire projects
Implementing the Analysis Logic
The following implementation analyzes method invocations to detect invalid regular expressions:
/// <summary>
/// Analyzes method invocations to detect invalid regular expressions
/// </summary>
/// <param name="context">The syntax node analysis context</param>
private void AnalyzeMethodInvocations(SyntaxNodeAnalysisContext context)
{
// Check if the node is an invocation expression
var invocationExpression = (InvocationExpressionSyntax)context.Node;
// Extract the member access expression
var memberAccess = invocationExpression.Expression as MemberAccessExpressionSyntax;
if (memberAccess?.Name.Identifier.ValueText != "Match") return;
// Verify this is Regex.Match method
var methodSymbol = context.SemanticModel.GetSymbolInfo(memberAccess).Symbol as IMethodSymbol;
if (!methodSymbol?.ToString().StartsWith(
"System.Text.RegularExpressions.Regex.Match") ?? true) return;
// Check for sufficient arguments
var arguments = invocationExpression.ArgumentList as ArgumentListSyntax;
if ((arguments?.Arguments.Count ?? 0) < 2) return;
// Extract the regex pattern parameter
var patternExpression = arguments.Arguments[1].Expression as LiteralExpressionSyntax;
if (patternExpression == null) return;
// Get the constant value of the pattern
var patternValue = context.SemanticModel.GetConstantValue(patternExpression);
if (!patternValue.HasValue) return;
var regexPattern = patternValue.Value as string;
if (regexPattern == null) return;
// Test the regex pattern
try
{
System.Text.RegularExpressions.Regex.Match("", regexPattern);
}
catch (ArgumentException exception)
{
// Report diagnostic if pattern is invalid
var diagnostic = Diagnostic.Create(Rule, patternExpression.GetLocation(), exception.Message);
context.ReportDiagnostic(diagnostic);
}
}This implementation follows a systematic approach to validate regular expressions by checking method signatures, argument counts, and pattern validity, reporting diagnostics when invalid patterns are detected.
Reference: https://docs.microsoft.com/en-us/dotnet/roslyn/