Implementing Diagnostic Analyzers with Roslyn for Regular Expression Validation

Using Syntax Trees for Code Analysis

When developing code analysis rules, such as detecting invalid regular expressions, understanding syntax trees becomes essential. The Roslyn platform provides powerful tools for examining source code structure.

To visualize syntax trees, utilize the Roslyn Syntax Visualizer available through View > Other Windows > Syntax Visualizer in Visual Studio. This tool helps identify different components of your code structure.

Consider the following example code:

namespace SampleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            Regex.Match("sample text", @"\pXXX");
        }
    }
}

When examining this code with the Syntax Visualizer, you'll notice three distinct visual elements:

  1. Blue nodes represent syntax elements in the code structure
  2. Green tokens indicate lexical elements identified by the compiler, including keywords, identifiers, and symbols
  3. Red trivia represents non-essential content like whitespace and comments

Creating a DiagnosticAnalyzer

To implement a custom rule analyzer, create a class that inherits from DiagnosticAnalyzer. The following example demonstrates a basic structure for analyzing method invocations:

public class RegexValidationAnalyzer : DiagnosticAnalyzer
{
    public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics 
    {
        get { return ImmutableArray.Create(Rule); }
    }
    
    public override void Initialize(AnalysisContext context)
    {
        context.RegisterSyntaxNodeAction(
          AnalyzeMethodInvocations, SyntaxKind.InvocationExpression);
    }
}

The AnalysisContext provides several registration methods for different analysis scenarios:

  • RegisterSyntaxNodeAction - Triggers when analyzing specific syntax node types
  • RegisterSymbolAction - Triggers when analyzing specific symbol types
  • RegisterSyntaxTreeAction - Triggers when analyzing entire syntax trees
  • RegisterSemanticModelAction - Triggers when semantic models are available for entire files
  • RegisterCodeBlockStartAction/EndAction - Triggers before/after analyzing method bodies or other code blocks
  • RegisterCompilationStartAction/EndAction - Triggers before/after analyzing entire projects

Implementing the Analysis Logic

The following implementation analyzes method invocations to detect invalid regular expressions:

/// <summary>
/// Analyzes method invocations to detect invalid regular expressions
/// </summary>
/// <param name="context">The syntax node analysis context</param>
private void AnalyzeMethodInvocations(SyntaxNodeAnalysisContext context)
{
    // Check if the node is an invocation expression
    var invocationExpression = (InvocationExpressionSyntax)context.Node;
    
    // Extract the member access expression
    var memberAccess = invocationExpression.Expression as MemberAccessExpressionSyntax;
    if (memberAccess?.Name.Identifier.ValueText != "Match") return;
    
    // Verify this is Regex.Match method
    var methodSymbol = context.SemanticModel.GetSymbolInfo(memberAccess).Symbol as IMethodSymbol;
    if (!methodSymbol?.ToString().StartsWith(
      "System.Text.RegularExpressions.Regex.Match") ?? true) return;
    
    // Check for sufficient arguments
    var arguments = invocationExpression.ArgumentList as ArgumentListSyntax;
    if ((arguments?.Arguments.Count ?? 0) < 2) return;
    
    // Extract the regex pattern parameter
    var patternExpression = arguments.Arguments[1].Expression as LiteralExpressionSyntax;
    if (patternExpression == null) return;
    
    // Get the constant value of the pattern
    var patternValue = context.SemanticModel.GetConstantValue(patternExpression);
    if (!patternValue.HasValue) return;
    
    var regexPattern = patternValue.Value as string;
    if (regexPattern == null) return;
    
    // Test the regex pattern
    try
    {
        System.Text.RegularExpressions.Regex.Match("", regexPattern);
    }
    catch (ArgumentException exception)
    {
        // Report diagnostic if pattern is invalid
        var diagnostic = Diagnostic.Create(Rule, patternExpression.GetLocation(), exception.Message);
        context.ReportDiagnostic(diagnostic);
    }
}

This implementation follows a systematic approach to validate regular expressions by checking method signatures, argument counts, and pattern validity, reporting diagnostics when invalid patterns are detected.

Reference: https://docs.microsoft.com/en-us/dotnet/roslyn/

Tags: Roslyn DiagnosticAnalyzer Code Analysis regular expressions Syntax Trees

Posted on Fri, 15 May 2026 02:09:05 +0000 by kaushikgotecha