Understanding nom::sequence::preceded in Rust

This means preceded will execute the first parser, successfully match the input, discard the first parser's result, and then continue with the second parser, returning its result.

Basic Usage Example

use nom::{
    character::complete::{digit1, char},
    combinator::opt,
    sequence::preceded,
    IResult,
};

fn parse_number(input: &str) -> IResult<&str, &str> {
    preceded(
        opt(alt((char('+'), char('-')))), // Optional + or - sign
        digit1, // At least one digit
    )(input)
}

In this example, preceded takes two parsers:

  1. The first parser: opt(alt((char('+'), char('-')))) matches an optional sign (+ or -).
  2. The second parser: digit1 matches at least one digit.

The preceded combinator first attempts to match a sign, then continues to match at least one digit. The result from the first parser (the sign) is discarded, and only the result from the second parser (the digits) is returned.

Enhanced Implementation with Sign Information

To capture sign information (whether the number is negative) while still using preceded, we need to modify our approach to handle the sign separately:

use nom::{
    character::complete::{digit1, char},
    combinator::{map, opt},
    sequence::preceded,
    IResult,
};

#[derive(Debug, PartialEq)]
pub enum Token {
    Integer64(i64),
    // Other token types
}

fn integer64(input: &str) -> IResult<&str, (Token, bool)> {
    // Check if the first character is '-' to determine negativity
    let is_negative = input.chars().next() == Some('-');
    
    // Parse numbers with optional sign
    let parse_number = alt((
        map(preceded(char('-'), digit1), |s: &str| Token::Integer64(s.parse().unwrap())),
        map(preceded(char('+'), digit1), |s: &str| Token::Integer64(s.parse().unwrap())),
        map(digit1, |s: &str| Token::Integer64(s.parse().unwrap())),
    ));
    
    parse_number(input).map(|(next_input, token)| (next_input, (token, is_negative)))
}

Explanation of the Enhanced Implementation

This modified appproach handles sign information by:

  1. Checking the first character to determine if it's a negative sign.
  2. Using alt to handle three cases:
    • Negative numbers: preceded(char('-'), digit1)
    • Positive numbers: preceded(char('+'), digit1)
    • Numbers without sign: digit1
  3. Returning both the parsed token and a boolean indicating negativity.

Testing the Implementation

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_integer64() {
        let test_cases = vec![
            ("123", (Token::Integer64(123), false)),
            ("-123", (Token::Integer64(123), true)),
            ("+123", (Token::Integer64(123), false)),
            ("0", (Token::Integer64(0), false)),
        ];

        for (input, expected) in test_cases {
            let result = integer64(input);
            assert!(result.is_ok());
            let (_, (token, is_negative)) = result.unwrap();
            assert_eq!(token, expected.0);
            assert_eq!(is_negative, expected.1);
        }
    }

    #[test]
    fn test_invalid_integer64() {
        let invalid_inputs = vec![
            "123abc",  // Invalid number
            "-abc",    // Invalid number
            "+",       // No digits
        ];

        for input in invalid_inputs {
            let result = integer64(input);
            assert!(result.is_err());
        }
    }
}

Key Takeaways

  • preceded is useful when you need to match a prefix but don't need its value.
  • When you need to preserve both prefix and value information, consider using other combinators like pair or handling the prefix separately.
  • alt combined with preceded provides flexibility in handling different cases of prefix pattterns.

Tags: rust nom parser-combinators sequence-combinators

Posted on Fri, 22 May 2026 16:20:31 +0000 by purtip3154