Comprehensive Guide to Using Regular Expressions in Python

Overview

Python's re module provides powerful tools to pattern matching within strings.

The basic syntax for matching is:

import re
result = re.match(pattern, string)

When using regular expressions, it's essential to prefix the pattern with an r to prevent Python from interpreting backslashes as escape characters.

Matching Single Characters

Character Description
. Matches any single character except newline (\n)
[...] Matches any one of the characters listed inside brackets
\d Matches any digit (0-9)
\D Matches any non-digit character
\s Matches any whitespace character (space, tab)
\S Matches any non-whitespace character
\w Matches any word character (letters, digits, underscore)
\W Matches any non-word character

Examples

import re

# Match single character
match_result = re.match(r"This is a \d", "This is a 5")
print(match_result.group())  # Output: This is a 5

match_result = re.match(r"This is a [1-367]", "This is a 3")
print(match_result.group())  # Output: This is a 3

match_result = re.match(r"This is a \w", "This is a _")
print(match_result.group())  # Output: This is a _

Matching Multiple Characters

Character Description
* Zero or more occurrences of the preceding character
+ One or more occurrences of the preceding character
? Zero or one occurrence of the preceding character
{m} Exactly m occurrences of the preceding character
{m,n} Between m and n occurrences of the preceding character

Examples

import re

# Using quantifiers
match_result = re.match(r"This is a \d{1,3}", "This is a 123")
print(match_result.group())  # Output: This is a 123

match_result = re.match(r"\d{11}", "12345678901")
print(match_result.group())  # Output: 12345678901

# Non-greedy matching
match_result = re.match(r"021-?\d{8}", "02112345678")
print(match_result.group())  # Output: 02112345678

# Multiline matching
content = '''line1
line2
line3'''
match_result = re.match(r".*", content, re.S)
print(match_result.group())  # Output: line1\nline2\nline3

Matching Start and End

Character Description
^ Matches the start of the string
$ Matches the end of the string

Example: Validating Variable Names

import re

def validate_names():
    names = ["name1", "_name", "02nmae", "__name__", "name!", "name@#"]
    for name in names:
        match_result = re.match(r"^[a-zA-Z_][a-zA-Z0-9_]*$", name)
        if match_result:
            print(f"Valid variable: {name}")
        else:
            print(f"Invalid variable: {name}")

validate_names()

Grouping Patterns

Character Description
` `
(ab) Groups the enclosed pattern
\num Refers to the matched content of group num
(?P<name>) Creates a named group
(?P=name) References a named group

Examples

import re

# Basic grouping
match_result = re.match(r"([a-zA-Z0-9_]{4,20})@(sohu|qq)\.com", "hello@qq.com")
print(match_result.group(1))  # Output: hello
print(match_result.group(2))  # Output: qq

# Backreference
html = "<h1>hello world</h1>"
match_result = re.match(r"<(\w*)>.*</\1>", html)
print(match_result.group())  # Output: <h1>hello world</h1>

# Named groups
html = "<body><h1>hello world</h1></body>"
match_result = re.match(r"<(?P<tag1>\w*)><(?P<tag2>\w*)>.*</(?P=tag2)></(?P=tag1)>", html)
print(match_result.group())  # Output: <body><h1>hello world</h1></body>

Escaping Special Characters

To match literal special regex characters like ., ?, or +, use a backslash \\ before them.

Example: QQ Email Validation

import re

def validate_emails():
    addresses = ["01552ahsgfhuag@qq.com", "5502@qq.com", "dss15@qq.com", "15sajhaj", "15613@qq.cn"]
    for addr in addresses:
        match_result = re.match(r"^[0-9a-zA-Z]{4,20}@qq\.com$", addr)
        if match_result:
            print(f"Valid email: {addr}")
        else:
            print(f"Invalid email: {addr}")

validate_emails()

Tags: python regular expressions regex string matching pattern matching

Posted on Thu, 07 May 2026 05:15:27 +0000 by Sealr0x