Overview
Python's re module provides powerful tools to pattern matching within strings.
The basic syntax for matching is:
import re
result = re.match(pattern, string)
When using regular expressions, it's essential to prefix the pattern with an r to prevent Python from interpreting backslashes as escape characters.
Matching Single Characters
| Character | Description |
|---|---|
. |
Matches any single character except newline (\n) |
[...] |
Matches any one of the characters listed inside brackets |
\d |
Matches any digit (0-9) |
\D |
Matches any non-digit character |
\s |
Matches any whitespace character (space, tab) |
\S |
Matches any non-whitespace character |
\w |
Matches any word character (letters, digits, underscore) |
\W |
Matches any non-word character |
Examples
import re
# Match single character
match_result = re.match(r"This is a \d", "This is a 5")
print(match_result.group()) # Output: This is a 5
match_result = re.match(r"This is a [1-367]", "This is a 3")
print(match_result.group()) # Output: This is a 3
match_result = re.match(r"This is a \w", "This is a _")
print(match_result.group()) # Output: This is a _
Matching Multiple Characters
| Character | Description |
|---|---|
* |
Zero or more occurrences of the preceding character |
+ |
One or more occurrences of the preceding character |
? |
Zero or one occurrence of the preceding character |
{m} |
Exactly m occurrences of the preceding character |
{m,n} |
Between m and n occurrences of the preceding character |
Examples
import re
# Using quantifiers
match_result = re.match(r"This is a \d{1,3}", "This is a 123")
print(match_result.group()) # Output: This is a 123
match_result = re.match(r"\d{11}", "12345678901")
print(match_result.group()) # Output: 12345678901
# Non-greedy matching
match_result = re.match(r"021-?\d{8}", "02112345678")
print(match_result.group()) # Output: 02112345678
# Multiline matching
content = '''line1
line2
line3'''
match_result = re.match(r".*", content, re.S)
print(match_result.group()) # Output: line1\nline2\nline3
Matching Start and End
| Character | Description |
|---|---|
^ |
Matches the start of the string |
$ |
Matches the end of the string |
Example: Validating Variable Names
import re
def validate_names():
names = ["name1", "_name", "02nmae", "__name__", "name!", "name@#"]
for name in names:
match_result = re.match(r"^[a-zA-Z_][a-zA-Z0-9_]*$", name)
if match_result:
print(f"Valid variable: {name}")
else:
print(f"Invalid variable: {name}")
validate_names()
Grouping Patterns
| Character | Description |
|---|---|
| ` | ` |
(ab) |
Groups the enclosed pattern |
\num |
Refers to the matched content of group num |
(?P<name>) |
Creates a named group |
(?P=name) |
References a named group |
Examples
import re
# Basic grouping
match_result = re.match(r"([a-zA-Z0-9_]{4,20})@(sohu|qq)\.com", "hello@qq.com")
print(match_result.group(1)) # Output: hello
print(match_result.group(2)) # Output: qq
# Backreference
html = "<h1>hello world</h1>"
match_result = re.match(r"<(\w*)>.*</\1>", html)
print(match_result.group()) # Output: <h1>hello world</h1>
# Named groups
html = "<body><h1>hello world</h1></body>"
match_result = re.match(r"<(?P<tag1>\w*)><(?P<tag2>\w*)>.*</(?P=tag2)></(?P=tag1)>", html)
print(match_result.group()) # Output: <body><h1>hello world</h1></body>
Escaping Special Characters
To match literal special regex characters like ., ?, or +, use a backslash \\ before them.
Example: QQ Email Validation
import re
def validate_emails():
addresses = ["01552ahsgfhuag@qq.com", "5502@qq.com", "dss15@qq.com", "15sajhaj", "15613@qq.cn"]
for addr in addresses:
match_result = re.match(r"^[0-9a-zA-Z]{4,20}@qq\.com$", addr)
if match_result:
print(f"Valid email: {addr}")
else:
print(f"Invalid email: {addr}")
validate_emails()