Regular expressions (regex) are patterns used to match character combinations in text, consisting of literal characters and metacharacters with special meanings. They enable efficient searching, extraction, and manipulation of text based on defined rules.
In Linux environments, regex is intergal to commend-line utilities like grep, sed, and awk, as well as scripting languages such as Python and Perl. Below are common regex constructs with examples.
Character Classes: Enclose characters in brackets [] to match any one of them. For instance, [aeiou] matches a single vowel.
grep '[aeiou]' document.txt
Wildcards: The dot . matches any single character, while * matches zero or more occurrences of the preceding element.
grep 'a.ple' document.txt # Matches 'apple', 'ample', etc.
grep 'app*le' document.txt # Matches 'aple', 'apple', 'appple', etc.
Ranges: Use a hyphen - within brackets to define a character range, like [0-9] for digits.
grep '[0-9]' data.log # Finds lines containing digits
grep -E '2023-09-07 (09:[4-5][0-9]|10:[0-5][0-9])' logs.txt # Matches timestamps from 09:40 to 10:59 on that date
Quantifiers: Curly braces {} specify exact repetitino counts for a pattern.
grep 'a{2,4}' text.txt # Matches 'aa', 'aaa', or 'aaaa', but not 'a' or 'aaaaa'
Escaping Special Characters: Prefix a metacharacter with a backslash \ to match it literally.
grep '\.' file.txt # Matches a period character '.'
Anchors: The caret ^ matches the start of a line, and the dollar sign $ matches the end.
grep '^apple' fruits.txt # Lines beginning with 'apple'
grep 'apple$' fruits.txt # Lines ending with 'apple'
Alternation: The pipe | acts as a logical OR to match one of multiple patterns.
grep 'apple|banana' list.txt # Matches lines containing 'apple' or 'banana'