Basics of Regular Expressions in JavaScript

Strings are one of the most commonly used data structures in programming, and the need to manipulate them is ubiquitous. For example, validating whether a string is a valid email address can be done by extracting the substring before and after the @ symbol and checking if they are words and domain names, respectively. However, this approach is cumbersome and not easily reusable.

Regular expressions (regex) are a powerful tool for matching strings. They use a descriptive language to define a pattern, and any string that conforms to this pattern is considered a match; otherwise, its deemed invalid.

To validate an email address, we can:

  1. Create a regex that matches an email format.
  2. Use this regex to check if the user's input is valid.

Since regular expressions are also represented as strings, let's first understand how to describe characters using other characters.

In regex, a character directly specified matches exactly that character. \d matches a digit, and \w matches a letter or digit. Therefore:

  • '00\d' matches '007' but not '00A'.
  • '\d\d\d' matches '010'.
  • '\w\w' matches 'js'.

The . character matches any single character, so:

  • 'js.' matches 'jsp', 'jss', 'js!', etc.

To match variable-length strings, regex uses:

  • * to match zero or more occurrences.
  • + to match one or more occurrences.
  • ? to match zero or one occurrence.
  • {n} to match exactly n occurrences.
  • {n,m} to match between n and m occurrences.

Consider the complex example: \d{3}\s+\d{3,8}.

Breaking it down:

  1. \d{3} matches exactly 3 digits, e.g., '010'.
  2. \s matches a whitespace character (including spaces and tabs), so \s+ matches one or more whitespaces, e.g., ' ', '\t\t'.
  3. \d{3,8} matches 3 to 8 digits, e.g., '1234567'.

Together, this regex matches a phone number with an area code separated by one or more spaces.

To match a phone number like '010-12345', the hyphen - needs to be escaped with a backslash, resulting in \d{3}-\d{3,8}. However, this still won't match '010 - 12345' due to the extra space. A more complex pattern is needed.

Advanced Patterns

For more precise matching, use [] to specify a range of characters. For example:

  • [0-9a-zA-Z_] matches a single digit, letter, or underscore.
  • [0-9a-zA-Z_]+ matches one or more digits, letters, or underscores, e.g., 'a100', '0_Z', 'js2015'.
  • [a-zA-Z_$][0-9a-zA-Z_$]* matches a string starting with a letter, underscore, or $, followed by any number of digits, letters, underscores, or $, which is a valid JavaScript variable name.
  • [a-zA-Z_$][0-9a-zA-Z_$]{0,19} limits the length of the variable to 1-20 characters.

A|B matches either A or B, so (J|j)ava(S|s)cript matches 'JavaScript', 'Javascript', 'javaScript', or 'javascript'.

^ denotes the start of a line, so ^\d ansures the string starts with a digit.

$ denotes the end of a line, so \d$ ensures the string ends with a digit.

Adding ^js$ restricts the entire string to exactly 'js'.

Using RegExp in JavaScript

With the basics covered, we can now use regular expressions in JavaScript. There are two ways to create a regex:

  1. Directly with /pattern/.
  2. Using new RegExp('pattern').

Both methods are equivalent:

var re1 = /ABC\-001/;
var re2 = new RegExp('ABC\\-001');

console.log(re1); // /ABC\-001/
console.log(re2); // /ABC\-001/

Note that in the second method, the backslash must be escaped in the string.

To check if a regex matches a string, use the test() method:

var re = /^\d{3}-\d{3,8}$/;
console.log(re.test('010-12345')); // true
console.log(re.test('010-1234x')); // false
console.log(re.test('010 12345')); // false

Splitting Strings

Using regex for splitting strings is more flexible than using fixed delimiters. For example:

console.log('a b   c'.split(' ')); // ['a', 'b', '', '', 'c']
console.log('a b   c'.split(/\s+/)); // ['a', 'b', 'c']
console.log('a,b, c  d'.split(/[\s,]+/)); // ['a', 'b', 'c', 'd']
console.log('a,b;; c  d'.split(/[\s,;]+/)); // ['a', 'b', 'c', 'd']

Grouping

Regex can also extract substrings using groups. Groups are defined with (). For example:

var re = /^(\d{3})-(\d{3,8})$/;
console.log(re.exec('010-12345')); // ['010-12345', '010', '12345']
console.log(re.exec('010 12345')); // null

If the regex defines groups, the exec() method returns an array where the first element is the full match, and subsequent elements are the matched groups.

Greedy vs. Non-Greedy Matching

By default, regex performs greedy matching, which means it matches as much as possible. For example, to match numbers followed by zeros:

var re = /^(\d+)(0*)$/;
console.log(re.exec('102300')); // ['102300', '102300', '']

To match non-greedily, add a ? after the quantifier:

var re = /^(\d+?)(0*)$/;
console.log(re.exec('102300')); // ['102300', '1023', '00']

Global Search

The g flag in regex enables global matching:

var r1 = /test/g;
var r2 = new RegExp('test', 'g');

Global matching allows multiple exec() calls to find all matches. The lastIndex property tracks the last match index:

var s = 'JavaScript, VBScript, JScript and ECMAScript';
var re = /[a-zA-Z]+Script/g;

console.log(re.exec(s)); // ['JavaScript']
console.log(re.lastIndex); // 10

console.log(re.exec(s)); // ['VBScript']
console.log(re.lastIndex); // 20

console.log(re.exec(s)); // ['JScript']
console.log(re.lastIndex); // 29

console.log(re.exec(s)); // ['ECMAScript']
console.log(re.lastIndex); // 44

console.log(re.exec(s)); // null

Global matching is similar to searching, so ^...$ patterns are not suitable for multiple matches.

Additional flags include i for case-insensitive matching and m for multi-line matching.

Summary

Regular expressions are a powerful tool, and mastering them requires practice and reference materials. If you frequently work with regex, consider consulting a comprehensive guide.

Tags: javascript RegularExpressions StringManipulation PatternMatching RegExp

Posted on Sun, 05 Jul 2026 16:59:30 +0000 by Steve Mellor