This is the regex cheat sheet I keep open in a tab every single day. Not the academic one that explains the theory of finite automata. Not the one that lists every obscure flag nobody uses. This is the one with the patterns I actually reach for when I'm writing code, validating input, or parsing logs at 2 AM.

I've organized everything from basic building blocks to production-ready patterns you can copy and use immediately. Every example is tested. Every pattern is explained. And if you want to try any of them live, you can paste them straight into the regex tester on akousa.net to see matches highlighted in real time.

Basic Syntax — The Building Blocks#

Before the cheat sheet makes sense, you need six concepts. That's it. Six.

Literal Characters#

Most characters match themselves. The pattern cat matches the string "cat" in "concatenate". Nothing fancy.

Pattern: cat
Matches: "The cat sat" → "cat"
         "concatenate" → "cat"

Metacharacters#

These characters have special meaning and need to be escaped with a backslash if you want to match them literally:

.  ^  $  *  +  ?  {  }  [  ]  (  )  |  \

To match a literal dot, use \. instead of . which matches any character.

Character Classes#

Square brackets define a set of characters to match:

[abc]      → matches a, b, or c
[a-z]      → matches any lowercase letter
[A-Z]      → matches any uppercase letter
[0-9]      → matches any digit
[a-zA-Z]   → matches any letter
[^abc]     → matches anything EXCEPT a, b, or c

Shorthand Character Classes#

These save you from writing out full character classes:

\d    → any digit           (same as [0-9])
\D    → any non-digit       (same as [^0-9])
\w    → any word character   (same as [a-zA-Z0-9_])
\W    → any non-word char    (same as [^a-zA-Z0-9_])
\s    → any whitespace       (space, tab, newline)
\S    → any non-whitespace
.     → any character except newline

Quantifiers#

Quantifiers control how many times a token repeats:

*       → 0 or more          a* matches "", "a", "aaa"
+       → 1 or more          a+ matches "a", "aaa" but NOT ""
?       → 0 or 1             a? matches "" or "a"
{3}     → exactly 3          a{3} matches "aaa"
{2,5}   → between 2 and 5    a{2,5} matches "aa" through "aaaaa"
{3,}    → 3 or more          a{3,} matches "aaa", "aaaa", etc.

Anchors#

Anchors match positions, not characters:

^       → start of string    ^Hello matches "Hello world"
$       → end of string      world$ matches "Hello world"
\b      → word boundary      \bcat\b matches "cat" but not "concatenate"
\B      → non-word boundary  \Bcat\B matches "concatenate" but not "cat"

Grouping and Alternation#

Parentheses create groups. The pipe character creates alternatives.

Capture Groups#

(abc)       → captures "abc" as group 1
(a)(b)(c)   → three groups: group 1 = "a", group 2 = "b", group 3 = "c"
(ab|cd)     → matches "ab" or "cd"

Non-Capturing Groups#

When you need grouping but don't need the captured value:

(?:abc)     → groups "abc" without capturing

This matters for performance when you have many groups but only care about some of them.

Backreferences#

Refer back to a previously captured group:

(a]b)\1     → matches "abab" (group 1 captured "ab", \1 repeats it)
(\w+)\s+\1  → matches repeated words like "the the" or "is is"

Named Groups#

Give your groups meaningful names instead of numbers:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In JavaScript, access via match.groups.year, match.groups.month, match.groups.day.

Lookaheads and Lookbehinds#

These assert what comes before or after the current position without consuming characters. They are incredibly powerful for complex validation.

Positive Lookahead#

Matches if the pattern ahead exists:

\d+(?= dollars)    → matches "100" in "100 dollars" but not "100 euros"

Negative Lookahead#

Matches if the pattern ahead does NOT exist:

\d+(?! dollars)    → matches "100" in "100 euros" but not in "100 dollars"

Positive Lookbehind#

Matches if the pattern behind exists:

(?<=\$)\d+         → matches "50" in "$50" but not in "50"

Negative Lookbehind#

Matches if the pattern behind does NOT exist:

(?<!\$)\d+         → matches "50" in "50 items" but not in "$50"

Combining Lookaheads for Password Validation#

This is where lookaheads shine. The classic password rule (at least 8 characters, one uppercase, one lowercase, one digit, one special character):

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*])[A-Za-z\d!@#$%^&*]{8,}$

Each (?=.*X) asserts that character class X exists somewhere in the string. The final part [...]{8,} enforces the minimum length and allowed characters.

Flags and Modifiers#

Flags change how the engine interprets your pattern:

g    → global: find all matches, not just the first
i    → case-insensitive: /hello/i matches "Hello", "HELLO", "hello"
m    → multiline: ^ and $ match start/end of each line, not just the string
s    → dotAll: . matches newline characters too
u    → unicode: enables full Unicode matching
y    → sticky: matches only at lastIndex position

In JavaScript:

javascript

const regex = /pattern/gi;
// or
const regex = new RegExp("pattern", "gi");

The Copy-Paste Section — Production-Ready Patterns#

These are patterns you can use right now. I've tested each one against edge cases. Paste them into the akousa.net regex tester if you want to verify them against your own data.

Email Validation#

The pragmatic version that covers 99.9% of real email addresses:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

What it matches: user@example.com, first.last+tag@sub.domain.org, user123@company.co.uk

What it rejects: @missing-local.com, no-at-sign.com, spaces in@email.com

Note: The RFC 5322 compliant regex is over 6,000 characters long. Nobody uses it. The pattern above is what production applications actually use.

URL Validation#

Matches HTTP and HTTPS URLs including paths, query strings, and fragments:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

What it matches: https://example.com, http://www.test.co.uk/path?q=1&r=2, https://sub.domain.com/page#section

Phone Numbers (International)#

A flexible pattern for international phone numbers:

^\+?(\d{1,3})?[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}$

What it matches: +1-234-567-8900, (234) 567-8900, +44 20 7946 0958, 1234567890

For strict US/Canada formatting:

^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

IP Addresses#

IPv4 with proper octet validation (0-255):

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

What it matches: 192.168.1.1, 10.0.0.255, 0.0.0.0

What it rejects: 256.1.1.1, 192.168.1.999, 1.2.3.4.5

IPv6 (simplified):

^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$

Date Formats#

ISO 8601 date (YYYY-MM-DD):

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

US date (MM/DD/YYYY):

^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$

European date (DD/MM/YYYY):

^(0[1-9]|[12]\d|3[01])\/(0[1-9]|1[0-2])\/\d{4}$

Time Formats#

24-hour time (HH:MM or HH:MM:SS):

^([01]\d|2[0-3]):([0-5]\d)(:[0-5]\d)?$

12-hour time with AM/PM:

^(0?[1-9]|1[0-2]):[0-5]\d\s?(AM|PM|am|pm)$

Credit Card Numbers#

Visa:

^4\d{12}(\d{3})?$

Mastercard:

^(5[1-5]\d{4}|2(2[2-9]\d{2}|[3-6]\d{3}|7[01]\d{2}|720\d)\d{12})$

Any major card (Visa, MC, Amex, Discover):

^(?:4\d{12}(?:\d{3})?|5[1-5]\d{14}|3[47]\d{13}|6(?:011|5\d{2})\d{12})$

Hex Color Codes#

Matches 3-digit and 6-digit hex colors with optional hash:

^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

What it matches: #FFF, #ff5733, abc123, #000

Slug / URL-Friendly String#

^[a-z0-9]+(-[a-z0-9]+)*$

What it matches: my-blog-post, hello-world-123, about

What it rejects: My Blog Post, --double-dash, trailing-

Username Validation#

Alphanumeric, 3-20 characters, underscores and hyphens allowed but not at start or end:

^[a-zA-Z0-9]([a-zA-Z0-9_-]{1,18}[a-zA-Z0-9])?$

Strong Password#

At least 8 characters, requires uppercase, lowercase, digit, and special character:

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Text Processing Patterns#

These patterns are for parsing and transforming text, not just validating it.

Extract All Numbers From a String#

-?\d+\.?\d*

Handles integers, decimals, and negative numbers. Applied to "Price: $19.99, discount: -5.50, qty: 3", this captures 19.99, -5.50, and 3.

Remove HTML Tags#

<[^>]+>

Replace matches with an empty string to strip HTML. Applied to "Hello world", removing matches gives "Hello world".

Warning: Do not use regex to parse complex HTML documents. Use a proper HTML parser. This pattern is fine for simple sanitization tasks.

Extract Content Between Quotes#

Double quotes:

"([^"]*)"

Single or double quotes:

(['"])([^'"]*)\1

Match Markdown Headers#

^#{1,6}\s+(.+)$

With the m (multiline) flag, this matches # Title, ## Subtitle, through ###### Deepest heading.

Find Duplicate Words#

\b(\w+)\s+\1\b

Catches "the the", "is is", "and and" -- common typos in writing.

Match CSV Values#

(?:^|,)("(?:[^"]|"")*"|[^,]*)

Handles quoted fields containing commas and escaped quotes.

Extract Domain From URL#

https?:\/\/(?:www\.)?([^\/\s]+)

Group 1 captures the domain. Applied to "https://www.example.com/path", group 1 is example.com.

Match Whitespace-Only Lines#

^\s*$

With the m flag, matches lines that are empty or contain only spaces and tabs. Useful for cleaning up text files.

JavaScript Regex Methods#

Here's how to actually use these patterns in JavaScript code.

test() -- Boolean Check#

javascript

const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
 
emailRegex.test("user@example.com"); // true
emailRegex.test("not-an-email"); // false

match() -- Find Matches#

javascript

const text = "Call 555-1234 or 555-5678";
const phones = text.match(/\d{3}-\d{4}/g);
// ['555-1234', '555-5678']

matchAll() -- Iterate With Capture Groups#

javascript

const text = "Date: 2026-03-27, Updated: 2026-04-01";
const dates = [...text.matchAll(/(\d{4})-(\d{2})-(\d{2})/g)];
 
for (const match of dates) {
  console.log(`Full: ${match[0]}, Year: ${match[1]}, Month: ${match[2]}, Day: ${match[3]}`);
}

replace() -- Search and Replace#

javascript

// Censor credit card numbers, keep last 4 digits
const text = "Card: 4111-1111-1111-1234";
const censored = text.replace(/\d{4}-\d{4}-\d{4}-(\d{4})/, "****-****-****-$1");
// 'Card: ****-****-****-1234'

split() -- Split With Pattern#

javascript

const csv = "one, two , three,  four";
const values = csv.split(/\s*,\s*/);
// ['one', 'two', 'three', 'four']

Named Groups in JavaScript#

javascript

const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-03-27".match(dateRegex);
 
console.log(match.groups.year); // '2026'
console.log(match.groups.month); // '03'
console.log(match.groups.day); // '27'

Python Regex Quick Reference#

Python's re module uses the same core syntax with slightly different API calls.

python

import re
 
# Search (first match)
match = re.search(r'\d+', 'abc 123 def')
if match:
    print(match.group())  # '123'
 
# Find all matches
numbers = re.findall(r'\d+', 'abc 12 def 345')
# ['12', '345']
 
# Substitution
clean = re.sub(r'<[^>]+>', '', '<p>Hello</p>')
# 'Hello'
 
# Compile for reuse
pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
pattern.match('user@example.com')  # Match object

Key difference: Python raw strings (r'...') prevent backslash escaping issues. Always use raw strings for regex patterns in Python.

Common Mistakes and How to Avoid Them#

Greedy vs. Lazy Matching#

By default, quantifiers are greedy -- they match as much as possible.

Pattern: <.+>
Input:   <b>bold</b>
Greedy:  <b>bold</b>      (matches everything)
Lazy:    <b> and </b>      (matches smallest possible)

Add ? after a quantifier to make it lazy:

<.+?>    → lazy version of <.+>
\d+?     → lazy version of \d+
.*?      → lazy version of .*

Forgetting to Escape Special Characters#

These will not work as expected:

192.168.1.1     → the dots match ANY character
$100            → $ means end of string
file.txt        → the dot matches any character

Correct versions:

192\.168\.1\.1
\$100
file\.txt

Catastrophic Backtracking#

Certain patterns can cause the regex engine to take exponentially long on specific inputs. The classic example:

(a+)+b

On an input like aaaaaaaaaaaaaaaaac, the engine tries every possible way to divide the as between the inner and outer + before concluding there's no match. This can freeze your application.

Rules to avoid backtracking problems:

Never nest quantifiers like (a+)+ or (a*)*
Use atomic groups or possessive quantifiers when available
Be specific -- [a-z]+ is safer than .+
Test edge cases with long non-matching strings

Anchoring Your Patterns#

If you're validating input, always anchor both ends:

\d+        → matches "123" inside "abc123def" (probably not what you want)
^\d+$      → matches only if the ENTIRE string is digits

Forgetting anchors in validation patterns is one of the most common security bugs in web applications.

Regex in Different Languages -- Syntax Differences#

Most regex syntax is shared across languages, but there are notable exceptions:

Lookbehind Support#

JavaScript: Variable-length lookbehind supported since ES2018
Python: Fixed-length lookbehind only ((?<=ab) works, (?<=a+) does not)
Java: Limited variable-length lookbehind
Go: No lookbehind at all (RE2 engine)

Named Groups#

JavaScript:  (?<name>pattern)   → match.groups.name
Python:      (?P<name>pattern)  → match.group('name')
Java:        (?<name>pattern)   → matcher.group("name")
PHP:         (?P<name>pattern)  → $matches['name']

Flags#

JavaScript:  /pattern/gi
Python:      re.compile(r'pattern', re.IGNORECASE | re.MULTILINE)
Java:        Pattern.compile("pattern", Pattern.CASE_INSENSITIVE)
PHP:         '/pattern/gi'
Go:          (?i)pattern  (inline flag only)

Advanced Patterns for Real-World Use#

Log Line Parser#

Extract timestamp, level, and message from common log formats:

^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z?)\s+\[?(INFO|WARN|ERROR|DEBUG)\]?\s+(.+)$

Applied to 2026-03-27T14:30:00.123Z [ERROR] Connection timeout after 30s, this captures the timestamp, level, and message as separate groups.

Semantic Version Matching#

^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

Matches 1.0.0, 2.1.3-beta.1, 1.0.0-alpha+build.123 according to the SemVer spec.

JSON String Extraction#

"(?:[^"\\]|\\.)*"

Properly handles escaped quotes inside JSON strings. Matches "hello", "say \"hi\"", "path\\to\\file".

Markdown Link Extraction#

\[([^\]]+)\]\(([^)]+)\)

Group 1 is the link text, group 2 is the URL. Applied to [Click here](https://example.com), captures Click here and https://example.com.

CSS Hex and RGB Color Matching#

#(?:[0-9a-fA-F]{3}){1,2}\b|rgb\(\s*\d{1,3}\s*,\s*\d{1,3}\s*,\s*\d{1,3}\s*\)

Matches both #ff5733 and rgb(255, 87, 51).

File Path Matching#

Unix paths:

^(\/[a-zA-Z0-9._-]+)+\/?$

Windows paths:

^[a-zA-Z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$

Performance Tips#

Regex engines are fast, but poorly written patterns can be orders of magnitude slower than well-written ones.

Be Specific#

Slow:   .*foo.*
Fast:   [^f]*foo.*

The more specific your character classes, the fewer paths the engine needs to explore.

Avoid Unnecessary Capture Groups#

Slow:   (https?):\/\/(www\.)?([^\/]+)
Fast:   https?:\/\/(?:www\.)?[^\/]+

If you don't need the captured values, use non-capturing groups (?:...) or remove the parentheses entirely.

Compile Once, Use Many Times#

In any language, if you're using the same regex in a loop, compile it once outside the loop:

javascript

// Bad: recompiles every iteration
for (const line of lines) {
  if (line.match(/^\d{4}-\d{2}-\d{2}/)) { ... }
}
 
// Good: compile once
const datePattern = /^\d{4}-\d{2}-\d{2}/;
for (const line of lines) {
  if (datePattern.test(line)) { ... }
}

Use Anchors When Possible#

Anchored patterns (^...$) let the engine fail fast on non-matching strings instead of trying every position in the string.

Test Your Patterns#

Writing a regex pattern without testing it is like writing code without running it. You need to see what matches, what doesn't, and what matches by accident.

The regex tester tool on akousa.net lets you type a pattern, paste in test strings, and see matches highlighted instantly. It supports all JavaScript regex flags, shows capture group contents, and provides a match breakdown so you can debug complex patterns step by step. No signup required, nothing to install, and your data stays in your browser.

Bookmark it. You'll use it more than you think.

Quick Reference Table#

Here's the complete cheat sheet in the most compact form possible. Print it, pin it, or keep it in a browser tab.

Characters: . any char, \d digit, \w word char, \s whitespace, \b word boundary

Quantifiers: * 0+, + 1+, ? 0-1, {n} exactly n, {n,m} n to m, {n,} n or more

Groups: (x) capture, (?:x) non-capture, (?<name>x) named, \1 backreference

Assertions: ^ start, $ end, (?=x) lookahead, (?!x) neg lookahead, (?<=x) lookbehind, (?<!x) neg lookbehind

Flags: g global, i case-insensitive, m multiline, s dotAll, u unicode

Escapes: \. literal dot, \\ literal backslash, \* literal asterisk -- escape any metacharacter with \

That covers every regex concept you'll encounter in day-to-day development. The patterns in this cheat sheet handle the vast majority of validation, parsing, and text processing tasks. For anything exotic, the fundamentals above give you enough understanding to construct or decode whatever pattern you come across.

Keep building. Keep testing. And when a regex makes no sense, break it apart one token at a time -- it always clicks eventually.

Basic Syntax — The Building Blocks#

Before the cheat sheet makes sense, you need six concepts. That's it. Six.

Literal Characters#

Most characters match themselves. The pattern cat matches the string "cat" in "concatenate". Nothing fancy.

Pattern: cat
Matches: "The cat sat" → "cat"
         "concatenate" → "cat"

Metacharacters#

These characters have special meaning and need to be escaped with a backslash if you want to match them literally:

.  ^  $  *  +  ?  {  }  [  ]  (  )  |  \

To match a literal dot, use \. instead of . which matches any character.

Character Classes#

Square brackets define a set of characters to match:

[abc]      → matches a, b, or c
[a-z]      → matches any lowercase letter
[A-Z]      → matches any uppercase letter
[0-9]      → matches any digit
[a-zA-Z]   → matches any letter
[^abc]     → matches anything EXCEPT a, b, or c

Shorthand Character Classes#

These save you from writing out full character classes:

\d    → any digit           (same as [0-9])
\D    → any non-digit       (same as [^0-9])
\w    → any word character   (same as [a-zA-Z0-9_])
\W    → any non-word char    (same as [^a-zA-Z0-9_])
\s    → any whitespace       (space, tab, newline)
\S    → any non-whitespace
.     → any character except newline

Quantifiers#

Quantifiers control how many times a token repeats:

*       → 0 or more          a* matches "", "a", "aaa"
+       → 1 or more          a+ matches "a", "aaa" but NOT ""
?       → 0 or 1             a? matches "" or "a"
{3}     → exactly 3          a{3} matches "aaa"
{2,5}   → between 2 and 5    a{2,5} matches "aa" through "aaaaa"
{3,}    → 3 or more          a{3,} matches "aaa", "aaaa", etc.

Anchors#

Anchors match positions, not characters:

^       → start of string    ^Hello matches "Hello world"
$       → end of string      world$ matches "Hello world"
\b      → word boundary      \bcat\b matches "cat" but not "concatenate"
\B      → non-word boundary  \Bcat\B matches "concatenate" but not "cat"

Grouping and Alternation#

Parentheses create groups. The pipe character creates alternatives.

Capture Groups#

(abc)       → captures "abc" as group 1
(a)(b)(c)   → three groups: group 1 = "a", group 2 = "b", group 3 = "c"
(ab|cd)     → matches "ab" or "cd"

Non-Capturing Groups#

When you need grouping but don't need the captured value:

(?:abc)     → groups "abc" without capturing

This matters for performance when you have many groups but only care about some of them.

Backreferences#

Refer back to a previously captured group:

(a]b)\1     → matches "abab" (group 1 captured "ab", \1 repeats it)
(\w+)\s+\1  → matches repeated words like "the the" or "is is"

Named Groups#

Give your groups meaningful names instead of numbers:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In JavaScript, access via match.groups.year, match.groups.month, match.groups.day.

Lookaheads and Lookbehinds#

These assert what comes before or after the current position without consuming characters. They are incredibly powerful for complex validation.

Positive Lookahead#

Matches if the pattern ahead exists:

\d+(?= dollars)    → matches "100" in "100 dollars" but not "100 euros"

Negative Lookahead#

Matches if the pattern ahead does NOT exist:

\d+(?! dollars)    → matches "100" in "100 euros" but not in "100 dollars"

Positive Lookbehind#

Matches if the pattern behind exists:

(?<=\$)\d+         → matches "50" in "$50" but not in "50"

Negative Lookbehind#

Matches if the pattern behind does NOT exist:

(?<!\$)\d+         → matches "50" in "50 items" but not in "$50"

Combining Lookaheads for Password Validation#

This is where lookaheads shine. The classic password rule (at least 8 characters, one uppercase, one lowercase, one digit, one special character):

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*])[A-Za-z\d!@#$%^&*]{8,}$

Each (?=.*X) asserts that character class X exists somewhere in the string. The final part [...]{8,} enforces the minimum length and allowed characters.

Flags and Modifiers#

Flags change how the engine interprets your pattern:

g    → global: find all matches, not just the first
i    → case-insensitive: /hello/i matches "Hello", "HELLO", "hello"
m    → multiline: ^ and $ match start/end of each line, not just the string
s    → dotAll: . matches newline characters too
u    → unicode: enables full Unicode matching
y    → sticky: matches only at lastIndex position

In JavaScript:

javascript

const regex = /pattern/gi;
// or
const regex = new RegExp("pattern", "gi");

The Copy-Paste Section — Production-Ready Patterns#

These are patterns you can use right now. I've tested each one against edge cases. Paste them into the akousa.net regex tester if you want to verify them against your own data.

Email Validation#

The pragmatic version that covers 99.9% of real email addresses:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

What it matches: user@example.com, first.last+tag@sub.domain.org, user123@company.co.uk

What it rejects: @missing-local.com, no-at-sign.com, spaces in@email.com

Note: The RFC 5322 compliant regex is over 6,000 characters long. Nobody uses it. The pattern above is what production applications actually use.

URL Validation#

Matches HTTP and HTTPS URLs including paths, query strings, and fragments:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

What it matches: https://example.com, http://www.test.co.uk/path?q=1&r=2, https://sub.domain.com/page#section

Phone Numbers (International)#

A flexible pattern for international phone numbers:

^\+?(\d{1,3})?[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}$

What it matches: +1-234-567-8900, (234) 567-8900, +44 20 7946 0958, 1234567890

For strict US/Canada formatting:

^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

IP Addresses#

IPv4 with proper octet validation (0-255):

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

What it matches: 192.168.1.1, 10.0.0.255, 0.0.0.0

What it rejects: 256.1.1.1, 192.168.1.999, 1.2.3.4.5

IPv6 (simplified):

^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$

Date Formats#

ISO 8601 date (YYYY-MM-DD):

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

US date (MM/DD/YYYY):

^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$

European date (DD/MM/YYYY):

^(0[1-9]|[12]\d|3[01])\/(0[1-9]|1[0-2])\/\d{4}$

Time Formats#

24-hour time (HH:MM or HH:MM:SS):

^([01]\d|2[0-3]):([0-5]\d)(:[0-5]\d)?$

12-hour time with AM/PM:

^(0?[1-9]|1[0-2]):[0-5]\d\s?(AM|PM|am|pm)$

Credit Card Numbers#

Visa:

^4\d{12}(\d{3})?$

Mastercard:

^(5[1-5]\d{4}|2(2[2-9]\d{2}|[3-6]\d{3}|7[01]\d{2}|720\d)\d{12})$

Any major card (Visa, MC, Amex, Discover):

^(?:4\d{12}(?:\d{3})?|5[1-5]\d{14}|3[47]\d{13}|6(?:011|5\d{2})\d{12})$

Hex Color Codes#

Matches 3-digit and 6-digit hex colors with optional hash:

^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

What it matches: #FFF, #ff5733, abc123, #000

Slug / URL-Friendly String#

^[a-z0-9]+(-[a-z0-9]+)*$

What it matches: my-blog-post, hello-world-123, about

What it rejects: My Blog Post, --double-dash, trailing-

Username Validation#

Alphanumeric, 3-20 characters, underscores and hyphens allowed but not at start or end:

^[a-zA-Z0-9]([a-zA-Z0-9_-]{1,18}[a-zA-Z0-9])?$

Strong Password#

At least 8 characters, requires uppercase, lowercase, digit, and special character:

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Text Processing Patterns#

These patterns are for parsing and transforming text, not just validating it.

Extract All Numbers From a String#

-?\d+\.?\d*

Handles integers, decimals, and negative numbers. Applied to "Price: $19.99, discount: -5.50, qty: 3", this captures 19.99, -5.50, and 3.

Remove HTML Tags#

<[^>]+>

Replace matches with an empty string to strip HTML. Applied to "Hello world", removing matches gives "Hello world".

Warning: Do not use regex to parse complex HTML documents. Use a proper HTML parser. This pattern is fine for simple sanitization tasks.

Extract Content Between Quotes#

Double quotes:

"([^"]*)"

Single or double quotes:

(['"])([^'"]*)\1

Match Markdown Headers#

^#{1,6}\s+(.+)$

With the m (multiline) flag, this matches # Title, ## Subtitle, through ###### Deepest heading.

Find Duplicate Words#

\b(\w+)\s+\1\b

Catches "the the", "is is", "and and" -- common typos in writing.

Match CSV Values#

(?:^|,)("(?:[^"]|"")*"|[^,]*)

Handles quoted fields containing commas and escaped quotes.

Extract Domain From URL#

https?:\/\/(?:www\.)?([^\/\s]+)

Group 1 captures the domain. Applied to "https://www.example.com/path", group 1 is example.com.

Match Whitespace-Only Lines#

^\s*$

With the m flag, matches lines that are empty or contain only spaces and tabs. Useful for cleaning up text files.

JavaScript Regex Methods#

Here's how to actually use these patterns in JavaScript code.

test() -- Boolean Check#

javascript

const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
 
emailRegex.test("user@example.com"); // true
emailRegex.test("not-an-email"); // false

match() -- Find Matches#

javascript

const text = "Call 555-1234 or 555-5678";
const phones = text.match(/\d{3}-\d{4}/g);
// ['555-1234', '555-5678']

matchAll() -- Iterate With Capture Groups#

javascript

const text = "Date: 2026-03-27, Updated: 2026-04-01";
const dates = [...text.matchAll(/(\d{4})-(\d{2})-(\d{2})/g)];
 
for (const match of dates) {
  console.log(`Full: ${match[0]}, Year: ${match[1]}, Month: ${match[2]}, Day: ${match[3]}`);
}

replace() -- Search and Replace#

javascript

// Censor credit card numbers, keep last 4 digits
const text = "Card: 4111-1111-1111-1234";
const censored = text.replace(/\d{4}-\d{4}-\d{4}-(\d{4})/, "****-****-****-$1");
// 'Card: ****-****-****-1234'

split() -- Split With Pattern#

javascript

const csv = "one, two , three,  four";
const values = csv.split(/\s*,\s*/);
// ['one', 'two', 'three', 'four']

Named Groups in JavaScript#

javascript

const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-03-27".match(dateRegex);
 
console.log(match.groups.year); // '2026'
console.log(match.groups.month); // '03'
console.log(match.groups.day); // '27'

Python Regex Quick Reference#

Python's re module uses the same core syntax with slightly different API calls.

python

import re
 
# Search (first match)
match = re.search(r'\d+', 'abc 123 def')
if match:
    print(match.group())  # '123'
 
# Find all matches
numbers = re.findall(r'\d+', 'abc 12 def 345')
# ['12', '345']
 
# Substitution
clean = re.sub(r'<[^>]+>', '', '<p>Hello</p>')
# 'Hello'
 
# Compile for reuse
pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
pattern.match('user@example.com')  # Match object

Key difference: Python raw strings (r'...') prevent backslash escaping issues. Always use raw strings for regex patterns in Python.

Common Mistakes and How to Avoid Them#

Greedy vs. Lazy Matching#

By default, quantifiers are greedy -- they match as much as possible.

Pattern: <.+>
Input:   <b>bold</b>
Greedy:  <b>bold</b>      (matches everything)
Lazy:    <b> and </b>      (matches smallest possible)

Add ? after a quantifier to make it lazy:

<.+?>    → lazy version of <.+>
\d+?     → lazy version of \d+
.*?      → lazy version of .*

Forgetting to Escape Special Characters#

These will not work as expected:

192.168.1.1     → the dots match ANY character
$100            → $ means end of string
file.txt        → the dot matches any character

Correct versions:

192\.168\.1\.1
\$100
file\.txt

Catastrophic Backtracking#

Certain patterns can cause the regex engine to take exponentially long on specific inputs. The classic example:

(a+)+b

On an input like aaaaaaaaaaaaaaaaac, the engine tries every possible way to divide the as between the inner and outer + before concluding there's no match. This can freeze your application.

Rules to avoid backtracking problems:

Never nest quantifiers like (a+)+ or (a*)*
Use atomic groups or possessive quantifiers when available
Be specific -- [a-z]+ is safer than .+
Test edge cases with long non-matching strings

Anchoring Your Patterns#

If you're validating input, always anchor both ends:

\d+        → matches "123" inside "abc123def" (probably not what you want)
^\d+$      → matches only if the ENTIRE string is digits

Forgetting anchors in validation patterns is one of the most common security bugs in web applications.

Regex in Different Languages -- Syntax Differences#

Most regex syntax is shared across languages, but there are notable exceptions:

Lookbehind Support#

JavaScript: Variable-length lookbehind supported since ES2018
Python: Fixed-length lookbehind only ((?<=ab) works, (?<=a+) does not)
Java: Limited variable-length lookbehind
Go: No lookbehind at all (RE2 engine)

Named Groups#

JavaScript:  (?<name>pattern)   → match.groups.name
Python:      (?P<name>pattern)  → match.group('name')
Java:        (?<name>pattern)   → matcher.group("name")
PHP:         (?P<name>pattern)  → $matches['name']

Flags#

JavaScript:  /pattern/gi
Python:      re.compile(r'pattern', re.IGNORECASE | re.MULTILINE)
Java:        Pattern.compile("pattern", Pattern.CASE_INSENSITIVE)
PHP:         '/pattern/gi'
Go:          (?i)pattern  (inline flag only)

Advanced Patterns for Real-World Use#

Log Line Parser#

Extract timestamp, level, and message from common log formats:

^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z?)\s+\[?(INFO|WARN|ERROR|DEBUG)\]?\s+(.+)$

Applied to 2026-03-27T14:30:00.123Z [ERROR] Connection timeout after 30s, this captures the timestamp, level, and message as separate groups.

Semantic Version Matching#

^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

Matches 1.0.0, 2.1.3-beta.1, 1.0.0-alpha+build.123 according to the SemVer spec.

JSON String Extraction#

"(?:[^"\\]|\\.)*"

Properly handles escaped quotes inside JSON strings. Matches "hello", "say \"hi\"", "path\\to\\file".

Markdown Link Extraction#

\[([^\]]+)\]\(([^)]+)\)

Group 1 is the link text, group 2 is the URL. Applied to [Click here](https://example.com), captures Click here and https://example.com.

CSS Hex and RGB Color Matching#

#(?:[0-9a-fA-F]{3}){1,2}\b|rgb\(\s*\d{1,3}\s*,\s*\d{1,3}\s*,\s*\d{1,3}\s*\)

Matches both #ff5733 and rgb(255, 87, 51).

File Path Matching#

Unix paths:

^(\/[a-zA-Z0-9._-]+)+\/?$

Windows paths:

^[a-zA-Z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$

Performance Tips#

Regex engines are fast, but poorly written patterns can be orders of magnitude slower than well-written ones.

Be Specific#

Slow:   .*foo.*
Fast:   [^f]*foo.*

The more specific your character classes, the fewer paths the engine needs to explore.

Avoid Unnecessary Capture Groups#

Slow:   (https?):\/\/(www\.)?([^\/]+)
Fast:   https?:\/\/(?:www\.)?[^\/]+

If you don't need the captured values, use non-capturing groups (?:...) or remove the parentheses entirely.

Compile Once, Use Many Times#

In any language, if you're using the same regex in a loop, compile it once outside the loop:

javascript

// Bad: recompiles every iteration
for (const line of lines) {
  if (line.match(/^\d{4}-\d{2}-\d{2}/)) { ... }
}
 
// Good: compile once
const datePattern = /^\d{4}-\d{2}-\d{2}/;
for (const line of lines) {
  if (datePattern.test(line)) { ... }
}