Regular expressions look like someone fell asleep on their keyboard. I get it. The first time I saw ^(?:[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$ in a codebase, I assumed it was an encryption key that got committed by accident.

But regex is one of those tools that, once it clicks, saves you hours every single week. Find-and-replace across thousands of files. Validating user input before it hits your database. Parsing logs when your monitoring dashboard is down and you're staring at raw text at 2 AM.

This cheat sheet is the one I keep bookmarked. It starts with the fundamentals, builds up to real patterns you'll actually use, and includes a quick-reference table at the end so you can skim it in 30 seconds when you're in the middle of something. If you want to test any pattern as you read, the Regex Tester on this site runs patterns in real time with match highlighting.

The Basics: Literal Characters and Metacharacters#

At its simplest, a regex is just a search string. The pattern cat matches the literal text "cat" inside "concatenate", "category", or just "cat".

It becomes a real regex when you use metacharacters — characters that have special meaning:

| Metacharacter | Meaning | | ------------- | ------------------------------------------- | ----------- | | . | Any character except newline | | ^ | Start of string (or line in multiline mode) | | $ | End of string (or line in multiline mode) | | \ | Escape the next character | | | | OR operator | | () | Grouping | | [] | Character class |

If you need to match a literal dot, dollar sign, or any other metacharacter, escape it with a backslash: \. matches an actual period.

Character Classes#

Character classes let you match one character from a set of possibilities. They go inside square brackets.

regex

[abc]       # Matches 'a', 'b', or 'c'
[a-z]       # Matches any lowercase letter
[A-Z]       # Matches any uppercase letter
[0-9]       # Matches any digit
[a-zA-Z]    # Matches any letter
[a-zA-Z0-9] # Matches any alphanumeric character
[^abc]      # Matches any character EXCEPT a, b, or c

The caret ^ inside a character class means negation. Outside a character class, it means start of string. Context matters.

Shorthand Character Classes#

Regex provides shortcuts for the most common character classes:

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r\f\v]`	Any whitespace
`\S`	`[^ \t\n\r\f\v]`	Any non-whitespace

These are the ones you'll use constantly. \d{3} matches exactly three digits. \s+ matches one or more whitespace characters. Once these are muscle memory, you can read most regex patterns without slowing down.

Quantifiers: How Many?#

Quantifiers specify how many times the preceding element should occur.

Quantifier	Meaning
`*`	0 or more
`+`	1 or more
`?`	0 or 1 (optional)
`{n}`	Exactly n
`{n,}`	n or more
`{n,m}`	Between n and m (inclusive)

regex

colou?r       # Matches 'color' and 'colour'
\d{3}-\d{4}   # Matches '555-1234'
a{2,4}        # Matches 'aa', 'aaa', or 'aaaa'
\w+           # Matches one or more word characters

Greedy vs Lazy#

By default, quantifiers are greedy — they match as much text as possible. Add ? after a quantifier to make it lazy (match as little as possible).

regex

<.*>    # Greedy: matches '<div>hello</div>' as one match
<.*?>   # Lazy: matches '<div>' and '</div>' separately

This is one of the most common regex gotchas. When you're extracting content between delimiters, you almost always want the lazy version.

Anchors: Position Matters#

Anchors don't match characters — they match positions.

Anchor	Meaning
`^`	Start of string
`$`	End of string
`\b`	Word boundary
`\B`	Non-word boundary

regex

^Hello      # Matches 'Hello' only at the start
world$      # Matches 'world' only at the end
\bcat\b     # Matches 'cat' but NOT 'concatenate'

Word boundaries are incredibly useful. Without \b, searching for cat will match inside "concatenate", "scattered", and "education". With \bcat\b, you match the standalone word only.

Groups and Capturing#

Parentheses serve two purposes: grouping parts of a pattern and capturing matched text for later use.

regex

(abc)+          # Matches 'abc', 'abcabc', etc.
(red|blue|green) # Matches 'red', 'blue', or 'green'

Backreferences#

Captured groups are numbered starting at 1. You can reference them with \1, \2, etc.

regex

(\w+)\s+\1     # Matches repeated words: 'the the', 'is is'

This pattern is a classic. It captures a word, looks for whitespace, then checks if the exact same word appears again. It's how most "find duplicate words" features work in text editors.

Named Groups#

For readability, you can name your groups:

regex

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In JavaScript, named groups show up in match.groups.year, match.groups.month, etc. Much easier to maintain than remembering that group 1 is the year and group 3 is the day.

Non-Capturing Groups#

When you need grouping but don't need to capture:

regex

(?:https?|ftp)://   # Groups the protocol options without capturing

The ?: at the start tells the engine to skip storing this match. It's a small performance improvement, but more importantly it keeps your capture group numbering clean.

Lookaheads and Lookbehinds#

These are the patterns that make people say regex is unreadable. But they solve problems that nothing else can.

Lookaheads and lookbehinds check if something exists before or after the current position without including it in the match.

Pattern	Type	Meaning
`(?=...)`	Positive lookahead	What follows must match
`(?!...)`	Negative lookahead	What follows must NOT match
`(?<=...)`	Positive lookbehind	What precedes must match
`(?<!...)`	Negative lookbehind	What precedes must NOT match

regex

\d+(?=px)       # Matches digits followed by 'px': '12' in '12px'
\d+(?!px)       # Matches digits NOT followed by 'px'
(?<=\$)\d+      # Matches digits preceded by '$': '50' in '$50'
(?<!\$)\d+      # Matches digits NOT preceded by '$'

A practical example: password validation that requires at least one uppercase letter, one digit, and is at least 8 characters long.

regex

^(?=.*[A-Z])(?=.*\d).{8,}$

Each lookahead runs independently from the start of the string. (?=.*[A-Z]) checks that an uppercase letter exists somewhere. (?=.*\d) checks that a digit exists somewhere. Then .{8,} actually consumes the string and ensures it's at least 8 characters.

Real-World Patterns#

Here's where it all comes together. These are patterns I actually use in production.

Email Validation#

regex

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This covers the vast majority of valid email addresses. It's not RFC 5322 compliant (almost nothing is), but it catches obviously invalid input while allowing real addresses through.

[a-zA-Z0-9._%+-]+ — one or more valid local-part characters
@ — literal at symbol
[a-zA-Z0-9.-]+ — domain name
\.[a-zA-Z]{2,} — dot followed by TLD (2+ letters)

URL Matching#

regex

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

This matches HTTP and HTTPS URLs with optional www, domain, TLD, and path/query parameters.

Phone Numbers (US Format)#

regex

^(\+1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$

Matches these formats:

555-123-4567
(555) 123-4567
+1 555 123 4567
5551234567

Phone number validation is one of those problems where regex gets you 80% of the way. For international numbers with all their formatting variations, a dedicated library is the better choice.

IPv4 Address#

regex

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

This validates that each octet is between 0 and 255. Breaking it down:

25[0-5] — matches 250-255
2[0-4]\d — matches 200-249
[01]?\d\d? — matches 0-199

It's repeated three times with dots, then once more without the trailing dot.

Date (YYYY-MM-DD)#

regex

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Matches ISO 8601 date format with basic validation:

Month: 01-12
Day: 01-31

It won't catch February 30th — that's a semantic validation problem, not a pattern matching problem. Regex handles syntax; your application logic handles semantics.

HTML Tags#

regex

<\/?[\w\s]*>|<.+[\W]>

For simple tag matching. But honestly, don't parse HTML with regex. Use a proper parser. The regex above breaks on nested tags, self-closing tags with attributes, and about a hundred other edge cases. I include it because people always ask for it, and it's genuinely useful for quick log scanning — just not for production HTML processing.

Extracting Numbers from Text#

regex

-?\d+\.?\d*

Matches integers and decimals, including negative numbers: 42, -7, 3.14, -0.5.

Hex Color Codes#

regex

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Matches both 6-digit (#FF5733) and 3-digit (#F00) hex color codes.

Slug Validation (URL-friendly strings)#

regex

^[a-z0-9]+(?:-[a-z0-9]+)*$

Matches lowercase alphanumeric strings separated by single hyphens. No leading hyphens, no trailing hyphens, no consecutive hyphens.

Flags / Modifiers#

Flags change how the entire pattern behaves:

Flag	Name	Effect
`g`	Global	Find all matches, not just the first
`i`	Case-insensitive	`A` matches `a`
`m`	Multiline	`^` and `$` match line boundaries
`s`	Dotall	`.` matches newline characters too
`u`	Unicode	Enables full Unicode support

In JavaScript:

javascript

const pattern = /hello/gi; // Global, case-insensitive
const result = "Hello World Hello".match(pattern);
// Result: ['Hello', 'Hello']

Common Mistakes to Avoid#

Forgetting to escape metacharacters. If you want to match a literal period in a filename like file.txt, use file\.txt not file.txt (which matches fileTtxt, file5txt, etc.).

Not anchoring your pattern. Without ^ and $, the pattern \d{3} matches inside 12345 — it'll find 123 and 234 and 345. If you mean "exactly three digits and nothing else", write ^\d{3}$.

Catastrophic backtracking. Nested quantifiers like (a+)+ can cause exponential processing time on certain inputs. If your regex is hanging, look for nested repetition. A tool like the Regex Tester will highlight performance issues.

Over-engineering validation. A regex that tries to validate every possible edge case of an email address is 6,000 characters long. In practice, the simple version above catches 99.9% of bad input. Let the verification email handle the rest.

Quick-Reference Table#

Here's everything on one screen. Bookmark this section.

| Category | Pattern | ------------------- | Any character | Digit | Non-digit | Word char | Whitespace | Zero or more | One or more | Optional | Exact count | Range | Start of string | End of string | Word boundary | Capture group | Non-capture | Lookahead | Neg. lookahead | Lookbehind | OR | Matches | | ---------- | ---------------------------------- | ------------------ | | . | Everything except newline | | \d | 0-9 | | \D | Anything except 0-9 | | \w | a-z, A-Z, 0-9, _ | | \s | Space, tab, newline | | * | Preceding element, 0+ times | | + | Preceding element, 1+ times | | ? | Preceding element, 0 or 1 time | | {n} | Preceding element, exactly n times | | {n,m} | Preceding element, n to m times | | ^ | Position at start | | $ | Position at end | | \b | Between word and non-word char | | (...) | Groups and captures | | (?:...) | Groups without capturing | | (?=...) | Asserts what follows | | (?!...) | Asserts what does NOT follow | | (?<=...) | Asserts what precedes | | a | b | Matches a or b |

Testing Your Patterns#

Reading about regex and writing regex are two different skills. The fastest way to learn is to write a pattern, test it against sample text, and iterate. The Regex Tester tool breaks down your pattern visually and shows matches in real time, which makes it much easier to see what's happening than staring at the pattern in your head.

For more complex scripting with regex, the Code Playground lets you run JavaScript, Python, or any other language with full regex support — useful for testing backreferences and replacements in the context of actual code.

Regex isn't something you memorize once and know forever. It's something you look up, test, adjust, and look up again. That's fine. That's how everyone uses it, including people who've been writing regex for 20 years. Keep this cheat sheet in your bookmarks, and the next time you're staring at a wall of text that needs parsing, you'll know exactly where to start.

The Basics: Literal Characters and Metacharacters#

At its simplest, a regex is just a search string. The pattern cat matches the literal text "cat" inside "concatenate", "category", or just "cat".

It becomes a real regex when you use metacharacters — characters that have special meaning:

If you need to match a literal dot, dollar sign, or any other metacharacter, escape it with a backslash: \. matches an actual period.

Character Classes#

Character classes let you match one character from a set of possibilities. They go inside square brackets.

regex

[abc]       # Matches 'a', 'b', or 'c'
[a-z]       # Matches any lowercase letter
[A-Z]       # Matches any uppercase letter
[0-9]       # Matches any digit
[a-zA-Z]    # Matches any letter
[a-zA-Z0-9] # Matches any alphanumeric character
[^abc]      # Matches any character EXCEPT a, b, or c

The caret ^ inside a character class means negation. Outside a character class, it means start of string. Context matters.

Shorthand Character Classes#

Regex provides shortcuts for the most common character classes:

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r\f\v]`	Any whitespace
`\S`	`[^ \t\n\r\f\v]`	Any non-whitespace

Quantifiers: How Many?#

Quantifiers specify how many times the preceding element should occur.

Quantifier	Meaning
`*`	0 or more
`+`	1 or more
`?`	0 or 1 (optional)
`{n}`	Exactly n
`{n,}`	n or more
`{n,m}`	Between n and m (inclusive)

regex

colou?r       # Matches 'color' and 'colour'
\d{3}-\d{4}   # Matches '555-1234'
a{2,4}        # Matches 'aa', 'aaa', or 'aaaa'
\w+           # Matches one or more word characters

Greedy vs Lazy#

By default, quantifiers are greedy — they match as much text as possible. Add ? after a quantifier to make it lazy (match as little as possible).

regex

<.*>    # Greedy: matches '<div>hello</div>' as one match
<.*?>   # Lazy: matches '<div>' and '</div>' separately

This is one of the most common regex gotchas. When you're extracting content between delimiters, you almost always want the lazy version.

Anchors: Position Matters#

Anchors don't match characters — they match positions.

Anchor	Meaning
`^`	Start of string
`$`	End of string
`\b`	Word boundary
`\B`	Non-word boundary

regex

^Hello      # Matches 'Hello' only at the start
world$      # Matches 'world' only at the end
\bcat\b     # Matches 'cat' but NOT 'concatenate'

Word boundaries are incredibly useful. Without \b, searching for cat will match inside "concatenate", "scattered", and "education". With \bcat\b, you match the standalone word only.

Groups and Capturing#

Parentheses serve two purposes: grouping parts of a pattern and capturing matched text for later use.

regex

(abc)+          # Matches 'abc', 'abcabc', etc.
(red|blue|green) # Matches 'red', 'blue', or 'green'

Backreferences#

Captured groups are numbered starting at 1. You can reference them with \1, \2, etc.

regex

(\w+)\s+\1     # Matches repeated words: 'the the', 'is is'

This pattern is a classic. It captures a word, looks for whitespace, then checks if the exact same word appears again. It's how most "find duplicate words" features work in text editors.

Named Groups#

For readability, you can name your groups:

regex

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In JavaScript, named groups show up in match.groups.year, match.groups.month, etc. Much easier to maintain than remembering that group 1 is the year and group 3 is the day.

Non-Capturing Groups#

When you need grouping but don't need to capture:

regex

(?:https?|ftp)://   # Groups the protocol options without capturing

The ?: at the start tells the engine to skip storing this match. It's a small performance improvement, but more importantly it keeps your capture group numbering clean.

Lookaheads and Lookbehinds#

These are the patterns that make people say regex is unreadable. But they solve problems that nothing else can.

Lookaheads and lookbehinds check if something exists before or after the current position without including it in the match.

Pattern	Type	Meaning
`(?=...)`	Positive lookahead	What follows must match
`(?!...)`	Negative lookahead	What follows must NOT match
`(?<=...)`	Positive lookbehind	What precedes must match
`(?<!...)`	Negative lookbehind	What precedes must NOT match

regex

\d+(?=px)       # Matches digits followed by 'px': '12' in '12px'
\d+(?!px)       # Matches digits NOT followed by 'px'
(?<=\$)\d+      # Matches digits preceded by '$': '50' in '$50'
(?<!\$)\d+      # Matches digits NOT preceded by '$'

A practical example: password validation that requires at least one uppercase letter, one digit, and is at least 8 characters long.

regex

^(?=.*[A-Z])(?=.*\d).{8,}$

Real-World Patterns#

Here's where it all comes together. These are patterns I actually use in production.

Email Validation#

regex

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This covers the vast majority of valid email addresses. It's not RFC 5322 compliant (almost nothing is), but it catches obviously invalid input while allowing real addresses through.

[a-zA-Z0-9._%+-]+ — one or more valid local-part characters
@ — literal at symbol
[a-zA-Z0-9.-]+ — domain name
\.[a-zA-Z]{2,} — dot followed by TLD (2+ letters)

URL Matching#

regex

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

This matches HTTP and HTTPS URLs with optional www, domain, TLD, and path/query parameters.

Phone Numbers (US Format)#

regex

^(\+1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$

Matches these formats:

555-123-4567
(555) 123-4567
+1 555 123 4567
5551234567

Phone number validation is one of those problems where regex gets you 80% of the way. For international numbers with all their formatting variations, a dedicated library is the better choice.

IPv4 Address#

regex

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

This validates that each octet is between 0 and 255. Breaking it down:

25[0-5] — matches 250-255
2[0-4]\d — matches 200-249
[01]?\d\d? — matches 0-199

It's repeated three times with dots, then once more without the trailing dot.

Date (YYYY-MM-DD)#

regex

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Matches ISO 8601 date format with basic validation:

Month: 01-12
Day: 01-31

It won't catch February 30th — that's a semantic validation problem, not a pattern matching problem. Regex handles syntax; your application logic handles semantics.

HTML Tags#

regex

<\/?[\w\s]*>|<.+[\W]>

Extracting Numbers from Text#

regex

-?\d+\.?\d*

Matches integers and decimals, including negative numbers: 42, -7, 3.14, -0.5.

Hex Color Codes#

regex

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Matches both 6-digit (#FF5733) and 3-digit (#F00) hex color codes.

Slug Validation (URL-friendly strings)#

regex

^[a-z0-9]+(?:-[a-z0-9]+)*$

Matches lowercase alphanumeric strings separated by single hyphens. No leading hyphens, no trailing hyphens, no consecutive hyphens.

Flags / Modifiers#

Flags change how the entire pattern behaves:

Flag	Name	Effect
`g`	Global	Find all matches, not just the first
`i`	Case-insensitive	`A` matches `a`
`m`	Multiline	`^` and `$` match line boundaries
`s`	Dotall	`.` matches newline characters too
`u`	Unicode	Enables full Unicode support

In JavaScript:

javascript

const pattern = /hello/gi; // Global, case-insensitive
const result = "Hello World Hello".match(pattern);
// Result: ['Hello', 'Hello']

Common Mistakes to Avoid#

Forgetting to escape metacharacters. If you want to match a literal period in a filename like file.txt, use file\.txt not file.txt (which matches fileTtxt, file5txt, etc.).

Quick-Reference Table#

Here's everything on one screen. Bookmark this section.

| Category | Pattern | Matches | | ------------------- | ---------- | ---------------------------------- | ------------------ | | Any character | . | Everything except newline | | Digit | \d | 0-9 | | Non-digit | \D | Anything except 0-9 | | Word char | \w | a-z, A-Z, 0-9, _ | | Whitespace | \s | Space, tab, newline | | Zero or more | * | Preceding element, 0+ times | | One or more | + | Preceding element, 1+ times | | Optional | ? | Preceding element, 0 or 1 time | | Exact count | {n} | Preceding element, exactly n times | | Range | {n,m} | Preceding element, n to m times | | Start of string | ^ | Position at start | | End of string | $ | Position at end | | Word boundary | \b | Between word and non-word char | | Capture group | (...) | Groups and captures | | Non-capture | (?:...) | Groups without capturing | | Lookahead | (?=...) | Asserts what follows | | Neg. lookahead | (?!...) | Asserts what does NOT follow | | Lookbehind | (?<=...) | Asserts what precedes | | OR | a | b | Matches a or b |

The Basics: Literal Characters and Metacharacters#

Character Classes#

Shorthand Character Classes#

Quantifiers: How Many?#

Greedy vs Lazy#

Anchors: Position Matters#

Groups and Capturing#

Backreferences#

Named Groups#

Non-Capturing Groups#

Lookaheads and Lookbehinds#

Real-World Patterns#

Email Validation#

URL Matching#

Phone Numbers (US Format)#

IPv4 Address#

Date (YYYY-MM-DD)#

HTML Tags#

Extracting Numbers from Text#

Hex Color Codes#

Slug Validation (URL-friendly strings)#

Flags / Modifiers#

Common Mistakes to Avoid#

Quick-Reference Table#

Testing Your Patterns#

Похожие записи

JSON YAML Converter Practical Workflow Guide

JSON Validator Guide for Cleaner Data

The Basics: Literal Characters and Metacharacters#

Character Classes#

Shorthand Character Classes#

Quantifiers: How Many?#

Greedy vs Lazy#

Anchors: Position Matters#

Groups and Capturing#

Backreferences#

Named Groups#

Non-Capturing Groups#

Lookaheads and Lookbehinds#

Real-World Patterns#

Email Validation#

URL Matching#

Phone Numbers (US Format)#

IPv4 Address#

Date (YYYY-MM-DD)#

HTML Tags#

Extracting Numbers from Text#

Hex Color Codes#

Slug Validation (URL-friendly strings)#

Flags / Modifiers#

Common Mistakes to Avoid#

Quick-Reference Table#

Testing Your Patterns#

Похожие записи

JSON YAML Converter Practical Workflow Guide

JSON Validator Guide for Cleaner Data