Master regular expressions with this comprehensive cheat sheet. Learn regex patterns, syntax, and real-world examples with a free online regex tester.
Regular expressions look like someone fell asleep on their keyboard. I get it. The first time I saw ^(?:[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$ in a codebase, I assumed it was an encryption key that got committed by accident.
But regex is one of those tools that, once it clicks, saves you hours every single week. Find-and-replace across thousands of files. Validating user input before it hits your database. Parsing logs when your monitoring dashboard is down and you're staring at raw text at 2 AM.
This cheat sheet is the one I keep bookmarked. It starts with the fundamentals, builds up to real patterns you'll actually use, and includes a quick-reference table at the end so you can skim it in 30 seconds when you're in the middle of something. If you want to test any pattern as you read, the Regex Tester on this site runs patterns in real time with match highlighting.
At its simplest, a regex is just a search string. The pattern cat matches the literal text "cat" inside "concatenate", "category", or just "cat".
It becomes a real regex when you use metacharacters — characters that have special meaning:
| Metacharacter | Meaning |
|---|---|
. | Any character except newline |
^ | Start of string (or line in multiline mode) |
$ | End of string (or line in multiline mode) |
\ | Escape the next character |
| ` | ` |
() | Grouping |
[] | Character class |
If you need to match a literal dot, dollar sign, or any other metacharacter, escape it with a backslash: \. matches an actual period.
Character classes let you match one character from a set of possibilities. They go inside square brackets.
[abc] # Matches 'a', 'b', or 'c'
[a-z] # Matches any lowercase letter
[A-Z] # Matches any uppercase letter
[0-9] # Matches any digit
[a-zA-Z] # Matches any letter
[a-zA-Z0-9] # Matches any alphanumeric character
[^abc] # Matches any character EXCEPT a, b, or cThe caret ^ inside a character class means negation. Outside a character class, it means start of string. Context matters.
Regex provides shortcuts for the most common character classes:
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d | [0-9] | Any digit |
\D | [^0-9] | Any non-digit |
\w | [a-zA-Z0-9_] | Any word character |
\W | [^a-zA-Z0-9_] | Any non-word character |
\s | [ \t\n\r\f\v] | Any whitespace |
\S | [^ \t\n\r\f\v] | Any non-whitespace |
These are the ones you'll use constantly. \d{3} matches exactly three digits. \s+ matches one or more whitespace characters. Once these are muscle memory, you can read most regex patterns without slowing down.
Quantifiers specify how many times the preceding element should occur.
| Quantifier | Meaning |
|---|---|
* | 0 or more |
+ | 1 or more |
? | 0 or 1 (optional) |
{n} | Exactly n |
{n,} | n or more |
{n,m} | Between n and m (inclusive) |
colou?r # Matches 'color' and 'colour'
\d{3}-\d{4} # Matches '555-1234'
a{2,4} # Matches 'aa', 'aaa', or 'aaaa'
\w+ # Matches one or more word charactersBy default, quantifiers are greedy — they match as much text as possible. Add ? after a quantifier to make it lazy (match as little as possible).
<.*> # Greedy: matches '<div>hello</div>' as one match
<.*?> # Lazy: matches '<div>' and '</div>' separatelyThis is one of the most common regex gotchas. When you're extracting content between delimiters, you almost always want the lazy version.
Anchors don't match characters — they match positions.
| Anchor | Meaning |
|---|---|
^ | Start of string |
$ | End of string |
\b | Word boundary |
\B | Non-word boundary |
^Hello # Matches 'Hello' only at the start
world$ # Matches 'world' only at the end
\bcat\b # Matches 'cat' but NOT 'concatenate'Word boundaries are incredibly useful. Without \b, searching for cat will match inside "concatenate", "scattered", and "education". With \bcat\b, you match the standalone word only.
Parentheses serve two purposes: grouping parts of a pattern and capturing matched text for later use.
(abc)+ # Matches 'abc', 'abcabc', etc.
(red|blue|green) # Matches 'red', 'blue', or 'green'Captured groups are numbered starting at 1. You can reference them with \1, \2, etc.
(\w+)\s+\1 # Matches repeated words: 'the the', 'is is'This pattern is a classic. It captures a word, looks for whitespace, then checks if the exact same word appears again. It's how most "find duplicate words" features work in text editors.
For readability, you can name your groups:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})In JavaScript, named groups show up in match.groups.year, match.groups.month, etc. Much easier to maintain than remembering that group 1 is the year and group 3 is the day.
When you need grouping but don't need to capture:
(?:https?|ftp):// # Groups the protocol options without capturingThe ?: at the start tells the engine to skip storing this match. It's a small performance improvement, but more importantly it keeps your capture group numbering clean.
These are the patterns that make people say regex is unreadable. But they solve problems that nothing else can.
Lookaheads and lookbehinds check if something exists before or after the current position without including it in the match.
| Pattern | Type | Meaning |
|---|---|---|
(?=...) | Positive lookahead | What follows must match |
(?!...) | Negative lookahead | What follows must NOT match |
(?<=...) | Positive lookbehind | What precedes must match |
(?<!...) | Negative lookbehind | What precedes must NOT match |
\d+(?=px) # Matches digits followed by 'px': '12' in '12px'
\d+(?!px) # Matches digits NOT followed by 'px'
(?<=\$)\d+ # Matches digits preceded by '$': '50' in '$50'
(?<!\$)\d+ # Matches digits NOT preceded by '$'A practical example: password validation that requires at least one uppercase letter, one digit, and is at least 8 characters long.
^(?=.*[A-Z])(?=.*\d).{8,}$Each lookahead runs independently from the start of the string. (?=.*[A-Z]) checks that an uppercase letter exists somewhere. (?=.*\d) checks that a digit exists somewhere. Then .{8,} actually consumes the string and ensures it's at least 8 characters.
Here's where it all comes together. These are patterns I actually use in production.
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$This covers the vast majority of valid email addresses. It's not RFC 5322 compliant (almost nothing is), but it catches obviously invalid input while allowing real addresses through.
[a-zA-Z0-9._%+-]+ — one or more valid local-part characters@ — literal at symbol[a-zA-Z0-9.-]+ — domain name\.[a-zA-Z]{2,} — dot followed by TLD (2+ letters)https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)This matches HTTP and HTTPS URLs with optional www, domain, TLD, and path/query parameters.
^(\+1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}$Matches these formats:
555-123-4567(555) 123-4567+1 555 123 45675551234567Phone number validation is one of those problems where regex gets you 80% of the way. For international numbers with all their formatting variations, a dedicated library is the better choice.
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$This validates that each octet is between 0 and 255. Breaking it down:
25[0-5] — matches 250-2552[0-4]\d — matches 200-249[01]?\d\d? — matches 0-199It's repeated three times with dots, then once more without the trailing dot.
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$Matches ISO 8601 date format with basic validation:
It won't catch February 30th — that's a semantic validation problem, not a pattern matching problem. Regex handles syntax; your application logic handles semantics.
<\/?[\w\s]*>|<.+[\W]>For simple tag matching. But honestly, don't parse HTML with regex. Use a proper parser. The regex above breaks on nested tags, self-closing tags with attributes, and about a hundred other edge cases. I include it because people always ask for it, and it's genuinely useful for quick log scanning — just not for production HTML processing.
-?\d+\.?\d*Matches integers and decimals, including negative numbers: 42, -7, 3.14, -0.5.
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$Matches both 6-digit (#FF5733) and 3-digit (#F00) hex color codes.
^[a-z0-9]+(?:-[a-z0-9]+)*$Matches lowercase alphanumeric strings separated by single hyphens. No leading hyphens, no trailing hyphens, no consecutive hyphens.
Flags change how the entire pattern behaves:
| Flag | Name | Effect |
|---|---|---|
g | Global | Find all matches, not just the first |
i | Case-insensitive | A matches a |
m | Multiline | ^ and $ match line boundaries |
s | Dotall | . matches newline characters too |
u | Unicode | Enables full Unicode support |
In JavaScript:
const pattern = /hello/gi; // Global, case-insensitive
const result = "Hello World Hello".match(pattern);
// Result: ['Hello', 'Hello']Forgetting to escape metacharacters. If you want to match a literal period in a filename like file.txt, use file\.txt not file.txt (which matches fileTtxt, file5txt, etc.).
Not anchoring your pattern. Without ^ and $, the pattern \d{3} matches inside 12345 — it'll find 123 and 234 and 345. If you mean "exactly three digits and nothing else", write ^\d{3}$.
Catastrophic backtracking. Nested quantifiers like (a+)+ can cause exponential processing time on certain inputs. If your regex is hanging, look for nested repetition. A tool like the Regex Tester will highlight performance issues.
Over-engineering validation. A regex that tries to validate every possible edge case of an email address is 6,000 characters long. In practice, the simple version above catches 99.9% of bad input. Let the verification email handle the rest.
Here's everything on one screen. Bookmark this section.
| Category | Pattern | Matches |
|---|---|---|
| Any character | . | Everything except newline |
| Digit | \d | 0-9 |
| Non-digit | \D | Anything except 0-9 |
| Word char | \w | a-z, A-Z, 0-9, _ |
| Whitespace | \s | Space, tab, newline |
| Zero or more | * | Preceding element, 0+ times |
| One or more | + | Preceding element, 1+ times |
| Optional | ? | Preceding element, 0 or 1 time |
| Exact count | {n} | Preceding element, exactly n times |
| Range | {n,m} | Preceding element, n to m times |
| Start of string | ^ | Position at start |
| End of string | $ | Position at end |
| Word boundary | \b | Between word and non-word char |
| Capture group | (...) | Groups and captures |
| Non-capture | (?:...) | Groups without capturing |
| Lookahead | (?=...) | Asserts what follows |
| Neg. lookahead | (?!...) | Asserts what does NOT follow |
| Lookbehind | (?<=...) | Asserts what precedes |
| OR | `a | b` |
Reading about regex and writing regex are two different skills. The fastest way to learn is to write a pattern, test it against sample text, and iterate. The Regex Tester tool breaks down your pattern visually and shows matches in real time, which makes it much easier to see what's happening than staring at the pattern in your head.
For more complex scripting with regex, the Code Playground lets you run JavaScript, Python, or any other language with full regex support — useful for testing backreferences and replacements in the context of actual code.
Regex isn't something you memorize once and know forever. It's something you look up, test, adjust, and look up again. That's fine. That's how everyone uses it, including people who've been writing regex for 20 years. Keep this cheat sheet in your bookmarks, and the next time you're staring at a wall of text that needs parsing, you'll know exactly where to start.