Test and debug regular expressions in real-time with our free online regex tester. Syntax highlighting, match explanations, common patterns library, and interactive tutorials.
There's a moment every developer recognizes. You're staring at a string of text — maybe a log file, maybe user input from a form, maybe a CSV dump from a database — and you need to extract something specific from it. A date buried in a paragraph. An email address hidden in a wall of HTML. A phone number in one of twelve different formats.
You could write a dozen if statements with indexOf and substring. Or you could write a single regular expression that handles all of it in one line.
Regular expressions are one of the most powerful tools in a developer's arsenal, yet they remain one of the most misunderstood. The syntax looks alien at first glance. But here's what nobody tells you: regex is a small language. The entire thing fits in your short-term memory once you've practiced it for a weekend.
This tutorial walks you through regex from the ground up — from matching literal text to writing patterns with lookaheads and capture groups. Every example is something you can paste directly into our free regex tester and experiment with in real time.
A regular expression (regex) is a sequence of characters that defines a search pattern. Think of it as a mini programming language specifically designed for finding, matching, and manipulating text.
Every modern programming language supports regex: JavaScript, Python, Java, Go, Ruby, PHP, C#, Rust. The core syntax is nearly identical across all of them. Learn it once, use it everywhere.
Regex is used for:
The simplest regex is just plain text. The pattern hello matches the string "hello" wherever it appears. Nothing magical about that.
Things get interesting when you add metacharacters — characters with special meaning in regex:
. Matches any single character (except newline)
^ Matches the start of a string
$ Matches the end of a string
\ Escapes a metacharacter (makes it literal)
Try this in the regex tester:
Pattern: c.t
Test string: "cat cot cut cart"
The . matches any single character, so c.t matches "cat", "cot", and "cut" — but not "cart" because there are two characters between "c" and "t".
Character classes let you match any one character from a specific set:
[abc] Matches a, b, or c
[a-z] Matches any lowercase letter
[A-Z] Matches any uppercase letter
[0-9] Matches any digit
[a-zA-Z] Matches any letter
[^abc] Matches anything EXCEPT a, b, or c
Regex also provides shorthand character classes that you'll use constantly:
\d Any digit (same as [0-9])
\w Any word character (same as [a-zA-Z0-9_])
\s Any whitespace (space, tab, newline)
\D Any non-digit
\W Any non-word character
\S Any non-whitespace
Example: To match a US zip code (5 digits), you'd write:
\d{5}
That's it. Five digits. Try it against "My zip is 90210 and yours is 10001" in the regex tester — you'll see both matches highlighted.
Quantifiers specify how many times the preceding element should appear:
* Zero or more
+ One or more
? Zero or one (optional)
{3} Exactly 3
{2,5} Between 2 and 5
{3,} 3 or more
Combining character classes with quantifiers is where regex starts to feel powerful:
\d+ One or more digits (matches "42", "7", "10050")
[a-z]+ One or more lowercase letters
\w{3,8} A word between 3 and 8 characters
Let's build up to real-world patterns. These are the ones developers reach for most often.
A practical email regex that covers the vast majority of valid addresses:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breaking it down:
^ — start of string[a-zA-Z0-9._%+-]+ — one or more valid characters in the local part@ — literal @ symbol[a-zA-Z0-9.-]+ — one or more valid characters in the domain\. — literal dot (escaped because . is a metacharacter)[a-zA-Z]{2,} — top-level domain with at least 2 letters$ — end of stringA word of caution: No regex can fully validate an email address according to the RFC spec. The pattern above handles 99% of real-world addresses. For production systems, send a confirmation email — that's the only true validation.
https?:\/\/[^\s/$.?#].[^\s]*
This matches URLs starting with http:// or https:// (the s? makes the "s" optional) followed by any non-whitespace characters. It's intentionally loose — URLs come in too many shapes for a strict pattern to be practical.
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
This handles multiple formats:
(555) 123-4567555-123-4567555.123.45675551234567The \(? and \)? make the parentheses optional. The [-.\s]? allows a dash, dot, space, or nothing between groups.
\b(0[1-9]|1[0-2])[\/\-](0[1-9]|[12]\d|3[01])[\/\-](19|20)\d{2}\b
This validates dates with some basic range checking — months 01-12, days 01-31, years 1900-2099. The \b word boundaries prevent matching partial numbers.
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
This properly validates each octet from 0 to 255. A simpler but less accurate version would be \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} — but that would match invalid addresses like 999.999.999.999.
Here's a condensed reference you can come back to. Bookmark this section.
| Pattern | Meaning |
|---|---|
^ | Start of string (or line with m flag) |
$ | End of string (or line with m flag) |
\b | Word boundary |
\B | Non-word boundary |
| Pattern | Meaning |
|---|---|
[abc] | Any of a, b, c |
[^abc] | Not a, b, or c |
[a-z] | Range: a through z |
. | Any character except newline |
\d / \D | Digit / non-digit |
\w / \W | Word char / non-word char |
\s / \S | Whitespace / non-whitespace |
| Pattern | Meaning |
|---|---|
* | 0 or more (greedy) |
+ | 1 or more (greedy) |
? | 0 or 1 (optional) |
{n} | Exactly n |
{n,m} | Between n and m |
{n,} | n or more |
*? / +? | Lazy versions (match as few as possible) |
| Pattern | Meaning |
|---|---|
(abc) | Capture group |
(?:abc) | Non-capturing group |
\1 | Backreference to group 1 |
(a|b) | Alternation (a or b) |
| Pattern | Meaning |
|---|---|
(?=abc) | Positive lookahead |
(?!abc) | Negative lookahead |
(?<=abc) | Positive lookbehind |
(?<!abc) | Negative lookbehind |
Lookaheads and lookbehinds check for a pattern without including it in the match. For example, \d+(?= dollars) matches the number in "100 dollars" but not the word "dollars" itself.
Parentheses do two things in regex: they group elements together, and they capture what they match so you can use it later.
Pattern: (\d{4})-(\d{2})-(\d{2})
Test: "Date: 2026-03-28"
This creates three capture groups:
2026 (year)03 (month)28 (day)In JavaScript, you'd access them like this:
const match = "2026-03-28".match(/(\d{4})-(\d{2})-(\d{2})/);
// match[1] = "2026", match[2] = "03", match[3] = "28"Named capture groups make your code even more readable:
const match = "2026-03-28".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
// match.groups.year = "2026"If you just need grouping without capturing (for performance or clarity), use (?:...):
(?:https?|ftp):\/\/
This groups https? and ftp for the alternation, but doesn't create a capture group.
Every web form needs input validation. Regex handles the pattern-matching part:
const patterns = {
email: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
phone: /^\+?[\d\s\-()]{7,15}$/,
zipCode: /^\d{5}(-\d{4})?$/,
username: /^[a-zA-Z0-9_]{3,20}$/,
};Server logs follow predictable patterns. Regex can extract structured data from them:
Pattern: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)
Applied to a log line like 2026-03-28 14:30:05 [ERROR] Database connection timeout, this captures the timestamp, log level, and message as separate groups.
Regex-powered find-and-replace is one of the most time-saving features in any code editor. For example, converting function calls from one format to another:
Find: console\.log\((['"`])(.+?)\1\)
Replace: logger.info($2)
This converts console.log("message") to logger.info(message) while handling different quote styles.
When parsing HTML or semi-structured text, regex helps extract specific pieces:
Pattern: <title>(.*?)<\/title>
This extracts the content between <title> tags. The .*? uses a lazy quantifier — matching as few characters as possible — which prevents it from matching across multiple tags.
For complex HTML parsing, a proper DOM parser is usually better. But for quick extraction tasks, regex is fast and effective.
Don't try to write the perfect pattern on the first attempt. Start with a basic version that matches your target, then add constraints to eliminate false positives.
For matching a date, start with \d{2}/\d{2}/\d{4} before adding range validation for months and days.
Replace (http|https) with (?:http|https) or even simpler, https?. Unnecessary capture groups waste memory and make your code harder to read.
When extracting content between delimiters, use .*? (lazy) instead of .* (greedy):
Greedy: <tag>(.*) </tag> — may match across multiple tags
Lazy: <tag>(.*?)</tag> — matches the shortest possible content
Most regex engines support flags that change matching behavior:
g (global) — find all matches, not just the firsti (case-insensitive) — a matches Am (multiline) — ^ and $ match line boundaries, not just string boundariess (dotAll) — . matches newline characters tooIn languages that support the verbose flag (Python's re.VERBOSE, for example), break your regex across multiple lines with comments:
pattern = re.compile(r"""
^ # Start of string
[a-zA-Z0-9._%+-]+ # Local part of email
@ # @ symbol
[a-zA-Z0-9.-]+ # Domain name
\.[a-zA-Z]{2,} # Top-level domain
$ # End of string
""", re.VERBOSE)The most common regex bug is greedy matching. Given the text <b>bold</b> and <b>more bold</b>, the pattern <b>.*</b> matches the entire string from the first <b> to the last </b>. Use <b>.*?</b> to match each bold section individually.
Characters like ., *, +, ?, (, ), [, ], {, }, \, ^, $, and | have special meaning. To match them literally, escape with a backslash: \. matches an actual period, \$ matches a dollar sign.
A common mistake: trying to match file.txt with file.txt — the unescaped . matches any character. Use file\.txt instead.
Some regex patterns can take exponentially long to evaluate against certain inputs. This happens when the engine tries too many permutations before failing.
The classic example is (a+)+$ tested against a string like aaaaaaaaaaaaaaaaX. The engine backtracks through billions of combinations before concluding there's no match.
How to avoid it: Don't nest quantifiers (like (a+)+), use atomic groups or possessive quantifiers when available, and always test your patterns against adversarial inputs.
You don't need a regex that validates every possible edge case of the email RFC. You need a regex that catches obvious typos and a confirmation email that handles the rest. Perfect is the enemy of good, especially with regex.
Reading about regex is useful. Practicing it is essential. Our free regex tester lets you:
For testing regex inside actual code — seeing how patterns behave in JavaScript, Python, Go, or any of 50+ other languages — try our code playground. Write a complete script, run it in the browser, and see the output instantly.
And if you want a structured path from beginner to confident, our Learning Hub offers interactive lessons that build your regex skills step by step, with exercises that reinforce each concept before moving to the next.
Regular expressions have a reputation for being cryptic, but that reputation comes from people encountering complex patterns before learning the basics. You wouldn't look at advanced calculus and conclude that all math is impossible — you'd start with arithmetic.
The path to regex fluency is straightforward:
The patterns in this guide cover the vast majority of what you'll encounter in day-to-day development. Paste any of them into the regex tester, modify them, break them, fix them. That's how fluency happens.
Regular expressions are a skill that pays dividends for your entire career. Every language you'll ever work with supports them. Every codebase you'll ever maintain contains them. The afternoon you spend learning regex today will save you hundreds of hours over the years ahead.