# The 20 Regex Patterns I Actually Use (After Mass-Deleting the Other 200)
💡 Key Takeaways
- My Journey from Regex Maximalist to Minimalist
- That Time Regex Almost Took Down Our API
- Breaking Down What Actually Matters
- The 20 Patterns That Survived the Purge
I once wrote an 847-character regex for email validation. It took three hours of my life I'll never get back, complete with nested lookaheads, character class exceptions, and enough backslashes to make my eyes water. I was so proud of it. I posted it in our team Slack with a smug "This handles ALL edge cases" message.
Then someone linked me to RFC 5322.
For those blissfully unaware, RFC 5322 is the official email address specification. The actual, complete regex pattern that validates every technically-legal email address is over 6,000 characters long. It includes things like comments in parentheses, quoted strings with escaped characters, and domain literals in square brackets. Technically, `"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com` is a valid email address according to the spec.
I stared at my 847-character pattern. Then at the RFC. Then back at my pattern. Then I did what any reasonable developer would do: I replaced it with `/.+@.+\..+/` and moved on with my life. Because —nobody actually uses those edge cases. And if they do, they deserve whatever breaks.
That was five years ago. Since then, I've written hundreds of regex patterns. I've debugged regex that made senior developers weep. I've optimized patterns that were causing production slowdowns. And through all of it, I've learned something crucial: most regex patterns are garbage you'll never need.
My Journey from Regex Maximalist to Minimalist
I used to collect regex patterns like some people collect stamps. I had a massive `regex-library.js` file with patterns for everything imaginable. IPv6 addresses with zone IDs. Credit card numbers with Luhn algorithm validation. URLs that handled every obscure protocol. Social security numbers with area number validation from the 1930s.
The file was 3,200 lines long. I was convinced I was building something valuable—a comprehensive library that would save me time on every project. I even started writing documentation for it, complete with examples and performance benchmarks.
Then I switched jobs.
At my new company, I tried to import my beloved regex library into our codebase. The senior architect took one look at it during code review and asked a simple question: "Which of these have you actually used in the last six months?"
I went through the file with a highlighter. Out of 200+ patterns, I'd used maybe 15. The rest were "just in case" patterns—solutions looking for problems. Patterns I'd written because they were intellectually interesting, not because they solved real issues.
That's when I started my great regex purge. I went through every pattern and asked: "Have I needed this in production? Not 'might I need it someday,' but have I actually needed it?" If the answer was no, it got deleted. No mercy. No "but what if" exceptions.
The file went from 3,200 lines to 400. Then to 200. Then to about 100 lines containing 20 patterns that I actually use regularly. And you know what? I've never once missed the other 180 patterns. Not even a little bit.
That Time Regex Almost Took Down Our API
Let me tell you about the worst production incident I've ever caused with regex. We had an API endpoint that accepted user-generated content—basically a notes field where users could write whatever they wanted. Simple enough, right?
Except we wanted to detect and auto-link URLs in the text. So I wrote what I thought was a clever regex pattern that would match URLs while avoiding false positives. It had lookaheads to check for valid protocols, character classes for domain names, optional port numbers, path segments, query parameters, and fragment identifiers. It was beautiful. It was comprehensive. It was a catastrophic mistake.
The pattern worked fine in testing. I threw various URLs at it, and it handled them perfectly. I was feeling pretty good about myself when we deployed to production on a Friday afternoon. (Yes, I know. Never deploy on Friday. I learned that lesson the hard way.)
Within an hour, our API response times went from 50ms to 30 seconds. Then timeouts started happening. Our monitoring lit up like a Christmas tree. Users were complaining. My phone was ringing. It was bad.
The culprit? A user had pasted a long string of text that happened to contain patterns that triggered catastrophic backtracking in my regex. The regex engine was trying every possible combination of matches, and with a 5,000-character input string, that meant billions of attempts. Each request was pegging a CPU core at 100% for 30+ seconds before timing out.
We rolled back immediately, and I spent the weekend rewriting that pattern. The new version was simpler, less "clever," and had explicit limits on repetition. It didn't catch every possible URL format—it caught the 99.9% of URLs that people actually use. And it ran in microseconds instead of seconds.
That incident taught me something crucial: regex complexity is a liability, not an asset. The fancier your pattern, the more likely it is to bite you in production. Simple patterns that handle common cases are almost always better than complex patterns that handle every edge case.
Breaking Down What Actually Matters
After years of writing regex and learning from my mistakes, I've developed a simple framework for deciding which patterns are worth keeping. It comes down to three criteria:
Frequency: Do I use this pattern at least once a month? If not, I can Google it when I need it. There's no point memorizing or maintaining patterns for rare use cases. Reliability: Does this pattern work consistently across different regex engines? JavaScript, Python, and Go all have slightly different regex implementations. Patterns that rely on fancy features might not be portable. Performance: Does this pattern run in linear time, or can it trigger catastrophic backtracking? I've learned to be paranoid about nested quantifiers and overlapping alternatives.Using these criteria, most patterns don't make the cut. That fancy regex for parsing ISO 8601 dates with timezone offsets and week numbers? Fails the frequency test—I need it maybe twice a year, and when I do, I can look it up. The pattern for validating IBAN bank account numbers? Fails the reliability test—it's so complex that I don't trust myself to maintain it. The recursive pattern for matching nested parentheses? Fails the performance test—it's a backtracking nightmare waiting to happen.
What's left are patterns that are simple, fast, and solve problems I encounter regularly. They're not the most interesting patterns. They're not the ones that make you feel clever. But they're the ones that actually matter.
The best regex pattern is the one you can understand six months later at 2 AM when production is down and you're trying to figure out why user input is breaking your validation.
The 20 Patterns That Survived the Purge
Here's the complete list of regex patterns I actually use, organized by category. These are the survivors—the patterns that proved their worth through repeated use in real projects.
| Pattern | Use Case | Frequency | Notes |
|---|---|---|---|
/^\s+|\s+$/g |
Trim whitespace | Daily | Yes, I know.trim() exists, but this works in more contexts |
/\s+/g |
Normalize whitespace | Daily | Replace multiple spaces with single space |
/[^a-z0-9]/gi |
Strip special chars | Weekly | For slugs, usernames, etc. |
/^[a-z0-9_-]{3,16}$/i |
Username validation | Weekly | Alphanumeric, underscore, hyphen, 3-16 chars |
/^.{8,}$/ |
Password length | Weekly | At least 8 characters, that's it |
/.+@.+\..+/ |
Email validation | Weekly | Good enough for 99.9% of cases |
/^https?:\/\//i |
Check for URL protocol | Weekly | Just http or https, nothing fancy |
/\d+/g |
Extract numbers | Daily | Simple and fast |
/^\d+$/ |
Validate numeric input | Weekly | Only digits, nothing else |
/^[0-9]{4}-[0-9]{2}-[0-9]{2}$/ |
Date format YYYY-MM-DD | Monthly | Format check only, not validation |
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/i |
Hex color codes | Monthly | With or without hash |
/\$\{([^}]+)\}/g |
Template variables | Monthly | Match ${variable} patterns |
//g |
HTML comments | Monthly | For stripping comments |
/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/ |
IPv4 addresses | Monthly | Format check, not range validation |
/^[a-z0-9-]+$/i |
Slug validation | Weekly | Lowercase, numbers, hyphens only |
/\r?\n/g |
Line breaks | Weekly | Handle both Unix and Windows |
/[<>]/g |
Basic XSS prevention | Weekly | Strip angle brackets, not comprehensive |
/^\/.\/[gimuy]$/ |
Detect regex literals | Rarely | For parsing user input |
/\b[A-Z]{2,}\b/g |
Find acronyms | Rarely | Two or more consecutive capitals |
/(\w+)=(['"])(.?)\2/g |
Parse attributes | Rarely | Match key="value" or key='value' |
Notice what's missing from this list? No credit card validation. No phone number parsing. No complex URL validation. No social security numbers. No postal codes. All of those patterns exist, and they're all more trouble than they're worth.
For credit cards, I use Stripe's validation library. For phone numbers, I use libphonenumber. For URLs, I use the browser's built-in URL parser. For everything else, I either use a specialized library or I keep the validation simple and handle edge cases in application logic.
🛠 Explore Our Tools
Why "Good Enough" Beats "Technically Correct"
There's a common trap that developers fall into with regex: the pursuit of technical perfection. We want patterns that handle every edge case, every obscure format, every possible variation. It feels like good engineering.
It's not.
Let me give you a concrete example. Email validation. The technically correct regex for email addresses according to RFC 5322 is a monster. It allows quoted strings, comments, IP addresses in brackets, and all sorts of weird stuff. But here's the reality: if someone tries to use an email address like `"john..doe"@example.com`, your application is probably going to have problems anyway.
Most email services don't accept these edge case formats. Most users don't use them. Most importantly, most of the time when someone enters a weird email address, it's a typo, not an intentional use of an obscure RFC feature.
So my email validation pattern is `/.+@.+\..+/`. That's it. Something, then an at sign, then something, then a dot, then something. It catches 99.9% of valid emails and rejects obvious garbage. If someone has an email address that doesn't match this pattern, they can contact support, and we'll handle it manually.
This approach has served me well across dozens of projects. I've had exactly zero complaints about email validation being too strict. Meanwhile, I've seen other developers spend hours debugging complex email regex patterns that were causing false negatives or, worse, performance issues.
The goal of validation isn't to be technically perfect according to some RFC. The goal is to catch obvious errors while letting legitimate users through. Simple patterns do this better than complex ones.
The same principle applies to almost every validation scenario. For URLs, I don't try to validate the entire URL structure. I just check if it starts with `http://` or `https://`. If someone wants to use an FTP URL or a data URI, they can, but I'm not going to complicate my regex to handle those cases.
For usernames, I don't try to support every possible Unicode character. I stick to alphanumeric characters, underscores, and hyphens. If someone wants emoji in their username, that's a product decision, not a regex decision.
For passwords, I don't enforce complex character requirements with regex. I just check the length. Study after study has shown that length matters more than complexity, and trying to enforce "must have uppercase, lowercase, number, and special character" with regex is a nightmare.
Common Regex Mistakes I See in Code Reviews
After reviewing hundreds of pull requests over the years, I've noticed the same regex mistakes appearing over and over. These aren't syntax errors—they're conceptual mistakes that lead to bugs, performance issues, or maintenance nightmares.
Mistake #1: Using regex when you don't need it. I see this constantly. Someone needs to check if a string starts with "http", so they write `/^http/.test(str)`. Just use `str.startsWith('http')`. It's faster, more readable, and less error-prone. Regex is a tool, not a religion. Mistake #2: Not escaping special characters. Someone needs to match a literal dot, so they write `/./` and wonder why it matches every character. Or they need to match a dollar sign and write `/$100/` and get confused when it doesn't work. Special characters need escaping: `\.`, `\$`, `\`, etc. Mistake #3: Forgetting about anchors. A pattern like `/\d{4}/` will match "1234" anywhere in the string, including in the middle of "abc12345def". If you want to match exactly four digits and nothing else, you need anchors: `/^\d{4}$/`. This is probably the most common source of validation bugs I see. Mistake #4: Catastrophic backtracking. Patterns like `/(a+)+b/` or `/(.)$/` can cause exponential time complexity. The regex engine tries every possible way to match the pattern, and with nested quantifiers, that's a lot of possibilities. Always be suspicious of nested quantifiers. Mistake #5: Not considering Unicode. A pattern like `/\w+/` matches word characters, but "word character" means different things in different contexts. In JavaScript, `\w` matches `[A-Za-z0-9_]`, which doesn't include accented characters or non-Latin scripts. If you need to match international text, you need to think carefully about character classes. Mistake #6: Trying to parse HTML with regex. This deserves its own section, but I'll keep it brief: don't parse HTML with regex. HTML is not a regular language. Nested tags, attributes, comments, CDATA sections—regex can't handle it reliably. Use a proper HTML parser. I've seen so many bugs caused by someone trying to extract data from HTML with regex. Mistake #7: Not testing with real data. Someone writes a pattern, tests it with a few examples, and ships it. Then it breaks in production because real user input is messier than test data. Always test regex patterns with actual production data, including edge cases and malformed input.The best way to avoid regex mistakes is to use less regex. Every pattern you don't write is a pattern that can't have bugs.
Practical Guidelines for Regex in Production
Based on my experience and my mistakes, here are the rules I follow when writing regex for production code:
- Start simple and only add complexity when necessary. Begin with the simplest pattern that could work. If it's too permissive, tighten it. Don't start with a complex pattern and try to simplify it—you'll never know if you're breaking edge cases.
- Always use anchors for validation. If you're validating that an entire string matches a pattern, use `^` and `$`. Without anchors, your pattern might match a substring and give false positives.
- Limit repetition explicitly. Instead of `.
- Test with malicious input. Don't just test happy paths. Try to break your pattern. Long strings, repeated characters, special characters, Unicode, empty strings—throw everything at it and see what happens.
- Document what you're matching and why. Future you (or your teammates) will thank you. A comment like "// Match YYYY-MM-DD format, doesn't validate actual dates" is incredibly helpful.
- Use named capture groups when extracting data. Instead of `/(\\d{4})-(\\d{2})-(\\d{2})/`, use `/(?
\\d{4})-(? \\d{2})-(? \\d{2})/`. It makes the code self-documenting and prevents bugs from reordering groups.
- Consider alternatives to regex. String methods, parsers, validation libraries—there are often better tools for the job. Regex should be your tool of choice for pattern matching, not your only tool.
- Keep patterns short. If your regex is more than one line, it's probably too complex. Break it into multiple simpler patterns or use a different approach entirely.
- Avoid regex for parsing structured data. JSON, XML, CSV, HTML—these all have proper parsers. Don't try to extract data from them with regex. It seems convenient until it breaks.
- Profile performance with realistic data. A pattern that runs in microseconds with 10-character strings might take seconds with 10,000-character strings. Test with production-scale data before deploying.
These guidelines have saved me from countless bugs and performance issues. They're not theoretical best practices—they're lessons learned from real production incidents.
The Cheat Sheet That Lives on My Second Monitor
Here's the actual cheat sheet I keep visible while coding. It's not comprehensive—it's practical. These are the patterns and techniques I reference constantly.
Quick Reference Patterns:```
Email (good enough): /.+@.+\..+/
Username (safe): /^[a-z0-9_-]{3,16}$/i
Slug (URL-safe): /^[a-z0-9-]+$/i
Hex color: /^#?([a-f0-9]{6}|[a-f0-9]{3})$/i
IPv4 (format only): /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
Date YYYY-MM-DD: /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/
```
Common Character Classes:```
\d Digit [0-9]
\w Word character [A-Za-z0-9_]
\s Whitespace (space, tab, newline)
. Any character except newline
\b Word boundary
```
Quantifiers (use sparingly):```
0 or more (avoid in production)+ 1 or more (avoid in production)
? 0 or 1
{n} Exactly n times
{n,} At least n times
{n,m} Between n and m times (prefer this)
```
Anchors (always use for validation):```
^ Start of string
$ End of string
\b Word boundary
```
Flags I actually use:```
g Global (find all matches)
i Case-insensitive
m Multiline (^ and $ match line breaks)
```
Quick Escaping Reference:```
Need to match literally:.
+ ? ^ $ { } [ ] ( ) | \Escape with backslash: \. \ \+ \? \^ \$ \{ \} \[ \] \( \) \| \\
```
Performance Checklist:- [ ] No nested quantifiers like `(.)` or `(a+)+`
- [ ] Explicit upper bounds on repetition `{1,100}` not `+` or ``
- [ ] Anchors used for validation `^...$`
- [ ] Tested with long strings (10,000+ characters)
- [ ] No catastrophic backtracking patterns
- Parsing JSON, XML, HTML, or other structured formats
- Validating complex formats (use libraries instead)
- When a simple string method would work (startsWith, includes, etc.)
- When you need to maintain state across matches
- When the pattern is longer than 2 lines
- Test the pattern at regex101.com with real data
- Check for catastrophic backtracking with long inputs
- Verify anchors are correct for the use case
- Test with edge cases: empty string, very long string, special characters
- Add explicit length limits if using `*` or `+`
- Document what the pattern matches and what it doesn't
This cheat sheet isn't fancy. It doesn't cover every regex feature. It doesn't explain lookaheads or backreferences or atomic groups. That's intentional. These are the patterns and techniques that solve 95% of my regex needs. The other 5%? I Google them when I need them, use them once, and forget them again.
The key insight from my journey from 200 patterns to 20 is this: regex is a tool for pattern matching, not a programming language. The simpler your patterns, the more reliable your code. The fewer patterns you memorize, the more mental energy you have for actual problem-solving.
I still have that old 847-character email validation pattern saved somewhere. I look at it occasionally as a reminder of how far I've come. It's a monument to over-engineering, a testament to the trap of pursuing technical perfection over practical utility.
These days, when I need to validate an email, I use `/.+@.+\..+/` and move on with my life. It's not perfect. It's not comprehensive. It's not technically correct according to RFC 5322. But it works, it's fast, and I can understand it at 2 AM when production is down.
And really, that's all that matters.
Disclaimer: This article is for informational purposes only. While we strive for accuracy, technology evolves rapidly. Always verify critical information from official sources. Some links may be affiliate links.