What Are Regular Expressions
Regular expressions (regex) are a pattern language for searching and processing text. They allow you to find substrings based on complex rules, validate data formats, extract information, and perform replacements. Regular expressions are supported by virtually all programming languages: JavaScript, Python, PHP, Java, Go, C#, and others.
A regular expression is a string consisting of literal characters and metacharacters (characters with special meaning). For example, the expression \d{3}-\d{2}-\d{2} describes a phone number pattern in the format 123-45-67.
Basic Syntax
Here are the core elements of regular expression syntax:
.— any single character (except newline).\d— any digit (equivalent to[0-9]).\w— a letter, digit, or underscore (equivalent to[a-zA-Z0-9_]).\s— a whitespace character (space, tab, newline).^— start of string.$— end of string.*— zero or more repetitions of the preceding element.+— one or more repetitions.?— zero or one repetition (the element is optional).{n}— exactly n repetitions.{n,m}— from n to m repetitions.[abc]— any of the characters in the brackets.[^abc]— any character except those listed.(group)— a capture group.|— logical OR.
Common Patterns
Email Validation
A basic pattern for validating an email address:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This pattern checks for characters before the @, a domain name, and a top-level domain (at least 2 characters). For production code, it is recommended to use stricter validation libraries, since the full regex for email per RFC 5322 is extremely complex.
Phone Number Validation
A Russian phone number in the format +7 (XXX) XXX-XX-XX:
^\+7\s?\(?\d{3}\)?\s?\d{3}[-\s]?\d{2}[-\s]?\d{2}$
URL Validation
Basic HTTP/HTTPS URL check:
^https?:\/\/[\w\-]+(\.[\w\-]+)+[\/\w\-._~:?#\[\]@!$&'()*+,;=%]*$
IPv4 Address
Validating a correct IPv4 address (from 0.0.0.0 to 255.255.255.255):
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
Regular Expression Flags
Flags modify the behavior of the entire regular expression:
g(global) — find all matches, not just the first one.i(case-insensitive) — ignore character case.m(multiline) —^and$work per line, not for the entire text.s(dotAll) — the dot.matches any character, including newlines.u(unicode) — correct handling of Unicode characters (important for Cyrillic).
Capture Groups and Backreferences
Parentheses () create capture groups — you can extract part of a match:
(\d{2})\.(\d{2})\.(\d{4}) — for a date in DD.MM.YYYY format, you get the day, month, and year as separate groups.
Named groups make code more readable:
(?<day>\d{2})\.(?<month>\d{2})\.(?<year>\d{4})
A non-capturing group (?:...) is used to group elements without saving them to the result.
Look-ahead and Look-behind
These constructs let you check context without including it in the match result:
(?=...)— positive look-ahead. Example:\d+(?= USD)— a number followed by " USD".(?!...)— negative look-ahead. Example:\d+(?! USD)— a number NOT followed by " USD".(?<=...)— positive look-behind. Example:(?<=\$)\d+— a number preceded by a dollar sign.(?<!...)— negative look-behind. Example:(?<!\$)\d+— a number NOT preceded by a dollar sign.
Practical Tips
- Start simple. Build your regex step by step, testing each part. Do not try to write the perfect expression in one go.
- Use a tester. Our regular expression tester lets you check patterns in real time with match highlighting.
- Be careful with greedy quantifiers.
.*captures the longest possible substring by default. Use the lazy variant.*?when you need the shortest match. - Do not reinvent the wheel. For validating email, URLs, and other standard formats, use proven libraries rather than custom regex.
- Comment complex expressions. Use the
xflag (verbose mode) to add spaces and comments inside your regex.
Frequently Asked Questions
Where are regular expressions most commonly used?
In user input validation (website forms), search and replace in text editors (VS Code, Sublime Text), log and data parsing, URL routing in web frameworks, and string processing in scripts. Regex is also useful when working with JSON data to search for specific patterns.
Can regular expressions be slow?
Yes, poorly written regex can cause catastrophic backtracking, where execution time grows exponentially. Avoid nested quantifiers like (a+)+ and test performance on long strings. This is especially important on the server side, where regex processes user input.
How do you work with Cyrillic in regex?
Use the Unicode range [\u0430-\u044f\u0410-\u042f\u0451\u0401] or the Unicode category \p{Cyrillic} (if the engine supports Unicode properties). In JavaScript, do not forget the u flag. The \w class does not include Cyrillic by default.
How can I test a regular expression online?
Open our online regex tester, enter your expression and test text. The tool highlights all matches in real time, shows capture groups, and explains errors. It is free and requires no registration.