Skip to content
useToolz online tools
Regular Expressions: A Guide with Examples
Development

Regular Expressions: A Guide with Examples

Александр Михеев

Александр Михеев

02 August 2024 · 5 min read

What Are Regular Expressions

Regular expressions (regex) are a pattern language for searching and processing text. They allow you to find substrings based on complex rules, validate data formats, extract information, and perform replacements. Regular expressions are supported by virtually all programming languages: JavaScript, Python, PHP, Java, Go, C#, and others.

A regular expression is a string consisting of literal characters and metacharacters (characters with special meaning). For example, the expression \d{3}-\d{2}-\d{2} describes a phone number pattern in the format 123-45-67.

Basic Syntax

Here are the core elements of regular expression syntax:

  • . — any single character (except newline).
  • \d — any digit (equivalent to [0-9]).
  • \w — a letter, digit, or underscore (equivalent to [a-zA-Z0-9_]).
  • \s — a whitespace character (space, tab, newline).
  • ^ — start of string.
  • $ — end of string.
  • * — zero or more repetitions of the preceding element.
  • + — one or more repetitions.
  • ? — zero or one repetition (the element is optional).
  • {n} — exactly n repetitions.
  • {n,m} — from n to m repetitions.
  • [abc] — any of the characters in the brackets.
  • [^abc] — any character except those listed.
  • (group) — a capture group.
  • | — logical OR.

Common Patterns

Email Validation

A basic pattern for validating an email address:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This pattern checks for characters before the @, a domain name, and a top-level domain (at least 2 characters). For production code, it is recommended to use stricter validation libraries, since the full regex for email per RFC 5322 is extremely complex.

Phone Number Validation

A Russian phone number in the format +7 (XXX) XXX-XX-XX:

^\+7\s?\(?\d{3}\)?\s?\d{3}[-\s]?\d{2}[-\s]?\d{2}$

URL Validation

Basic HTTP/HTTPS URL check:

^https?:\/\/[\w\-]+(\.[\w\-]+)+[\/\w\-._~:?#\[\]@!$&'()*+,;=%]*$

IPv4 Address

Validating a correct IPv4 address (from 0.0.0.0 to 255.255.255.255):

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

Regular Expression Flags

Flags modify the behavior of the entire regular expression:

  • g (global) — find all matches, not just the first one.
  • i (case-insensitive) — ignore character case.
  • m (multiline) — ^ and $ work per line, not for the entire text.
  • s (dotAll) — the dot . matches any character, including newlines.
  • u (unicode) — correct handling of Unicode characters (important for Cyrillic).

Capture Groups and Backreferences

Parentheses () create capture groups — you can extract part of a match:

(\d{2})\.(\d{2})\.(\d{4}) — for a date in DD.MM.YYYY format, you get the day, month, and year as separate groups.

Named groups make code more readable:

(?<day>\d{2})\.(?<month>\d{2})\.(?<year>\d{4})

A non-capturing group (?:...) is used to group elements without saving them to the result.

Look-ahead and Look-behind

These constructs let you check context without including it in the match result:

  • (?=...) — positive look-ahead. Example: \d+(?= USD) — a number followed by " USD".
  • (?!...) — negative look-ahead. Example: \d+(?! USD) — a number NOT followed by " USD".
  • (?<=...) — positive look-behind. Example: (?<=\$)\d+ — a number preceded by a dollar sign.
  • (?<!...) — negative look-behind. Example: (?<!\$)\d+ — a number NOT preceded by a dollar sign.

Practical Tips

  • Start simple. Build your regex step by step, testing each part. Do not try to write the perfect expression in one go.
  • Use a tester. Our regular expression tester lets you check patterns in real time with match highlighting.
  • Be careful with greedy quantifiers. .* captures the longest possible substring by default. Use the lazy variant .*? when you need the shortest match.
  • Do not reinvent the wheel. For validating email, URLs, and other standard formats, use proven libraries rather than custom regex.
  • Comment complex expressions. Use the x flag (verbose mode) to add spaces and comments inside your regex.

Frequently Asked Questions

Where are regular expressions most commonly used?

In user input validation (website forms), search and replace in text editors (VS Code, Sublime Text), log and data parsing, URL routing in web frameworks, and string processing in scripts. Regex is also useful when working with JSON data to search for specific patterns.

Can regular expressions be slow?

Yes, poorly written regex can cause catastrophic backtracking, where execution time grows exponentially. Avoid nested quantifiers like (a+)+ and test performance on long strings. This is especially important on the server side, where regex processes user input.

How do you work with Cyrillic in regex?

Use the Unicode range [\u0430-\u044f\u0410-\u042f\u0451\u0401] or the Unicode category \p{Cyrillic} (if the engine supports Unicode properties). In JavaScript, do not forget the u flag. The \w class does not include Cyrillic by default.

How can I test a regular expression online?

Open our online regex tester, enter your expression and test text. The tool highlights all matches in real time, shows capture groups, and explains errors. It is free and requires no registration.

Понравилась статья?

Оцените — это помогает нам делать контент лучше

Change rating

Your rating:

Thanks for your rating!

Comments

Log in to leave a comment

No comments yet. Be the first!

We use cookies for site operation and analytics. Подробнее

Upscaled image
Download

Log in to continue

or