Skip to content
useToolz online tools
URL Encoding: Why It Is Needed and How It Works
Development

URL Encoding: Why It Is Needed and How It Works

Александр Михеев

Александр Михеев

07 August 2024 · 4 min read

What Is URL Encoding

URL encoding (percent-encoding) is a mechanism for converting characters into a format that is safe for use in URLs. The URL standard (RFC 3986) only allows a limited set of ASCII characters in addresses: Latin letters, digits, and a few special characters (-, _, ., ~). All other characters — including spaces, Cyrillic, and special characters — must be encoded.

An encoded character looks like a percent sign % followed by two hexadecimal digits representing a UTF-8 byte. For example, a space is encoded as %20, and the Cyrillic letter "а" becomes %D0%B0 (two UTF-8 bytes).

Why URL Encoding Is Needed

A URL has a strict structure: protocol, host, path, query parameters, fragment. Special characters serve as delimiters: / separates path segments, ? starts the query string, & separates parameters, and # marks the fragment. If these characters appear in the data (for example, in a parameter value), they must be encoded so the browser and server can parse the URL correctly.

Example of the problem: suppose you want to pass the search query "blue shirt & white" in a URL:

https://shop.example.com/search?q=blue shirt & white

Without encoding, the browser interprets & as a parameter separator and spaces as the end of the URL. The correct encoded URL:

https://shop.example.com/search?q=blue%20shirt%20%26%20white

Which Characters Are Encoded

Characters fall into three groups:

  • Unreserved — do not require encoding: A-Z, a-z, 0-9, -, _, ., ~.
  • Reserved — have special meaning in a URL: :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, =. Encoded when used outside their intended purpose (e.g., inside a parameter value).
  • Everything else — spaces, Cyrillic, CJK characters, special symbols — are always encoded.

URL Encoding in Different Languages

JavaScript

JavaScript has two pairs of functions:

  • encodeURIComponent() / decodeURIComponent() — encodes parameter values. Encodes all special characters except - _ . ! ~ * ' ( ).
  • encodeURI() / decodeURI() — encodes an entire URL. Does not encode delimiter characters (:, /, ?, #, &, =).

Important: use encodeURIComponent() for encoding parameter values and encodeURI() for entire URLs. Confusing the two is a common source of bugs.

PHP

urlencode() encodes a string for use in a query string (spaces are replaced with +). rawurlencode() uses %20 for spaces, which conforms to the RFC 3986 standard.

Python

The urllib.parse module: quote() for encoding path components, quote_plus() for query parameters (space as +).

Common Issues and Solutions

  • Double encoding. If data is encoded twice, %20 turns into %2520. Check whether the data is already encoded before encoding it again.
  • Space: %20 or +? In a query string, both are valid, but %20 is universal and works in any part of a URL. The + as a space is only valid in application/x-www-form-urlencoded.
  • Cyrillic in URLs. Modern browsers display Cyrillic URLs nicely, but internally encode them using percent-encoding. When copying a URL from the address bar, you may get either the Cyrillic or the encoded version.
  • Incorrect encoding. URL encoding assumes UTF-8. If the source data is in a different encoding (e.g., Windows-1251), the result will be incorrect.

URL Encoding and UTM Tags

When creating UTM tags with our UTM tag generator, parameter values are encoded automatically. But if you build a URL manually and the value contains special characters (for example, a campaign name like "50% off + free gift"), be sure to encode it. Otherwise, % and + will be interpreted as control characters.

URL Encoding vs. Base64

Both techniques convert data to text, but for different purposes. URL encoding makes a string safe for use in a URL while keeping ASCII characters readable. Base64 encodes binary data into text for transmission over text-based protocols. To transmit binary data in a URL, the two methods are often combined: first Base64, then URL encoding (or the Base64URL variant is used).

Frequently Asked Questions

Why encode a URL if the browser displays Cyrillic just fine?

The browser displays a decoded URL for user convenience, but when sending a request to the server, the URL is always encoded. Problems arise when you build URLs programmatically (in APIs, scripts, email campaigns) — there, encoding is mandatory.

How do I decode a URL?

Use our URL encoding tool, which supports both encoding and decoding. Every programming language also has the corresponding functions: decodeURIComponent() in JavaScript, urldecode() in PHP.

Can URL encoding be used for security?

No. URL encoding is not a security measure. Attackers can use encoding to bypass filters (for example, encoding SQL injection characters). Always validate and sanitize input data regardless of encoding.

What is the maximum URL length?

The standard does not set a limit, but in practice: Internet Explorer supported up to 2,083 characters, modern browsers up to 65,000. Servers typically limit URLs to 8 KB. Given that URL encoding increases length (one Cyrillic character takes 6-9 characters), this is important to consider when building long URLs.

Понравилась статья?

Оцените — это помогает нам делать контент лучше

Change rating

Your rating:

Thanks for your rating!

Comments

Log in to leave a comment

No comments yet. Be the first!

We use cookies for site operation and analytics. Подробнее

Upscaled image
Download

Log in to continue

or