Skip to content
useToolz online tools
HTML Entities: Encoding Special Characters
Development

HTML Entities: Encoding Special Characters

Александр Михеев

Александр Михеев

09 September 2024 · 4 min read

Anyone who has worked with HTML code has eventually encountered constructs like &, <, or ©. These are HTML entities — special character sequences that allow browsers to correctly display characters that have special meaning in markup. In this article, we'll explore how they work, why they're needed, and when you should encode text.

What Are HTML Entities

An HTML entity is a way to represent a character using a special text notation. When the browser encounters such notation, it replaces it with the corresponding character during page rendering. Entities start with an ampersand & and end with a semicolon ;.

Why is this needed? In HTML, certain characters are reserved. For example, angle brackets < and > denote the start and end of tags. If you want to show the user the text <div> as-is rather than creating a div element, you need to encode the brackets: &lt;div&gt;.

Named and Numeric Entities

There are two formats for writing HTML entities:

  • Named entities — use a human-readable text name: &amp; for ampersand, &lt; for the less-than sign, &copy; for the copyright symbol. They're easy to remember and read in code.
  • Numeric entities — use the character's code from the Unicode table. They come in decimal (&#38;) and hexadecimal (&#x26;) forms. The numeric format is universal and works for any character, including emoji.

Named entities are not available for all characters — there are only a few hundred of them. Numeric entities, however, cover the entire Unicode range, so you can encode absolutely any character with them.

Why Encode Characters

Encoding special characters serves several purposes:

  • Correct display. Without encoding, the browser may misinterpret reserved characters. An unencoded < in text will "break" the markup: the browser will think a new tag is starting.
  • Protection against XSS attacks. If user input is displayed on a page without encoding, an attacker can inject malicious JavaScript. Encoding the characters <, >, &, ", and ' neutralizes such attacks — the script becomes harmless text.
  • Characters not on the keyboard. Entities make it easy to insert special characters: em dash (&mdash;), guillemets (&laquo; &raquo;), euro sign (&euro;), arrows, and much more.

Most Commonly Used Entities

Here is a list of HTML entities that developers encounter most frequently:

  • &amp; — ampersand (&)
  • &lt; — less-than sign (<)
  • &gt; — greater-than sign (>)
  • &quot; — double quote (")
  • &apos; — single quote (')
  • &nbsp; — non-breaking space
  • &copy; — copyright sign (©)
  • &mdash; — em dash (—)
  • &laquo; and &raquo; — guillemets (« »)
  • &hellip; — ellipsis (…)

The first five on this list are mandatory to encode when outputting user data.

When to Encode and When to Decode

Encoding is necessary when outputting text into HTML markup. Any content coming from users (forms, comments, search queries) must pass through an encoding function before being inserted into HTML. In PHP, htmlspecialchars() is used for this; in JavaScript, you create a text node via document.createTextNode().

Decoding is needed in the reverse situation: when you receive HTML text and want to extract a "clean" string from it. For example, when parsing web pages or processing RSS feeds, content may be double-encoded. In PHP, decoding is performed with the html_entity_decode() function.

Common Mistakes

The most frequent issue is double encoding, where &amp; turns into &amp;amp;. This happens when text is encoded repeatedly, for example when saving to a database and then outputting again. Another mistake is forgetting to specify the encoding in htmlspecialchars(): without the ENT_QUOTES parameter, single quotes will remain unencoded, which can open the door to XSS vulnerabilities.

Conclusion

HTML entities are a fundamental web development mechanism that ensures security and correct content display. Knowing the basic entities and encoding rules helps avoid markup errors and protect your site from attacks. Don't neglect this simple but important tool.

You can quickly encode or decode HTML entities using our HTML entity encoder. If you also need to work with URL encoding, check out our URL encoder.

Понравилась статья?

Оцените — это помогает нам делать контент лучше

Change rating

Your rating:

Thanks for your rating!

Comments

Log in to leave a comment

No comments yet. Be the first!

We use cookies for site operation and analytics. Подробнее

Upscaled image
Download

Log in to continue

or