XML (eXtensible Markup Language) is one of the oldest and most widely used data exchange formats. Despite the growing popularity of JSON, XML remains the standard in many industries. However, any error in an XML document can cause an application to crash. In this article, we'll explore what XML validation is, what types of checks exist, and how to avoid common mistakes.
What Is XML
XML is an extensible markup language designed for storing and transmitting structured data. Unlike HTML, which describes how content is displayed, XML describes the data itself. You define the tag names and document structure yourself — hence the word "extensible" in the name.
A typical XML document looks like this:
<catalog><book id="1"><title>XML for Beginners</title><price>590</price></book></catalog>
Each element has an opening and closing tag, attribute values are enclosed in quotes, and the entire document forms a tree structure with a single root element.
Well-Formed and Valid XML
When checking an XML document, two levels of correctness are distinguished:
- Well-formed — the document follows the basic syntactic rules of XML: it has a root element, all tags are closed, attributes are quoted, and nesting is correct. This is the minimum requirement for any XML file.
- Valid — the document is not only well-formed but also conforms to a specific schema (DTD or XSD) that defines the allowed elements, attributes, their data types, and the order in which they appear. Schema validation ensures that the data structure meets the application's expectations.
A document can be well-formed but not valid. The reverse is impossible — a syntactically invalid document won't pass even the basic check.
Common XML Errors
Here are the most frequent issues that lead to parsing errors:
- Unclosed tags. Every opening tag
<item>must have a corresponding closing tag</item>, or the tag must be self-closing:<item />. - Incorrect nesting. Tags must be properly nested:
<a><b></b></a>is correct,<a><b></a></b>is an error. - Unquoted attributes. In XML, unlike HTML, attribute values must be enclosed in quotes:
id="1", notid=1. - Unencoded special characters. Ampersands
&, angle brackets, and quotes within element content must be encoded using entities (&,<,>). - Missing root element. An XML document must have exactly one root element that contains all other elements.
- Encoding issues. A mismatch between the encoding declared in the XML declaration and the actual file encoding is a common source of errors when working with non-ASCII characters.
XML vs JSON
Today JSON dominates web development, but XML still holds its ground in several areas. Let's compare both formats:
- Readability. JSON is more compact and easier to read for simple structures. XML is more verbose but better suited for complex hierarchies.
- Validation schemas. XML has powerful validation mechanisms (DTD, XSD, Relax NG). JSON Schema appeared later and is less widespread.
- Attributes and metadata. XML supports element attributes, namespaces, and comments. JSON only works with key-value pairs.
- Ecosystem support. JSON is native to JavaScript and most modern APIs. XML is the standard for SOAP, RSS, SVG, XSLT, and many enterprise systems.
Where XML Is Still Essential
Despite JSON's popularity, XML remains indispensable in the following areas:
- SOAP services — a web services protocol widely used in banking and government systems.
- RSS and Atom — news feed formats are still based on XML.
- SVG — the vector graphics format is a dialect of XML.
- Configuration files — Maven (pom.xml), Android (AndroidManifest.xml), .NET (web.config), and many other tools use XML.
- Document formats — DOCX, XLSX, and ODT files contain XML internally.
DTD and XSD Schemas
DTD (Document Type Definition) is an older format for describing XML structure. It is simple but limited: it doesn't support data types or namespaces. XSD (XML Schema Definition) is the modern standard that allows you to define data types, value constraints, and complex inheritance structures. XSD is recommended for new projects.
Conclusion
XML validation is a mandatory step when working with this format. Syntax checking catches errors that would break the parser, while schema validation ensures that the data conforms to the expected structure. Don't skip validation — even a single unclosed bracket can cause the entire process to fail.
You can check your XML document for correctness using our XML validator. If you work with JSON, you'll also find our JSON formatter useful.