XML Validation
"Life is like a box of chocolates. You never know what you'll find inside!"
(A famous line from the brilliant Oscar-winning movie "Forrest Gump.")
Yes, life is full of surprises and uncertainties, and this is the reason life has a charm to it. But programmers and software clients don't like surprises… especially the bad kind! And the same holds for information that is processed by programs.
Software programs can work only when they can find structure and patterns in information, and XML was invented to help with that. So, validation of XML documents is also important. In particular, the syntactic structure of XML documents should be the same as expected by a computer program calling it. It is then considered "well-formed." As we saw in an earlier session, correct matching of start and end tags for each element, as well as correct nesting of elements (without overlap) are the main issues here.
Consider, for example, a brothel owner who has to open a box of condoms, check that the contents are as expected – none are damaged or missing, before selling it in his brothel. If we were to describe this scenario in terms of a computer program that is given an XML document, we would say:
- The brothel owner : the browser's XML parser application
- The box of condoms: The XML document. Each type of rubber is an element category in the XML document.
- The brothel: the browser that accesses this XML doc stored on the web server, and displays the contained information on the browser interface.
First, let us define a well-formed XML document that represents the box of condoms:
<?xml version="1.0" encoding="UTF-8" ?>
<box>
<boxOfRubbers>
<rubber id="0">
<rubberName>Ribbed</rubberName> <quantity>4</quantity>
</rubber>
<rubber id="1">
<rubberName>Glow in the Dark</rubberName>
<quantity>3</quantity>
</rubber>
<rubber id="2">
<rubberName>Chocolate Flavored</rubberName>
<quantity>4</quantity>
</rubber>
</boxOfRubbers>
Now, when the brothel owner (browser's XML parser) opens (parses/scans and separates) the box of rubbers (XML document,) he might find one or more of the following inside:
- A few rubbers have melted (similar to no proper nesting of element start- and end- tags.) In other words, our xml is not well formed.
- A few items, (for example all the ribbed rubbers – say THAT 3 times very fast!) are missing. Could have been stolen by a picky hooker. Again, not well formed.
- All types and quantities of condoms advertised on the box cover are intact. In this case, the XML doc is well formed.
If the brothel owner discovers that the box of rubbers is incomplete (not well formed,) then they do not sell them to clients. Instead, they return the box to the store where they bought them, along with some harsh words, and demand a refund. Similarly, if the browser's XML parser discovers that the document is not well formed, it displays an error on the browser window. The creator of the document must then correct the syntactic structure.
Hope we've conveyed to you the importance of having a valid XML document for well-behaved, consistent software applications, so that they can earn you good money!