02Geek HTML5 and JavaScript, TypeScript, React, Flash, ActionScript online School
Previous VideoNext Video

More Then Just Leftovers

In this session, we discuss some miscellaneous topics mainly concerned with what is and isn't legal content in XML documents, and also ways to circumvent these rules using special element constructs. The topics we cover are:

  1. Comments and how they behave
  2. White-space handing
  3. Special characters in content
  4. Creating parser-ignorable content
  5. Embedding version and character-set information

We will illustrate these concepts with an example, "The King's Speech", based on the multi-Oscar winning 2011 British Film of the year. This movie addressed King George VI's speech problem, and its correction by a speech therapist. Let us now see the King in modern times, with the BBC making a recording of his World War III speech to motivate the British Army troops stationed across the world. Here is how the BBC's state-of-the-art speech recording and filtering software would go about its task, while choosing XML to be the output format:

Stammering speech is stored as comments in the XML document. (This content is useful for creating statistics and studying them.) Long periods of silence (no noise) are treated as 1 white-space per second of silence. Special characters display mathematical statistics. Recording noise is also identified and isolated, but maintained in the document as special non-processed data, for the tool's own logging and improvement. Here is the document example:

<?xml version="1.0" encoding="UTF-8" ?>

<king-speech>

<!%u2014line> duh-duh-duh (This is Stammering) duh-duh </line -- >

<line>The time has come for the British to stand up for the rest of Humanity</line>

<line> </line>

<![CDATA[ALL RECORDING NOISE*****]]>

<!%u2014line>The rest of the speech</line -- >

<statistics>Stammer Percentage &lt; 30 per cent</statistics>

<statistics>Noise Correction &lt; 45 per cent</statistics>

</king-speech>

Here is how we map the syntax used in this document to the actual requirements, stated just preceding it:

  1. Comment Syntax : <!%u2014 is the start tag and -- > is the end-tag. Anything in between is ignored by the application, so this is ideal for storing the king's stammering to be analyzed later on.
  2. White-space: considered as valid content%u2026 here used to model silence in the speech.
  3. The recording noise is a binary bit-pattern which cannot be parsed by the XML parser as it can only understand character data. We could have ignored it, but decided to keep it for statistical purposes. The start tag is : <![CDATA[ and end-tag is ]]>
  4. Special mathematical symbols such as '<' can be confused for beginning of a tag, so special character sequences such as '&lt;' (familiar from HTML) are used to preserve them.
  5. The first line of the XML document describes the type of information (XML,) the version of the XML standard (1.0) and the encoding information (UTF-8).

The main point is that XML understands and stores only character sequence data, and has a few restricted keywords. But it also provides constructs to work around these limitations and syntactic rules, as we saw in the document example above.

Ready to Level Up Your Skills?

Join thousands of learners on 02GEEK and start your journey to becoming a coding expert today!

Enroll Now for Free!

Overview and Context of XML

We will learn about the historical perspective and overview of the XML standard. The main focus is on Information organization and cross platform usage

12:47

XML Elements

The first step in learning about how XML documents are structured is to get familiar with elements, the basic components of an XML document.

08:22

XML Nested Elements

Nested elements allow for complex data to be represented effortlessly in an XML document.

09:19

Attributes

Attribute-value pairs are an alternative to nesting of elements in an XML document. We'll see the pros and cons of this feature

08:12

More Then Just Leftovers

Comments, White-space, Special characters, Version Information and Parser by-passing content; we discuss what these constructs are and how to use them effectively

13:54

XML Validation

XML documents should be well formed structure- and content-wise for applications to be configured and behave properly. We discuss an analogy with a real life scenario to highlight the point

08:17

E4X – ActionScript 3.0

We describe the importance and beneficial features of E4X library, and why it should be used in browsers at the earliest

24:56

XML DOM

An XML document is often pictured as an inverted nodes (elements) tree and connected nodes have a relation between them. DOM allows the program to traverse and retrieve these related nodes.

11:12

JavaScript

We describe the basic syntax of Javascript to load XML documents in client-side applications

13:51

Looping XML

We describe the looping constructs of PHP scripting language and how they can be used to parse and render XML information in client software

14:19

XML as a Remote

We describe the utility of XML for the modern web, especially when there's a plethora of programming languages and platforms and how XML builds several bridges

09:19

Loading XML in PHP

We describe how to load XML documents into server side applications in the PHP scripting language

04:57

Elements and Attributes in PHP

We describe how to access elements and attribute values while processing XML content via PHP scripts in server-side software

04:44

foreach

We describe the foreach looping construct in PHP and also how it can be used to process XML documents in server-side software

07:01

PHP XML compare

We describe how to compare XML document components and content in different documents having the same syntax

05:08

Modifying XML

We describe how to modify the content of XML documents in PHP : insertion/deletion/update operations in server-side applications

08:34