===== XML Basics =====
XML is a subset of the Standard Generalized Markup Language (SGML)
and, as such, is a meta-language that is used to create languages that
describe hierarchical data. XHTML (a variant of HTML) is one such
language that is used for mark-up papers/articles/books.
==== Elements ====
XML has two different kind of elements, those that can contain component elements and those that can't. Both types can have attributes, which are name-value pairs.
XML elements that can't contain other elements have the following syntax (sometimes called the abbreviated syntax for empty elements):
''**<**//tag// [ //attribute//**="**//value//**"** ] ... **/>**''
For example, the following might be such an element in a railroad
system:
Elements that can contain other elements have the following syntax:
''**<**//tag// [ //attribute//**="**//value//**"** ] ... **>** [ //component// ... ]
****//tag//**>**''
For example, the following might be such an element in a railroad system:
Note that both tags and attribute names are case-sensitive. Note also that text strings can be components.
==== Well-Formed XML Documents ====
An XML document is said to be well-formed if and only if:
* It only contain elements that have a start tag and a close tag or use the abbreviated syntax for empty elements;
* It has a single root element; and
* It has the XML declaration as the first line.
The following is an example of a well-formed XML document in a railroad timetable system.
Boston, MA-South StationNewport News, VABoston, MA-South StationNewport News, VA
==== Imposing Structure ====
To be well-formed an XML document must satisfy a few simple requirements. However, these requirements say nothing about the structure of the document (e.g., the attributes for each element, the hierarchical structure of the elements).
There are two different ways to impose structure. One is to use a Document Type Definition (DTD) and the other is to use a Schema. Thought they have very different syntaxes, they are conceptually very similar.
For example, the following DTD:
and the following Schema:
The root element
can both be used to impose structure on a railroad timetable document.
The primary advantage of Schemas is that they are also written in XML, which means that the same parser can be used for the data document and the document that describes its structure.