Legend Scrolls

From HTML to XHTML

XML Structure

One of the new markup introduced in XML is the Processing Instruction:

<?appTarget attributesOrOtherInstructions ?>

The 'appTarget' states what the target of the instructions are for and after a space is usually a set of attributes stating the instructions, followed by an optional space.

XML's most known processing instruction is the XML Prolog that is at the top of all XML Documents:

<?xml version="1.0" encoding="UTF-8"?>

This prolog is optional but most authors use it as it clearly states what type of document it is and provides a few details for the document.
The version attribute states the version of XML. To date there is XML 1.0 and XML 1.1. The majority of XML Documents and XML parsers conform to XML 1.0.
To state what character set the XML-supported application should use to process the document characters, we use the encoding attribute. UTF-8 is the typical and most supported character set on the Internet and in the computing world (UTF-8 is used in most of today's filesystems).
For those PHP users, you have already been using an XML Processing Instruction:

<?php echo 'This is an instruction for PHP to display this string of text.' ?>

PHP Note: when outputting an XML Prolog, the PHP parser will not like XML targets in Processing Instructions so you will need to output them as so:

<?php echo '<?xml version="1.0" encoding="UTF-8"?>'."\n"; ?>

Some times XML document structures are bound to a Document Type Definition (DTD):

<!DOCTYPE documentElement PUBLIC "-//OrganisationName//DTD LanguageName 1.0//EN" "http://www.example.com/myvalidstructure.dtd">

The first normal element is called the Document Element for instance a language describing details of an image:

<ImageLibrary>

</ImageLibrary>

XML elements and attributes are much like HTML elements and attributes except they obey a set of baseline XML rules to keep document structure well-formed such as:

All XML parsers (processors) have this baseline validator that checks XML documents are well-formed. If your document isn't well-formed, the parser will let you know with an XML error page.

An example of an XML Document:

<?xml version="1.0" encoding="UTF-8"?>
<ImageLibrary xml:lang="en-GB">
  <Image xml:id="i268">
    <Filename>FlowDiagram</Filename>
    <FileExtension>png</FileExtension>
    <MIMEType>image/png</MIMEType>
    <Dimensions units="px">
      <Width>102</Width>
      <Height>214</Height>
    </Dimensions>
    <ColorDepth SingleBitTransparencySupport="8-bitOnly" AlphaTransparencySupport="true">
      24-bit
    </ColorDepth>
    <AnimationSupport>None</AnimationSupport>
    <Exif>Copyright 2006 Legend Scrolls</Exif>
    <AlternativeContent>A diagram: box A points to box B, in turn box B points to box C and D. Box D points back to box A.</AlternativeContent>
  </Image>
  <Image xml:id="i269">
    <Filename>SimpleTree</Filename>
    <FileExtension>gif</FileExtension>
    <MIMEType>image/gif</MIMEType>
    <Dimensions units="px">
      <Width>32</Width>
      <Height>78</Height>
    </Dimensions>
    <ColorDepth SingleBitTransparencySupport="true" AlphaTransparencySupport="false">
      8-bit
    </ColorDepth>
    <AnimationSupport>Framed, repeat</AnimationSupport>
    <Exif>Not Supported</Exif>
    <AlternativeContent></AlternativeContent>
  </Image>
</ImageLibrary>

For elements that do not have element content XML introduces the concept of an Empty Element:

<AlternativeContent/>

The slash at the end, instead of being in an end tag can be at the end of the start tag turning the element into an Empty Element. An optional space can be put just before the slash. But Empty Elements can be normal elements as long as there is absolutely no element content including new lines.
The Image Document could be re-written as:

<?xml version="1.0" encoding="UTF-8"?>
<ImageLibrary xml:lang="en-GB">
  <Image xml:id="i268">
    <Filename>FlowDiagram</Filename>
    <FileExtension>png</FileExtension>
    <MIMEType>image/png</MIMEType>
    <Dimensions units="px" width="102" height="214"/>
    <ColorDepth SingleBitTransparencySupport="8-bitOnly" AlphaTransparencySupport="true">
      24-bit
    </ColorDepth>
    <AnimationSupport/>
    <Exif>Copyright 2006 Legend Scrolls</Exif>
    <AlternativeContent>A diagram: box A points to box B, in turn box B points to box C and D. Box D points back to box A.</AlternativeContent>
  </Image>
  <Image xml:id="i269">
    <Filename>SimpleTree</Filename>
    <FileExtension>gif</FileExtension>
    <MIMEType>image/gif</MIMEType>
    <Dimensions units="px" width="32" height="78"/>
    <ColorDepth SingleBitTransparencySupport="true" AlphaTransparencySupport="false">
      8-bit
    </ColorDepth>
    <AnimationSupport>Framed, repeat</AnimationSupport>
    <Exif/>
    <AlternativeContent/>
  </Image>
</ImageLibrary>

XML Comments are like HTML Comments:

<!-- This is an XML Comment -->

Like HTML has its MIME Type as 'text/html', XML has two MIME Types:

text/xml
application/xml

The second one is preferred these days.
Typically XML documents are saved in .xml files but various XML-based languages have their own file extensions and their own MIME Types.

Next: XHTML Structure

Copyright ©2006-2008 Legend Scrolls and Peter Davison.
All rights reserved.