From HTML to XHTML
XHTML Structure
XHTML 1.0, the reformulation of HTML 4.01 as an XML Application.
XHTML is essentially a native XML language and so must need an XML parser that understands XHTML to successfully render its features. As HTML has 'text/html' and XML prefers 'application/xml' for their MIME Types, XHTML does have its own MIME Type:
application/xhtml+xmlAlthough native XHTML can use the XML MIME Types, if the parser doesn't recognise XHTML then it will just treat it as generic XML.
Static XHTML documents can be saved in .xml files but are usually saved in .xhtml files.
Mozilla Firefox, SeaMonkey, Netscape 6+, Konqueror 3.2+, Opera 5+, Apple Safari and Safari RSS are just a few of the web browsers that support native XHTML. Microsoft Internet Explorer does not - big surprise there.
In appendix C of XHTML 1.0's specification it draws out a set of guidelines to allow most of XHTML 1.0 to be processed by non-XHTML or even non-XML supported web browsers, termed 'HTML-Compatible XHTML'.
These guidelines state:
- that you must have a space before the slash in any Empty Element;
- you can use the '
text/html' MIME Type instead of the native XHTML MIME Type; - you can save static documents as
.htmlfiles.
Some simple notes to upgrade from HTML to XHTML:
- As XML does care about the case of element and attribute names, XHTML is declared in lowercase,
- All elements that can have a start and end tag such as
<p>and<li>must have both start and end tags present, - All elements that only have a start tag like
<img>and<br>must be Empty Elements: either have a space (if using HTML-Compatible mode) and a slash before the less-than character such as<br />(preferred) or can be a normal element as long as there is absolutely no element content including new lines such as<img></img>, - All attribute values must be quoted - commonly with double quotes,
- If you use double quotes in attribute values that are surrounded by double quotes then use the
"entity within the attribute value instead of a literal double quote (otherwise you would be ending the attribute before you intended; plus when you go to native-XHTML the XML parser will throw errors at you), - If you use apostrophes or single quotes in attribute values that are surrounded by single quotes then use the
'or'entities within the attribute value instead of a literal apostrophe or single quote. It is best to surround attribute values with double quotes these days, - As ampersands (&) are used to start entities then if you need to use it as an ampersand use the
&entity in attribute values and element content, - As less-than characters (<) are used to start elements and you need to use it as a less-than character then use the
<entity in attribute values and element content - greater-than characters (>) do not pose a problem, - Any 'minimized' attributes such as
checked,multipleornoresizemust be expanded to have the attribute name as the value such aschecked="checked",multiple="multiple"andnoresize="noresize", - Start using the
idattribute at least in addition to thenameattribute as thenameattribute is being depreciated in favour of the global standard of ID type attributes likeidandxml:id, - It is best to move from using the
languageattribute in<script language="JavaScript"> </script>elements to the global standard and requiredtypeattribute as<script type="text/javascript"> </script>, - In native-XHTML the
document.write()anddocument.writeln()does exist still but it doesn't work. This is because it would break the XML Document Well-Formed Rules. Instead you will have to use the HTML Document Object Model to manipulate the existing markup, - Use the XHTML DOCTYPEs instead of the depreciated HTML DOCTYPEs,
- It is best to move from using character sets like ISO-8859-1 to the most supported global standard and Unicode supporting character set UTF-8,
- Element nesting must be proper - for instance:
<strong>This is an <em>invalid</strong> nesting</em>as the<em>element was started within the<strong>element then the<em>element should end within the<strong>element as<strong>This is <em>valid</em> nesting</strong>, - Elements
<script></script>,<style></style>and<pre></pre>must have thexml:space="preserve"attribute to preserve leading and trailing spaces and any multiple spaces within words at the XML parser level, - As XHTML is an XML language, where ever the
langattribute is used, thexml:langattribute must accompany it, - Also XHTML needs to declare its XHTML namespace which uniquely identifies the elements and attributes in the world of XML that they are from the XHTML language and not to be confused with any other XML language element or attribute that might have the same name.
The XHTML namespace value is 'http://www.w3.org/1999/xhtml' and is typically declared as:
<html xmlns="http://www.w3.org/1999/xhtml" ...>...</html>So the elements which were just start tags in the old HTML are now the following XHTML Empty Elements:
<meta />, <link />, <br />, <hr />, <param />, <col />, <img />, <area />, <input />, <frame /> and <base />.Or they could be like normal elements:
<meta></meta>, <link></link>, <br></br>, <hr></hr>, <param></param>, <col></col>, <img></img>, <area></area>, <input></input>, <frame></frame> and <base></base>.All other elements are normal elements with start and end tags present and text and/or child elements as element content.
And these minimised attributes should be expanded to the following:
selected="selected", multiple="multiple", checked="checked", noresize="noresize", ismap="ismap", disabled="disabled", readonly="readonly" and defer="defer".Also some thing to note is the <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> element: the MIME Type of the webpage is sent by the server in the HTTP Header as it is transmitted to the web browser.
So in HTML environments it is just the character encoding from the charset that is used.
In XML environments such as native-XHTML, again the MIME Type is sent from the server and the character set is obtained by the encoding attribute in the XML Prolog so the meta http-equiv content-type element is useless in this environment.
Plus the XML Prolog and any other XML Processing Instruction (other than PHP's) would be displayed as text in web browsers such as Netscape 3 and under and Microsoft Internet Explorer 3 and under.
As from surveys, no-one uses these web browsers any more and in Microsoft Internet Exploer, Netscape, Opera 4 and higher and other browsers they correctly ignore the XML Prolog in HTML mode.
Talking now about ids. Ids are coming into their own in XML Documents and especially in HTML 4, 4.01 and XHTML. A powerful yet simple, basic feature of webpages. It replaces the name attribute identifying an element for scripting purposes. Even part of the Document Object Model there is a method called getElementById() with the parameter as the value of an id attribute. This method obtains the element that has that id attribute value.
Ids can be used to attach styles too using the hash or sharp character (#) as:
div#navigation {
width: 98%;
background-color: aqua;
}Attaching the width and background colour styles to a div element with an id="navigation" on it.
Also it replaces <a name=""></a> elements as fragment identifiers and results of uris such as mypage.xhtml#fourthParagraph. As it is an attribute on an element, any element with an id can be the target of such a uri.
Id is a unique identifier and so the value must be unique throughout the document. Attributes of type ID such as id and xml:id have the same character restraints as XML Names (element, attribute, processing instruction target names).
The three flavours of XHTML have these DOCTYPEs:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Copyright ©2006-2008 Legend Scrolls and Peter Davison.
All rights reserved.
Skip to content
Home
Contact Me

