HTML Webpage Structure

Release: 2010-02-02
Jump to Web Standards Articles TOC

Page 1, , , Page 4

Hypertext

  This article provides a little history of how and why HTML 5 is in existence and describes the code that is used for HTML 5 for Web Authors.

 In the dawn of styled information, when typing meant using typewriters, the content also had written markings (in pen or pencil) to describe parts of a document such as emphasis, importance, layout including tabular and comments.

Today written markings or 'markup' are used practically everywhere. You may not even be aware that you are using it! For example today's word processing applications, amongst others, such as OpenOffice.org Writer, Microsoft Office Word, Corel Wordperfect and Apple Pages, all use markup languages to describe your stylizations and layouts with keywords such as bold, italic, tablerow and listitem.

An old International Standard of describing Markup Languages is SGML - Standard Generalized Markup Language. This is a robust technical language that is used to create technical project documentation and other information documents.

SGML has been used to develop various markup languages such as Rich Text, Cold Fusion and also one of the most popular Internet information documents: HTML - HyperText Markup Language. HTML is the primary way of structuring information on the Internet as Hypertext.

 The life of Hypertext on the web has gone through several stages to try and provide as best semantic (meaningful) structure for webpages. Sir Tim Berners-Lee created the initial development of HyperText Markup Language (HTML) and then the Internet Engineering Task Force (IETF) updated it to HTML 2 and then the language continued from HTML 3.2 by the World Wide Web Consortium (W3C) (directed by Sir Tim Berners-Lee).

 Beginning the path to Web Standards, the W3C HTML 4 and 4.01 specifications provided a strict flavour that cuts out support for most presentational and other depreciated markup and forces proper structure rules. The strict flavour was also to be used with Cascade Stylesheets which provide a more realistic way of providing presentation and layout to HTML documents. Two other flavours, transitional and frameset, were also provided for backwards compatibility of the depreciated code. But all three added internationalization and accessibility.

 Web Browser vendors were wanting to have an easier way of adding more markup that did not break Web Standards so they could provide non-typical information features. Also the Information Technology domain wanted a standard, extensible, structured, data format. Together with the W3C, they developed a specification that is a strict subset of SGML: XML - eXtensible Markup Language. Even though it is a smaller specification than SGML, it is strict so it can continue the power to create custom Markup Languages.

 They field tested HTML in the realm of XML as eXtensible HyperText Markup Language - XHTML. This adapted HTML to use XML's strict structure rules and have extensibility. After XHTML 1.0, they broke up the whole XHTML specification into a collection of reusable modules to further harness extensibility and other XML technologies in XHTML 1.1 as well as provide an attempt for a standard webpage structure for Mobile Phone Internet and eventually attempted a rewrite of the language as XHTML2.

In the wild wild web native XHTML is too strict: one error will throw an XML error page at you or if the error is in a common component of the website or web application such as a header or footer, then the entire website or web application will be XML error pages. Various custom features such as authors own Named Entity Character References cannot be guaranteed to work in all environments and not all environments such as web browsers support native XHTML.

HTML-Compatible XHTML, which allowed webpages coded in a slightly modified form of XHTML to display in environments that did not support native XHTML, do not have the strict structure rules or the extensibility of XML because they are handled as a strange form of HTML.

Plus various text handling of some webpage features are different between XML including native XHTML and HTML including HTML-Compatible XHTML. Different MIME Media Types are used to identify HTML, XML and XHTML for web browsers. For all these reasons, authors were getting confused which type of markup to use with which MIME Media Type and why some features did not work as expected.

 It is now understood that XML is best for structuring pure data such as configuration preferences or rich file formats where the handling of structures is in a less variable environment or has not had a previous history in web browsers such as math expressions and news syndication feeds.

 Webpages, websites and now web applications need an improved form of webpage markup that the old HTML and XHTML cannot completely provide. This improved structure should provide support for new standardized features while keeping a standardized backwards compatibility for web browsers and other environments; also requires authors to drop depreciated and obsolete code and embrace a clean, modern, hopefully future-proof structure including new features and some existing features simplified to help markup emerging web features. All this with a small and incremental learning curve to ease the upgrade from HTML, Native XHTML and HTML-Compatible XHTML.

URIs And IRIs

All the resources on the World Wide Web are identified by a URI: Uniform Resource Identifier. A URI can be one of many types: a URL, a URN and a Tag URI.

A URL is a Uniform Resource Locator and is the most familiar of URIs. URL provides a location as well as a spatial identifier for a resource. Such URL type URIs include http:// addresses.

http://www.w3.org
http://example.com/folder/filename
Figure 1: Examples of URL type URIs

URLs can have a Fragment part at the end represented as a hash or sharp character (#) followed by an unique identifier that usually points to part of a webpage or web application represented by an ID type attribute with the same value in markup languages like HTML:

http://example.com/folder/page.html#thissection
Figure 2: An URI with a Fragment Part

Plus URLs can have an optional query string denoted by a question mark and the variable=value pair separated by an ampersand (&):

http://example.com/folder/page.php?name=me&range=2400&packed=true
http://example.com/folder/page.php?name=me&range=2400&packed=true#thatsection
Figure 3: URI Query Strings

Although the ampersand would be & within HTML, as:

http://example.com/folder/page.php?name=me&range=2400&packed=true
http://example.com/folder/page.php?name=me&range=2400&packed=true#thatsection
Figure 4: Ampersands in URIs in HTML

The above examples are absolute URIs, but you can use relative URIs that are automatically converted to absolute by whatever environment like a web browser:

folder/filename
../otherfolder/me.html
Figure 5: relative URIs

The web browser would take the relative URI and prefix the URI of the current document, without the document's filename, to create an absolute URI. For referring to things in another folder near the current one you use the ../ for 'up-one-level'. So the last example is referring to 'me.html' in the 'otherfolder' folder that is beside the folder that you are in.

A URN is a Uniform Resource Name which provides a spatial identifier independent of it's physical or electronic location. These are restricted to some companies or organizations and include ISBNs for publications and also several organizations use URNs to access built in schemas, etc.

urn:isbn:0-00-000000-0
urn:schemas-microsoft-com:office:office
urn:oasis:names:tc:opendocument:xmlns:office:1.0
Figure 6: Examples of URN type URIs

Tag URIs include tag:example.com,2009:articles.entry01 or tag:me@example.com,2009-02-10:myposts.entry-01.
The 'tag' URI (or 'tags') provides an unique identifier not just in space but in time as well. Beginning with the word 'tag' followed by a colon followed by either a domain name that you have bought or your email address. Then a comma and a UTC (Coordinated Universal Time) date is used for the date when the domain name or email address was owned. Usually people choose the current year they are creating the tag:.

The date can be a four digit year or a four digit year dash two digit month or a full UTC Date such as 2009, 2009-01 or 2009-01-01. After the second colon (:) you can usually use slashes (/), colons and / or dots to separate any meaningful keywords.

IRI stands for Internationalized Resource Identifier and is basically a URI that supports non-Latin characters.

MIME Media Types

Most resources on the Internet are also categorized to a MIME Media Type such as text/plain for text files (and the default for uncategorised files), audio/mpeg for mp3 audio, audio/m4a for .m4a MPEG4 audio, image/png for Portable Network Graphics (PNG), application/octet-stream for binary files, text/html for HTML webpages, text/css for Cascade Stylesheets, application/xml for XML Documents, application/xhtml+xml for XHTML webpages, video/mp4 for MPEG4 video, application/ogg for OGG audio and video and image/svg+xml for Scalable Vector Graphics (SVG).

HTML 5

HTML 5 was proposed by the Web Hypertext Application Technology Working Group (WHATWG). This working group was founded by Mozilla Foundation, Opera Software and Apple in 2004 after a W3C workshop on XHTML. WHATWG didn't like where W3C was taking XHTML in regards to almost completely rewritten and new structures as in XHTML2 and XForms. Some concepts were an improvement but they do not address recent web features such as Web Applications and non-document-based webpages.

Since, the W3C has recognized HTML 5 and the WHATWG is part of the new W3C HTML Working Group, together they are developing HTML 5. This specification is currently a Draft Standard but basic parts of the specification (or the Subset) are modelled on the existing implementation of HTML, XHTML and browser extensions in web browsers since roughly the year 2000. So parts of HTML 5 is already supported in web browsers and other environments.

 Unlike HTML 4.01 and under as a language expressed in terms of the SGML Document Type Definition (DTD) and XHTML in terms of the XML DTD, HTML 5 is an abstract language expressed in terms of the Document Object Model (DOM). So HTML 5 documents can be written in either DOM 5 HTML scripting, HTML 5's custom HTML syntax (text/html mode) or in an XML syntax (XHTML 5).

When using the text/html MIME Type (default), HTML 5 can be in a document with the .html file extension or at least using the MIME Type in environments like PHP. Plus, in this mode, HTML 5 is written in the custom HTML 5 syntax which supports most of the old HTML syntax as well as some XML syntax allowing this form of HTML 5 to be interoperable with common HTML and XHTML concepts and an upgrade path from HTML 4.01, XHTML 1 and 1.1.

HTML 5 documents with the application/xhtml+xml MIME Type or any of the other XML MIME Types are XHTML 5 documents and saved in .xhtml files or again using the application/xhtml+xml MIME Type in environments like PHP. XHTML 5 documents must be written in accordance with XML 1.0 or 1.1.

Apart from the syntaxes and MIME Types, HTML 5 actually has two requirements. For user agents such as web browsers, all previous elements and attributes must be supported for backwards compatibility of old webpages. For webpage authors anything that would be depreciated has been dropped or changed semantics.

More and more web authors are at least coding in the subset of the custom HTML form of HTML 5 as this is the form that can be successfully used in the wild wild web.

In HTML 5 you have the general support for webpage features such as lists, sections, menus, headings, hyperlinks, tables, forms, images and general object inclusion coded in elements, attributes and Entity Character References making use of URIs and MIME Media Types.

Markup

An HTML document is made up of elements, comments, attributes and Entity Character References. An element generally is made up of a start tag (such as <p>), element content (such as text, Entity Character References or even other elements) and an end tag (such as </p>).

<p>This is an element with element content</p>
Figure 7: An element

A Void Element has no element content or end tag and may be either a plain start tag or a start tag with a forward slash near the end or a space and a forward slash near the end such as:

<br> or <br/> or <br />
Figure 8: The three forms of a void element

Comments generally use a special start-only tag:

<!-- This is a comment -->
Figure 9: A comment

You should not have two dashes within the comment text as this will end the comment text before you intended. Comments can be used within any element content.

Attributes provide extra information for an element or void element. An attribute is a name=value pair where the value should be quoted with double quotes (preferred) or single quotes. They are used in element start tags and void elements:

<object data="images/diagram.png" type="image/png" width="180" height="200">
  <span class="fallback">Diagram 1: Core box points to ...</span>
</object>
Figure 10: Attributes example 1

or

<img src="images/diagram.png" width="180" height="200" alt="Diagram 1: Core box points to ...">
Figure 11: Attributes example 2

Some attributes provide 'present or not present' or 'on and off' information and are called boolean attributes. They either just have the name or name="name":

<input type="checkbox" checked>
Figure 12: Boolean attributes (minimized)

or

<input type="checkbox" checked="checked">
Figure 13: Boolean attributes (expanded)

Both would indicate that the checkbox is already checked or ticked.

Characters are grouped in various Character Sets or Character Encodings such as ISO-8859-1 (Latin 1) only contains English and some European characters, JIS0201, JIS0208, JIS0212 contains Latin1 and some Japanese characters and ISO-8859-7 has Latin 1 and Greek characters, etc. Characters are indexed in these Sets or Encodings and some characters may have different indexes in different Encodings. These days it is best to use UTF-8 (8-bit Universal Character Set Transfer Format) as this Character Set or Character Encoding supports almost all characters in existence which will allow you to use a single Encoding and have English, Japanese, Greek, Cyrillic, Chinese, Arrows, circled letters and numbers amongst other characters in the same webpage document.

An Entity Character Reference (or Character Reference or Entity) is a way to reference a character that is not easily available on a keyboard or similar or may have a different index if the webpage uses one Encoding and the physical file you've saved the webpage in was saved in a different encoding. Using the Character Reference would provide the correct character of the Encoding specified by the webpage.

Character References can be written in three forms: Named (such as &eacute;), Numbered (such as &#233;) or Hexadecimal (such as &#x00E9;). All three of the above examples reference the small letter e with an acute symbol above it: é, as in café.

For best practises in the text/html mode of HTML 5 here are a few notes:

  1. Element and attribute names maybe written in uppercase but that is a pure waste of time and contrary to popular belief does not make your source code any clearer: the modern HTML is in lowercase;
  2. All elements that can have a start and end tag (normal elements) such as p and li should have both start and end tags present;
  3. All elements that only have a start tag (void elements) like img and br may be as <img>, <br> or <img/>, <br/> or <img />, <br />;
  4. All attribute values should be quoted - commonly with double quotes;
  5. If you use double quotes in attribute values that are surrounded by double quotes then use the &quot; Entity Character Reference within the attribute value instead of a literal double quote (otherwise you would be ending the attribute before you intended and the HTML parser will just mess up your page without any warning);
  6. If you use apostrophes or single quotes in attribute values that are surrounded by single quotes then use the &#x0027;, &#39; or &apos; Entity Character References within the attribute value instead of a literal apostrophe or single quote. It is best to surround attribute values with double quotes these days;
  7. As ampersands (&) are used to start Entity Character References, then if you need to use it as an ampersand use the &amp; Entity Character Reference in attribute values and element content;
  8. As less-than characters (<) are used to start elements and you need to use it as a less-than character then use the &lt; Entity Character Reference in attribute values and element content - greater-than characters (>) do not pose a problem;
  9. Any 'minimized' or boolean attributes such as checked or multiple may be expanded to have the attribute name as the value such as checked="checked" and multiple="multiple" or they can be just the attribute name;
  10. The id attribute is used instead of the old name attribute for elements such as img, a and table as the name attribute has been replaced in favour of the global standard of ID type attributes like id;
  11. The old language attribute in script elements has been replaced by the global standard type attribute as <script type="text/javascript"> </script>;
  12. The noscript element is still available in HTML5, now in the head element too. In any case you should be writing Unobtrusive Scripts (see related article) so you do not need noscript anymore (web browsers will still support it for older webpages);
  13. The document.write() and document.writeln() do exist still but they don't work in all circumstances – plus are old hat. Instead you should use the HTML Document Object Model to manipulate the existing markup including DOM 5 HTML's innerHTML property. This will work in all current web browsers (see Modelling the Document Objects article for more on the Document Object Model (DOM) and innerHTML);
  14. Use the HTML 5 Doctype (<!doctype html>) instead of the obsolete HTML and XHTML Doctypes;
  15. It is best to move from using character sets like ISO-8859-1 to the most supported global standard and Unicode supporting character set UTF-8;
  16. Element nesting should be proper - for instance: <strong>This is an <em>invalid</strong> nesting</em> as the em element was started within the strong element then the em element should end within the strong element as <strong>This is <em>valid</em> nesting</strong>;
  17. Elements script, style and pre in XHTML must have the xml:space="preserve" attribute to preserve leading and trailing spaces, tabs, newlines and other 'whitespace' characters and any multiple whitespaces within words at the XML parser level. But in HTML 5 you don't use it as all elements have whitespace preservation at the HTML 5 parser level;
  18. In HTML 5 you use the lang attribute (no xml:lang attribute is needed; however a non-namespaced, non-prefixed attribute called xml:lang may be used in addition if it has the same value but xml:lang is useless in HTML Parsers);

Categories and Content Models

Metadata Content are elements that provide information about the webpage or other resource such as its title, description, an icon, presentation (such as StyleSheets), extra functionality (such as JavaScript), references to other webpages in a collection or even alternative forms of the webpage, resource or application information (such as OpenDocument Format or PDFs). Mostly these elements are contained in a head element but some may be in the body of the webpage or application.

Flow Content should have at least text or an Embedded Content within it and some can contain other Flow Content as well as most other content. But some have a more specific content model (what can be contained within it). By default some Flow Content have a presentational box model of display: block (from CSS) which pushes itself onto its own line and other elements after it onto the next line. Such Flow Content include headings, sections, paragraphs, lists, tables and form boxes.

Sectioning Content are elements that contain headings, headers and footers and so provide heading scope and contribute to the document outline. Sectioning Roots are elements that can have their own outlines. Sectioning Content is a sub-category of Flow Content. Such Sectioning Content include the body of the document or application, sections, articles, asides and such Sectioning Roots include the body of the document or application and table data cells.

Heading Content are elements that provide headings for Sectioning Content and are used in document outlines. Heading Content are a sub-category of Flow Content but can only have Phrasing, Embedded and Interactive Content within them.

Phrasing Content should have at least text or an Embedded Content within it and generally can only contain Phrasing, Embedded, Interactive Content. By default Phrasing Content have a presentational box model of display: inline (from CSS) which do not push other elements and text below or above itself and so can quite comfortably sit within a line. Such Phrasing Content include text, hyperlinks, images, objects, line breaks, emphasis and form controls.

Embedded Content are elements that replace the element with another resource (externally or not). Embedded Content are a sub-category of Phrasing Content. Such Embedded Content include images, frames, audio and / or visual multimedia and content from non-HTML languages that provide content rather than metadata (such as from MathML or SVG).

Interactive Content are elements that a user can activate via a pointing device, keyboard, voice command, touch or motion or even programatically via scripts if those features are supported by the UserAgent, Operating System and hardware (or other forms of information devices). Interactive Content are a sub-category of Phrasing Content but cannot have Interactive Content within them. Such Interactive Content includes hyperlinks and form controls.

Transparent: If an element's content model or part of its content model contains the keyword Transparent, then the scope of that part of the element's content model depends on the content model of the parent element (the element that the current element is within).
 If, for instance, the parent element's content model was Phrasing then the current element's Transparent model can only contain Phrasing level elements.
 Or if the parent element's content model was Flow then the current element's Transparent model may contain Flow Content.

DOM Tree

Modern web browsers, and other UserAgents that deal with HTML and / or XML, internally represent the HTML or XML document as a tree structure of document objects. HTML 5 is natively a Document Object Model (DOM) based language which can be written in the custom HTML syntax (text/html mode) or XML syntax (XHTML 5) or even purely as a DOM Tree in scripts and programming languages.

Html Html Element
  Html Head Element
    Html Title Element
  Html Body Element
    Html Heading Element
    Html Paragraph Element
Figure 14: A small conceptual tree of document objects

<html>
  <head>
    <title></title>
  </head>
  <body>
    <h1></h1>
    <p></p>
  </head>
</html>
Figure 15: The basic markup version

Document Element, Root Element: Both the Document Element and the Root Element refer to the first main element of the markup language vocabulary. For instance in HTML and XHTML the Document Element and Root Element is html.

WebCanvas Element: The WebCanvas element refers to the top most element that is rendered in the WebCanvas such as the area of a web browser that the webpage is displayed in. For instance in HTML (text/html) the WebCanvas Element is body. But in XHTML the WebCanvas Element is the same as the Document Element or Root Element which is html.

Elements may be referred in relation to other elements in a family tree perspective: ancestor, parent, sibling, child, children and descendant.

For instance, from the previous example, the html element is the parent of the head and body elements and is an ancestor element to the head, body, title, h1 and p elements. The body element is a sibling element to the head element. Plus h1 and p elements are child elements or children to the body element and along with the head and body elements are descendant elements of the html element.

Global Attributes

Global attributes can be used on pretty much any HTML 5 element. These include the style attribute which allow inline styling directly on the element with StyleSheet properties in the attribute value; a class attribute that has a space separated list of StyleSheet classes that can be used to provide multiple layered styling. Also the id attribute which provides an unique identifier that can be a target for the fragment part of a URI (or IRI) and also for singular style classes. id values must be unique throughout the document and can only start with a letter, underscore( _ ) or a colon(:) and then as many letters, underscores, colons, numbers and dashes(-). For language selection to help identify that the following text is in a particular language you can use the lang attribute:

<span lang="en-GB">This is English text</span>
<span lang="fr-FR">C'est texte Français</span>
<span lang="el-GR">Αυτό είναι ελληνικό κείμενο</span>
Figure 16: Stating the written language of the text

Presents 'This is English text' in British English, 'This is French text' in French and 'This is Greek text' in Greek.

To provide advisory information about the text or other feature you can add a title attribute.

Plus the hidden boolean attribute may be used on elements that are not yet relevant or are no longer relevant. Scripting can be used to make them relevant following certain conditions such as response from a form. This attribute is not supported by any web browser yet.

 The tabindex attribute is now available on all elements to assist with AJAX and other Web Application element ordering when you tab to each of them. A number value of 1 or higher provides priority: 1 being the highest. Generally the value would be zero providing the order according to where the element is in the document structure (after tabindex="1" and higher). With a value of -1 and lower disables the ability to tab to the element.

 By default links and form controls have the equivalent of tabindex="0". All others are tabindex="-1".

You can use the role attribute and appropriate aria- prefixed attributes on elements in order to add main or extra accessible semantic (meaningful) information for web browsers, screen readers and other Assistive Technology Tools that support Accessible Rich Internet Applications (ARIA) from the Web Accessibility Initiative (WAI). ARIA is not just to make JavaScript powered Web Applications accessible but also make parts of a webpage more meaningful for those listening to a screen reader. (See the Web Accessibility article for more on accessibility and screen readers and the ARIA article for more on WAI-ARIA.)

Other global attributes include the contenteditable attribute, spellchek attribute, itemscope boolean attribute, itemref attribute, itemtype attribute, itemid attribute and itemprop attribute. These will be discussed later in appropriate subsections.

Most event attributes are also global. These will be discussed later but they include: onabort, onblur, onchange, onclick, ondblclick, onerror, onfocus, onkeydown, onkeypress, onkeyup, onload, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onmousewheel, onscroll and onsubmit.

Framework

The HTML 5 Doctype for the text/html version is:

<!doctype html>
Figure 17: The HTML 5 Doctype

The words doctype and html in the Doctype are case insensitive (can be lowercase, uppercase or mixed case). This is purely to switch web browsers into Standards Compliance mode for text/html webpages. If you need to validate HTML 5 documents you need an actual HTML 5 validator such as http://html5.validator.nu/. Most web browsers do not validate webpages so there is no need for Document Type Definitions (DTD) that the older HTML and XHTML required.

Element Name: html.

Categories: Document Framework.

Default CSS Display: block.

Where it can be used: Document Element, Root Element, WebCanvas Element (XHTML 5 only), where a sub-document is allowed in a compound document.

Content Model: one head element followed by one body element.

Attributes: manifest, Global Attributes.

Element Part of Subset: Yes.

html is still the Document Element (main element of the webpage). The XHTML namespace (xmlns="http://www.w3.org/1999/xhtml") maybe kept if you are upgrading from XHTML 1.x to ease the transition but means nothing in HTML. The lang attribute is for stating the natural language of the webpage's content such as lang="en". Plus any other typical attributes such as dir="ltr" or dir="rtl" (dictates if text goes left-to-right or right-to-left).

Element Name: head.

Categories: Document Framework.

Default CSS Display: not applicable.

Where it can be used: As the first child of a html element.

Content Model: One or more Metadata Content where at least one of them is a title element.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: title.

Categories: Metadata.

Default CSS Display: not applicable.

Where it can be used: In the head element with no other title elements.

Content Model: Plain text.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: body.

Categories: Document Framework, Sectioning Root, WebCanvas Element (text/html only).

Default CSS Display: block.

Where it can be used: As the second child of a html element.

Content Model: Flow Content.

Attributes: onafterprint, onbeforeprint, onbeforeunload, onhashchange, onmessage, onoffline, ononline, onpagehide, onpageshow, onpopstate, onredo, onresize, onstorage, onundo, onunload, Global Attributes.

Element Part of Subset: Yes.

head and body elements are the main child containers. In the head element you have a title element to provide the descriptive webpage title and a meta element with a charset attribute to set the character set; these days it would be set to UTF-8 (8-bit Universal Character Set Transfer Format). This character set declaration must be the first child element within the head element. Plus there is an older, longer version as: <meta http-equiv="content-type" content="text/html; charset=UTF-8">. But this is old hat now.

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Untitled</title>
  </head>
  <body>
    
  </body>
</html>
Figure 18: An HTML 5 framework

Content Grouping and Headings

It is best to use appropriate elements to put your text in, rather than text directly within the body element. This will add more meaning or semantics to the text.

Element Name: p.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Phrasing Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: div.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: section.

Categories: Flow, Sectioning.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content.

Attributes: Global Attributes.

Element Part of Subset: No.

 

Element Name: article.

Categories: Flow, Sectioning.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content.

Attributes: Global Attributes.

Element Part of Subset: No.

 

Element Name: aside.

Categories: Flow, Sectioning.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content.

Attributes: Global Attributes.

Element Part of Subset: No.

 

Element Name: nav.

Categories: Flow, Sectioning.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content.

Attributes: Global Attributes.

Element Part of Subset: No.

The most used Flow elements are paragraphs (the p element), old division section (the div element), the section element, the article element, the nav element and the aside element.

The p element provides paragraphs and lines and has some presentational spacing around it. p elements can only have Phrasing Content within them.

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Untitled</title>
  </head>
  <body>
    <p>The ball bounced over the hill.</p>
  </body>
</html>
Figure 19: A paragraph or line
<body>
  <div>
    <p>The hat flew in the breeze.</p>
    <p>Following behind, was a smart gentleman.</p>
  </div>
</body>
Figure 20: The old generic Flow container element

<body>
  <section>
    <p>The hat flew in the breeze.</p>
    <p>Following behind, was a smart gentleman.</p>
  </section>
</body>
Figure 21: A section of a website or web application

<body>
  <article>
    <p>The hat flew in the breeze.</p>
    <p>Following behind, was a smart gentleman.</p>
  </article>
</body>
Figure 22: A news article or an entry in a forum or guestbook, etc.

<aside>
  <p>The current temperature for...</p>
</aside>
Figure 23: A secondary information pane or sidebar

<nav>
  <ul>
    <li><a href="../news.html">News Section</a></li>
    <li><a href="../help.html">Help Section</a></li>
  </ul>
</nav>
Figure 24: Grouping navigation links like a website or application main menu

You can have other Flow and Phrasing Content within div, section, article, aside and nav elements.

Element Names: h1, h2, h3, h4, h5, h6.

Categories: Flow, Heading.

Default CSS Display: block.

Where they can be used: Where Flow Content is expected.

Content Model: Phrasing Content.

Attributes: Global Attributes.

Elements Part of Subset: Yes.

Sections, articles and sidebars need headings. There are 6 heading elements based on the older HTML and XHTML languages:

<h1>A Page Heading</h1>
<h2>A website heading, navigation heading, Page Sub-Title or section heading</h2>
<h3>A sub-title or sub-section heading</h3>
<h4>A sub-sub section heading</h4>
<h5>A sub-sub-sub section heading</h5>
<h6>A sub-sub-sub-sub section or lower heading</h6>
Figure 25: Heading elements

Generally you may use any of these heading elements to provide headings for the page, a section, a sub-section, an article, a navigation section, etc. You may use the h1 element as all the headings in your document or the h2, h3, h4, h5 elements or even the h6 element as all your headings. But it is recommended to use an appropriately numbered heading for the particular section, article, sub-section, etc. The numbers provide a ranking system generally in the document or if you use more than one in the same section, article, sub-section. The highest heading element in the parent element is the top level heading of that parent element. Lower ranking headings imply sub-sections and equivalent or higher ranking headings will imply the end of the current section or sub-section and imply a new section or sub-section.

 As headings are used as part of the document outline and are the key pieces of information used by blind, partially sighted, dyslexic users listening to a screen reader and people in general to locate what they are looking for in the webpage, you should provide a decent hierarchy of headings for the document and each section of content.

For instance only have a single h1 element, primarily as the webpage heading. Then use a h2 element for the website heading (states the company or organization name or main name of the whole website and appears on every webpage of the website). Section and main article and sidebar headings would usually be h2 elements. Main and footer Navigation Headings would usually be h2 elements. Lower parts of the content hierarchy such as sub-sections would be h3 or lower ranking heading elements.

You can only have Phrasing Content within the heading elements.

Element Name: hgroup.

Categories: Flow, Heading.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: One or more h1, h2, h3, h4, h5, h6 elements.

Attributes: Global Attributes.

Element Part of Subset: No.

You can wrap multiple headings in the current element in a hgroup element (as a heading group). Within this element, lower ranking headings represent strap lines or similar.

Element Name: header.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content but with no header descendent elements or footer elements.

Attributes: Global Attributes.

Element Part of Subset: No.

A header element may be used in the main body of the document as the document header or any sectioning element such as sections, articles, sidebars and navigation groups as section headers, article headers, etc. Such items include headings, heading group, introductory staplines, navigation links, logos and banners.

Element Name: footer.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Flow Content but with no Heading or Sectioning Content or header elements or footer descendent elements.

Attributes: Global Attributes.

Element Part of Subset: No.

At the bottom of the section or article, etc. you can have a footer element to wrap page numbers, article numbers, forum entry ids, address elements for contact information (more on address later on).

A progressive set of examples of a news article with a weather sidebar:

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Record Orb Sightings - News - NewsCorp</title>
  </head>
  <body>
    <div class="header">
      <h2>NewsCorp&reg;</h2>
    </div>
    
    <div class="navigation">
      <p>Home, News, Competitions, Staff</p>
    </div>
    
    <div class="article">
      <h1>Record Orb Sightings</h1>
      <div>
        <p>Over the past few months there has been a dramatic increase in the number of orb sightings.</p>
        <p>Orbs are the first stage of a person or spirit materializing into our range of sight.</p>
      </div>
      
      <div>
        <p>The major concentrations of orb activity include York, United Kingdom, San Diego, North America, ...</p>
        <p>Most sightings have taken place during the Summer Solstice 1:00am and 4:00am as well as Halloween 11:00pm to 4:00am.</p>
      </div>
      <!-- … -->
             <div>
        <p>News Article by: San Droo</p>
        <p>Email: san.droo@newscorp.example.com</p>
      </div>
    </div>
    
    <div class="sidebar">
      <h2>Current Weather</h2>
      <p>New New New London: 23 degrees, Sunny.</p>
    </div>
    
    <div class="fotoer">
      <p>&copy;2103, NewsCorp&reg;. All rights reserved.</p>
      <p>NewsCorp is a registered, limited...</p>
    </div>
  </body>
</html>
Figure 26: Simple division based news article webpage

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Record Orb Sightings - News - NewsCorp</title>
  </head>
  <body>
    <div class="header">
      <div class="headings>
        <h2>NewsCorp&reg;</h2>
        <p>Streaming all your news 24/7 since 2097</p>
      </div>
      <p>Win a year's supply of cheesecake.</p>
    </div>
    
    <div class="navigation">
      <p>Home, News, Competitions, Staff</p>
    </div>
    
    <div class="newsarticle">
      <div class="articleheader">
        <div class="headings">
          <h1>Record Orb Sightings</h1>
          <p>Spirits or Whitelighters</p>
        </div>
        <p>2 December, 2103</p>
      </div>
      
      <div class="section">
        <h2>What are these floating clumps of light?</h2>
        <p>Over the past few months there has been a dramatic increase in the number of orb sightings.</p>         <p>Orbs are the first stage of a person or spirit materializing into our range of sight.</p>
      </div>
      
      <div class="section">
        <h2>Where and when?</h2>
        <p>The major concentrations of orb activity include York, United Kingdom, San Diego, North America, ...</p>
        <p>Most sightings have taken place during the Summer Solstice 1:00am and 4:00am as well as Halloween 11:00pm to 4:00am.</p>
      </div>       <!-- … -->
      
      <div class="articlefooter">
        <p>News Article by: San Droo</p>
        <p>Email: san.droo@newscorp.example.com</p>
      </div>
    </div>
    
    <div class="sidebar">
      <h2>Current Weather</h2>
      <p>New New New London: 23 degrees, Sunny.</p>
    </div>
    
    <div class="footer">
      <p>&copy;2103, NewsCorp&reg;. All rights reserved.</p>
      <p>NewsCorp is a registered, limited...</p>
    </div>
  </body>
</html>
Figure 27: Expanded division based news article webpage

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Record Orb Sightings - News - NewsCorp</title>
  </head>
  <body>
    <header>
      <hgroup>
        <h2>NewsCorp&reg;</h2>
        <h3>Streaming all your news 24/7 since 2097</h3>
      </hgroup>
      <p>Win a year's supply of cheesecake.</p>
    </header>
    
    <nav>
      <p>Home, News, Competitions, Staff</p>
    </nav>
    
    <article>
      <header>
        <hgroup>
          <h1>Record Orb Sightings</h1>
          <h2>Spirits or Whitelighters</h2>
        </hgroup>
        <p>2 December, 2103</p>
      </header>
      
      <section>
        <h2>What are these floating clumps of light?</h2>
        <p>Over the past few months there has been a dramatic increase in the number of orb sightings.</p>
        <p>Orbs are the first stage of a person or spirit materializing into our range of sight.</p>
      </section>
      
      <section>
        <h2>Where and when?</h2>
        <p>The major concentrations of orb activity include York, United Kingdom, San Diego, North America, ...</p>
        <p>Most sightings have taken place during the Summer Solstice 1:00am and 4:00am as well as Halloween 11:00pm to 4:00am.</p>
      </section>
      <!-- … -->
      
      <footer>
        <p>News Article by: San Droo</p>
        <p>Email: san.droo@newscorp.example.com</p>
      </footer>
    </article>
    
    <aside>
      <h2>Current Weather</h2>
      <p>New New New London: 23 degrees, Sunny.</p>
    </aside>
    
    <footer>
      <p>&copy;2103, NewsCorp&reg;. All rights reserved.</p>
      <p>NewsCorp is a registered, limited...</p>
    </footer>
  </body>
</html>
Figure 28: A more semantic news article webpage

Currently no browsers support the section, article, aside, nav, hgroup, header or footer elements.

Images

Element Name: img.

Categories: Interactive (if a usemap attribute is present), Embedded, Flow, Phrasing.

Default CSS Display: inline.

Where it can be used: Where Embedded Content is expected.

Content Model: Empty.

Attributes: alt, height, ismap (boolean), src, usemap, width, Global Attributes.

Element Part of Subset: Yes.

Some Phrasing Content are Embedded Content such as a way to bring in pixel based images like Portable Network Graphics (PNGs). You can do this by using the img void element. The src attribute, stands for source, takes a URI referencing the image and the alt attribute provides only simple alternative text if there is critical text within the image or the image contributes to the content (as opposed to decorative images).

The text in the alt attribute will be used for environments like web browsers who can't handle images or can't find the image and also for screen readers to read out to hard of seeing or blind users. width and height attributes can be used to specify the dimensions of the image but this can also be done by StyleSheets. The id attribute can be used for fragment targeting, styling and scripting purposes.

Unfortunately the environment has to guess what the image is, from the file extension, and if the environment doesn't support the image format or can't find it then the only other option is the simple text from the alt attribute.

<img src="images/me_scanned_pic.png" alt="Scanned picture on 2005-06-07" class="dropright">
Figure 29: Displaying an image

If the image is purely for decoration and has no text that the viewer needs to know or the text that is associated with the image is next to the image (such as a caption or link text) then you don't need to have a value in the alt attribute - but you still need an alt attribute present as alt and src attributes are required for img void elements.

<img src="images/landscape.png" alt="" class="landscapeDimensions">
Figure 30: A decorative image and an icon for a link

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Record Orb Sightings - News - NewsCorp</title>
  </head>
  <body>
    <header>
      <hgroup>
        <h2><img src="images/mainlogo.png" width="100" height="50" alt="">NewsCorp&reg;</h2>
        <h3>Streaming all your news 24/7 since 2097</h3>
      </hgroup>
      <p><img src="adverts/wincheesecake2103.png" class="advertDimensions" alt="Win a year's supply of cheesecake."></p>
    </header>
    
    <nav>
      <p>         <img src="images/home.png" width="22" height="22" alt="">Home,
        <img src="images/news.png" width="22" height="22" alt="">News,
        <img src="images/comps.png" width="22" height="22" alt="">Competitions,
        <img src="images/staff.png" width="22" height="22" alt="">Staff
      </p>
    </nav>
    
    <article>
      <header>
        <hgroup>
          <h1>Record Orb Sightings</h1>
          <h2>Spirits or Whitelighters</h2>
        </hgroup>
        <p>2 December, 2103</p>
      </header>
      
      <section>
        <h2>What are these floating clumps of light?</h2>
        <p>
          <img src="images/orb18382-28293.png" width="78" height="100" alt="" class="dropright">
          Over the past few months there has been a dramatic increase in the number of orb sightings.
        </p>
        <p>Orbs are the first stage of a person or spirit materializing into our range of sight.</p>
      </section>
      
      <section>
        <h2>Where and when?</h2>
        <p>The major concentrations of orb activity include York, United Kingdom, San Diego, North America, ...</p>
        <p>Most sightings have taken place during the Summer Solstice 1:00am and 4:00am as well as Halloween 11:00pm to 4:00am.</p>
      </section>
      <!-- … -->
      
      <footer>
        <p>News Article by: San Droo</p>
        <p>Email: san.droo@newscorp.example.com</p>
      </footer>
    </article>
    
    <aside>
      <h2>Current Weather</h2>
      <p>New New New London: 23 degrees, Sunny.</p>
    </aside>
    
    <footer>
      <p>&copy;2103, NewsCorp&reg;. All rights reserved.</p>
      <p>NewsCorp is a registered, limited...</p>
    </footer>
  </body>
</html>
Figure 31: The news article with a logo and images

Lists

Element Name: ol.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Zero or more li elements.

Attributes: reversed (boolean), start, Global Attributes.

Element Part of Subset: Yes.

 

Element Name: ul.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Zero or more li elements.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: li.

Categories: None.

Default CSS Display: list-item.

Where it can be used: As a child of ol, ul, menu elements.

Content Model: Flow Content.

Attributes: value (if as a child of the ol element), Global Attributes.

Element Part of Subset: Yes.

HTML 5 can list items in four ways: ordered, unordered, as a definition list or as a menu list.

To order items you provide an ordered list group with the ol Flow element which only contains one or more li special list-item type Flow elements as the items. By default each ordered item will have a number and a dot before it. The ol start tag may have a start attribute with an integer (whole number) as the start number of the ordered list. Also, not supported in browsers yet, a reversed boolean attribute to make a descending ordered list.

<ol>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
</ol>
Figure 32: Code for an ordered list

produces:

  1. Item 1
  2. Item 2
  3. Item 3
Figure 33: An ordered list

To have something similar to bulleted lists use a ul Flow element around the li elements:

<ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
</ul>
Figure 34: Code for an unordered list

produces:

  • Item 1
  • Item 2
  • Item 3
Figure 35: An unordered list

li elements may have Flow Content including list groups plus Phrasing Content within it.

Element Name: dl.

Categories: Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Zero or more groups of at least one dt element followed by at least one dd element in each group.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: dt.

Categories: None.

Default CSS Display: block.

Where it can be used: Within a dl element before its corresponding dd element.

Content Model: Phrasing Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

 

Element Name: dd.

Categories: None.

Default CSS Display: block.

Where it can be used: Within a dl element after its corresponding dt element.

Content Model: Flow Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

Definition lists comprise of a definition list group, a dl element, one or more definition terms, a dt element, and for each term you have a definition description, a dd element:

<dl>
  <dt>Term</dt>
  <dd>Description</dd>
</dl>
Figure 36: Code for a definition list

producing:

Term
Description
Figure 37: A definition list

Element Name: menu.

Categories: Interactive (if the type attribute value is 'toolbar'), Flow.

Default CSS Display: block.

Where it can be used: Where Flow Content is expected.

Content Model: Zero or more li elements or Flow Content.

Attributes: label, type, Global Attributes.

Element Part of Subset: Yes.

Menu lists can be marked up using one or more menu elements, list items and Phrasing Content. For instance a basic menu list would be as follows:

<menu id="appMenu">
  <li><a href="preferences.html" onclick="openPreferences();">Preferences</a></li>
  <li><a href="help.html" onclick="openHelp();">Help</a></li>
</menu>
Figure 38: An application menu

Add a type="context" attribute to the menu start tag and the menu list turns into a popup context menu. Or have a type="toolbar" attribute on the menu start tag will turn it into a toolbar with toolbar buttons. You can use button elements instead of a elements, regardless of which menu type. (More on a and button elements later.) Plus you can use menu elements with a label attribute, within the list items as submenus. The label attribute provides the submenu label. Currently no web browser supports the type and label attributes and so do not support the context menu or toolbar features yet. But normal menu lists have been supported for years and years.

<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="UTF-8">
    <title>Record Orb Sightings - News - NewsCorp</title>
  </head>
  <body>
    <header>
      <hgroup>
        <h2><img src="images/mainlogo.png" width="100" height="50" alt="">NewsCorp&reg;</h2>
        <h3>Streaming all your news 24/7 since 2097</h3>
      </hgroup>
      <p><img src="adverts/wincheesecake2103.png" class="advertDimensions" alt="Win a year's supply of cheesecake."></p>
    </header>
    
    <nav>
      <menu>
        <li><img src="images/home.png" width="22" height="22" alt="">Home, </li>
        <li><img src="images/news.png" width="22" height="22" alt="">News, </li>
        <li><img src="images/comps.png" width="22" height="22" alt="">Competitions, </li>
        <li><img src="images/staff.png" width="22" height="22" alt="">Staff </li>
      </menu>
    </nav>
    
    <article>
      <header>
        <hgroup>
          <h1>Record Orb Sightings</h1>
          <h2>Spirits or Whitelighters</h2>
        </hgroup>
        <p>2 December, 2103</p>
      </header>
      
      <section>
        <h2>What are these floating clumps of light?</h2>
        <p>
          <img src="images/orb18382-28293.png" width="78" height="100" alt="" class="dropright">
          Over the past few months there has been a dramatic increase in the number of orb sightings.
        </p>
        <p>Orbs are the first stage of a person or spirit materializing into our range of sight.</p>
      </section>
      
      <section>
        <h2>Where and when?</h2>
        <p>The major concentrations of orb activity include:</p>
        <ul>
          <li>York, United Kingdom,</li>
          <li>San Diego, North America,</li>
          <li>...</li>
        </ul>

        <p>Most sightings have taken place during the Summer Solstice 1:00am and 4:00am as well as Halloween 11:00pm to 4:00am.</p>
      </section>
      <!-- … -->
      
      <footer>
        <p>News Article by: San Droo</p>
        <p>Email: san.droo@newscorp.example.com</p>
      </footer>
    </article>
    
    <aside>
      <h2>Current Weather</h2>
      <p>New New New London: 23 degrees, Sunny.</p>
    </aside>
    
    <footer>
      <p>&copy;2103, NewsCorp&reg;. All rights reserved.</p>
      <p>NewsCorp is a registered, limited...</p>
    </footer>
  </body>
</html>
Figure 39: The news article with a list and a navigation menu list

Text-level Elements

To break lines of text (usually only where lines are broken such as poems and source code examples) you can use the br void element. Some people tended to use a div element and separate paragraphs with two br void elements rather than enclose lines and paragraphs with p elements but this is not semantic (meaningful) structure as they are not proper paragraphs or lines.

Element Name: br.

Categories: Flow, Phrasing.

Default CSS Display: inline.

Where it can be used: Where Phrasing Content is expected.

Content Model: Empty.

Attributes: Global Attributes.

Element Part of Subset: Yes.

A br void element is one of the Phrasing Content type of elements except it does push elements and text after it onto the next line.

<p>There was a person from Eling,<br>
  Who could never get down from the ceiling,<br>
 &bnbsp;…
</p>
Figure 40: A rhyming poem with forced line breaks

Element Name: i.

Categories: Flow, Phrasing.

Default CSS Display: inline.

Where it can be used: Where Phrasing Content is expected.

Content Model: Phrasing Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

Phrasing Content elements that can give a span of text more meaning includes the i element. This element semantically represent text in an offset voice or mood such as a word or phrase in another spoken language (complemented with the lang attribute) or a thought. It could represent technical terms, a ship name (such as Mary Rose) or a house name (such as Rosemary's instead of 48 Sandreine Lane), etc. By default some browsers will present these in italics or an alternate speech voice.

<p>The team continued to search the corridors of the <i>Mary Rose</i>.</p>
<p><i>"A dark, cloudy night drew in"</i> Sophie began to dream...</p>
Figure 41: Defining a ship's name and a daydream or dream sequence

Element Name: b.

Categories: Flow, Phrasing.

Default CSS Display: inline.

Where it can be used: Where Phrasing Content is expected.

Content Model: Phrasing Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

The b element could be used to semantically represent notable text such as key words or product names. By default some browsers will present these in bold or an alternate colour.

<p>Get your all in one <b>Kitchen 2100 Toolset</b> with these features:...</p>
Figure 42: A product name

But to just make text italic or bold you can use StyleSheets.

Element Name: em.

Categories: Flow, Phrasing.

Default CSS Display: inline.

Where it can be used: Where Phrasing Content is expected.

Content Model: Phrasing Content.

Attributes: Global Attributes.

Element Part of Subset: Yes.

em element is for emphasis. Multiple nested em elements will provide stronger emphasis.

Text-level Elements continues on Page 2

Page 1, , , Jump to Page 4

Copyright ©2008-2010 Legend Scrolls and Peter Davison.
Icons from the Oxygen Icon Theme, LGPL, and PNG version of icons in the Oxygen Icon Theme from kde-look.org, GPL.
All rights reserved.