HTML (HyperText Markup Language) is the predominant language of web pages. Whenever you read or interact with a page in your browser, chances are itβs an HTML document. Originally developed as a way to describe and share scientific papers, HTML is now used to mark up all sorts of documents and create visual interfaces for browser-based software.
Every HTML document is made from elements, and elements are represented by tags. Tags are a sequence of characters that mark where different parts of an element start and/or stops.
All tags begin with a left-facing angle bracket (<) and end with a right-facing angle bracket (>). Every element has a start tag or opening tag, which starts with <, and is followed by the element name (or an abbreviation of it). The element name may be followed by an attribute (or series of attributes) that describes how that instance of an element is supposed to behave. You can set an explicit value for an attribute with an = sign. Some attributes, however, are empty. If an empty attribute is present, the value is true. Letβs look at an example using the input element.
[html]<input type=”text” name=”first_name” disabled>[/html]Here, type, name and disabled are all attributes. The first two have explicit values, but disabled is empty. Some elements allow empty attributes, and these are usually those that might otherwise accept true/false values. Hereβs the tricky part: The value of an empty attribute is either true or false based on the presence or absence of the attribute, regardless of its set value. In other words, both disabled=”true” and disabled=”false” would also disable input control. Most elements also have a closing tag. Closing tags also start with <, but rather than being immediately followed by the element name, they are followed by a forward slash (/). Then comes the element name, and right-angle bracket or >. However, some elements are known as void elements. These elements cannot contain content, and so do not have a closing tag. The input element shown above is an example of a void element.
Now that weβve covered the basics of tags, letβs take a closer look at an HTML5 document.
Your First HTML5 Document
Open up your favorite text editor and type the following. Save it as hi.html.
[html]<!DOCTYPE html><html>
<head>
<title>Hi</title>
</head>
<body>
<p>Hi</p>
</body>
</html>[/html]
Congratulationsβyouβve written your first HTML5 document! Itβs not fancy, perhaps, but it does illustrate the basics of HTML5. Our first line, <!DOCTYPE html> is required. This is how the browser knows that weβre sending HTML5. Without it, thereβs a risk of browsers parsing our document incorrectly. Why? Because of DOCTYPE switching. DOCTYPE switching means that browsers parse and render a document differently based on the value of the <!DOCTYPE declaration, if it’s served with a Contenttype: text/html response header. Most browsers implemented some version of DOCTYPE switching in order to correctly render documents that relied on non-standard browser behavior, or outdated specifications. HTML 4.01 and XHTML 1.0, for example, had multiple modesβstrict, transitional, and framesetβthat could be triggered with a DOCTYPE declaration, whereas HTML 4.01 used the following DOCTYPE for its strict mode.
[html]<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN”“http://www.w3.org/TR/html4/strict.dtd”> [/html]
Transitional, or loose DOCTYPE declarations trigger quirks mode. In quirks mode, each browser parses the document a little bit differently based on its own bugs and deviations from web standards. Strict DOCTYPE declarations trigger standards mode or almost standards mode. Each browser will parse the document according to rules agreed upon in the HTML and CSS specifications. A missing DOCTYPE, however, also triggers quirks mode.
So HTML5 defined the shortest DOCTYPE possible. The HTML5 specification explains:
“DOCTYPEs are required for legacy reasons. When omitted, browsers tend to use a different rendering mode that is incompatible with some specifications. Including the DOCTYPE in a document ensures that the browser makes a best-effort attempt at following the relevant specifications.”
And so, using the HTML5 DOCTYPE (<!DOCTYPE html>) triggers standards mode, even for older browsers that lack HTML5 parsers.
The Two Modes of HTML5 Syntax
HTML5 has two parsing modes or syntaxes: HTML and XML. The difference depends on whether the document is served with a Content-type: text/html header or a Content-type: application/xml+xhtml header.
If itβs served as text/html, the following rules apply:
- Start tags are not required for every element.
- End tags are not required for every element.
- Only void elements such as br, img, and link may be “self-closed” with />.
- Tags and attributes are case-insensitive.
- Attributes do not need to be quoted.
- Some attributes may be empty (such as checked and disabled).
- Special characters, or entities, do not have to be escaped.
- The document must include an HTML5 DOCTYPE.
HTML Syntax
Letβs look at another HTML5 document.
[html]<!DOCTYPE html><html>
<head>
<meta charset=utf-8>
<title>Hi</title>
<!–
This is an example of a comment.
The lines below show how to include CSS
–>
<link rel=stylesheet href=style.css type=text/css>
<style>
body{
background: aliceblue;
}
<style>
</head>
<body>
<p>
<img src=flower.jpg alt=Flower>
Isn’t this a lovely flower?
<p>
Yes, that is a lovely flower. What kind is it?
<script src=foo.js></script>
</body>
</html> [/html]
Again, our first line is a DOCTYPE declaration. As with all HTML5 tags, itβs case-insensitive. If you donβt like reaching for Shift, you could type <!doctype html> instead. If you really enjoy using Caps Lock, you could also type <!DOCTYPE HTML> instead. Next is the head element. The head element typically contains information about the document, such as its title or character set. In this example, our head element contains a meta element that defines the character set for this document. Including a character set is optional, but you should always set one and it’s recommended that you use UTF-8.
Make Sure Youβre Using UTF-8 Ideally, verify your text editor saves your documents with UTF-8 encoding “without BOM” and uses Unix/Linux line-endings.
Our head element also contains our document title (<title>Hi</title>). In most browsers, the text between the title tags is displayed at the top of the browser window or tab. Comments in HTML are bits of text that arenβt rendered in the browser. Theyβre only viewable in the source code, and are typically used to leave notes to yourself or a coworker about the document. Some software programs that generate HTML code may also include comments. Comments may appear just about anywhere in an HTML document. Each one must start with <!– and end with –>. A document head may also contain link elements that point to external resources, as shown here. Resources may include style sheets, favicon images, or RSS feeds. We use the rel attribute to describe the relationship between our document and the one weβre linking to. In this case, weβre linking to a cascading style sheet, or CSS file. CSS is the stylesheet language that we use to describe the way a document looks rather than its structure. We can also use a style element (delineated here by <style> and </style>) to include CSS in our file. Using a link element, however, lets us share the same style sheet file across multiple pages. By the way, both meta and link, are examples of void HTML elements; we could also self-close them using />. For example, <meta charset=utf-8> would become <meta charset=utf-8 />, but it isnβt necessary to do this.
To Quote or Not Quote: Attributes in HTML5
In the previous example, our attribute values are unquoted. In our hi.html example, we used quotes. Either is valid in HTML5, and you may use double (“) or single (‘) quotes. Be careful with unquoted attribute values. Itβs fine to leave a single-word value unquoted. A space-separated list of values, however, must be enclosed in quotes. If not, the parser will interpret the first value as the value of the attribute, and subsequent values as empty attributes. Consider the following snippet:
[php]<code class=php highlightsyntax><?php echo ‘Hello!’; ?></code>[/php]Because both values for the class attribute are not enclosed in quotes, the browser interprets it like so:
[php]<code class=”php” highlightsyntax><?php echo ‘Hello!’; ?></code>[/php]Only php is recognized as a class name, and weβve unintentionally added an empty highlightsyntax attribute to our element. Changing class=php highlightsyntax to class=”php highlightsyntax” (or the single-quoted class=’php highlightsyntax’) ensures that both class attribute values are treated as such.
A Pared-down HTML5 Document
According to the rules of HTMLβthis is also true of HTML 4βsome elements donβt require start tags or end tags. Those elements are implicit. Even if you leave them out of your markup, the browser acts as if theyβve been included. The body element is one such element. We could, in theory, re-write our hi.html example to look like this.
[html]<!DOCTYPE html><head>
<meta charset=utf-8>
<title>Hi</title>
<p>Hi [/html]
When our browser creates the document node tree, it will add a body element for us. Just because you can skip end tags doesnβt mean you should. The browser will need to generate a DOM in either case. Closing elements reduces the chance that browsers will parse your intended DOM incorrectly. Balancing start and end tags makes errors easier to spot and fix, particularly if you use a text editor with syntax highlighting. If youβre working within a large team or within a CMS (Content Management System), using start and end tags also increases the chance that your chunk of HTML will work with those of your colleagues. For the remainder of this book, weβll use start and end tags, even when optional.
Start and End Tags To discover which elements require start and end tags, consult the World Wide Web Consortiumβs guide HTML: The Markup Language (an HTML language reference). The W3C also manages the Web Platform Docs which includes this information.
“XHTML5”: HTML5βs XML Syntax
HTML5 can also be written using a stricter, XML-like syntax. You may remember from Chapter 1 that XHTML 1.0 was “a reformulation of HTML 4 as an XML 1.0 application.” That isnβt quite true of what is sometimes called “XHTML5”. XHTML5 is best understood as HTML5 thatβs written and parsed using the syntax rules of XML and served with a Content-type: application/xml+xhtml response header. The following rules apply to “XHTML5”:
- All elements must have a start tag.
- Non-void elements with a start tag must have an end tag (p and li, for example).
- Any element may be “self-closed” using />.
- Tags and attributes are case sensitive, typically lowercase.
- Attribute values must be enclosed in quotes.
- Empty attributes are forbidden (checked must instead be checked=”checked” or checked=”true”).
- Special characters must be escaped using character entities.
Our html start tag also needs an xmlns (XML name space) attribute. If we rewrite our document from above to use XML syntax, it would look like the example below.
[html]<!DOCTYPE html><html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<meta charset=”utf-8″ />
<title>Hi</title>
</head>
<body>
<p>
<img src=”flower.jpg” alt=”Flower” />
Isn’t this a lovely flower?
</p>
<script src=”foo.js” />
</body>
</html> [/html]
Here weβve added the XML name space with the xmlns attribute, to let the browser know that weβre using the stricter syntax. Weβve also self-closed the tags for our empty or void elements, meta and img. According to the rules of XML and XHTML, all elements must be closed either with an end tag or by self-closing with a space, slash, and a right-pointing angle bracket (/>).
In this example, we have also self-closed our script tag. We could also have used a normal </script> tag, as weβve done with our other elements. The script element is a little bit of an oddball. You can embed scripting within your documents by placing it between script start and end tags. When you do this, you must include an end tag.
However, you can also link to an external script file using a script tag and the src attribute. If you do so, and serve your pages as text/html, you must use a closing </script> tag. If you serve your pages as application/xml+xhtml, you may also use the self-closing syntax.
Donβt forget: in order for the browser to parse this document according to XML/XHTML rules, our document must be sent from the server with a Contenttype: application/xml+xhtml response header. In fact, including this header will trigger XHTML5 parsing in conforming browsers even if the DOCTYPE is missing.
Configuring Your Server
In order for your web server or application to send the Content-type: application/xml+xhtml response header, it must be configured to do so. If youβre using a web host, thereβs a good chance your web host has done this already for files with an .xhtml extension. Here you would just need to rename hi.html to hi.xhtml. If that doesn’t work, consult your web server documentation.
As you may have realized, XML parsing rules are more persnickety. Itβs much easier to use the text/html MIME type and its looser HTML syntax.




