Ian Marshall

Getting Started with HTML and CSS

Lesson 3: The HTML Document

Try right-clicking on this page—or indeed any page at all on the Internet—and selecting "view page source" (the contextual menu phrasing will be slightly different on some browsers, and even not present on others—I'm looking at you, Edge!).

What you're seeing is HTML, the source document for the entire web page. Some of it may look familiar. But no doubt quite a lot of it still seems downright foreign. This is because we've specifically been learning the HTML meant for the page's content: text and images. But there's also HTML meant to mark up the entire document itself.

An HTML document describes the web page in entirety, including lots of machine-readable data regarding language used, text direction, author, description, resources used, etc. There's a long list of possibilities, some more essential and useful than others. Consider how you would describe another person. You could quickly come up with a half dozen crucial descriptors before your first pause, but then most that come afterwards probably only apply under certain circumstances. It's pretty much the same with HTML.

Here is a simple, but complete template that I use both casually and professionally:



Notes:

  1. <!DOCTYPE html> is an important first line. It informs the browser what it's about to read: an HTML document type. We humans can at a glance recognize HTML by its tags and attributes. A browser needs to be told.
  2. Actually, that's not strictly true. In the last two chapters we wrote HTML code without any doctype and the browser didn't blow up. That's because by default the browser expects to be reading HTML. So is it important or not? Yes, it is, both for backward compatibility with older browsers and for future-proofing your HTML in case standards change (which they do frequently).
  3. You may notice that after the doctype, the <html> tag represents an element that doesn't close until the very bottom of the document. Absolutely everything (except the doctype) is nested inside the <html> element. This is very important, as the <html> tags surround the entire document. All content should be between these tags.
  4. lang="en-US" is an attribute that provides data regarding the language used in the HTML document. In this case, the document is written in US English.
  5. dir="ltr" represents how the content text is intended to be read, in this case left-to-right. Some languages, like Hebrew, are read right-to-left. Hebrew HTML documents would set the attribute as dir="rtl".
  6. There are only ever two elements nested inside the <html>. These are <head> and <body>, in that order. I stress this because I once spent an entire semester helping a student whose college professor insisted on putting other elements in the <html>, which, of course, drove me crazy. The HTML worked fine, so what's the problem? The problem is compatibility, both backward and forward, historical and future—not to mention professionalism. The document either contains valid HTML or it doesn't. More about HTML validity below.
  7. The first of the two elements nested in the <html> is the <head> element. By "first" I mean it always appears above the <body> in the HTML document. The <head> is intended to hold elements that describe the document itself. Wait, why aren't lang and dir in the <head>? They describe the document, don't they? Yes, but they're also instructions for the browser so it can properly read the text within the HTML element. Remember, computers are stupid. You gotta tell 'em everything.
  8. <meta charset="utf-8"> tells the browser that the character encoding set used in the document is UTF-8, which is a character set that includes nearly every character in nearly every human language. UTF-8 is by far the most widely used encoding on the World Wide Web.
  9. Have you ever visited a website on a phone or other mobile device that appeared tiny and needed to be scrolled around in both directions and zoomed in just to read anything? Such are websites that have omitted <meta name="viewport" content="width=device-width">. Don't let yours be one of them. This line sets up your site to display properly on a smaller viewport (screen).
  10. <title> is not only a required element, but one you'll definitely want to add text to. The text between the <title> tags will appear in the top tab of the browser window. This page's tab says "". Look again at the page source HTML; you'll see exactly that text in the <title> element.
  11. The second of the two elements nested in the <html> is the <body> element. By "second" I mean it always appears below the <head> in the HTML document. The <body> is intended to hold elements that contain displayable content. In other words…

All the HTML we've written up to this point belongs in the <body> of a properly formatted HTML document.

Nothing you want to be displayed in the browser viewport will be in the <head>, which is intended for machine-readable content.

This means more typing, doesn't it?

Um…yes. Sorry.

Except there is the option to use the above, or something similar, as a template. A template is any structured code that is reusable because it doesn't change, or at least not much. In fact, let's create the above template now.

Exercise 2.3.01: Template

Instructions:

Using your code editor, create a new document called _template.html. (Note the leading underscore. This will ensure this document appears at the top of your folder.) Type the code below into that document, matching my syntax and line numbers, and check your work in a browser.

chapter_02/_template.html


    

Notes:

  1. When you open this document in the browser you'll see…nothing! That's because there's nothing yet in the <body> element, which is where all page content lives.

Tinker:

  1. Add text to the <title> element, save (Cmd-S or Ctrl-S) the document, and refresh the browser page (usually F5). Check out the tab at the top of the browser window.
  2. Try copying (Cmd-C or Ctrl-C) and pasting (Cmd-V or Ctrl-V) any of the HTML from lesson 1 or lesson 2 into the <body> of this template. Save, then refresh the browser and you should see your HTML just as it was without the extra document template. This is a good thing, because we can add data to the document without changing its appearance.
  3. When you're done tinkering, be sure to resave your template without a title or any body content. This keeps our template empty and reusable for everything we build from now on.

How strict is HTML?

It's not strict at all! In fact, browsers have gotten very good at adjusting to inconsistent HTML coding practices. That's good news, because we human beings tend to make a lot of mistakes. And that's why we are grateful to our robot overlords.

The big question, then: Why should I be strict with my own coding if the browser is so lenient? Good question; I'm glad you asked! There's an organization called the World Wide Web Consortium, sometimes abbreviated as W3C or just W3. They've taken it upon themselves to publish standards for HTML and other World Wide Web technologies, propose new ones, and provide an open forum for debate. It's composed of numerous government, non-government entities like businesses and universities, and individuals committed to an open and collectively empowering model that will help radically improve the way people around the world develop new technologies and innovate for humanity. (https://www.w3.org/Consortium/mission) That's kinda noble, huh?

Just in case you're not yet convinced to follow the W3C recommendations, it might help to tell you it was founded and currently still run by Sir Tim Berners-Lee. Remember him? He, uh, invented the World Wide Web. Maybe we should listen.

I heartily recommend validating your HTML code. W3C offers a free, online resource:

https://validator.w3.org/

Why validate? https://validator.w3.org/docs/why.html

Validation is not required, but it is definitely more professional to make yourself conform to best practices—or at least provide legitimate reasons for breaking them. Ignorance and apathy are never professional.

What's in your <head>?

Well, now it's that song Zombie by The Cranberries. Maybe it's in your head now, too. (Sorry?)

Remember: nothing in the <head> will be displayed in the browser's viewport.

There are a lot of machine-readable options that may be included in an HTML document's <head> element. Many of them revolve around search engine optimization (SEO), so we'll talk about most of those in a more advanced course. But let's introduce three that you may choose to add to your template.

Author

You want credit for your work, don't you? Of course you do.



Be proud of your web development efforts! I hate to tell you, though, that this tag is currently ignored by most search engines. Still, you may include it in case that ever changes, or if you want to provide this information should any humans read your HTML code. It may also be important if you're working in a group and each member is responsible for a different page. Identifying authorship isn't a bad idea.

Description

Now this tag is very important for search engines. When you do an online seach, you see each site listed with its title (the <title> element) and its description.



Choose a better description than I did, though. One more…descriptive. And specific. It'll help your search results, if that's important to you. Now, if you do add a description to your template, then—like <title>—you'll have to provide content specific to the page so the text it contains stays relevant.

Favicon

A favicon is a small, square image displayed in the browser tab next to the page title. It also may appear in search results or as a mobile device button image. You can see the favicon I'm currently using for this site above in the browser tab, but here it is, as well: My current favicon image



Notes:

  1. The <link> is a new element! It provides a way to insert an external document (in this case, an image) into the <head> of the current one.
  2. The rel="icon" attribute is another of those human/machine readability improvements that is a good idea to include. Here, we're specifying the image we're providing a link to, which happens to be an icon.
  3. The type="image/png" attribute is important to include; it's also important it matches the image type. Above, I'm referencing a png image. Here are a few of the most common image types:
myfavicon.jpgtype="image/jpeg"
myfavicon.pngtype="image/png"
myfavicon.bmptype="image/bmp"
myfavicon.giftype="image/gif"
myfavicon.tiftype="image/tif"
myfavicon.webptype="image/webp"
myfavicon.svgtype="image/svg+xml"
  1. You can use any image, as long as it's square. You may even provide different sizes to prevent different browser environments from stretching or shrinking your favicon image.


  1. We'll discuss images in detail in a more advanced course.

Review

We now have a professional and valid HTML document shell that we can use as a template for all future work. It includes data needed by the browser to display our HTML effectively and efficiently, as well as vacancies for our content and page title.

Additionally, we have the option of including three more elements in the <head>, one each for author data, a page description, and an icon image (favicon).