HTML Markup Tags

HTML Markup Tags

HTML
Identifies the file as containing HTML-coded information and is used in association with the file extension .html (.htm where the extension length is restricted).

HEAD
Identifies the first part of the document containing the title.
The title is shown as part of your browser's window (see below).

TITLE
The title element contains your document title and identifies its content in a global context. The title is typically displayed in the title bar at the top of the browser window, but not inside the window itself. The title is also what is displayed on someone's hotlist or bookmark list, so choose something descriptive, unique, and relatively short. A title is also used to identify your page for search engines.
It is good practice to restrict titles to 64 characters.

BODY
The body is the main section of the HTML document following after the heading, and contains the publishable content of the document.
Background Color
By default text is displayed in black on a gray background, but these settings can be changed. The color of text, links, visited links, and active links can be chbanged using the following attributes:

BACKGROUND	Identify the image to use as a background to the document on screen. Any image can be used as the background image. The browser will tile the image, repeating it across and down to fill the browser window automatically. Usage: `BACKGROUND="image_filename"`
BGCOLOUR	Set the background colour for the document (see colours).
TEXT	Set the foreground colour used to display text.
LINK	Set the colour used to highlight unused hyperlinks.

Headings
HTML has six levels of headings, starting with '1' for the most significant and working down to '6'. The associated text is highlighted by the Browser to stand out against the normal body text. The syntax of the heading element is:

<Hn>Heading Text</Hn>

where n specifies the heading level (1-6). Heading levels should not be skipped as the results may be unpredictable.

Paragraphs
When HTML documents are rendered for presentation, carriage returns are ignored and the text that they separate continues as if a single space is present. In addition, contiguous whitespace (spaces, linefeeds, and carriage returns) is compressed into a single space when the document is displayed. It is always the tags that dictate the layout of the document and not the text itself. The browser will determine where best to wrap text to make best use of the space available on the screen (or printed page). Paragraph tags are used to identify hard breaks in the text - where the next paragraph should start on a new line. They are coded as:

Paragraph text

The browser will ignore any line breaks within the text of the paragraph and only starts a new paragraph only when it encounters another tag. The browser will also ignore any indentations or blank lines in the source text. The only exception is "preformatted," text, described below. The closing tag may be omitted, as it is assumed when the next tag is encountered. The following attributes are available:

ALIGN

Identifies the justification for the text in the paragraph as one of LEFT (default), CENTER or RIGHT.

Preformatted Text
The <PRE> tag forces text to be presented in a fixed-width font. The tag also makes whitespace significant (spaces, new lines, and tabs). This can, for example, be useful for program listings. The syntax is:

<PRE>
Text 1
Text 2
Text 3
</PRE>

displays as:

Text 1
Text 2
Text 3

The following attributes can be used:

WIDTH

the maximum number of characters for a line, allowing the browser to choose an appropriate font and indentation for the text.

Hyperlinks can be used within <PRE> sections, but other HTML tags should be avoided.

Lists
HTML supports unnumbered, numbered, and definition lists, and includes the ability to nest lists.

Unnumbered, bulleted List
To make an unnumbered, bulleted list, the syntax is as follows:

<UL>
<LI> item 1
<LI> item 2
</UL>

generating the output:

item 1
item 2

The <LI> items can contain multiple paragraphs, indicated with the relevant tags.

Numbered (Ordered) List
A numbered list (also called an ordered list, from which the tag name derives) is identical to an unnumbered list, except it uses <OL> instead of <UL>. The items are tagged using the same <LI> tag. The following HTML code:

<OL>
<LI> item 1
<LI> item 2
</OL>

generates the output:

item 1
item 2

Definition Lists
A definition list usually consists of alternating a definition term (the 'heading') and a definition definition (its description).

<DL>
<DT> SGML
<DD> Standard Generalized Markup Language
<DT> HTML
<DD> HyperText Markup Language
</DL>

Generating the following output:

SGML: Standard Generalized Markup Language
HTML: HyperText Markup Language

The <DT> and <DD> entries can contain multiple paragraphs or other formatting instructions. The <DL> tag supports the following attributes:

COMPACT

Used where the definition terms are short enough to be included on the same line in the document.

Nested Lists
Lists can be nested within each other, mixing different types as required. Also, a number of paragraphs, each containing a nested list, can be included in a single list item.
Each subsequent level of nesting is distinguished by the browser with additional indentation and a different style of bullet or numbering.

Extended Quotations
The <BLOCKQUOTE> tag is used to delimit quotations in a separate block on the screen, generally by chnging the indentation of the text to distinguish it from the surrounding text. The following example:

<BLOCKQUOTE>In sitting just sit, in standing just stand.
Above all, don't wobbleAnon</BLOCKQUOTE>

Generates this output:

In sitting just sit, in standing just stand. Above all, don't wobble
Anon

Forced Line Breaks/Postal Addresses
The tag forces a line break, avoiding additional space between the lines, eg. as required by postal addresses. An example of its use is:

Line 1 Line 2 Line 3 

Horizontal Lines (Rules)
The <HR> tag produces a horizontal line the width of the visible screen. The following attributes are available:

SIZE	Specify the thickness of the line (in pixels)
WIDTH	Specify the proportion of the width to be taken up by the line.

For example:

<HR SIZE=4 WIDTH="50%">

is presented as:

Character Formatting
HTML has two types of styles for individual words or sentences: logical and physical. Logical styles tag text according to its meaning, while physical styles indicate the specific appearance of a section. For example, in the preceding sentence, the words "logical styles" was tagged as "emphasis." The same effect (formatting those words in italics) could have been achieved via a different tag that tells your browser to "put these words in italics."
Logical Versus Physical Styles
Physical and logical styles produce very similar result when the document is presented. Ideally, content is divorced from presentation and formatting is described in terms of the context (logical) rather than precisely how it should be displayed (physical). The advantage of this approach is that the definition of a category of information can be changed and have a global effect, rather than editing all instances of that type of data with the new layout. Indeed, many browsers today let you define how you want the various HTML tags rendered on-screen using what are called cascading style sheets, or CSS.
Another advantage of logical tags is that they help enforce consistency in documents as it's easier to tag the text as a heading rather than having to ensur ethat the correct set of characteristics are applied consistently to each instance of the same category of information. For example, the tag is generally rendered as bold text. However, it is possible to instruct that these sections are displayed differently, eg. in red, using a local cascading style sheet. Logical styles offer this flexibility.
Physical styles, om the other hand, offer consistency in that the tagged text will always be displayed in the same way. Whichever approach is used, it should be applied consistently throughout the document.

Logical Styles

<DFN> for a word being defined, typically italics. (no longer supported)

 for emphasis, typically italics.

<CITE> for titles of books, films, etc., typically italics.

<CODE> for computer code, displayed with fixed-width font.

<KBD> for user keyboard entry, typically plain fixed-width font.

<SAMP> for a sequence of literal characters, displayed with fixed-width font.

 for strong emphasis, typically bold.

<VAR> for a variable, to be replaced with specific information, typically italics.

Physical Styles

 Bold

 Italics

<TT> Typewriter Text, ie. fixed-width.

Escape Sequences (Character Entities)

Character entities allow the presentation of special characters (those that would otherwise be interpreted as significant by the browser, eg. '<') and characters that are not available in the standard ASCII character set (eg. accented characters in the extended portion of the set).

The three characters that have special meanings in HTML can be entered as an escape sequence, as can many other characters:

<	escape sequence for '<'
>	'>'
&	'&'
ö	lowercase 'o' with an umlaut ('ö')
ñ	lowercase 'n' with tilde ('ñ')
È	uppercase 'E' with a grave accent ('È')

NOTE: Unlike the rest of HTML, the escape sequences are case sensitive (&LT; cannot be used in place of <).

Linking
The most powerful feature of HTML is its ability to link an element of one document (text or image) to another document or section of a document. A browser highlights the identified element of the first document (color, underline) identifying it as a hypertext link (hyperlink or just link). HTML's single hypertext-related tag is <A>, which stands for anchor:

<A HREF="target document">
Text/image that acts as the link
</A>

Relative vs Absolute Pathnames
You can link to documents in other directories by specifying the relative path from the location of the current document. Pathnames use the standard UNIX syntax. The UNIX syntax for the parent directory (the directory that contains the current directory) is "..". For example, a document in a directory below the location of the current document would be:

<A HREF="subdir/doc.html">Link</A>

The absolute pathname (complete URL) of the file can also be specified, but relative links are more efficient in accessing a server. They also have the advantage of making documents more portable. For instance, several web pages can be created in a single folder, linked to each other using relative paths, and then the entire folder can be uploaded to the web server. The pages on the server will link to the pages on the server (using the relative path). It is important to remember that UNIX is case-sensitive as far as filenames are concerned, so hyperlinks should always be checked before and after ploading to the Web server.
Relative links should be used wherever possible because it's easier to move documents between locations, the connections to the server are more efficient and they take up less space in the document. However, there are plenty of occasions where absolute pathnames have to be used, eg. where linking to a document on a different Web site.

URLs
The World Wide Web uses Uniform Resource Locators (URLs) to specify the location of files on Web servers. A URL includes the type of resource being accessed (e.g., Web, gopher, FTP), the address of the server, and the location of the file. The syntax is:

scheme://host.domain[:port]/path/filename

where scheme is one of:

file	file on the local system
ftp	file on an anonymous FTP server
http	file on a WWW server
gopher	file on a Gopher service
WAIS	file on a WAIS server
news	Usenet newsgroup
telnet	connection to a Telnet-based service

The port number can generally be omitted. There is also a mailto scheme (discussed below), used to hyperlink email addresses, but this scheme is unique in that it uses only a colon (:) instead of :// between the scheme and the address.

Links to Specific Sections
Anchors can also be used to move to a particular section in a document (either the same or a different document) rather than to the top, which is the default. This type of an anchor is commonly called a named anchor because to create the links, you insert HTML names within the document.
Often it can be time-consuming navigating through a long document when only a particular section is of interest. Hyperlinks can be used to create a "table of contents" at the top of a document that can be used to move directly to another location in the same document. You can also link to a specific section in another document.

Links Between Sections of Different Documents
The coding that represents a link to a named anchor is:

<a href="doc2.html#PosA">PosA in doc2</a>

The value after the hash (#) mark represents a tab within the target file (doc2.html). This tab tells the browser what should be displayed at the top of the window when the link is activated. The named anchor (in this example "PosA") in the target document is created using:

<A NAME="PosA">doc2: PosA</A>

With both of these elements in place, the link novigates directly to the relevant position in the target file.
NOTE: You cannot make links to specific sections within a different document unless either you have write permission to the coded source of that document or that document already contains in-document named anchors. For example, you could include named anchors to this primer in a document you are writing because there are named anchors in this guide (use View Source in your browser to see the coding). But if this document did not have named anchors, you could not make a link to a specific section because you cannot edit the original file on NCSA's server.

Links to Specific Sections within the Current Document
The technique is the same except the filename is omitted. For example, to link to the 'PosA' anchor from within doc2.html, use:

<A HREF="#PosA">PosA in this document</a>

Mailto
This anchor makes it possible for the user to create an email to a specific address. The format is:
 <A HREF="mailto:emailinfo@host">Contact us</a>
Other additional information can be defaulted when the email is presented by adding a '?' after the address and a '&' between each additional option:
 subject=subject
 cc=address
Multiple entries under the address values (To: or Cc:) are separated by commas.

Inline Images
Most browsers can display images (ie. mixed in with the text) that are in X Bitmap (XBM), GIF, or JPEG format. Other image formats are also being incorporated into browsers [e.g., the Portable Network Graphic (PNG) format]. Each image takes additional time to download and slows down the initial display of a document.
To include an inline image, enter:

<IMG SRC=ImageName>

where ImageName is the URL of the image file. The syntax for <IMG SRC=...> URLs is identical to that used in an anchor HREF. The name of the image file must end with a recognised extension:

.gif	GIF file
.xbm	X Bitmap
.jpg, .jpeg	JPEG image file
.png	Portable Network Graphic files

The <IMG ...> tag supports the following attributes (all optional apart from 'SRC'):

HEIGHT The height of the image in pixels

WIDTH The width of the image in pixels

ALIGN Alignment with following text, values = TOP, CENTER, Bottom (default). An image that is published on its own as a paragraph will not be alighed with any following text.

ALT Specifies text to be displayed if the browser is not capable (or has been instructed to ignore) images. The text is also displayed while the image is being loaded and as a 'hint' when the cursor is over the image.

BORDER Defines the size of the border in pixels, appearing in the default text color. Setting this attributes to '0' when the image is a hyperlink avoids displaying the default coloured border.

NOTE: Some browsers use the HEIGHT and WIDTH attributes to stretch or shrink an image to fit into the allotted space when the image does not exactly match the attribute numbers. Not all browser publishers think stretching/shrinking is a good idea, so this feature should not be assumed as standard.

External Images, Sounds, and Animations
In order to have an image open as a separate document (called an external image) when a user activates a link on either a word or a smaller, inline version of the image, a reference is included to an external image as follows:

<A HREF="image.gif"><IMG SRC="smallimage.gif"></A>

The smallimage.gif image is displayed in the document, and clicking on it will navigate to the main image in a separate document. The same syntax is used for links to external animations and sounds, the only difference is the file extension of the linked file. For example,

<A HREF="demo.mov">play movie</A>

specifies a link to a QuickTime movie. Some common file types and their extensions are:

.txt	Plain text document
.html	HTML document
.gif	GIF image
.tiff	TIFF image
.xbm	X Bitmap image
.jpg, .jpeg	JPEG image
.ps	Postscript file
.aiff	AIFF sound file
.au	AU sound file
.wav	WAV sound file
.mov	QuickTime movie
.mpg, .mpeg	MPEG movie

Tables
Before HTML tags for tables were finalized, authors had to carefully format their tabular information within <PRE> tags, counting spaces and previewing their output. Tables are very useful for presentation of tabular information as well as a benefit to creative HTML authors who use the table tags to present their Web pages.
A table has a header to describe the columns/rows, rows for information, and cells for each item. In the following table, the first column contains the header information, each row explains an HTML table tag, followed by an explanation of the tag's function.

Table Elements
Element	Description
<TABLE> ... </TABLE>	defines a table in HTML. If the BORDER attribute is present, your browser displays the table with a border.
<CAPTION> ... </CAPTION>	defines the caption for the title of the table. The default position of the title is centered at the top of the table. The attribute ALIGN=BOTTOM can be used to position the caption below the table. NOTE: Any kind of markup tag can be used in the caption.
<TR> ... </TR>	specifies a table row within a table. You may define default attributes for the entire row: ALIGN (LEFT, CENTER, RIGHT) and/or VALIGN (TOP, MIDDLE, BOTTOM). See Table Attributes at the end of this table for more information.
<TH> ... </TH>	defines a table header cell. By default the text in this cell is bold and centered. Table header cells may contain other attributes to determine the characteristics of the cell and/or its contents. See Table Attributes at the end of this table for more information.
<TD> ... </TD>	defines a table data cell. By default the text in this cell is aligned left and centered vertically. Table data cells may contain other attributes to determine the characteristics of the cell and/or its contents. See Table Attributes at the end of this table for more information.

The <TABLE> and </TABLE> tags must surround the entire table definition. The first item inside the table is the CAPTION, which is optional. Then you can have any number of rows defined by the <TR> and </TR> tags. Within a row you can have any number of cells defined by the <TD>...</TD> or <TH>...</TH> tags. Each row of a table is, essentially, formatted independently of the rows above and below it. This lets you easily display tables like the one above with a single cell, such as Table Attributes, spanning columns of the table.

Table Attributes
NOTE: Attributes defined within <TH> ... </TH> or <TD> ... </TD> cells override the default alignment set in a <TR> ... </TR>.
Attribute	Description
ALIGN (LEFT, CENTER, RIGHT)	Horizontal alignment of a cell.
VALIGN (TOP, MIDDLE, BOTTOM)	Vertical alignment of a cell.
COLSPAN=n	The number (n) of columns a cell spans.
ROWSPAN=n	The number (n) of rows a cell spans.
NOWRAP	Turn off word wrapping within a cell.