XML documentation system

Document preparation and conversion in XML

Arjen Baart `<arjen@andromeda.nl>`

Oct 18, 2007

Document Information
Version0.6
OrganizationAndromeda Technology & Automation

Abstract:

This guide explains the concepts of XMLDoc and discusses the features available to prepare documentation. XMLDoc uses XSLT transformations to turn the XML source document into a number of other formats.

1 Introduction

XMLDoc is a collection of stylesheets and utilities, resembling a documentation system like sgmltools and Linuxdoc. The objective is to prepare documentation in XML format. Other formats, like LaTeX, PostScript or HTML are then generated by transforming the XML source document into other formats. The transformation is performed by an XSLT processor and using XSL stylesheets for a specific output format.

1.1 XMLDoc concepts

Writing documents in XML is rather like writing them in HTML. The content is just plain text and the markup, or layout, is defined with tags, which are marked in angle brackets (< and >). A tag marks the beginning or end of an element, where an element is something like a chapter, a section or a phrase of emphasized text. The tags are processed by the XML parser and translated to the output format commands, as defined by the XSL stylesheet.

You can not use every tag just anywhere in your document. There is a certain amount of structure you must adhere to. This structure is defined in the Document Type Definition (DTD).

1.2 XMLDoc in practice

Using XMLDoc is fairly simple. To use XMLDoc, all you need is a plain text editor, such as vim or emacs and a working installation of the XMLDoc utilities and stylesheets. The utilities constitute a special form of XSLT processor and a few shell scripts that invoke this XSLT processor with the proper style sheets.

Since there are three XSLT sheets, there are also three shell scripts: `xml2html`, `xml2latex` and `xml2text`. These scripts transform the XMLDoc document into HTML, LaTeX and plain text output, respcetively. All you need to do is create your XML source document and use one of these scripts to transform the document an another format. For example, to transform this guide into a LaTeX document, you could use:

```   xml2latex guide.xml >guide.tex
```

Note that the transformed document is written to standard output.

2 Overall document structure

All XML-doc documents are set up in a similar way. Just as every other XML file, a doument starts with the XML and DOCTYPE declarations:

```
<?xml version="1.0"?>
<!DOCTYPE doc SYSTEM "doc.dtd">

```

These declarations are followed by the doc element. The doc element contains the entire document. The next element within the doc element defines the type of document. This can be a book, article or a report. Structurally, there is no real difference between either type of document but in LaTeX, they result in different layout styles. Each type of document contains any number of chapter elements. The example shows a minimal XML-doc document:

```
<?xml version="1.0"?>
<!DOCTYPE doc SYSTEM "doc.dtd">

<doc>

<book>

<chapter>
<heading>The only chapter</heading>

</chapter>

</book>

</doc>

```

2.1 Preamble

The document preamble is everything that comes before the real page of text. Typical elements of the preamble are the title page and the table of contents. You can create a title page with the titlepage tag. As the name suggests, the title page holds at least the title of the document. Other elements in the title page are the author(s), the date and the abstract. Here is an example of a title page:

```<titlepage>
<title>XML documentation system</title>
<author>Arjen Baart  <arjen@andromeda.nl></author>
<date>April 8, 2002</date>
<abstract>
This guide explains the concepts of <strong>XMLDoc</strong> and discusses
the features available to prepare documentation.
<strong>XMLDoc</strong> uses XSLT transformations to turn the XML source
document into a number of other formats.
</abstract>
</titlepage>
```

The titlepage must have a title and at least one author element. The date and abstract elements are optional, but if you use them, you must put them in this order.

The table of contents is particularly easy to create. All it takes is an empty toc element:

```   <toc/>
```

The table of contents is optional, but if you use it, it must come between the title page and the first chapter. Note for LaTeX: to actually make the table of contents, you need to run latex twice. LaTeX does not do this in a single pass.

2.2 Sectioning and Paragraphs

After the opening tag of the first chapter, the document really begins. The structure of the document is layed out by its chapters, sections within the chapters, subsections within the sections and so on. Just as in LaTeX and HTML, there are six levels of sectioning elements available:

1. `chapter`: For top-level chapters.
2. `section`: For second-level sections, i.e. 1.1, 1.2, 1.3 and so on.
3. `subsection`: For third-level sections, i.e. 1.1.1, 1.1.2 and so on.
4. `subsubsection`: For fourth-level sections (you get the idea).
5. `paragraph`: The fifth-level sections
6. `subparagraph`: The sixth and final level.

Note that the names are equivalent to their counterparts in LaTeX.

Just as in LaTeX, the `article` document type does not have a `chapter` element. The top-level sectioning element for an `article` is a `section`.

2.3 Linking with style sheets

You can use a 'normal' CSS stylesheet with the HTML output from xml2html, by using the style attribute in the doc element. Here is an example:

```
<doc style="main.css">

```

3 Block-level content

The actual content of your document is organized in block-level elements, such a paragraphs, lists or tables.

3.1 Paragraphs

The most basic type of content block is an ordinary paragraph, contained in a para element. To make several separate paragraphs, you must enclose each paragraph in a para open tag and a para close tag. Here is an example of two small paragraphs:

```
<para>
This is an example of a small paragraph.
</para>
<para>
And here is another paragraph.
</para>

```

A second type of paragraph is a quote. You can make a quote by using the `quote` element:

```   <quote>
This is an example of a quote.
The text within a quoted paragraph is usually slightly indented on both
the left and the right margin.
</quote>
```

Which results in:

This is an example of a quote. The text within a quoted paragraph is usually slightly indented on both the left and the right margin.

A special kind of paragraph is the verbatim environment. Just as in LaTeX, this is used to include literal text output with spaces, indentation and line breaks preserved. The practical use for the verbatim element is to include coding examples, such as:

```   <verbatim>
struct complex
{
double   real;
double   imaginary;
};
</verbatim>
```

Which comes out like this:

```   struct complex
{
double   real;
double   imaginary;
};
```

A variation on the verbatim text is the example text. The only real difference is that example is placed inside a box to make it stand out a bit more. In fact, when converted to XHTML, only an attribute `class='example'` is added. It is up to the CSS linked to the XHTML page to add additional layout features. The default styling will only add a border. Here is the above example shown in an actual example element:

```   <example>
struct complex
{
double   real;
double   imaginary;
};
</example>
```

Which comes out like this:

```   struct complex
{
double   real;
double   imaginary;
};
```

3.2 Footnotes

```<footnote>This is an example of a footnote</footnote>
```

Within a footnote, you can use inline content 2 to format the type styles of the text in the footnote. It is not possible to use the block content described in this chapter within a footnote.

Footnotes appear at the bottom of the page, with a small number in the running text referring to that footnote.

3.3 Lists

Three types of lists are supported:

Each item in such a list must be in an `item` element. In fact, an `item` is the only element allowed in an `itemize`, `enumerate` or `description` element. You should not put ordinary text or any other element in a list without enclosing them in `<item>` and `</item>`. Here is an example of a numbered list:

```
<enumerate>
<item>First you need an enumerate or itemize tag.</item>
<item>Second, include one or more item elements.</item>
<item>Finally, put the content inside the items.</item>
</enumerate>

```

And this is what the list turns into:

1. First you need an enumerate, itemize or description tag.
2. Second, include one or more item elements.
3. Finally, put the content inside the items.

In a description list, you make your own tags for each item instead of the automatically generated bullts or numbers. The tags for each item go in the `tag` attribute of the `item` element. So, repeating the above list as a description list:

```<description>
<item tag='itemize'> for bulleted lists such as this one.</item>
<item tag='enumerate'> for numbered lists.</item>
<item tag='description'> for tagged lists.</item>
</description>
```

Which creates the following output:

itemize
for bulleted lists.
enumerate
for numbered lists.
description
for tagged lists such as this one.

An item can contain inline content as well as block-level content.

3.4 Including graphics

The empty element picture is used to include graphics in your document, like this:

```   <picture src='diagram.png' eps='diagram' scale='0.5'/>
```

The two attributes are used in either HTML or LaTeX.

3.5 Tables

Creating tables in XMLDoc is much like creating tables in HTML. First, there is the `table` element. The `table` element may contain an optional `thead` and any number of `row` elements. Both the `thead` and the `row` elements must contain one or more `col` elements. The `col` elements hold the actual content of the table, which must be inline content (see next chapter) or block content. To use the tables in LaTeX, you must supply a `cpos` attribute in the `table` tag.

An example of a table is shown below:

```
<table cpos='lr'>
<thead><col>Drink   </col><col>Price</col></thead>
<row><col>Beer      </col><col> 1.80</col></row>
<row><col>Wiskey    </col><col> 3.50</col></row>
<row><col>Wine      </col><col> 2.20</col></row>
</table>

```
Drink Price
Beer 1.80
Wiskey 3.50
Wine 2.20

4 Type styles and special characters

There are six type styles:

1. `emph` : Emphasized text
2. `strong` : Usually bold face
3. `code` : `For literal coding names`
4. `remark` : For highlighted remarks
5. `sub` : For subscript text.
6. `sup` : For superscript text.

4.1 Color and size

Normally, if not redefined by additional style sheets, the text is printed in black on a white background. You can create text in other colors with the color element. Which color the output text should become is specified in the colorname attribute, for example:

```  <color colorname='red'>Red text</color>
```

It is not possible to create any color; there are just a few predefined colors available: red, green, blue, cyan, magenta, yellow, orange, violet, purple, brown, pink, olive, black, darkgray, gray, lightgray and white. The last, color (white) is probably invisible :-).

NOTE: To make colored text work with LaTeX, you probably need to install the latex-xcolor package.

Apart from the normal font size, there are two other sizes available: bigand small. These element will make the text slightly bigger or smaller, like this:

```Here is <small>some small text</small> and <big>a big phrase</big>.
```

Here is some small text and a big phrase.

4.2 Line and page breaking

You can force the start of a new line of output with the empty `newline` element. This element has no attributes and no content. Starting a new page with the `newpage` element is only usefull when you create LaTeX output.

4.3 Output escapes

You can insert eny piece of text into a specific output target by using an escape element. An escape element is an element with the name of the output format and a single `command` attribute. For example, the `LaTeX` is used to put arbitrary text into the output of `xml2latex`. The content of its `command` attribute is copied literally into the output. One of the applications of the `LaTeX` escape element is to control the first line indent of a paragraph in LaTeX:

```<LaTeX command='\setlength{\parindent}{0cm}'/>
```

Note that it is not possible to create HTML tags in this manner.

4.4 Special characters

LaTeX special characters: #, \$, %, &, ~, _, ^, \, { and }

You can include special characters, such as accented letters, Greek letter and mathematical symbols with the `&symbol;` syntax, just as in HTML. The special characters defined in the XMLDoc's DTD are listed in the following tables.

4.4.1 Special characters from the Latin-1 set

codesymbol Description
iexcl¡Inverted exclamation mark
cent¢Cent sign
pound£Pound sign
curren¤Currency sign
yen¥Yen sign
brvbar¦Broken vertical bar
sect§Section sign
uml¨Spacing diaeresis
copy©Copyright sign
ordfªFeminine oridinal indicator
laquo«Left pointing double angle quotation
not¬Not sign
shy­Soft hyphen
reg®Registered sign
macr¯Spacing macron
deg°Degree sign
plusmn±Plus-minus sign
sup2²Superscript digit 2
sup3³Superscript digit 3
acute´Spacing acute accent
microµMicro sign
paraParagraph sign
middot·Middot sign
cedil¸Spacing cedilla
sup1¹Superscript digit 1
ordmºMasculine ordinal indicator
raquo»Right pointing double angle quotation
frac14¼Vulgar fraction one quarter
frac12½Vulgar fraction one half
frac34¾Vulgar fraction three quarters
iquest¿Inverted question mark
AEligÆCapital ligature AE
aeligæSmall ligature ae
OEligŒCapital ligature OE
oeligœSmall ligature oe
szligßSmall letter sharp s
ETHÐCapital letter ETH
THORNÞCapital letter THORN
ethðSmall letter eth
thornþSmall letter thorn
times×Multiplication sign
divide÷Division sign

4.4.2 Accented letters

 code symbol Description Agrave À Capital letter A with grave Aacute Á Capital letter A with acute Acirc Â Capital letter A with circumflex Atilde Ã Capital letter A with tilde Auml Ä Capital letter A with diaeresis Aring Å Capital letter A with ring above Ccedil Ç Capital letter C with cedilla Egrave È Capital letter E with grave Eacute É Capital letter E with acute Ecirc Ê Capital letter E with circumflex Euml Ë Capital letter E with diaeresis Igrave Ì Capital letter I with grave Iacute Í Capital letter I with acute Icirc Î Capital letter I with circumflex Iuml Ï Capital letter I with diaeresis Ntilde Ñ Capital letter N with tilde Ograve Ò Capital letter O with grave Oacute Ó Capital letter O with acute Ocirc Ô Capital letter O with circumflex Otilde Õ Capital letter O with tilde Ouml Ö Capital letter O with diaeresis Oslash Ø Capital letter O with stroke Ugrave Ù Capital letter U with grave Uacute Ú Capital letter U with acute Ucirc Û Capital letter U with circumflex Uuml Ü Capital letter U with diaeresis Yacute Ý Capital letter Y with acute Yuml Ÿ Capital letter Y with diaeresis agrave à Small letter a with grave aacute á Small letter a with acute acirc â Small letter a with circumflex atilde ã Small letter a with tilde auml ä Small letter a with diaeresis aring å Small letter a with ring above ccedil ç Small letter c with cedilla egrave è Small letter e with grave eacute é Small letter e with acute ecirc ê Small letter e with circumflex euml ë Small letter e with diaeresis igrave ì Small letter i with grave iacute í Small letter i with acute icirc î Small letter i with circumflex iuml ï Small letter i with diaeresis
 code symbol Description ntilde ñ Small letter n with tilde ograve ò Small letter o with grave oacute ó Small letter o with acute ocirc ô Small letter o with circumflex otilde õ Small letter o with tilde ouml ö Small letter o with diaeresis oslash ø Capital letter o with stroke ugrave ù Small letter u with grave uacute ú Small letter u with acute ucirc û Small letter u with circumflex uuml ü Small letter u with diaeresis yacute ý Small letter y with acute yuml ÿ Small letter y with diaeresis Scaron Š Capital letter S with caron scaron š Small letter s with caron

4.4.3 Greek Letters

 code symbol Description Alpha Α Greek capital letter Alpha Beta Β Greek capital letter Beta Gamma Γ Greek capital letter Gamma Delta Δ Greek capital letter Delta Epsilon Ε Greek capital letter Epsilon Zeta Ζ Greek capital letter Zeta Eta Η Greek capital letter Eta Theta Θ Greek capital letter Theta Iota Ι Greek capital letter Iota Kappa Κ Greek capital letter Kappa Lambda Λ Greek capital letter Lambda Mu Μ Greek capital letter Mu Nu Ν Greek capital letter Nu Xi Ξ Greek capital letter Xi Omicron Ο Greek capital letter Omicron Pi Π Greek capital letter Pi Rho Ρ Greek capital letter Rho Sigma Σ Greek capital letter Sigma Tau Τ Greek capital letter Tau Upsilon Υ Greek capital letter Upsilon Phi Φ Greek capital letter Phi Chi Χ Greek capital letter Chi Psi Ψ Greek capital letter Psi Omega Ω Greek capital letter Omega alpha α Greek small letter alpha beta β Greek small letter beta gamma γ Greek small letter gamma delta δ Greek small letter delta epsilon ε Greek small letter epsilon zeta ζ Greek small letter zeta eta η Greek small letter eta theta θ Greek small letter theta iota ι Greek small letter iota kappa κ Greek small letter kappa lambda λ Greek small letter lambda mu μ Greek small letter mu nu ν Greek small letter nu xi ξ Greek small letter xi omicron ο Greek small letter omicron pi π Greek small letter pi rho ρ Greek small letter rho sigmaf ς Greek small letter final sigma sigma σ Greek small letter sigma tau τ Greek small letter tau upsilon υ Greek small letter upsilon phi φ Greek small letter phi chi χ Greek small letter chi psi ψ Greek small letter psi omega ω Greek small letter omega

4.4.4 Special Symbols

 code symbol Description euro € Euro symbol dagger † Dagger Dagger ‡ Double dagger bull • Bullet hellip … Horizontal ellipsis prime ′ Prime Prime ″ Double prime oline ‾ Overline weierp ℘ Power set image ℑ Imaginary part real ℜ Real part trade ™ Trade mark sign alefsym ℵ First transfinite cardinal larr ← Leftwards arrow uarr ↑ Upwards arrow rarr → Rightwards arrow darr ↓ Downwards arrow harr ↔ Left right arrow crarr ↵ Carriage return lArr ⇐ Leftwards double arrow uArr ⇑ Upwards double arrow rArr ⇒ Rightwards double arrow dArr ⇓ Downwards double arrow hArr ⇔ Left right double arrow loz ◊ Lozenge spades ♠ Spade suit clubs ♣ Club suit hearts ♥ Heart suit diams ♦ Diamond suit

4.4.5 Mathematical Symbols

 code symbol Description forall ∀ For all part ∂ Partial differential exist ∃ There exists empty ∅ Empty set nabla ∇ Nabla, backward difference isin ∈ Element of notin ∉ Not an element of ni ∋ Contains as member prod ∏ Product sign sum ∑ Summation sign minus − Minus sign lowast ∗ Asterisk operator radic √ Square root prop ∝ Proportional to infin ∞ Infinity ang ∠ Angle and ∧ Logical AND or ∨ Logical OR cap ∩ Intersection cup ∪ Union int ∫ Integral there4 ∴ Therefore sim ∼ Similar to cong ≅ Approximately equal to asymp ≈ Asymptotic to ne ≠ Not equal to equiv ≡ Equivalent to le ≤ Less than or equal to ge ≥ Greater than or equal to sub ⊂ Subset of sup ⊃ Superset of nsub ⊄ Not a subset of sube ⊆ Subset of or equal to supe ⊇ Superset of or equal to oplus ⊕ Direct sum otimes ⊗ Vector product perp ⊥ Perpendicular to sdot ⋅ Dot operator lceil ⌈ Left ceiling rceil ⌉ Right ceiling lfloor ⌊ Left floor rfloor ⌋ Right floor lang 〈 Left pointing angle bracket rang 〉 Right pointing angle bracket

Note that some symbols are not available in LaTeX.

5 References

```<reference xml:link="simple"
href="http://www.andromeda.nl/project/xmldoc/xmldoc.html">
The XMLDoc website
</reference>
provide the installation instructions.
```

The XMLDoc website provides the installation instructions.

```   <label name='example'/>
```

You can refer to a label from any other place in the document by using a `ref` or a `page` element. The `page` element creates a reference to the page number on which the `label` is printed. This is of course only usefull on printed media, such as LaTeX. The `ref` and `page` elements also require one attribute:

```   <ref to='example'>example reference</ref>
```

The required attribute `to` holds the name of the label to which the reference refers. The `ref` element is usually not empty. The inline content of the `ref` element is used to create the reference. For example, when the document is transformed into HTML, the content will become a clickable link. The content of the `page` element is only renedered in LaTeX output.

6 Using multiple files

The XMLDoc system allows you to use multiple files for both input and output. Well, not really. Multiple output files is not implemented yet :-)

6.1 Multiple input files

You do not need to create one single big XML file that contains all of your story. Especially when writing a large book or report, it is often convenient to origanize your document in multiple files. For example, one file for each chapter and a single 'root' document with all the header stuff, such as a title page and a table of contents which binds all chapters together. As a matter of fact, this XMLDoc guide is organized in such a way.

To include another XML file in your root document, use the standard XML XInclude system with a single attribute to specify the name of the file to be included. Other attributes, as well as the fallback element, are of course supported as well. Refer to the XML Inclusions standard for more information. For example, this chapter is in a file called "multifiles.xml", which is included in the main document with:

```<xi:include href="multifiles.xml"
xmlns:xi="http://www.w3.org/2001/XInclude"/>
```

Remember to put the xi:include in the proper namespace. Normally, the included files must be valid XML files as well. To include other kinds of files, such as raw text or source code, the attribute parse='text' must be added to the include element. When including another XML document, the included XML file starts with the usual XML declaration, but has a different root element declared in the <!DOCTYPE...> declaration:

```<?xml version="1.0"?>
<!DOCTYPE chapter SYSTEM "../doc.dtd">
<chapter>
<--  The content of this chapter -->
</chapter>
```

6.2 Multiple output files

Creating multiple output files would be a handy feature. Certainly when you create your documents for the web, it would be nice to have one title page with an index or table of contents and put each chapter or section in a separate HTML or XML file. Unfortunately, to implement this, we need the <xsl:document> element which is proposed in version 1.1 of the XSLT recommendation. We'll have to wait until this is supported by the XSLT processor; just hang in there for a while.

7 Glossary

Alphabetical list of elements:

abstract
Optional part of the title page. States in a few sentences what the document is about.
article
One of the document styles. Defines what kind of document this is.
author
Part of the title page. States by whom the document was written. There must be at least one author on a title page.
book
One of the document styles. Defines what kind of document this is.
chapter
The first level of sectioning elements.
code
A form of inline content for making `code` examples. Usually renders to a monospace font.
col
A single cell of content in a table.
date
Part of the title page. States when the document was written.
description
Creates a list of descriptive items.
doc
The single root element of all XMLDoc documents.
docinfo
Optional block of information about the document. Contains a list of one or more `infoitem` elements.
emph
A form of inline content for making emphasized text. Usually renders to some form of italic.
enumerate
Creates a list of numbered items.
example
A block-level element to hold text with a pre-determined layout. The block of text is surrounded by a border.
footnote
Creates a numbered footnote at the bottom of the page.
heading
Holds the heading of any one of the sectioning elements. This heading is printed in larger font at the top of the sectioning element and is used to create the table of contents (`toc`).
include
Includes part of the document from another file. Deprecated. Use the standard XML Inclusion method instead.
infoitem
An informational item on the title page. Holds additional information about the document, such as version number or organization.
item
One item in a list of items.
itemize
Creates a list of bulleted items.
label
Used for internal references. Marks a point in the document that can be referred to by a `ref` element. The `name` attribute sets the name that can be referred.
LaTeX
LaTeX escape element. Copies the content of the `command` attribute literally to the LaTeX output.
newline
Forces a line break in the output document.
newpage
Forces a page break in the output document. Used only for LaTeX output.
page
Used for internal references. Creates a page number to a `label` element. The name of the `label` is put in the `to` attribute. Used only for LaTeX output.
para
The primary block-level element to hold the textual content of the document.
paragraph
The fifth level of sectioning elements.
picture
Imports pictures into the document.
quote
A quotation is a slightly indented block of text.
ref
Used for internal references. Creates a reference to a `label` element. The name of the `label` is put in the `to` attribute.
reference
Used for external references. Creates a reference to an external document pointed to by the `href` attribute.
remark
A form of inline content for making a remark that stands out from the rest of the text. The final rendering is not really defined and mey depend on other stylesheets. E.g. for HTML output, this creates a `span` tag with `remark` class attribute. The way this looks is determined by a CSS stylesheet.
report
One of the document styles. Defines what kind of document this is.
row
A row of columns in a table.
section
The second level of sectioning elements.
strong
A form of inline content for making strong text. Usually renderd in bold.
sub
A form of inline content for making subscript text.
subparagraph
The sixth level of sectioning elements.
subsection
The third level of sectioning elements.
subsubsection
The fourth level of sectioning elements.
subtitle
Optionally, a document can have any number of subtitles added to the title.
sup
A form of inline content for making superscript text.
table
Creates tabular content.
thead
Optional first header row of a table.
title
Mandatory first element of the title page.
titlepage
Optional title page to hold the document's title and author, along with some other information.
toc
Automatically generated table of contents.
verbatim
A block-level element to hold text with a pre-determined layout. Mainly used for coding examples.

8 Other XML applications

8.1 MathML

${\left[\theta +b\right]}^{260}+{\left\{a+b\right\}}_{i}$

9 Things to do

1This is an example of a footnote

2described in the next chapter