XML documentation system

Document preparation and conversion in XML

Arjen Baart <arjen@andromeda.nl>

Oct 18, 2007

Document Information
Version0.6
OrganizationAndromeda Technology & Automation

Abstract:

This guide explains the concepts of XMLDoc and discusses the features available to prepare documentation. XMLDoc uses XSLT transformations to turn the XML source document into a number of other formats.

Table Of Contents

1 Introduction

1.1 XMLDoc concepts

1.2 XMLDoc in practice

2 Overall document structure

2.1 Preamble

2.2 Sectioning and Paragraphs

2.3 Linking with style sheets

3 Block-level content

3.1 Paragraphs

3.2 Footnotes

3.3 Lists

3.4 Including graphics

3.5 Tables

4 Type styles and special characters

4.1 Color and size

4.2 Line and page breaking

4.3 Output escapes

4.4 Special characters

5 References

6 Using multiple files

6.1 Multiple input files

6.2 Multiple output files

7 Glossary

8 Other XML applications

8.1 MathML

8.2 SVG

9 Things to do


1 Introduction

XMLDoc is a collection of stylesheets and utilities, resembling a documentation system like sgmltools and Linuxdoc. The objective is to prepare documentation in XML format. Other formats, like LaTeX, PostScript or HTML are then generated by transforming the XML source document into other formats. The transformation is performed by an XSLT processor and using XSL stylesheets for a specific output format.

1.1 XMLDoc concepts

Writing documents in XML is rather like writing them in HTML. The content is just plain text and the markup, or layout, is defined with tags, which are marked in angle brackets (< and >). A tag marks the beginning or end of an element, where an element is something like a chapter, a section or a phrase of emphasized text. The tags are processed by the XML parser and translated to the output format commands, as defined by the XSL stylesheet.

You can not use every tag just anywhere in your document. There is a certain amount of structure you must adhere to. This structure is defined in the Document Type Definition (DTD).

1.2 XMLDoc in practice

Using XMLDoc is fairly simple. To use XMLDoc, all you need is a plain text editor, such as vim or emacs and a working installation of the XMLDoc utilities and stylesheets. The utilities constitute a special form of XSLT processor and a few shell scripts that invoke this XSLT processor with the proper style sheets.

Since there are three XSLT sheets, there are also three shell scripts: xml2html, xml2latex and xml2text. These scripts transform the XMLDoc document into HTML, LaTeX and plain text output, respcetively. All you need to do is create your XML source document and use one of these scripts to transform the document an another format. For example, to transform this guide into a LaTeX document, you could use:

   xml2latex guide.xml >guide.tex

Note that the transformed document is written to standard output.

2 Overall document structure

All XML-doc documents are set up in a similar way. Just as every other XML file, a doument starts with the XML and DOCTYPE declarations:


<?xml version="1.0"?>
<!DOCTYPE doc SYSTEM "doc.dtd">

These declarations are followed by the doc element. The doc element contains the entire document. The next element within the doc element defines the type of document. This can be a book, article or a report. Structurally, there is no real difference between either type of document but in LaTeX, they result in different layout styles. Each type of document contains any number of chapter elements. The example shows a minimal XML-doc document:


<?xml version="1.0"?>
<!DOCTYPE doc SYSTEM "doc.dtd">

<doc>

   <book>

      <chapter>
         <heading>The only chapter</heading>

      </chapter>

   </book>

</doc>

2.1 Preamble

The document preamble is everything that comes before the real page of text. Typical elements of the preamble are the title page and the table of contents. You can create a title page with the titlepage tag. As the name suggests, the title page holds at least the title of the document. Other elements in the title page are the author(s), the date and the abstract. Here is an example of a title page:

<titlepage>
   <title>XML documentation system</title>
   <author>Arjen Baart  <arjen@andromeda.nl></author>
   <date>April 8, 2002</date>
   <abstract>
   This guide explains the concepts of <strong>XMLDoc</strong> and discusses
   the features available to prepare documentation.
   <strong>XMLDoc</strong> uses XSLT transformations to turn the XML source
   document into a number of other formats.
   </abstract>
</titlepage>

The titlepage must have a title and at least one author element. The date and abstract elements are optional, but if you use them, you must put them in this order.

The table of contents is particularly easy to create. All it takes is an empty toc element:

   <toc/>

The table of contents is optional, but if you use it, it must come between the title page and the first chapter. Note for LaTeX: to actually make the table of contents, you need to run latex twice. LaTeX does not do this in a single pass.

2.2 Sectioning and Paragraphs

After the opening tag of the first chapter, the document really begins. The structure of the document is layed out by its chapters, sections within the chapters, subsections within the sections and so on. Just as in LaTeX and HTML, there are six levels of sectioning elements available:

  1. chapter: For top-level chapters.
  2. section: For second-level sections, i.e. 1.1, 1.2, 1.3 and so on.
  3. subsection: For third-level sections, i.e. 1.1.1, 1.1.2 and so on.
  4. subsubsection: For fourth-level sections (you get the idea).
  5. paragraph: The fifth-level sections
  6. subparagraph: The sixth and final level.

Note that the names are equivalent to their counterparts in LaTeX.

Just as in LaTeX, the article document type does not have a chapter element. The top-level sectioning element for an article is a section.

After the open tag of a sectioning element (chapter, section, subsection or subsubsection, etc.) comes the heading element, followed by the block level content as discussed in chapter .

2.3 Linking with style sheets

You can use a 'normal' CSS stylesheet with the HTML output from xml2html, by using the style attribute in the doc element. Here is an example:


<doc style="main.css">

3 Block-level content

The actual content of your document is organized in block-level elements, such a paragraphs, lists or tables.

3.1 Paragraphs

The most basic type of content block is an ordinary paragraph, contained in a para element. To make several separate paragraphs, you must enclose each paragraph in a para open tag and a para close tag. Here is an example of two small paragraphs:


<para>
  This is an example of a small paragraph.
</para>
<para>
  And here is another paragraph.
</para>

A second type of paragraph is a quote. You can make a quote by using the quote element:

   <quote>
   This is an example of a quote.
   The text within a quoted paragraph is usually slightly indented on both
   the left and the right margin.
   </quote>

Which results in:

This is an example of a quote. The text within a quoted paragraph is usually slightly indented on both the left and the right margin.

A special kind of paragraph is the verbatim environment. Just as in LaTeX, this is used to include literal text output with spaces, indentation and line breaks preserved. The practical use for the verbatim element is to include coding examples, such as:

   <verbatim>
      struct complex
      {
         double   real;
         double   imaginary;
      };
   </verbatim>

Which comes out like this:

   struct complex
   {
      double   real;
      double   imaginary;
   };

A variation on the verbatim text is the example text. The only real difference is that example is placed inside a box to make it stand out a bit more. In fact, when converted to XHTML, only an attribute class='example' is added. It is up to the CSS linked to the XHTML page to add additional layout features. The default styling will only add a border. Here is the above example shown in an actual example element:

   <example>
      struct complex
      {
         double   real;
         double   imaginary;
      };
   </example>

Which comes out like this:

   struct complex
   {
      double   real;
      double   imaginary;
   };

3.2 Footnotes

Footnotes are created with the footnote element: 1

<footnote>This is an example of a footnote</footnote>

Within a footnote, you can use inline content 2 to format the type styles of the text in the footnote. It is not possible to use the block content described in this chapter within a footnote.

Footnotes appear at the bottom of the page, with a small number in the running text referring to that footnote.

3.3 Lists

Three types of lists are supported:

Each item in such a list must be in an item element. In fact, an item is the only element allowed in an itemize, enumerate or description element. You should not put ordinary text or any other element in a list without enclosing them in <item> and </item>. Here is an example of a numbered list:


<enumerate>
   <item>First you need an enumerate or itemize tag.</item>
   <item>Second, include one or more item elements.</item>
   <item>Finally, put the content inside the items.</item>
</enumerate>

And this is what the list turns into:

  1. First you need an enumerate, itemize or description tag.
  2. Second, include one or more item elements.
  3. Finally, put the content inside the items.

In a description list, you make your own tags for each item instead of the automatically generated bullts or numbers. The tags for each item go in the tag attribute of the item element. So, repeating the above list as a description list:

<description>
   <item tag='itemize'> for bulleted lists such as this one.</item>
   <item tag='enumerate'> for numbered lists.</item>
   <item tag='description'> for tagged lists.</item>
</description>

Which creates the following output:

itemize
for bulleted lists.
enumerate
for numbered lists.
description
for tagged lists such as this one.

An item can contain inline content as well as block-level content.

3.4 Including graphics

The empty element picture is used to include graphics in your document, like this:

   <picture src='diagram.png' eps='diagram' scale='0.5'/>

The two attributes are used in either HTML or LaTeX.

3.5 Tables

Creating tables in XMLDoc is much like creating tables in HTML. First, there is the table element. The table element may contain an optional thead and any number of row elements. Both the thead and the row elements must contain one or more col elements. The col elements hold the actual content of the table, which must be inline content (see next chapter) or block content. To use the tables in LaTeX, you must supply a cpos attribute in the table tag.

An example of a table is shown below:


<table cpos='lr'>
  <thead><col>Drink   </col><col>Price</col></thead>
  <row><col>Beer      </col><col> 1.80</col></row>
  <row><col>Wiskey    </col><col> 3.50</col></row>
  <row><col>Wine      </col><col> 2.20</col></row>
</table>

Drink Price
Beer 1.80
Wiskey 3.50
Wine 2.20

4 Type styles and special characters

There are six type styles:

  1. emph : Emphasized text
  2. strong : Usually bold face
  3. code : For literal coding names
  4. remark : For highlighted remarks
  5. sub : For subscript text.
  6. sup : For superscript text.

4.1 Color and size

Normally, if not redefined by additional style sheets, the text is printed in black on a white background. You can create text in other colors with the color element. Which color the output text should become is specified in the colorname attribute, for example:

  <color colorname='red'>Red text</color>

It is not possible to create any color; there are just a few predefined colors available: red, green, blue, cyan, magenta, yellow, orange, violet, purple, brown, pink, olive, black, darkgray, gray, lightgray and white. The last, color (white) is probably invisible :-).

NOTE: To make colored text work with LaTeX, you probably need to install the latex-xcolor package.

Apart from the normal font size, there are two other sizes available: bigand small. These element will make the text slightly bigger or smaller, like this:

Here is <small>some small text</small> and <big>a big phrase</big>.

Here is some small text and a big phrase.

4.2 Line and page breaking

You can force the start of a new line of output with the empty newline element. This element has no attributes and no content. Starting a new page with the newpage element is only usefull when you create LaTeX output.

4.3 Output escapes

You can insert eny piece of text into a specific output target by using an escape element. An escape element is an element with the name of the output format and a single command attribute. For example, the LaTeX is used to put arbitrary text into the output of xml2latex. The content of its command attribute is copied literally into the output. One of the applications of the LaTeX escape element is to control the first line indent of a paragraph in LaTeX:

<LaTeX command='\setlength{\parindent}{0cm}'/>

Note that it is not possible to create HTML tags in this manner.

4.4 Special characters

LaTeX special characters: #, $, %, &, ~, _, ^, \, { and }

You can include special characters, such as accented letters, Greek letter and mathematical symbols with the &symbol; syntax, just as in HTML. The special characters defined in the XMLDoc's DTD are listed in the following tables.

4.4.1 Special characters from the Latin-1 set

codesymbol Description
iexclInverted exclamation mark
centCent sign
poundPound sign
currenCurrency sign
yenYen sign
brvbarBroken vertical bar
sectSection sign
umlSpacing diaeresis
copyCopyright sign
ordfFeminine oridinal indicator
laquoLeft pointing double angle quotation
notNot sign
shySoft hyphen
regRegistered sign
macrSpacing macron
degDegree sign
plusmnPlus-minus sign
sup2Superscript digit 2
sup3Superscript digit 3
acuteSpacing acute accent
microMicro sign
paraParagraph sign
middotMiddot sign
cedilSpacing cedilla
sup1Superscript digit 1
ordmMasculine ordinal indicator
raquoRight pointing double angle quotation
frac14Vulgar fraction one quarter
frac12Vulgar fraction one half
frac34Vulgar fraction three quarters
iquestInverted question mark
AEligCapital ligature AE
aeligSmall ligature ae
OEligŒCapital ligature OE
oeligœSmall ligature oe
szligSmall letter sharp s
ETHCapital letter ETH
THORNCapital letter THORN
ethSmall letter eth
thornSmall letter thorn
timesMultiplication sign
divideDivision sign

4.4.2 Accented letters

codesymbol Description
AgraveCapital letter A with grave
AacuteCapital letter A with acute
AcircCapital letter A with circumflex
AtildeCapital letter A with tilde
AumlCapital letter A with diaeresis
AringCapital letter A with ring above
CcedilCapital letter C with cedilla
EgraveCapital letter E with grave
EacuteCapital letter E with acute
EcircCapital letter E with circumflex
EumlCapital letter E with diaeresis
IgraveCapital letter I with grave
IacuteCapital letter I with acute
IcircCapital letter I with circumflex
IumlCapital letter I with diaeresis
NtildeCapital letter N with tilde
OgraveCapital letter O with grave
OacuteCapital letter O with acute
OcircCapital letter O with circumflex
OtildeCapital letter O with tilde
OumlCapital letter O with diaeresis
OslashCapital letter O with stroke
UgraveCapital letter U with grave
UacuteCapital letter U with acute
UcircCapital letter U with circumflex
UumlCapital letter U with diaeresis
YacuteCapital letter Y with acute
YumlŸCapital letter Y with diaeresis
agraveSmall letter a with grave
aacuteSmall letter a with acute
acircSmall letter a with circumflex
atildeSmall letter a with tilde
aumlSmall letter a with diaeresis
aringSmall letter a with ring above
ccedilSmall letter c with cedilla
egraveSmall letter e with grave
eacuteSmall letter e with acute
ecircSmall letter e with circumflex
eumlSmall letter e with diaeresis
igraveSmall letter i with grave
iacuteSmall letter i with acute
icircSmall letter i with circumflex
iumlSmall letter i with diaeresis
codesymbol Description
ntildeSmall letter n with tilde
ograveSmall letter o with grave
oacuteSmall letter o with acute
ocircSmall letter o with circumflex
otildeSmall letter o with tilde
oumlSmall letter o with diaeresis
oslashCapital letter o with stroke
ugraveSmall letter u with grave
uacuteSmall letter u with acute
ucircSmall letter u with circumflex
uumlSmall letter u with diaeresis
yacuteSmall letter y with acute
yumlSmall letter y with diaeresis
ScaronŠCapital letter S with caron
scaronšSmall letter s with caron

4.4.3 Greek Letters

codesymbol Description
AlphaΑGreek capital letter Alpha
BetaΒGreek capital letter Beta
GammaΓGreek capital letter Gamma
DeltaΔGreek capital letter Delta
EpsilonΕGreek capital letter Epsilon
ZetaΖGreek capital letter Zeta
EtaΗGreek capital letter Eta
ThetaΘGreek capital letter Theta
IotaΙGreek capital letter Iota
KappaΚGreek capital letter Kappa
LambdaΛGreek capital letter Lambda
MuΜGreek capital letter Mu
NuΝGreek capital letter Nu
XiΞGreek capital letter Xi
OmicronΟGreek capital letter Omicron
PiΠGreek capital letter Pi
RhoΡGreek capital letter Rho
SigmaΣGreek capital letter Sigma
TauΤGreek capital letter Tau
UpsilonΥGreek capital letter Upsilon
PhiΦGreek capital letter Phi
ChiΧGreek capital letter Chi
PsiΨGreek capital letter Psi
OmegaΩGreek capital letter Omega
alphaαGreek small letter alpha
betaβGreek small letter beta
gammaγGreek small letter gamma
deltaδGreek small letter delta
epsilonεGreek small letter epsilon
zetaζGreek small letter zeta
etaηGreek small letter eta
thetaθGreek small letter theta
iotaιGreek small letter iota
kappaκGreek small letter kappa
lambdaλGreek small letter lambda
muμGreek small letter mu
nuνGreek small letter nu
xiξGreek small letter xi
omicronοGreek small letter omicron
piπGreek small letter pi
rhoρGreek small letter rho
sigmafςGreek small letter final sigma
sigmaσGreek small letter sigma
tauτGreek small letter tau
upsilonυGreek small letter upsilon
phiφGreek small letter phi
chiχGreek small letter chi
psiψGreek small letter psi
omegaωGreek small letter omega

4.4.4 Special Symbols

codesymbol Description
euroEuro symbol
daggerDagger
DaggerDouble dagger
bullBullet
hellipHorizontal ellipsis
primePrime
PrimeDouble prime
olineOverline
weierpPower set
imageImaginary part
realReal part
tradeTrade mark sign
alefsymFirst transfinite cardinal
larrLeftwards arrow
uarrUpwards arrow
rarrRightwards arrow
darrDownwards arrow
harrLeft right arrow
crarrCarriage return
lArrLeftwards double arrow
uArrUpwards double arrow
rArrRightwards double arrow
dArrDownwards double arrow
hArrLeft right double arrow
lozLozenge
spadesSpade suit
clubsClub suit
heartsHeart suit
diamsDiamond suit

4.4.5 Mathematical Symbols

codesymbol Description
forallFor all
partPartial differential
existThere exists
emptyEmpty set
nablaNabla, backward difference
isinElement of
notinNot an element of
niContains as member
prodProduct sign
sumSummation sign
minusMinus sign
lowastAsterisk operator
radicSquare root
propProportional to
infinInfinity
angAngle
andLogical AND
orLogical OR
capIntersection
cupUnion
intIntegral
there4Therefore
simSimilar to
congApproximately equal to
asympAsymptotic to
neNot equal to
equivEquivalent to
leLess than or equal to
geGreater than or equal to
subSubset of
supSuperset of
nsubNot a subset of
subeSubset of or equal to
supeSuperset of or equal to
oplusDirect sum
otimesVector product
perpPerpendicular to
sdotDot operator
lceilLeft ceiling
rceilRight ceiling
lfloorLeft floor
rfloorRight floor
langLeft pointing angle bracket
rangRight pointing angle bracket

Note that some symbols are not available in LaTeX.

5 References

Creating hypertext references is as simple as it is in HTML. The element used for references is reference, which adheres to the XLink syntax. We need to add one more attribute to the usual href attribute. For example:

<reference xml:link="simple"
    href="http://www.andromeda.nl/project/xmldoc/xmldoc.html">
The XMLDoc website
</reference>
provide the installation instructions.

The XMLDoc website provides the installation instructions.

An internal reference requires at least two elements. The reference itself, which points to another place in the document and a label to which the reference refers. There can of course be multiple references to a single label. The point in the document where reference can point to is marked by a label element. A label is an empty element with exactly one attribute, the name of the label. Each label must have a name that is unique throughout the document. Here is an example of a label:

   <label name='example'/>

You can refer to a label from any other place in the document by using a ref or a page element. The page element creates a reference to the page number on which the label is printed. This is of course only usefull on printed media, such as LaTeX. The ref and page elements also require one attribute:

   <ref to='example'>example reference</ref>

The required attribute to holds the name of the label to which the reference refers. The ref element is usually not empty. The inline content of the ref element is used to create the reference. For example, when the document is transformed into HTML, the content will become a clickable link. The content of the page element is only renedered in LaTeX output.

6 Using multiple files

The XMLDoc system allows you to use multiple files for both input and output. Well, not really. Multiple output files is not implemented yet :-)

6.1 Multiple input files

You do not need to create one single big XML file that contains all of your story. Especially when writing a large book or report, it is often convenient to origanize your document in multiple files. For example, one file for each chapter and a single 'root' document with all the header stuff, such as a title page and a table of contents which binds all chapters together. As a matter of fact, this XMLDoc guide is organized in such a way.

To include another XML file in your root document, use the standard XML XInclude system with a single attribute to specify the name of the file to be included. Other attributes, as well as the fallback element, are of course supported as well. Refer to the XML Inclusions standard for more information. For example, this chapter is in a file called "multifiles.xml", which is included in the main document with:

<xi:include href="multifiles.xml"
    xmlns:xi="http://www.w3.org/2001/XInclude"/>

Remember to put the xi:include in the proper namespace. Normally, the included files must be valid XML files as well. To include other kinds of files, such as raw text or source code, the attribute parse='text' must be added to the include element. When including another XML document, the included XML file starts with the usual XML declaration, but has a different root element declared in the <!DOCTYPE...> declaration:

<?xml version="1.0"?>
<!DOCTYPE chapter SYSTEM "../doc.dtd">
<chapter>
<--  The content of this chapter -->
</chapter>

6.2 Multiple output files

Creating multiple output files would be a handy feature. Certainly when you create your documents for the web, it would be nice to have one title page with an index or table of contents and put each chapter or section in a separate HTML or XML file. Unfortunately, to implement this, we need the <xsl:document> element which is proposed in version 1.1 of the XSLT recommendation. We'll have to wait until this is supported by the XSLT processor; just hang in there for a while.

7 Glossary

Alphabetical list of elements:

abstract
Optional part of the title page. States in a few sentences what the document is about.
article
One of the document styles. Defines what kind of document this is.
author
Part of the title page. States by whom the document was written. There must be at least one author on a title page.
book
One of the document styles. Defines what kind of document this is.
chapter
The first level of sectioning elements.
code
A form of inline content for making code examples. Usually renders to a monospace font.
col
A single cell of content in a table.
date
Part of the title page. States when the document was written.
description
Creates a list of descriptive items.
doc
The single root element of all XMLDoc documents.
docinfo
Optional block of information about the document. Contains a list of one or more infoitem elements.
emph
A form of inline content for making emphasized text. Usually renders to some form of italic.
enumerate
Creates a list of numbered items.
example
A block-level element to hold text with a pre-determined layout. The block of text is surrounded by a border.
footnote
Creates a numbered footnote at the bottom of the page.
heading
Holds the heading of any one of the sectioning elements. This heading is printed in larger font at the top of the sectioning element and is used to create the table of contents (toc).
include
Includes part of the document from another file. Deprecated. Use the standard XML Inclusion method instead.
infoitem
An informational item on the title page. Holds additional information about the document, such as version number or organization.
item
One item in a list of items.
itemize
Creates a list of bulleted items.
label
Used for internal references. Marks a point in the document that can be referred to by a ref element. The name attribute sets the name that can be referred.
LaTeX
LaTeX escape element. Copies the content of the command attribute literally to the LaTeX output.
newline
Forces a line break in the output document.
newpage
Forces a page break in the output document. Used only for LaTeX output.
page
Used for internal references. Creates a page number to a label element. The name of the label is put in the to attribute. Used only for LaTeX output.
para
The primary block-level element to hold the textual content of the document.
paragraph
The fifth level of sectioning elements.
picture
Imports pictures into the document.
quote
A quotation is a slightly indented block of text.
ref
Used for internal references. Creates a reference to a label element. The name of the label is put in the to attribute.
reference
Used for external references. Creates a reference to an external document pointed to by the href attribute.
remark
A form of inline content for making a remark that stands out from the rest of the text. The final rendering is not really defined and mey depend on other stylesheets. E.g. for HTML output, this creates a span tag with remark class attribute. The way this looks is determined by a CSS stylesheet.
report
One of the document styles. Defines what kind of document this is.
row
A row of columns in a table.
section
The second level of sectioning elements.
strong
A form of inline content for making strong text. Usually renderd in bold.
sub
A form of inline content for making subscript text.
subparagraph
The sixth level of sectioning elements.
subsection
The third level of sectioning elements.
subsubsection
The fourth level of sectioning elements.
subtitle
Optionally, a document can have any number of subtitles added to the title.
sup
A form of inline content for making superscript text.
table
Creates tabular content.
thead
Optional first header row of a table.
title
Mandatory first element of the title page.
titlepage
Optional title page to hold the document's title and author, along with some other information.
toc
Automatically generated table of contents.
verbatim
A block-level element to hold text with a pre-determined layout. Mainly used for coding examples.

8 Other XML applications

8.1 MathML

θ + b 260 + a + b i

8.2 SVG

9 Things to do


1This is an example of a footnote

2described in the next chapter