<arjen@andromeda.nl>
Oct 18, 2007
Document Information | |
---|---|
Version | 0.6 |
Organization | Andromeda Technology & Automation |
Abstract:
This guide explains the concepts of XMLDoc and discusses the features available to prepare documentation. XMLDoc uses XSLT transformations to turn the XML source document into a number of other formats.
XMLDoc is a collection of stylesheets and utilities, resembling a documentation system like sgmltools and Linuxdoc. The objective is to prepare documentation in XML format. Other formats, like LaTeX, PostScript or HTML are then generated by transforming the XML source document into other formats. The transformation is performed by an XSLT processor and using XSL stylesheets for a specific output format.
Writing documents in XML is rather like writing them in HTML. The content is just plain text and the markup, or layout, is defined with tags, which are marked in angle brackets (< and >). A tag marks the beginning or end of an element, where an element is something like a chapter, a section or a phrase of emphasized text. The tags are processed by the XML parser and translated to the output format commands, as defined by the XSL stylesheet.
You can not use every tag just anywhere in your document. There is a certain amount of structure you must adhere to. This structure is defined in the Document Type Definition (DTD).
Using XMLDoc is fairly simple. To use XMLDoc, all you need is a plain text editor, such as vim or emacs and a working installation of the XMLDoc utilities and stylesheets. The utilities constitute a special form of XSLT processor and a few shell scripts that invoke this XSLT processor with the proper style sheets.
Since there are three XSLT sheets, there are also three shell scripts:
xml2html
, xml2latex
and xml2text
.
These scripts transform the XMLDoc document into HTML, LaTeX and plain
text output, respcetively.
All you need to do is create your XML source document and use one of these
scripts to transform the document an another format.
For example, to transform this guide into a LaTeX document, you could
use:
xml2latex guide.xml >guide.tex
Note that the transformed document is written to standard output.
All XML-doc documents are set up in a similar way. Just as every other XML file, a doument starts with the XML and DOCTYPE declarations:
<?xml version="1.0"?> <!DOCTYPE doc SYSTEM "doc.dtd">
These declarations are followed by the doc element. The doc element contains the entire document. The next element within the doc element defines the type of document. This can be a book, article or a report. Structurally, there is no real difference between either type of document but in LaTeX, they result in different layout styles. Each type of document contains any number of chapter elements. The example shows a minimal XML-doc document:
<?xml version="1.0"?> <!DOCTYPE doc SYSTEM "doc.dtd"> <doc> <book> <chapter> <heading>The only chapter</heading> </chapter> </book> </doc>
The document preamble is everything that comes before the real page of text. Typical elements of the preamble are the title page and the table of contents. You can create a title page with the titlepage tag. As the name suggests, the title page holds at least the title of the document. Other elements in the title page are the author(s), the date and the abstract. Here is an example of a title page:
<titlepage> <title>XML documentation system</title> <author>Arjen Baart <arjen@andromeda.nl></author> <date>April 8, 2002</date> <abstract> This guide explains the concepts of <strong>XMLDoc</strong> and discusses the features available to prepare documentation. <strong>XMLDoc</strong> uses XSLT transformations to turn the XML source document into a number of other formats. </abstract> </titlepage>
The titlepage must have a title and at least one author element. The date and abstract elements are optional, but if you use them, you must put them in this order.
The table of contents is particularly easy to create. All it takes is an empty toc element:
<toc/>
The table of contents is optional, but if you use it, it must come between the title page and the first chapter. Note for LaTeX: to actually make the table of contents, you need to run latex twice. LaTeX does not do this in a single pass.
After the opening tag of the first chapter, the document really begins. The structure of the document is layed out by its chapters, sections within the chapters, subsections within the sections and so on. Just as in LaTeX and HTML, there are six levels of sectioning elements available:
chapter
: For top-level chapters.section
: For second-level sections, i.e. 1.1, 1.2, 1.3
and so on.subsection
: For third-level sections, i.e. 1.1.1, 1.1.2
and so on.subsubsection
: For fourth-level sections
(you get the idea).paragraph
: The fifth-level sectionssubparagraph
: The sixth and final level.Note that the names are equivalent to their counterparts in LaTeX.
Just as in LaTeX, the article
document type does not have
a chapter
element.
The top-level sectioning element for an article
is a
section
.
After the open tag of a sectioning element (chapter
,
section
, subsection
or subsubsection
, etc.)
comes the heading element, followed by the block level content
as discussed in chapter .
You can use a 'normal' CSS stylesheet with the HTML output from xml2html, by using the style attribute in the doc element. Here is an example:
<doc style="main.css">
The actual content of your document is organized in block-level elements, such a paragraphs, lists or tables.
The most basic type of content block is an ordinary paragraph, contained in a para element. To make several separate paragraphs, you must enclose each paragraph in a para open tag and a para close tag. Here is an example of two small paragraphs:
<para> This is an example of a small paragraph. </para> <para> And here is another paragraph. </para>
A second type of paragraph is a quote.
You can make a quote by using the quote
element:
<quote> This is an example of a quote. The text within a quoted paragraph is usually slightly indented on both the left and the right margin. </quote>
Which results in:
This is an example of a quote. The text within a quoted paragraph is usually slightly indented on both the left and the right margin.
A special kind of paragraph is the verbatim environment. Just as in LaTeX, this is used to include literal text output with spaces, indentation and line breaks preserved. The practical use for the verbatim element is to include coding examples, such as:
<verbatim> struct complex { double real; double imaginary; }; </verbatim>
Which comes out like this:
struct complex { double real; double imaginary; };
A variation on the verbatim text is the example
text.
The only real difference is that example is placed inside
a box to make it stand out a bit more.
In fact, when converted to XHTML, only an attribute class='example'
is added.
It is up to the CSS linked to the XHTML page to add additional layout features.
The default styling will only add a border.
Here is the above example shown in an actual example element:
<example> struct complex { double real; double imaginary; }; </example>
Which comes out like this:
struct complex { double real; double imaginary; };
Footnotes are created with the footnote element: 1
<footnote>This is an example of a footnote</footnote>
Within a footnote, you can use inline content 2 to format the type styles of the text in the footnote. It is not possible to use the block content described in this chapter within a footnote.
Footnotes appear at the bottom of the page, with a small number in the running text referring to that footnote.
Three types of lists are supported:
itemize
for bulleted lists such as this one.enumerate
for numbered lists.description
for tagged lists.
Each item in such a list must be in an item
element.
In fact, an item
is the only element allowed in an
itemize
, enumerate
or description
element.
You should not put ordinary text or any other element in a list without
enclosing them in <item>
and </item>
.
Here is an example of a numbered list:
<enumerate> <item>First you need an enumerate or itemize tag.</item> <item>Second, include one or more item elements.</item> <item>Finally, put the content inside the items.</item> </enumerate>
And this is what the list turns into:
In a description list, you make your own tags for each item instead
of the automatically generated bullts or numbers.
The tags for each item go in the tag
attribute of the
item
element.
So, repeating the above list as a description list:
<description> <item tag='itemize'> for bulleted lists such as this one.</item> <item tag='enumerate'> for numbered lists.</item> <item tag='description'> for tagged lists.</item> </description>
Which creates the following output:
An item can contain inline content as well as block-level content.
The empty element picture is used to include graphics in your document, like this:
<picture src='diagram.png' eps='diagram' scale='0.5'/>
The two attributes are used in either HTML or LaTeX.
Creating tables in XMLDoc is much like creating tables in HTML.
First, there is the table
element.
The table
element may contain an optional thead
and any number of row
elements.
Both the thead
and the row
elements must contain
one or more col
elements.
The col
elements hold the actual content of
the table, which must be inline content (see next chapter) or block content.
To use the tables in LaTeX, you must supply a cpos
attribute in the table
tag.
An example of a table is shown below:
<table cpos='lr'> <thead><col>Drink </col><col>Price</col></thead> <row><col>Beer </col><col> 1.80</col></row> <row><col>Wiskey </col><col> 3.50</col></row> <row><col>Wine </col><col> 2.20</col></row> </table>
Drink | Price |
---|---|
Beer | 1.80 |
Wiskey | 3.50 |
Wine | 2.20 |
There are six type styles:
emph
: Emphasized textstrong
: Usually bold facecode
: For literal coding names
remark
: For highlighted remarkssub
: For subscript text.sup
: For superscript text.Normally, if not redefined by additional style sheets, the text is printed in black on a white background. You can create text in other colors with the color element. Which color the output text should become is specified in the colorname attribute, for example:
<color colorname='red'>Red text</color>
It is not possible to create any color; there are just a few predefined colors available: red, green, blue, cyan, magenta, yellow, orange, violet, purple, brown, pink, olive, black, darkgray, gray, lightgray and white. The last, color (white) is probably invisible :-).
NOTE: To make colored text work with LaTeX, you probably need to install the latex-xcolor package.
Apart from the normal font size, there are two other sizes available: bigand small. These element will make the text slightly bigger or smaller, like this:
Here is <small>some small text</small> and <big>a big phrase</big>.
Here is some small text and a big phrase.
You can force the start of a new line of output with the empty
newline
element.
This element has no attributes and no content.
Starting a new page with the newpage
element is only
usefull when you create LaTeX output.
You can insert eny piece of text into a specific output target by using
an escape element. An escape element is an element with the name of the
output format and a single command
attribute.
For example, the LaTeX
is used to put arbitrary text into
the output of xml2latex
.
The content of its command
attribute is copied literally
into the output.
One of the applications of the LaTeX
escape element is to
control the first line indent of a paragraph in LaTeX:
<LaTeX command='\setlength{\parindent}{0cm}'/>
Note that it is not possible to create HTML tags in this manner.
LaTeX special characters: #, $, %, &, ~, _, ^, \, { and }
You can include special characters, such as accented letters, Greek letter and
mathematical symbols with the &symbol;
syntax,
just as in HTML.
The special characters defined in the XMLDoc's DTD are listed in the following tables.
code | symbol | Description |
---|---|---|
iexcl | ¡ | Inverted exclamation mark |
cent | ¢ | Cent sign |
pound | £ | Pound sign |
curren | ¤ | Currency sign |
yen | ¥ | Yen sign |
brvbar | ¦ | Broken vertical bar |
sect | § | Section sign |
uml | ¨ | Spacing diaeresis |
copy | © | Copyright sign |
ordf | ª | Feminine oridinal indicator |
laquo | « | Left pointing double angle quotation |
not | ¬ | Not sign |
shy | | Soft hyphen |
reg | ® | Registered sign |
macr | ¯ | Spacing macron |
deg | ° | Degree sign |
plusmn | ± | Plus-minus sign |
sup2 | ² | Superscript digit 2 |
sup3 | ³ | Superscript digit 3 |
acute | ´ | Spacing acute accent |
micro | µ | Micro sign |
para | ¶ | Paragraph sign |
middot | · | Middot sign |
cedil | ¸ | Spacing cedilla |
sup1 | ¹ | Superscript digit 1 |
ordm | º | Masculine ordinal indicator |
raquo | » | Right pointing double angle quotation |
frac14 | ¼ | Vulgar fraction one quarter |
frac12 | ½ | Vulgar fraction one half |
frac34 | ¾ | Vulgar fraction three quarters |
iquest | ¿ | Inverted question mark |
AElig | Æ | Capital ligature AE |
aelig | æ | Small ligature ae |
OElig | Œ | Capital ligature OE |
oelig | œ | Small ligature oe |
szlig | ß | Small letter sharp s |
ETH | Ð | Capital letter ETH |
THORN | Þ | Capital letter THORN |
eth | ð | Small letter eth |
thorn | þ | Small letter thorn |
times | × | Multiplication sign |
divide | ÷ | Division sign |
code | symbol | Description |
Agrave | À | Capital letter A with grave |
Aacute | Á | Capital letter A with acute |
Acirc | Â | Capital letter A with circumflex |
Atilde | Ã | Capital letter A with tilde |
Auml | Ä | Capital letter A with diaeresis |
Aring | Å | Capital letter A with ring above |
Ccedil | Ç | Capital letter C with cedilla |
Egrave | È | Capital letter E with grave |
Eacute | É | Capital letter E with acute |
Ecirc | Ê | Capital letter E with circumflex |
Euml | Ë | Capital letter E with diaeresis |
Igrave | Ì | Capital letter I with grave |
Iacute | Í | Capital letter I with acute |
Icirc | Î | Capital letter I with circumflex |
Iuml | Ï | Capital letter I with diaeresis |
Ntilde | Ñ | Capital letter N with tilde |
Ograve | Ò | Capital letter O with grave |
Oacute | Ó | Capital letter O with acute |
Ocirc | Ô | Capital letter O with circumflex |
Otilde | Õ | Capital letter O with tilde |
Ouml | Ö | Capital letter O with diaeresis |
Oslash | Ø | Capital letter O with stroke |
Ugrave | Ù | Capital letter U with grave |
Uacute | Ú | Capital letter U with acute |
Ucirc | Û | Capital letter U with circumflex |
Uuml | Ü | Capital letter U with diaeresis |
Yacute | Ý | Capital letter Y with acute |
Yuml | Ÿ | Capital letter Y with diaeresis |
agrave | à | Small letter a with grave |
aacute | á | Small letter a with acute |
acirc | â | Small letter a with circumflex |
atilde | ã | Small letter a with tilde |
auml | ä | Small letter a with diaeresis |
aring | å | Small letter a with ring above |
ccedil | ç | Small letter c with cedilla |
egrave | è | Small letter e with grave |
eacute | é | Small letter e with acute |
ecirc | ê | Small letter e with circumflex |
euml | ë | Small letter e with diaeresis |
igrave | ì | Small letter i with grave |
iacute | í | Small letter i with acute |
icirc | î | Small letter i with circumflex |
iuml | ï | Small letter i with diaeresis |
code | symbol | Description |
ntilde | ñ | Small letter n with tilde |
ograve | ò | Small letter o with grave |
oacute | ó | Small letter o with acute |
ocirc | ô | Small letter o with circumflex |
otilde | õ | Small letter o with tilde |
ouml | ö | Small letter o with diaeresis |
oslash | ø | Capital letter o with stroke |
ugrave | ù | Small letter u with grave |
uacute | ú | Small letter u with acute |
ucirc | û | Small letter u with circumflex |
uuml | ü | Small letter u with diaeresis |
yacute | ý | Small letter y with acute |
yuml | ÿ | Small letter y with diaeresis |
Scaron | Š | Capital letter S with caron |
scaron | š | Small letter s with caron |
code | symbol | Description |
Alpha | Α | Greek capital letter Alpha |
Beta | Β | Greek capital letter Beta |
Gamma | Γ | Greek capital letter Gamma |
Delta | Δ | Greek capital letter Delta |
Epsilon | Ε | Greek capital letter Epsilon |
Zeta | Ζ | Greek capital letter Zeta |
Eta | Η | Greek capital letter Eta |
Theta | Θ | Greek capital letter Theta |
Iota | Ι | Greek capital letter Iota |
Kappa | Κ | Greek capital letter Kappa |
Lambda | Λ | Greek capital letter Lambda |
Mu | Μ | Greek capital letter Mu |
Nu | Ν | Greek capital letter Nu |
Xi | Ξ | Greek capital letter Xi |
Omicron | Ο | Greek capital letter Omicron |
Pi | Π | Greek capital letter Pi |
Rho | Ρ | Greek capital letter Rho |
Sigma | Σ | Greek capital letter Sigma |
Tau | Τ | Greek capital letter Tau |
Upsilon | Υ | Greek capital letter Upsilon |
Phi | Φ | Greek capital letter Phi |
Chi | Χ | Greek capital letter Chi |
Psi | Ψ | Greek capital letter Psi |
Omega | Ω | Greek capital letter Omega |
alpha | α | Greek small letter alpha |
beta | β | Greek small letter beta |
gamma | γ | Greek small letter gamma |
delta | δ | Greek small letter delta |
epsilon | ε | Greek small letter epsilon |
zeta | ζ | Greek small letter zeta |
eta | η | Greek small letter eta |
theta | θ | Greek small letter theta |
iota | ι | Greek small letter iota |
kappa | κ | Greek small letter kappa |
lambda | λ | Greek small letter lambda |
mu | μ | Greek small letter mu |
nu | ν | Greek small letter nu |
xi | ξ | Greek small letter xi |
omicron | ο | Greek small letter omicron |
pi | π | Greek small letter pi |
rho | ρ | Greek small letter rho |
sigmaf | ς | Greek small letter final sigma |
sigma | σ | Greek small letter sigma |
tau | τ | Greek small letter tau |
upsilon | υ | Greek small letter upsilon |
phi | φ | Greek small letter phi |
chi | χ | Greek small letter chi |
psi | ψ | Greek small letter psi |
omega | ω | Greek small letter omega |
code | symbol | Description |
euro | € | Euro symbol |
dagger | † | Dagger |
Dagger | ‡ | Double dagger |
bull | • | Bullet |
hellip | … | Horizontal ellipsis |
prime | ′ | Prime |
Prime | ″ | Double prime |
oline | ‾ | Overline |
weierp | ℘ | Power set |
image | ℑ | Imaginary part |
real | ℜ | Real part |
trade | ™ | Trade mark sign |
alefsym | ℵ | First transfinite cardinal |
larr | ← | Leftwards arrow |
uarr | ↑ | Upwards arrow |
rarr | → | Rightwards arrow |
darr | ↓ | Downwards arrow |
harr | ↔ | Left right arrow |
crarr | ↵ | Carriage return |
lArr | ⇐ | Leftwards double arrow |
uArr | ⇑ | Upwards double arrow |
rArr | ⇒ | Rightwards double arrow |
dArr | ⇓ | Downwards double arrow |
hArr | ⇔ | Left right double arrow |
loz | ◊ | Lozenge |
spades | ♠ | Spade suit |
clubs | ♣ | Club suit |
hearts | ♥ | Heart suit |
diams | ♦ | Diamond suit |
code | symbol | Description |
forall | ∀ | For all |
part | ∂ | Partial differential |
exist | ∃ | There exists |
empty | ∅ | Empty set |
nabla | ∇ | Nabla, backward difference |
isin | ∈ | Element of |
notin | ∉ | Not an element of |
ni | ∋ | Contains as member |
prod | ∏ | Product sign |
sum | ∑ | Summation sign |
minus | − | Minus sign |
lowast | ∗ | Asterisk operator |
radic | √ | Square root |
prop | ∝ | Proportional to |
infin | ∞ | Infinity |
ang | ∠ | Angle |
and | ∧ | Logical AND |
or | ∨ | Logical OR |
cap | ∩ | Intersection |
cup | ∪ | Union |
int | ∫ | Integral |
there4 | ∴ | Therefore |
sim | ∼ | Similar to |
cong | ≅ | Approximately equal to |
asymp | ≈ | Asymptotic to |
ne | ≠ | Not equal to |
equiv | ≡ | Equivalent to |
le | ≤ | Less than or equal to |
ge | ≥ | Greater than or equal to |
sub | ⊂ | Subset of |
sup | ⊃ | Superset of |
nsub | ⊄ | Not a subset of |
sube | ⊆ | Subset of or equal to |
supe | ⊇ | Superset of or equal to |
oplus | ⊕ | Direct sum |
otimes | ⊗ | Vector product |
perp | ⊥ | Perpendicular to |
sdot | ⋅ | Dot operator |
lceil | ⌈ | Left ceiling |
rceil | ⌉ | Right ceiling |
lfloor | ⌊ | Left floor |
rfloor | ⌋ | Right floor |
lang | 〈 | Left pointing angle bracket |
rang | 〉 | Right pointing angle bracket |
Note that some symbols are not available in LaTeX.
Creating hypertext references is as simple as it is in HTML. The element used for references is reference, which adheres to the XLink syntax. We need to add one more attribute to the usual href attribute. For example:
<reference xml:link="simple" href="http://www.andromeda.nl/project/xmldoc/xmldoc.html"> The XMLDoc website </reference> provide the installation instructions.
The XMLDoc website provides the installation instructions.
An internal reference requires at least two elements.
The reference itself, which points to another place in the document and a label
to which the reference refers.
There can of course be multiple references to a single label.
The point in the document where reference can point to is marked
by a label
element.
A label
is an empty element with exactly one attribute,
the name
of the label.
Each label
must have a name
that is unique
throughout the document.
Here is an example of a label:
<label name='example'/>
You can refer to a label from any other place in the document by using a
ref
or a page
element.
The page
element creates a reference to the page number on which
the label
is printed.
This is of course only usefull on printed media, such as LaTeX.
The ref
and page
elements also require one
attribute:
<ref to='example'>example reference</ref>
The required attribute to
holds the name of the
label to which the reference refers.
The ref
element is usually not empty.
The inline
content of the ref
element is used to create the reference.
For example, when the document is transformed into HTML, the content
will become a clickable link.
The content of the page
element is only renedered in LaTeX output.
The XMLDoc system allows you to use multiple files for both input and output. Well, not really. Multiple output files is not implemented yet :-)
You do not need to create one single big XML file that contains all of your story. Especially when writing a large book or report, it is often convenient to origanize your document in multiple files. For example, one file for each chapter and a single 'root' document with all the header stuff, such as a title page and a table of contents which binds all chapters together. As a matter of fact, this XMLDoc guide is organized in such a way.
To include another XML file in your root document, use the standard XML XInclude system with a single attribute to specify the name of the file to be included. Other attributes, as well as the fallback element, are of course supported as well. Refer to the XML Inclusions standard for more information. For example, this chapter is in a file called "multifiles.xml", which is included in the main document with:
<xi:include href="multifiles.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
Remember to put the xi:include in the proper namespace. Normally, the included files must be valid XML files as well. To include other kinds of files, such as raw text or source code, the attribute parse='text' must be added to the include element. When including another XML document, the included XML file starts with the usual XML declaration, but has a different root element declared in the <!DOCTYPE...> declaration:
<?xml version="1.0"?> <!DOCTYPE chapter SYSTEM "../doc.dtd"> <chapter> <-- The content of this chapter --> </chapter>
Creating multiple output files would be a handy feature. Certainly when you create your documents for the web, it would be nice to have one title page with an index or table of contents and put each chapter or section in a separate HTML or XML file. Unfortunately, to implement this, we need the <xsl:document> element which is proposed in version 1.1 of the XSLT recommendation. We'll have to wait until this is supported by the XSLT processor; just hang in there for a while.
Alphabetical list of elements:
code
examples. Usually renders to a monospace font.
infoitem
elements.
toc
).
ref
element.
The name
attribute sets the name that can be referred.
command
attribute literally to the LaTeX output.
label
element.
The name of the label
is put in the to
attribute.
Used only for LaTeX output.
label
element.
The name of the label
is put in the to
attribute.
href
attribute.
span
tag with remark
class attribute. The way this looks is determined by a CSS stylesheet.
1This is an example of a footnote
2described in the next chapter