The "Tag Manual" for JafSoft's text conversion utilities

The most recent version of this document can always be found online


Previous page Back to Contents List Next page

2. Using the pre-processor

The preprocessor was originally introduced to allow users more flexibility in the HTML they generate.

The pre-processor allows AscToHTM and AscToRTF to be used as an authoring tools, as opposed to a simple text conversion or migration tool.

Preprocessor lines are not normally output to the HTML or RTF generated. Instead they are used to modify the conversion process in a number of ways.


2.1 Marking up sections of text

The pre-processor can be used to mark sections in your document so that the program will correctly process them as you wish.

Note
The software does attempt to spot much user-formatted text automatically, but this is a difficult area and prone to error. Hence the use of these directives can reduce the error rate on such occasions.

Examples include :-

SECTION

This directive is used to divide the document up into named sections that may then be conditionally included/excluded from a particular conversion.

BEGIN/END_TABLE
BEGIN/END_DELIMITED_TABLE
BEGIN/END_COMMA_DELIMITED_TABLE
BEGIN/END_USER_TABLE New in version 5.0

These pairs of directives are used to bracket tables of various types in the source text. The software will attempt to detect plain text tables, but if this goes wrong adding these commands can correct the analysis

Within these tables you can use other TABLE pre-processor commands to tailor the HTML generated (see "The TABLE commands").

BEGIN/END_CONTENTS

Used to mark up a contents list in the source document. The software will attempt to automatically detect the presence and location of any contents list in the document, but the algorithm can be problematic, and only really works for numbered headings.

BEGIN/END_HTML

Delimits a section of raw HTML code to be copied to the output file unchanged.

BEGIN/END_CODE
BEGIN/END_DIAGRAM
BEGIN/END_PRE

Delimits sections of pre-formatted text. CODE refers to software samples whilst DIAGRAM refers to ASCII art. PRE is the more general "pre-formatted" text, although currently all 3 have the same implementation.

BEGIN/END_IGNORE

Delimits text that should be ignored. This could be anything from comments to copyright statements in the original source file that shouldn't appear in the converted document.


2.2 Commands that influence the indexing of the document

Certain directives can be used to alter the document properties. Often these affect how the document will be searched and indexed.

In HTML these mostly lead to tags in the <HEAD>..</HEAD> of each page. Often these tags produce no visible effect.

In RTF these lead to field in the document properties being filled in.

Examples include :-

TITLE
DESCRIPTION
KEYWORDS
STYLE_SHEET (HTML Only)

The DESCRIPTION and KEYWORDS commands may be continued on subsequent lines provided they also begin with the same $_$_<command> directive.


2.3 Useful one-line pre-processor commands

A large number of one-line directives exist. Those for tables are listed the section on The TABLE commands. Others include

CONTENTS_LIST
HTML_LINE
INCLUDE
LINERULE
NAVIGATION_BAR
TOC


2.4 Useful in-line tags

A large number of in-line tags are available. These can be used to produce a number of useful effects. They include :-

BR (line break)
GOTO
HYPERLINK
TIMESTAMP
SPACES
SUPER and SUB
VARIABLE


2.5 The TABLE commands

These directives are used to tailor the HTML generated in any tables the software creates. They are placed either

At the top of the file
Directives placed here become defaults for the whole file, and will replace any policies that have been set (see the section on "Table Generation" in the AscToHTM manual)

Inside a BEGIN_TABLE ... END_TABLE section
Directives placed here will apply only to the table marked up by these commands (see 7.1.2).

The table commands are described (naturally enough) in the following table.

Directive Value Effect
TABLE_ALIGN Align Specifies the alignment of the whole table.
TABLE_BGCOLOR Colour Colour of background
TABLE_BORDER Number Size of border. 0 = None
TABLE_BORDERCOLOR Colour Colour of border
TABLE_CAPTION Text Table caption. Added centred at the top
TABLE_CELL_ALIGN Align Specifies the default alignment of
    cells. Left, right or center
TABLE_CELLSPACING Number Spacing between cells.
TABLE_CELLPADDING Number Padding inside each cell
TABLE_COLO(U)R_ROWS (none) If present this specifies that the
    odd and even rows of the table should
    be coloured differently. See also the
    "Colour data rows" policy.
TABLE_CONVERT_XREFS (none) If present, indicates that any section
    cross-references in the table may
    be converted to hyperlinks
    (see also the policy line
    "Convert TABLE X-refs to links")
TABLE_EVEN_ROW_COLO(U)R Colour When data rows are to be coloured
    this specifies the colour of the
    even numbered rows.
TABLE_HEADER_ROWS Number Number of header rows. These
    will be placed in <TH> .. </TH> markup
TABLE_HEADER_COLS Number Number of header columns.
    These will be marked up in bold
TABLE_IGNORE_HEADER (none) If present, indicates that the first
    few line (i.e. the header) should be ignored
    when calculating the column structure of the table.
    See also policy "Ignore table header during analysis"
TABLE_LAYOUT Layout Explicit structure of table in terms of
    number of columns and their widths.
    See also policy "Default TABLE layout"
TABLE_MAY_BE_SPARSE (none) If present, indicates that the TABLE
    may be sparse (see also the policy
    "Expect sparse tables")
TABLE_MIN_COLUMN_SEPARATION Number Number of spaces to be taken as a
    column separator when analysing the
    table (see also the policy
    "Minimum TABLE column separation").
TABLE_ODD_ROW_COLO(U)R Colour When data rows are to be coloured
    this specifies the colour of the
    odd numbered rows.
TABLE_WIDTH Text The width of the table (see also the
    policy "Default TABLE width")

Colours should be HTML Colours which will placed in the various attributes of the <BODY> tag and other. The program simply transcribes your value into the output file.


2.6 The CHANGE_POLICY command

NOTE
This feature has the potential to cause mayhem, and as such is offered to users on a "as is" basis. That is, we offer no support for getting this feature to have the effect a user may desire. That said, it's one of the most useful tags we know :-)

This directive allows you change a particular policy in part of a document. This is a potentially powerful feature, allowing you to tailor the conversion of your file in different sections of that file, or to embed the policy particular to a file in commands inserted at the top of the file itself.

The syntax of the command line is

$_$_CHANGE_POLICY <Policy Line>

where <Policy_line> is a policy line as it would appear in a policy file, and (usually) as it appears in the Policy manual.

For example the following would all be valid directives

        $_$_CHANGE_POLICY Background Colour : red
        $_$_CHANGE_POLICY Ignore multiple blank lines : Yes

Although how and when they would take affect will depend on the policy.

For example, the background colour would only take effect if splitting the file up, and only on the next file generation. This works, BTW, so if anyone wants to split a file into many pages, all different colours, then be my guest.

There are a many caveats to this behaviour :-

Not all policies may be changed in this way. In particular policies that open other policy files are not supported. Even if a policy if "changed", it does not follow that changing the policy will have an effect.

It is unlikely that this feature can be sensibly used to influence the analysis of file, other than when placed at the top of the file only. If such a manner it is simply an alternative to using a separate policy file.

Output policies are referenced at different times. Only those that are referenced after the line is read from the source file may be influenced, thus things like output file name may have no effect.

Not all policies once changed, can be changed back. This is particularly of policies that contain values to be added to a list. This is an issue that may be addresses in later versions.

Messing with policies can cause unpredictable behaviour. For example if you alter the section splitting parameters, then the chances of a section cross-reference elsewhere in the document being calculated as a correct hyperlink diminishes.

That's why this feature is offered UNSUPPORTED

To further complicate matters, the software uses a readahead, write behind buffer which means that you may need to experiment with the placing of your policy change to within 40 lines (the size of the buffer).

This problem is alleviated since version 3.2.


2.7 Definition blocks and variables

Using pre-processor tags you can define "blocks" of text known as "definition blocks".

Definition blocks allow blocks of output to be defined out of sequence, that is the content is defined in one location, and then may be instantiated at a number of different locations.

A definition block has the form

        $_$_DEFINE_BLOCK <block name>
        ..
        text that forms the block
        ..
        $_$_END_BLOCK

The text inside the block may contain in-line tags, but it cannot contain any other tag directives.

To invoke a block use the EMBED_BLOCK or INSERT_BLOCK commands.

One tag that is particularly useful inside blocks is the VARIABLE tag. You can define variables throughout the document and then quote them inside a define block.

A possible example of use would be the addition of "page" footers. You could define the text that goes inside a page footer, and include in it a variable called PAGE_NUMBER. You can then re-define the PAGE_NUMBER and output a new page boundary with the commands

        $_$_DEFINE_VARIABLE PAGE_NUMBER 21
        $_$_INSERT_BLOCK PAGE_FOOTER

having previously defined a PAGE_FOOTER block.

It should perhaps be pointed out that "pages" are anathema to HTML, but should you want this feature this is a possible implementation.


2.8 HTML colours

Some tags accept colour values. These values should be HTML colours which - for example - may be placed in the various attributes of the <BODY> tag.

You can enter any value acceptable to HTML. Normally a value is expressed as a 6-digit hexadecimal value in the range 000000 (black) to FFFFFF (white), but certain colours such as "white", "blue", "red" etc may also be recognised by HTML. The software (AscToHTM) simply transcribes your value into the output file. The list of colours recognised in the HTML standard is

Colour HTML Hex value
Black #000000
Silver #C0C0C0
Gray #808080
White #FFFFFF
Maroon #800000
Red #FF0000
Purple #800080
Fuchia #FF00FF
Green #008000
Lime #00FF00
Olive #808000
Yellow #FFFF00
Navy #000080
Blue #0000FF
Teal #008080
Aqua #00FFFF

Only these values will be converted by the software to the equivalent names. Other names exist outside the standard which may not be universally supported.


2.9 English/American spellings

As far as possible tags support both British English and American English spellings. This mainly occurs with the word "colour" (or "color"), so for example the directives

$_$_TABLE_ODD_ROW_COLOUR ....

and

$_$_TABLE_ODD_ROW_COLOR ....

are equivalent.



Previous page Back to Contents List Next page

Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-2004 John A. Fotheringham
Converted by AscToHTM