AscToHTM

Documentation for the AscToHTM conversion utility

This documentation can be downloaded as part of the documentation set in .zip format (370k)


Previous page Back to Contents List Next page

6 Using Document Policy files

This chapter has been largely superceded by the Policy manual

Document policy files are ordinary text files that list the "policies" that AscToHTM should implement when converting your document. The file can have added comment lines (starting with a "!" or "#" character) and headings for clarity.

A summary of the recognised policy lines is given in the Policy manual.

In most cases recognised policy lines are identical to those listed in the generated policy file (see 4.1). This is usually a good place to start when making your own policy.

Only those lines that are recognised policies are acted upon.

To use a policy file, simply list it on the command line after the name of the file being converted (see 4.2.2.3).

Document policies have two main uses :

  1. To correct any failure of analysis that AscToHTM makes. Hopefully this won't be needed too much as the core analysis engine improves.

Examples include page width, whether or not underlined section headings are expected etc.

  1. To tell the program how to produce better HTML end product in ways that couldn't possibly be inferred from the original text.

Examples include adding colour and titles to the page, as well as requesting a large document is split into several pages, and a contents list created.

The document sections in this chapter that described the policies in detail have been moved to a standalone document called the "Policy manual". That document describes the scope, effect, location and default values for all policies recognised by the program.


6.1 An example conversion

This documentation has itself been converted using AscToHTM. The files used were

This policy file "includes" the link dictionary a2hlinks.dat.

These files are included in the distribution kit as an example set of documentation.

You can, of course, use AscToHTM to convert this doco into whatever format, colour etc that you wish.


6.2 Analysis policies

These policies are used to control and correct the analysis of files during conversion. Full descriptions of these policies can be found in the Policy manual.

6.2.1 Overview ("look for") policies

The following analysis policies help give you an overview of what the program is looking for, and to enable/disable what is being looked for.

"Look for indentation"
"Look for hanging paragraphs"
"Look for white space"
"Look for short lines"
"Look for horizontal rulers"
"Minimum ruler length"
"Look for bullets"
"Search for definitions"
"Look for quoted text"
"Look for MAIL and USENET headers"
"Look for preformatted text"
"Attempt TABLE generation"
"Look for diagrams"


6.2.2 General Layout policies

The following analysis policies help control general layout parameters:-

"Page width"
"TAB size"
"Short line length"
"Min chapter size"

"Expect blank lines between paras"
"Hanging paragraph position(s)"

"Search for Definitions"
"New Paragraph Offset"
"Definition Char"

"Indent position(s)"


6.2.3 Bullet policies

AscToHTM has the following bullet point policies that will normally be correctly calculated on the analysis pass :-

"Look for bullets"

"Expect alphabetic bullets"
"Expect numbered bullets"
"Expect roman numeral bullets"

"Recognise '-' as a bullet"

"Recognise 'o' as a bullet"

"Bullet char"


AscToHTM tries hard not to get confused by the "1", "a" and "I" that happen to end up at the start of lines by random. These could get mistaken for bullet points.


6.2.4 Contents analysis policies

There is only one analysis contents policy:-

"Expect contents list"

This is described together with all the output contents list policies in Contents generation policies

For more information on content list generation see 5.6.2.


6.2.5 File Structure policies

AscToHTM has the following file structure policies that will normally be need to be set manually :-

"Keep it simple"

"Expect code samples"
"Input file contains DOS characters"
"Input file contains MIME encoding"
"Input file contains PCL codes"
"Input file contains Japanese characters"
"Input file has change bars"
"Input file has page markers"
"Page marker size (in lines)"

"Text Justification"
"Input file is double spaced"


6.2.6 Heading policies

AscToHTM has the following section heading policies that will normally be correctly calculated on the analysis pass :-

"Expect Numbered Headings"
"Expect Underlined Headings"
"Expect Capitalised Headings"
"Expect Embedded Headings"
"Heading key phrases"

"Check indentation for consistency"

"Expect Second Word Headings"
"First Section Number"
"Smallest possible section number"
"Largest possible section number"

"Preserve underlining of headings"

Section headers are far and away the most complex things the analysis pass has to detect, and the most likely area for errors to occur.

AscToHTM will also document to a policy file the headings it finds. This is still to be finalised, but currently has the format

      We have 4 recognised headings
          Heading level 0 = "" N at indent 0
          Heading level 1 = "" N.N at indent 0
          Contents level 0 = "" N at indent 0
          Contents level 1 = "" N.N at indent 2

AscToHTM will read in such lines from a policy text file, but does not yet fully supported editing these via the Windows interface.

The syntax is explained below, but this will probably change in future releases. You can edit these lines in your policy file, and through the policy options in Windows.

The lines are currently structured as follows

Line component
Value
xxxx

Either "Heading" or "Contents" according
to the part of the policy being described
Level n

Level number, starting at 0 for chapters
1 for level 1 headings etc.
"Some_word"


Any text that may be expected to occur before
the heading number. E.g. "Chapter" or "Section"
or "[". The case is unimportant.
N.Nx


The style of the heading number. This will
ultimately (in later versions) be read
as a series of number/separator pairs.





The proposed format is
"N" = number
"i" / "I" = lower/upper case roman numeral
with an 'x' at the end signalling that trailing
letters may be expected (e.g. 5.6a, 5.6b)
at indent n


The indentation that this heading is expected
at. This is important in helping to eliminate
false candidates.


6.2.7 Pre-formatted text policies

AscToHTM has the following section heading policies that will normally be correctly calculated on the analysis pass :-

"Minimum automatic <PRE> size"


6.2.8 Table analysis policies

New in version 4

AscToHTM uses the following policies to control the detection and analysis of tables :-

"Attempt TABLE generation"

"Table extending factor"

"Expect sparse tables"
"Ignore table header during analysis"
"Column merging factor"
"Minimum TABLE column separation"

"Default TABLE layout"
"Tables could be blank line separated"


6.3 Output policies

These policies are used to output and generation of files during conversion. Full descriptions of these policies can be found in the Policy manual.

6.3.1 Added HTML policies

AscToHTM has the following HTML policies that will only ever take effect if supplied in a user policy file :-

"Use first heading as title"
"Use first line as title"
"Document title"

"Document description"
"Document keywords"
"Background Image"

"HTML header file"
"HTML footer file"
"HTML Script file"

"Omit <HEAD> and <BODY> from output"
"Document Base URL"
"Comment generation code"
"HTML fragments file"

These "polices" allow you to start "adding value" to the HTML generated. That is, they allow to specify things that cannot be inferred from the original text.

You can also add HTML to your files by using the HTML preprocessor command (see 7.1.1)


6.3.2 Cascading Style sheet policies (CSS)

New in version 4

AscToHTM has the following HTML policies that influence the use of CSS in the HTML generated :-

"Document Style Sheet"

Not visible in the user interface is :-

"Create embedded style sheet"


6.3.3 Contents generation policies

AscToHTM has the following HTML policies that influence the detection and generation of contents lists :-

"Expect Contents List"

"Add contents list"
"Maximum level to show in contents"

"Use any existing contents list"

"Generate external contents file"
"External contents list filename"

"Hyperlinks on numbers"

See also the discussion in 5.6.2


6.3.4 Document Colour policies

New in version 4

AscToHTM has a large number of HTML policies that can control the colouring of the files. These policies are spread across a number of areas of functionality.

General

"Suppress all colour markup"

"Active Link Colour"
"Background Colour"
"Text Colour"
"Unvisited Link Colour"
"Visited Link Colour"

Frames

"Header frame background colour"
"Header frame text colour"
"Contents frame background colour"
"Contents frame text colour"
"Footer frame background colour"
"Footer frame text colour"

Tables

"Colour data rows"
"Default TABLE border colour"
"Default TABLE colour"
"Default TABLE even row colour"
"Default TABLE odd row colour"


6.3.5 Directory Page policies

AscToHTM has the following policies that can be used to influence whether or not AscToHTM will attempt to generate a Directory page for the files being converted. This is really only appropriate when converting more that one file at once (see 4.3.3)

The Directory Page will consist of entries for each file being converted (in order of conversion), and can have hyperlinks to the files, and to recognised headings in the files. This makes it suitable for use as a master index to a set of files converted in a single directory.

"Make Directory"
"Indent headings in Directory"
"Show file titles in Directory"
"Directory filename"

"Directory title"
"Directory description"
"Directory keywords"
"Directory return hyperlink text"

"Directory header file"
"Directory footer file"
"Directory script file"


6.3.6 File generation policies

AscToHTM has the following HTML policies that affect the file generation process :-

"Input directory"
"Output directory"
"Use .HTM extension"
"Output file extension"

"Preserve file structure using <PRE>"
"Preserve line structure"
"Treat each line as a paragraph"

"Generate diagnostics files"
"Output policy file"
"Output policy filename"

"DOS filename root"
"Use DOS filenames"

"Split level"
"Min HTML File size"
"Add navigation bar"
"Minimise HTML file size"

"Break up long HTML lines"

These policies specify how your document is divided into one or more HTML files, and how those files are to be named and linked together with hyperlinks.


6.3.7 Font policies

AscToHTM supports the implementation of fonts via either Cascading style sheets (CSS) or via the <FONT> tag.

Related policies are :-

"Use CSS to implement fonts"
"Default font"


6.3.8 Frames policies

New in version 4

From version 4 onwards AscToHTM will support the output of HTML as a set of HTML FRAMES. A large number of policies support this process.

General

"Place document in frames"

"Output frame name"
"Add Frame border"

"Open frame links in new window"
"New frame link window name"

"Add NOFRAMES links"
"NOFRAMES link URL"

Header and Footer frame policies

"Use main header in header frame"
"Header Frame depth"

"Use main footer in footer frame"
"Footer Frame depth"

Contents frame

"Add contents frame if possible"
"Contents Frame width"
"Number of levels in contents frame"

Main Frame

"Split level"
"Min HTML File size"
"First frame page number"

Frame colours

"Header frame background colour"
"Header frame text colour"
"Contents frame background colour"
"Contents frame text colour"
"Footer frame background colour"
"Footer frame text colour"


6.3.9 Hyperlink policies

AscToHTM has the following hyperlink policies set as defaults :-

"Create hyperlinks"
"Create mailto links"
"Allow email beginning with numbers"
"Check domain name syntax"

"Create gopher links"
"Create FTP links"
"Only allow explicit FTP links"

"Create NEWS links"
"Only use known groups"
"Recognised USENET groups"

"Add <BR> to lines with URLs"

"Cross-refs at level"

"Open link in new browser window"
"new browser window name"

Hyperlinks can also be added by using a link dictionary (see 4.3.2.2 and 4.4.2).


6.3.10 Link Dictionary policies

Link definitions appear in a policy file as follows :-

        [Link Dictionary]
        -----------------
        Link definition       :  "a2hdoco.txt" = "Source text" + "/~jaf/A2HDOCO

That is, the text to be matched, the text to be used in its placed as the highlighted text, and the URL this link is to point to (in this case a relative URL).

See the discussions in 4.3.2.2 and 4.4.2.


6.3.11 Preprocessor policies

AscToHTM has the following policies that can be used to influence the preprocessor (see Using the preprocessor), and hence the HTML output :-

"Use Preprocessor"
"Include document section(s)"

"Allow definitions inside PRE"


6.3.12 HTML styling policies

AscToHTM has the following "styling" that can be used to influence the HTML output :-

"Allow automatic centring"
"Automatic centring tolerance"
"Ignore multiple blank lines"

"Highlight definition text"
"Use <DL> markup for defn. paras"

"Largest allowed <Hn> tag"
"Smallest allowed <Hn> tag"
"Headings colour"
"Preserve underlining of headings"

"Search for emphasis"

"Use <EM> and <STRONG> markup"

"Preserve New Paragraph Offset"

Also, not available in the user interface is :-

"First line indentation (in blocks)"

6.3.13 Table Generation policies

AscToHTM has the following policies that can be used to influence whether or not AscToHTM will attempt to detect and generate HTML tables, and the attributes of any tables generated.

Tables may be tailored individually by adding pre-processor commands to your source text (see 7.1.4)

"Attempt TABLE generation"

"Default TABLE cell spacing"
"Default TABLE cell padding"
"Default TABLE border size"
"Default TABLE width"

"Default TABLE colour"
"Default TABLE border colour"

"Colour data rows"
"Default TABLE even row colour"
"Default TABLE odd row colour"

"Default TABLE alignment"
"Default TABLE cell alignment"

"Convert TABLE X-refs to links"

The following policies can only be changed through policy file, but are probably best not used in favour of the their equivalent preprocessor tags.

"Default TABLE caption"

"Default TABLE header rows"
"Default TABLE header cols"

"Column boundaries have zero width"

"Use <CODE>..</CODE> markup"


6.3.14 Miscellaneous policies

AscToHTM supports the following policies which currently can only be added by editing the .policy file

Contents List
"Add mail headers to contents list"

CSS
"Create embedded style sheet"

File generation
"Break up long HTML lines"
"HTML version to be targeted"
"Lines to ignore at end of file"
"Lines to ignore at start of file"

Fonts
"Suppress all font markup"

Headings
"Expect Second Word Headings"
"First Section Number"
"Number of words to include in filename"

HTML Generation
"HTML version to be targeted"

Style
"First line indentation (in blocks)"

Tables
"Default TABLE caption"

"Default TABLE header rows"
"Default TABLE header cols"

"Column boundaries have zero width"

"Use <CODE>..</CODE> markup"


6.4 Settings policies

New in version 4

These policies are used to control the behaviour of the program during the conversion process. Most program setting are not available as policies, but those that are are listed here. Full descriptions of these policies can be found in the Policy manual.


6.4.1 Error reporting

The following policies can be used to tailor the number and type of messages displayed during conversion.

"Error reporting level"

"Suppress INFO messages",
"Suppress TAG ERROR messages"
"Suppress URL messages"
"Suppress WARNING messages"
"Suppress program ERROR messages"


6.5 Saving and loading policy files

This section has been copied into the Policy manual section on placing policies in a file

6.5.1 Overview

AscToHTM allows you to save policies to file so that you can later reload them. This allows you to easily define different ways of doing conversions, either for different types of files, or to produce different types of output.

The policy files have a .pol extension by default, and are simple text files, with one policy on each line. You can, if you wish, edit these policies in a text editor... this is sometimes easier that using all the dialogs in the Windows version.

When editing policies, it is important not to change the key phrase (the bit before the ":" character), as this needs to be matched exactly by AscToHTM.

For best results, it is advisable to put in your policy file only those policies you want to fix. This leaves AscToHTM to calculate document-by-document policies that suit the files being converted.

Note:
Avoid using "full" policy file for your conversions. Such files prevent the program from adjusting to each source file, often leading to unwanted results.

6.5.2 Generating policy files for your document

The normal way to create a policy file is by setting options and them saving them using the "save policy file" dialog. This will offer you the choice of creating a partial policy file or a full policy file (see 6.5.2.1 and 6.5.2.2).

Alternatively, you can set the "Output policy file" policy which will generate a full policy file resulting from the analysis of the converted document.

Once a file is generated you can either edit them in a text editor - deleting policies that are of little interest to you, and editing those that are - or reload them into the program, change them and save them again.


6.5.2.1 Partial policy files

Partial policy files are files which have values for some, not all, policies.

These are recommended, because the it leaves AscToHTM free to adjust all the other policies not set in the file, allowing it to adapt to the details of the document being concerned.

For example, you should only set the indentation policy if you know what indents you are using, or if you want to override those calculated by AscToHTM. Normally it is best to omit this policy, and allow AscToHTM to work it out itself.

When you save a policy file from inside AscToHTM, a partial policy file will contain

6.5.2.2 Full policy files

A "full" policy file contains a value for almost every possible policy. Such files are usually only useful for documentation and analysis reasons, and should almost never be expected to be reloaded as input into a conversion, as this would totally fix the conversion details.


6.5.3 Naming policy files

Whenever the "Output policy file" policy is set the generated "full" policy file is usually called

<filename>.pol

where <filename> is the name of the file being created. When this happens any existing file of that name will be overwritten.

For this reason we strongly advise you adopt a naming convention of the form

in_<filename>.pol or i<filename>.pol

or place your input policies in a different directory and ensure they are backed up.




Previous page Back to Contents List Next page

Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-2001 John A Fotheringham
Converted by AscToHTM