Documentation for the AscToHTM Text to HTML converter


Previous page Back to Contents List Next page

Change History

Contents of this section

Version 5.0 (November 2004)
New functions in version 5.0
Ability to fully control table generation
Ability to "tag" your own tables for greater accuracy
Input text manipulation and labelling using "Text commands"
Support for non-ASCII character types and character encodings
Support for Definition Blocks
Support for comma-delimited and tab-delimited tables
New policies in version 5.0
New added HTML policies
New general analysis policies
New configuration file policies
New contents list policies
New file, page, paragraph and line structure policies
New file splitting policies
New font policies
New heading policies
New hyperlink policies
New styling policies
New table analysis policies
New table generation policies
New 'what to look for' policies
Other new policies
New programs in version 5.0
API version now available
New utility A2HDETAG
Other changes in version 5.0
New Pre-processor tags
Changes to the Windows version
Changes to the command line version
Changes to document analysis
Changes to documentation
Other new options
Version 4.1 (August 2001)
New functions in version 4.1
Other Changes in version 4.1
Version 4.0 (May 2001)
New functions in version 4.0
Other changes in version 4.0
Version 3.3 (June 2000)
New functions in version 3.3
Other changes in version 3.3
Version 3.2 (October 1999)
New functions in version 3.2
Other changes in version 3.2
Version 3.0 (August 1998)
New functions in version 3.0
Other changes in version 3.0
Version 2.3 (April 1998)
New functions in version 2.3
Other changes in version 2.3
Version 2.2 (Feb 1998)
New functions in version 2.2
Other changes in version 2.2
Version 2.1 (never officially released)
New functions in version 2.1
Other changes in version 2.1
Version 2.0 (October 1997)
New functions in version 2.0
Other changes in version 2.0
Version 1.1 (August 1997)
New functions in version 1.1
Other changes in version 1.1
Version 1.05 (July 1997)
New functions in version 1.05
Other changes in version 1.05
Version 1.04 (July 1997)
New functions in version 1.04
Other changes in version 1.04
Version 1.01 (April 1997)

Version 5.0 (November 2004)

Version 5.0 is the first major update to AscToHTM in 3 years. As such it contains a large number of enhancements and changes from the previous version 4.1.

New functions in version 5.0

Several major new features are added in version 5.0.

Ability to fully control table generation

To aid in processing tables, the program now allows you to identify various table structures by specifying various match conditions. Each time the software encounters a candidate table, it tests this against the match conditions to see if the "table" is of a known type.

For each table you can specify its structure, and various formatting rules to be used in its conversion. These structure and formatting definitions can be shared between multiple table types for your convenience.

All the table type, structures and formatting rules should be placed in an external text file, known as a Table Definition File (or TDF for short). A new policy allows you to identify which Table Definition File is to be used, and you can select this from the new Config File Location menu.

When a table matches a known table definition it is possible to:-

For full details see Using Table Definition Files (TDF).

Ability to "tag" your own tables for greater accuracy

The program now supports Tagged Table commands. These commands allow you to completely markup a table, specifying the column details, the row details and the contents of each table cell.

This approach can be used by those who want complete control over how their tables are constructed, or who are generating text files from a source which knows the table layout and can explicitly state it.

By using the tagged approach, you avoid the prospect of the program making mistakes when analysing the layout of the table.

As an example of using tagged table commands, the following sequence in the source file

        $_$_BEGIN_USER_TABLE C,1 in
        $_$_COLUMN_DETAILS 1,,,L, 2 in
        $_$_COLUMN_DETAILS 2,,,C, 1 ins
        $_$_TABLE_BORDER 1

        $_$_NEW_ROW HEAD
        $_$_NEW_CELL
        Substance (units)
        $_$_NEW_CELL
        Year
        Sampled

        $_$_NEW_ROW DATA
        $_$_NEW_CELL
        Alpha emitters (pCi/L)
        $_$_NEW_CELL
        1999

        $_$_NEW_ROW DATA
        $_$_NEW_CELL
        Asbestos (MFL)
        $_$_NEW_CELL
        1993
        $_$_END_TABLE

becomes

Substance (units)
Year
Sampled
Alpha emitters (pCi/L)
1999
Asbestos (MFL)
1993

Input text manipulation and labelling using "Text commands"

The program now allows you to apply "text commands" to the input text, before it is converted. There are several commands possible, which allow you to identify lines in the input text that should be ignored, and text in the input file that should be removed or replaced.

You can also use commands to tell the software how to interpret certain types of line. For example to say which lines are headings, and which should be regarded as bullet points etc. The Text Commands to be used should be placed in an external Text Command File. A new policy allows you to identify which Text Command File is to be used, and you can select this from the new Config File Location menu.

For full details see Using Text Command Files

Support for non-ASCII character types and character encodings

Non-latin and Unicode character sets

Some support has been added for non-latin character sets. The character set names are based on those used in HTML charsets.

Support has been added for auto-detecting the character set used, but this is far from foolproof. If you are using non-latin character sets you may need to set the character set manually.

It is not possible at present to support multiple character sets in one document (unless you are using Unicode)

To support this feature the following policies have been added

See Working with Unicode.


other special characters

There is a limited auto-detect of DOS characters when diagrams are present.

Support for Definition Blocks

Definition blocks allow you to define blocks of text that you may then insert at any point in the text (e.g. to give an "end of page" effect). You can also "define variables" whose value is then inserted wherever a VARIABLE tag is used.

This feature, though supported by the core analysis engine, is expected to be used more by the users of the AscToHTM converter.

The pre-processor commands involved are:

Support for comma-delimited and tab-delimited tables

Pre-processor commands have been added to allow you to mark up a section of comma-delimited or tab-delimited data you want turning into a table.

The new pre-processor directives are the COMMA_DELIMITED)TABLE command and the DELIMITED_TABLE command

In addition to this, the software now has the ability to automatically detect tab-delimited data tables.


New policies in version 5.0

A large number of new policy options have been added in version 5.0, and several that weren't previously accessible via the user interface can now be accessed.

New added HTML policies

The "added HTML" section acquires a number of new policies that allow you to create appropriate META tags in the HTML <HEAD> sections

New general analysis policies

New configuration file policies

In version 5.0 there is a new menu option under Conversion options pointing to Configuration Files, that is files loaded in addition to the policy file to control different aspects of the conversion. The choices made on that menu are also saved as option in your policy file.

New contents list policies

New file, page, paragraph and line structure policies

File structure

Page structure

Page markers

Paragraph structure

Line structure

Added options to allow more control over how the original document's file structure should be preserved

File generation

New file splitting policies

Added policies to allow greater control over splitting large files into a set of smaller linked HTML pages

New font policies

Added policies to allow different fonts to be applied to different types of text as follows

Normal text Default font
Headings Heading Font
Text in tables Table font
Table of contents Table of contents Font
Fixed-pitch text Fixed font

New heading policies

There are two new heading types that can be supported :-

Also added :-

New hyperlink policies

http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/

Although the display text is left unchanged, the hyperlink will point to a non-obfuscated URL (either the domain name, or an IP address). This is because the obfuscated URLs such as there are often used by spammers, and the author has no intention of allowing his software to aid spammers in their goals.

If someone cares to give me a valid reason for using such URLs I may reconsider this behaviour.

New styling policies

New table analysis policies

New table generation policies

New 'what to look for' policies

Other new policies

The following policies can't be accessed via the User Interface, but are listed here for completeness.

New programs in version 5.0

As well as the Windows and console versions, AscToHTM is also available (under separate licence) in API form so that developers can harness the power of AscToHTM's conversion abilities for use in their own software.

Also the New utility A2HDETAG allows documents marked up using the AscToHTM pre-processor to be converted into plain text files (e.g. for sending out as newsletters).

API version now available

As with all JafSoft converters, AscToHTM is available under separate license as an Application Programming Interface (API). This API allows software developers to harness the powerful abilities of AscToHTM from within their own software products.

The API is written in C++, and is supplied as either a library or a DLL under Windows. As such it can easily be invoked from C, C++ and Visual Basic software and has also been successfully invoked from inside Java and C# programs.

New utility A2HDETAG

For users who register, there is a new, separate command line utility called A2HDETAG available so they can "de-tag" their source files of all AscToHTM pre-processor tags, leaving a plain text fit for publishing, e.g. on Usenet.

In conjunction with this new BEGIN/END_ASCII ... END_ASCII pre-processor tags have been added. These identify text that will be copied to the output of A2HDETAG. It is ignored in all other conversions, and is intended to allow alternative text to be placed in text and HTML versions of a document.


Other changes in version 5.0

Other changes include :-

New Pre-processor tags

Added several new pre-processor in-line tags :-

FILENAME outputs name of file being converted
FRACTION outputs a fraction
VERSION
outputs AscToHTM program name and version number
IGNORE multi-line text to be ignored
IGNORE_THIS in-line text to be ignored

Changes to the Windows version

Changes to the command line version

Changes to document analysis

Changes to documentation

Other new options

Version 4.1 (August 2001)

Version 4.1 was a major update from the previous release 4.0.


New functions in version 4.1

The auto-detect of character sets can be switched off by using the Look for character encodings policy

Normal text Default font
Headings Heading Font
Text in tables Table font
Table of contents TOC Font
Fixed-pitch text Fixed font

The "Default Font" policy existed previously, the other four policies are new in this version.

Other Changes in version 4.1

Windows version

Documentation

All versions

Version 4.0 (May 2001)

Version 4.0 represents a major update over the previous version 3.3.


New functions in version 4.0

API version

Linux version

Windows version

All versions

Place document in frames
Output frame name

Header Frame depth
Footer Frame depth
Contents Frame width

Use main header in header frame
Use main footer in footer frame


Add contents frame if possible
Add Frame border

Open frame links in new window
New frame link window name

Add NOFRAMES links
NOFRAMES link URL

Number of levels in contents frame

First frame page number

Header frame background colour
Header frame text colour
Contents frame background colour
Contents frame text colour
Footer frame background colour
Footer frame text colour

Although the display text is left unchanged, the hyperlink will point to a non-obfuscated URL (either the domain name, or an IP address). This is because the obfuscated URLs such as there are often used by spammers, and the author has no intention of allowing his software to aid spammers in their goals.

If someone cares to give me a valid reason for using such URLs I may reconsider this behaviour.

allow lines at the start and end of the source file to be discarded. This can be useful if you source text is coming from a third party source that adds extra, unwanted, lines.

Other changes in version 4.0

Windows version

NOTE: DDE won't work with Netscape 6.0 (it doesn't support it)

All versions

Version 3.3 (June 2000)

The AscToHTM 3.3 release follows 6 "micro-releases" announced via the updates page on the Web. As such it will appear as a small step forward over 3.2.06, but in fact it offers a fair amount of new functionality over version 3.2

Major changes in version 3.3 include :-

If anyone wants to correct these files and send them back to me, feel free.

New functions in version 3.3

Fonts

font, and will scale with the selected font size, although the <H1> headers are slightly smaller than the default.

You can choose to have the fonts implemented using <FONT> tags or CSS (e.g. according to your target audience) using the Use CSS to implement fonts policy.


Tables

Default TABLE layout
(also pre-processor tag TABLE_LAYOUT)

This allows you to specify the number of columns in each table, and the attributes of each column, specifically the character position that marks the end of each column. Rather than use this policy, it is probably better to use the related directive $_$_TABLE_LAYOUT in the source text on a per-table basis.


Default TABLE alignment
(also pre-processor tag TABLE_ALIGN)

Allows the alignment of the table to be specified (left, right, center)


Ignore table header during analysis (also pre-processor tag TABLE_IGNORE_HEADER)

Specifies that table headers should be ignored when columns are being auto-detected. Some tables have complex headers that confuse the analysis. This policy can be used to help them be ignored.


Table extending factor

Controls the degree to which pre-formatted lines should be expanded into adjacent text.


Column merging factor

Controls the degree to which columns which don't appear to be very clear should be "merged" together


Could be blank line separated

Indicates that tables could be using blank lines to separate rows of data. This affects the analysis and detection of the tables extent.


The new pre-processor directives :-

BEGIN/END_COMMA_DELIMITED_TABLE
BEGIN/END_DELIMITED_TABLE


Other

Treat each line as a paragraph

If this option is selected, every line in the source file is treated as a paragraph. This may be suitable if the file has been authored using an editor that wraps the lines (i.e. doesn't put in hard breaks) and which doesn't add blank lines between paragraphs.

Preserve line structure

If this option is selected a <BR> is added to every line, thereby preserving the line structure of the original and giving the resulting HTML file an "A4 look" that hugs the left margin regardless of how wide the window is made.

Preserve file structure using <PRE>

If this option is selected the whole document is placed in <PRE> markup, and very few conversions are attempted. This is really a "last resort" option that you may want to use if the file has complex structures which the program is failing to understand. This option was added for a customer who wanted to convert all 2800 RFCs without having to manually correct each one.

Analysis->File Structure.

There is a limited auto-detect of DOS characters when diagrams are present.

FILENAME - output name of converted file
FRACTION - output a fraction
VERSION - output program version number

IGNORE_THIS - for comments in the source code

Only "HTML 3.2" and "HTML 4.0 Transitional" are currently supported.


Other changes in version 3.3

Windows

All

Also reversed the order of sections in this "Change History" section


Version 3.2 (October 1999)

(Version 3.1 was never released, but a release of AscToTab occurred sometime after version 3.0, and so in keeping with the policy of synchronizing version numbers that was labelled version 3.1)

Over a year after the last release, version 3.2 is a major upgrade, but is only given a minor version number change because the remainder of the functionality produced in that time will be revealed in version 4.0.

Version 3.2 starts to prepare the groundwork for Cascading Style Sheet (CSS) and general font support that will be introduced in version 4.0. This has required a fairly radical change to the type of HTML code generated and how this is put together.

For example the HTML is now more standards compliant (this is now a stated goal of the software, although I can't always promise full compliance see Standards compliance), and as an aid towards CSS support "optional" end tags such as </P> are now being placed in the generated HTML.

Note that the use of the <FONT> tag is deprecated in HTML 4.0, and if you choose to add FONT markup to your pages they'll become much bigger, especially if they contain tables. This is because the HTML standard requires the FONT tag to continually be re-expressed to achieve the right appearance in all browsers (believe me, I only accepted this through bitter experience and grudgingly).

Major changes in version 3.2 include :-

New functions in version 3.2

Windows Version

All versions

Other changes in version 3.2

On the web site, and documentation

Windows version

VMS version

All versions

This should make it easier to rename files after production without breaking local hyperlinks. Links to/from other files would still stop working though.

Version 3.0 (August 1998)

There are a fair number of small changes in functionality over V2.3, together with a fair number of bug fixes and refined algorithms. A lot of development during this time was directed towards the production of a text-to-RTF converter (AscToRTF) using the same analysis engine. Consequently there are a lot of changes "under the bonnet".

The main functional change has been the revamp of the Windows User Interface. A new section (4.1.2) has been added to this document describing the Windows interface in some detail. The changes include :-

New functions in version 3.0

Windows Version

All Versions

Other changes in version 3.0

An example piece of HTML code would be

        <A HREF="http://www.jafsoft.com/asctohtm/?from=doco">
        <IMG SRC="a2hlogo.jpg" WIDTH=100 HEIGHT=36 BORDER=0
        ALT="Converted by AscToHTM"></A>

Version 2.3 (April 1998)

Minor bug fixes and upgraded functionality over V2.2. The main functional changes have been

  1. The introduction of wildcard support to allow conversion of multiple files at once.

  2. (related to the above) the introduction of the Directory Page
    feature that allows the generation of a hyperlinked document spanning
    all the files in a directory.

  3. Major re-write of the contents-list generating routines. The
    program now makes a third, intermediate, pass through the document
    to analyse the contents structure. This means that contents lists
    are now placed at the top of the HTML file be default, rather than
    in a separate file as previously - though that behaviour is still
    supported if wanted.

This approach is expected to pay further dividends in later releases.


New functions in version 2.3

Windows Version

All versions

Make Directory
Directory filename
Show file titles in Directory
Indent headings in Directory
Directory title
Directory keywords
Directory description
Directory return hyperlink text
Directory Script file
Directory header file
Directory footer file

Other changes in version 2.3

Sometimes I just work too hard :^)


Version 2.2 (Feb 1998)

First major release after V2.0 (when AscToHTM first went fully-Windowed). Major change this time has been the introduction of TABLE generating algorithms. These were first made available as a separate freeware utility AscToTab.

This version is reviewed by ZDNet and awarded 5-stars, their highest award.


New functions in version 2.2

Table generation

This is the biggest change in this version. AscToHTM now incorporates the technology first introduced in AscToTab. To support this the detection of pre-formatted text has been improved, new policies added, and new pre-processor commands added.

New policies include :-

Attempt TABLE generation
Default TABLE border size
Default TABLE header rows
Default TABLE header cols
Default TABLE cell spacing
Default TABLE cell padding
Default TABLE colour
Default TABLE border colour
Default TABLE caption


New Pre-processor commands include :-

BEGIN/END_CODE
BEGIN/END_DIAGRAM
BEGIN/END_TABLE
TABLE_BORDER
TABLE_BORDERCOLOR
TABLE_BGCOLOR
TABLE_CAPTION
TABLE_CELLSPACING
TABLE_CELLPADDING
TABLE_HEADER_ROWS
TABLE_HEADER_COLS


Other changes

Other changes in version 2.2

Documentation

All versions