Contents of this section
Version 5.0 (November 2004)
New functions in version 5.0Version 4.1 (August 2001)
Ability to fully control table generationNew policies in version 5.0
Ability to "tag" your own tables for greater accuracy
Input text manipulation and labelling using "Text commands"
Support for non-ASCII character types and character encodings
Support for Definition Blocks
Support for comma-delimited and tab-delimited tables
New added HTML policiesNew programs in version 5.0
New general analysis policies
New configuration file policies
New contents list policies
New file, page, paragraph and line structure policies
New file splitting policies
New font policies
New heading policies
New hyperlink policies
New styling policies
New table analysis policies
New table generation policies
New 'what to look for' policies
Other new policies
API version now availableOther changes in version 5.0
New utility A2HDETAG
New Pre-processor tags
Changes to the Windows version
Changes to the command line version
Changes to document analysis
Changes to documentation
Other new options
New functions in version 4.1Version 4.0 (May 2001)
Other Changes in version 4.1
New functions in version 4.0Version 3.3 (June 2000)
Other changes in version 4.0
New functions in version 3.3Version 3.2 (October 1999)
Other changes in version 3.3
New functions in version 3.2Version 3.0 (August 1998)
Other changes in version 3.2
New functions in version 3.0Version 2.3 (April 1998)
Other changes in version 3.0
New functions in version 2.3Version 2.2 (Feb 1998)
Other changes in version 2.3
New functions in version 2.2Version 2.1 (never officially released)
Other changes in version 2.2
New functions in version 2.1Version 2.0 (October 1997)
Other changes in version 2.1
New functions in version 2.0Version 1.1 (August 1997)
Other changes in version 2.0
New functions in version 1.1Version 1.05 (July 1997)
Other changes in version 1.1
New functions in version 1.05Version 1.04 (July 1997)
Other changes in version 1.05
New functions in version 1.04Version 1.01 (April 1997)
Other changes in version 1.04
Version 5.0 is the first major update to AscToHTM in 3 years. As such it contains a large number of enhancements and changes from the previous version 4.1.
Several major new features are added in version 5.0.
To aid in processing tables, the program now allows you to identify various table structures by specifying various match conditions. Each time the software encounters a candidate table, it tests this against the match conditions to see if the "table" is of a known type.
For each table you can specify its structure, and various formatting rules to be used in its conversion. These structure and formatting definitions can be shared between multiple table types for your convenience.
All the table type, structures and formatting rules should be placed in an external text file, known as a Table Definition File (or TDF for short). A new policy allows you to identify which Table Definition File is to be used, and you can select this from the new Config File Location menu.
When a table matches a known table definition it is possible to:-
For full details see Using Table Definition Files (TDF).
The program now supports Tagged Table commands. These commands allow you to completely markup a table, specifying the column details, the row details and the contents of each table cell.
This approach can be used by those who want complete control over how their tables are constructed, or who are generating text files from a source which knows the table layout and can explicitly state it.
By using the tagged approach, you avoid the prospect of the program making mistakes when analysing the layout of the table.
As an example of using tagged table commands, the following sequence in the source file
$_$_BEGIN_USER_TABLE C,1 in $_$_COLUMN_DETAILS 1,,,L, 2 in $_$_COLUMN_DETAILS 2,,,C, 1 ins $_$_TABLE_BORDER 1 $_$_NEW_ROW HEAD $_$_NEW_CELL Substance (units) $_$_NEW_CELL Year Sampled $_$_NEW_ROW DATA $_$_NEW_CELL Alpha emitters (pCi/L) $_$_NEW_CELL 1999 $_$_NEW_ROW DATA $_$_NEW_CELL Asbestos (MFL) $_$_NEW_CELL 1993 $_$_END_TABLE
becomes
Substance (units) |
Year Sampled |
---|---|
Alpha emitters (pCi/L) |
1999 |
Asbestos (MFL) |
1993 |
The program now allows you to apply "text commands" to the input text, before it is converted. There are several commands possible, which allow you to identify lines in the input text that should be ignored, and text in the input file that should be removed or replaced.
You can also use commands to tell the software how to interpret certain types of line. For example to say which lines are headings, and which should be regarded as bullet points etc. The Text Commands to be used should be placed in an external Text Command File. A new policy allows you to identify which Text Command File is to be used, and you can select this from the new Config File Location menu.
For full details see Using Text Command Files
Non-latin and Unicode character sets
Some support has been added for non-latin character sets. The character set names are based on those used in HTML charsets.
Support has been added for auto-detecting the character set used, but this is far from foolproof. If you are using non-latin character sets you may need to set the character set manually.
It is not possible at present to support multiple character sets in one document (unless you are using Unicode)
To support this feature the following policies have been added
- the character encoding policy to allow the character encoding of a document to be set. The software has limited ability to detect Japanese ("x-sjis") and Cyrillic ("koi-8") text, but in some cases this will need to be set.
- The auto-detect of character sets can be switched off by using the Look for character encodings policy. You might want to do this if the software wrongly suspects your document is a non-latin character set.
See Working with Unicode.
other special characters
- Added support for parsing files with some Mime-encoded quotable strings in them. The new policy Input file contains MIME encoding can be found under Analysis->File structure. At present there is some (very limited) auto-detect for this feature.
- Added support for documents with change bars. By default change bars are stripped out, and the changed text coloured red this behaviour may be changed in later versions. Added the new policy Input file has change bars which can be found under Analysis->File Structure.
- Added support for converting DOS characters. The new policy Input file contains DOS characters can be found under Analysis->File Structure.
There is a limited auto-detect of DOS characters when diagrams are present.
- Added Input file contains PCL codes policy. Again there is a limited ability to detect these codes. A few of the PCL codes are interpreted. Most are just discarded.
- Improved handling of VT escape characters. These are either removed from the output or converted to "line" characters
Definition blocks allow you to define blocks of text that you may then insert at any point in the text (e.g. to give an "end of page" effect). You can also "define variables" whose value is then inserted wherever a VARIABLE tag is used.
This feature, though supported by the core analysis engine, is expected to be used more by the users of the AscToHTM converter.
The pre-processor commands involved are:
Pre-processor commands have been added to allow you to mark up a section of comma-delimited or tab-delimited data you want turning into a table.
The new pre-processor directives are the COMMA_DELIMITED)TABLE command and the DELIMITED_TABLE command
In addition to this, the software now has the ability to automatically detect tab-delimited data tables.
A large number of new policy options have been added in version 5.0, and several that weren't previously accessible via the user interface can now be accessed.
The "added HTML" section acquires a number of new policies that allow you to create appropriate META tags in the HTML <HEAD> sections
In version 5.0 there is a new menu option under Conversion options pointing to Configuration Files, that is files loaded in addition to the policy file to control different aspects of the conversion. The choices made on that menu are also saved as option in your policy file.
File structure
- Added Lines to ignore at start of file and Lines to ignore at end of file policies to allow lines at the start and end of the source file to be discarded. This can be useful if you source text is coming from a third party source that adds extra, unwanted, lines.
- Added auto-detect of double spaced files (files where every second line is blank). This will set the Input file is double spaced policy whenever double-spaced text is detected (unless the policy has already been set).
- Added Input file contains UNICODE characters policy. When enabled the program will create a UTF-8 output file.
Page structure
- Added PAGE command. This marks a page boundary. In the HTML this creates a <HR> page separator
Page markers
- Added Input file has page markers and Page marker size (in lines) policies. These allow you to identify that the file has page markers containing form feeds and that the first so many lines after the form feed should be discarded.
Paragraph structure
- Added Preserve new paragraph offset policy. In documents where a first line offset is detected at the start of each paragraph you can elect to have this preserved in the output.
Line structure
Added options to allow more control over how the original document's file structure should be preserved
- Added Treat each line as a paragraph policy. If this option is selected, every line in the source file is treated as a paragraph. This may be suitable if the file has been authored using an editor that wraps the lines (i.e. doesn't put in hard breaks) and which doesn't add blank lines between paragraphs.
- Added Preserve line structure policy. If this option is selected a line break is added to every line, thereby preserving the line structure of the original.
File generation
- Added Break up long HTML lines policy. If this option is selected, the output HTML will be broken into smaller lines to make it more readable.
Added policies to allow greater control over splitting large files into a set of smaller linked HTML pages
Added policies to allow different fonts to be applied to different types of text as follows
Normal text Default font Headings Heading Font Text in tables Table font Table of contents Table of contents Font Fixed-pitch text Fixed font
There are two new heading types that can be supported :-
Also added :-
http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/
Although the display text is left unchanged, the hyperlink will point to a non-obfuscated URL (either the domain name, or an IP address). This is because the obfuscated URLs such as there are often used by spammers, and the author has no intention of allowing his software to aid spammers in their goals.
If someone cares to give me a valid reason for using such URLs I may reconsider this behaviour.
The following policies can't be accessed via the User Interface, but are listed here for completeness.
As well as the Windows and console versions, AscToHTM is also available (under separate licence) in API form so that developers can harness the power of AscToHTM's conversion abilities for use in their own software.
Also the New utility A2HDETAG allows documents marked up using the AscToHTM pre-processor to be converted into plain text files (e.g. for sending out as newsletters).
As with all JafSoft converters, AscToHTM is available under separate license as an Application Programming Interface (API). This API allows software developers to harness the powerful abilities of AscToHTM from within their own software products.
The API is written in C++, and is supplied as either a library or a DLL under Windows. As such it can easily be invoked from C, C++ and Visual Basic software and has also been successfully invoked from inside Java and C# programs.
For users who register, there is a new, separate command line utility called A2HDETAG available so they can "de-tag" their source files of all AscToHTM pre-processor tags, leaving a plain text fit for publishing, e.g. on Usenet.
In conjunction with this new BEGIN/END_ASCII ... END_ASCII pre-processor tags have been added. These identify text that will be copied to the output of A2HDETAG. It is ignored in all other conversions, and is intended to allow alternative text to be placed in text and HTML versions of a document.
Other changes include :-
Added several new pre-processor in-line tags :-
FILENAME | outputs name of file being converted |
FRACTION | outputs a fraction |
VERSION |
outputs AscToHTM program name and version number |
IGNORE | multi-line text to be ignored |
IGNORE_THIS | in-line text to be ignored |
- The main screen now allows access to Policy file selection. Previously this was only available on the menu structure. The Menu structure has been left unchanged, meaning you now have two ways of choosing your policy files.
- The main screen now allows you to search sub folders when using wildcards.
- The main screen also allows you to specify the File conversion type. You can choose to treat the input file as a number of different table types (e.g. tab-delimited data).
- You no longer get prompted to "save policy" just because you pressed OK on one of the policy sheets. Now this only happens when something has been changed.
- The main menu now has a "check for updates" option. If you select this you'll be taken to the JafSoft website where you'll be told if any newer versions of the software have been released.
- Program now remembers positions of windows from one invocation to the next.
- The user interface is now available in Italian, French and Swedish.
- Command line now allows multiple filespecs, separated by spaces. Policy file must now be a .pol file, rather than the second argument.
- More changes on bullet characters, in particular to disallow 'O' (upper case) from becoming a bullet character through analysis. This really doesn't work in Portuguese documents :-) 'o' (lower case) may still be detected. If upper case 'O' is wanted this can still be manually switched on.
- Horizontal lines are now implemented as line rules whose length attempts to approximate the original (e.g. 50% or whatever). Previously lines would become full width.
- Bookmark names from filename are now lower case (to reduce possible mismatches)
- Shareware version now expires after 30 days + 5 uses. This will allow people to use the software on 5 different days after the first 30 days, giving people more time to evaluate the software at their leisure.
- Now strip out leading and trailing "---" from heading text to make them more presentable in HTML
- Changed emphasis handling to allow hyphenated parts to be emphasised independently, e.g. pre-formatted or pre-formatted.
- Fine-tuned the detection of whether or not a file has an in-situ contents list
- The LINKPOINT pre-processor tag can now be used as a directive as well as an in-line tag. (see the Tag manual for details).
- Increased maximum width allowed for input lines in tables to 5.0 (after encountering a sample at 165). Lines longer than this are still disregarded as candidate table lines.
- Improved analysis for tables using bar ('|') column separators
- Improved detection of ASCII art diagrams.
- Improved handling of heavily indented blocks of text. Previously these were (poorly) rendered as tables. Now the tables more accurately preserve the large indentation (see Text block detection).
- The software will now automatically detect where a table is in fact tab-delimited data. Where detected it will then and use that tab structure to calculate columns.
- This document has been completely re-written. It is converted from a single text file into the HTML pages, an RTF file and the Windows Help file using the AscToHTM and AscToHTM programs. You can view the source file for this document as file "AscToHTM.txt".
- The Tag manual describes the tagging systems available to JafSoft conversion utilities. Note that not all of the tags described there are relevant (or supported) in HTML generation. However many are common between the converters, should you wish to convert the same text file into other formats
- A "Table manual" is under production to explain how to get the most from tables in your conversions. This is expected to appear some time after AscToHTM 5.0 is released.
Version 4.1 was a major update from the previous release 4.0.
- New /TABLE command line qualifier that allows the input file to be treated as a single plain text table
- Added support for HEAD_SCRIPT HTML fragment. This allows HTML to be defined that can be copied into the <HEAD> of a document. This can included <META> tags of <SCRIPT>...</SCRIPT> sections.
- Added Swedish interface. Many thanks to Dan Sverraby.
- Added new policy Only allow pages to be viewed in frames
- New utility A2HDETAG is available to registered users so they can "de-tag" their source files to remove all AscToHTM pre-processor tags, leaving a plain text fit for publishing, e.g. on Usenet.
- Added BEGIN_ASCII ... END_ASCII pre-processor tags. These identify text that will be copied to the output of A2HDETAG. It is ignored in all other conversions, and is intended to allow alternative text to be placed in text and HTML versions of a document.
- Added character encoding policy to allow the character encoding of a document to be set. The software has limited ability to detect Japanese ("x-sjis") and Cyrillic ("koi-8") text, but in some cases this will need to be set.
The auto-detect of character sets can be switched off by using the Look for character encodings policy
- Added policies to allow different fonts to be applied to different types of text as follows
Normal text Default font Headings Heading Font Text in tables Table font Table of contents TOC Font Fixed-pitch text Fixed font
The "Default Font" policy existed previously, the other four policies are new in this version.
- Added PAGE directive. This marks a page boundary. In HTML this simply results in a <HR> tag, since HTML doesn't really support pages. This may be expanded in future to allow page numbers and the like to be displayed.
Windows version
- Loading a policy file with "place policy in frames" policy will now toggle the Conversion type
- You no longer get prompted to "save policy" just because you pressed OK on one of the policy sheets. Now this only happens when something has been changed.
- The main menu now has a "check for updates" option. If you select this you'll be taken to the JafSoft website where you'll be told if any newer versions of the software have been released.
Documentation
- The list of bug fixes is removed from this document and is now to be found on-line at http://www.jafsoft.com/doco/asctohtm_bug_history.html
All versions
- Added support for HTML fragment files to $_$_INCLUDE other HTML fragment files. This allows common fragments to be shared.
- Fine-tuned the detection of whether or not a file has an in-situ contents list
- When Frames generation is selected the default "Split level" is set to 1 instead of 2. This means you'll get fewer files generated and - depending on the type of headings you have - no splitting may occur unless you manually increase the split level.
- The LINKPOINT pre-processor tag can now be used as a directive as well as an in-line tag. (see the Tag manual for details).
- Added a "Range" attribute to the CONTENTS_LIST tag. This allows mini-contents lists to be generated which contain only entries for a part of the document, rather than the whole document, e.g. for just a single chapter. This should help those who want to split large files into pages and to have a mini-contents list for each section.
- Improved handling of VT escape characters. These are either removed from the output or converted to "line" characters
- Added auto-detect of double spaced files (files where every second line is blank). This will set the Input file is double spaced policy whenever double-spaced text is detected (unless the policy has already been set).
Version 4.0 represents a major update over the previous version 3.3.
API version
- For those wishing to call AscToHTM programmatically, an API has been developed. This is sold under separate license. Contact info<at>jafsoft.com (replace "<at>" by "@") if you're interested.
Linux version
- A Linux command line version will soon be available. Beta versions have been tested, and I hope to do a Linux command line release just after version 4 is released.
Windows version
- You can now choose from the main screen whether you want your HTML output as one or more HTML file(s), sent to the Windows Clipboard (see Output to the Windows clipboard, or turned into a set of HTML frames (see Frames).
- Program now remembers positions of windows from one invocation to the next.
- The user interface is now available in Italian.
All versions
- Version 4 introduces frames support (see Frames). This introduces a large number of supporting policies :-
Use main header in header frame
Use main footer in footer frame
Add contents frame if possible
Add Frame borderOpen frame links in new window
New frame link window nameAdd NOFRAMES links
NOFRAMES link URLNumber of levels in contents frame
Header frame background colour
Header frame text colour
Contents frame background colour
Contents frame text colour
Footer frame background colour
Footer frame text colour
- Added HTML fragments feature, with HTML fragments file policy and DEFINE_HTML_FRAGMENT, RESET_HTML_FRAGMENT pre-processor commands. This allows you to define HTML fragments that can be used to replace the standard HTML generated by the program. This allows you to customize headers, footers, horizontal rules, contents lists, navigation bars and more.
- Added support for URL parsing, including :-
- new top level domains (.info, .biz etc) are supported
- the "snews://" secure news server protocol type is now supported
- URLs of the form http://username@domain_name/... are now supported
- Added Check domain name syntax policy
- Added Create Telnet links policy
- Added support for "obfuscated" URLs such as
http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/Although the display text is left unchanged, the hyperlink will point to a non-obfuscated URL (either the domain name, or an IP address). This is because the obfuscated URLs such as there are often used by spammers, and the author has no intention of allowing his software to aid spammers in their goals.
If someone cares to give me a valid reason for using such URLs I may reconsider this behaviour.
- Added support for embedded headings with the Expect embedded headings policy (see embedded heading detection). These are "headings" that are embedded as the first sentence in a paragraph.
- Added support for headings that start with particular words or phrases via the Heading key phrases policy (see section on detecting key phrase headings).
- New /COMMA and /TABBED command line qualifiers that allow comma delimited and tab delimited files be converted into tables.
- Added Check indentation for consistency policy to allow checking of headings to be relaxed (e.g. when they're centred on the page).
- Added Look for diagrams policy
- Added Input file contains PCL codes policy
- Added Input file contains Japanese characters support.
- Added Preserve new paragraph offset policy
- Added Omit <HEAD> and <BODY> from output policy
- Added Document Base URL policy
- Added Comment generation code policy
- Added Number of words to include in filename policy to allow filenames to be generated from the first few words of the title when splitting documents with underlined or capitalised headings at each heading.
- Added Lines to ignore at end of file and Lines to ignore at start of file policies to
allow lines at the start and end of the source file to be discarded. This can be useful if you source text is coming from a third party source that adds extra, unwanted, lines.
- Added Suppress all colour markup policy
- Added Column boundaries have zero width policy
Windows version
- On some systems DDE doesn't always work properly. This would cause the program to hang when it attempted to display results. In such cases you would need to stop the program from the task manager. In version 4 the program will now detect when this has happened and disable use of DDE next time it runs.
NOTE: DDE won't work with Netscape 6.0 (it doesn't support it)
- Added the policy Suppress URL messages to the Settings | Diagnostics menu option. When enabled all URLs, email addresses etc will be listed in the log file. Since this file can be saved to disk, this is one way of identifying all the candidate hyperlinks from your text file.
All versions
- Improved analysis for tables using bar ('|') column separators
- Improved detection of ASCII art diagrams.
- Improved handling of heavily indented blocks of text. Previously these were (poorly) rendered as tables. Now the tables more accurately preserve the large indentation (see Text blocks).
- The first three words of an underlined heading are now used to generate the filename. Previously only the first word was used, leading to less meaningful names, with more chances of duplication.
- VMS command line now allows multiple filespecs, separated by spaces. Policy file must now be a .pol file, rather than the second argument.
- Anchor names from filename are now lower case (to reduce possible mismatches)
- Shareware version now expires after 30 days + 5 uses. This will allow people to use the software on 5 different days after the first 30 days, giving people more time to evaluate the software at their leisure.
- Now strip out leading and trailing "---" from heading text to make them more presentable in HTML or RTF
- Added support for headings that span up to 3 lines, previously this was only 2.
- Changed heading to allow <H4> markup to be used. Previously "level 4" headings would get <H3> markup since anything smaller would end up smaller than the main text. With the advent of CSS style sheets This should be less of a problem.
- Changed emphasis handling to allow hyphenated parts to be emphasised independently, e.g. pre-formatted or pre-formatted.
The AscToHTM 3.3 release follows 6 "micro-releases" announced via the updates page on the Web. As such it will appear as a small step forward over 3.2.06, but in fact it offers a fair amount of new functionality over version 3.2
Major changes in version 3.3 include :-
- Support for fonts. You can now choose a font for the whole document. By default this is implemented using CSS, but you can elect to use <FONT> tags should you prefer.
- Enhanced Language support The Spanish and German interfaces added in the last version have had Portuguese added. Also a new feature allows you to save the interface to a "language skin" text file which may be edited and then reloaded. Using this feature we can now offer
- American English (simply a spell-checked UK English file)
- "Babelfish" French. A French translation from http://babelfish.altavista.com/
- "Babelfish" Italian. An Italian translation from Babelfish.
If anyone wants to correct these files and send them back to me, feel free.
- More table generation controls. Several new controls have been added to give you more control over the detection, analysis and generation of tables in the text.
- Support for comma and tab delimited tables. Pre-processor commands have been added to allow you to mark up a section of comma-delimited or tab-delimited data you want turning into a table.
- Support for preserving file/line structures. You can now elect to preserve the original line structure of a file, or to place the whole file in <PRE> markup (which is a little defeatist, but has its uses)
- Support for non-standard characters. The program can now recognize, to a limited extent, DOS line-drawing characters, MIME-encoded text and text documents with "change bars" in them.
- New "Tag manual". The Using the pre-processor and in-line tags sections of this document have now been re-merged and their contents largely moved to a new document called the Tag manual.
Fonts
- The default font for the whole document can now be set via the Default font policy. Headings will also adopt the selected
font, and will scale with the selected font size, although the <H1> headers are slightly smaller than the default.
You can choose to have the fonts implemented using <FONT> tags or CSS (e.g. according to your target audience) using the Use CSS to implement fonts policy.
Tables
- Added several new policies and tags to help with table analysis. Policies added include
Default TABLE layout
(also pre-processor tag TABLE_LAYOUT)This allows you to specify the number of columns in each table, and the attributes of each column, specifically the character position that marks the end of each column. Rather than use this policy, it is probably better to use the related directive $_$_TABLE_LAYOUT in the source text on a per-table basis.
Default TABLE alignment
(also pre-processor tag TABLE_ALIGN)Allows the alignment of the table to be specified (left, right, center)
Ignore table header during analysis (also pre-processor tag TABLE_IGNORE_HEADER)Specifies that table headers should be ignored when columns are being auto-detected. Some tables have complex headers that confuse the analysis. This policy can be used to help them be ignored.
Controls the degree to which pre-formatted lines should be expanded into adjacent text.
Controls the degree to which columns which don't appear to be very clear should be "merged" together
Indicates that tables could be using blank lines to separate rows of data. This affects the analysis and detection of the tables extent.
- Added support for embedding comma-delimited and tab-delimited table data in your source file (e.g. data exported from Excel and the line).
The new pre-processor directives :-
BEGIN/END_COMMA_DELIMITED_TABLE
BEGIN/END_DELIMITED_TABLE
Other
- Added options to allow more control over how the original document's file structure should be preserved
Treat each line as a paragraph
If this option is selected, every line in the source file is treated as a paragraph. This may be suitable if the file has been authored using an editor that wraps the lines (i.e. doesn't put in hard breaks) and which doesn't add blank lines between paragraphs.
If this option is selected a <BR> is added to every line, thereby preserving the line structure of the original and giving the resulting HTML file an "A4 look" that hugs the left margin regardless of how wide the window is made.
Preserve file structure using <PRE>
If this option is selected the whole document is placed in <PRE> markup, and very few conversions are attempted. This is really a "last resort" option that you may want to use if the file has complex structures which the program is failing to understand. This option was added for a customer who wanted to convert all 2800 RFCs without having to manually correct each one.
- Added support for parsing files with some Mime-encoded quotable strings in them. The new policy Input file contains mime encoding can be found under Analysis->File structure. At present there is some (very limited) auto-detect for this feature.
- Added support for documents with change bars. By default change bars are stripped out, and the changed text coloured red this behaviour may be changed in later versions. Added the new policy Input file has change bars which can be found under Analysis->File Structure.
- Added support for converting DOS characters. The new policy Input file contains DOS characters can be found under
Analysis->File Structure.
There is a limited auto-detect of DOS characters when diagrams are present.
- Changed hyperlink detection to only allow explicit FTP URLs and email addresses that don't start with numbers. These behaviours can be reversed using the new policies Only allow explicit FTP links and Allow email beginning with numbers, both of which are on the Output->Hyperlinks tab.
- Added the policy Create gopher links to toggle the conversion of gopher links into hyperlinks.
- Added the policy Check indentation for consistency so that it could be disabled in documents where headings were centred (and thus all at different indentations)
- Added several new pre-processor in-line tags :-
FILENAME - output name of converted file
FRACTION - output a fraction
VERSION - output program version numberIGNORE_THIS - for comments in the source code
- Added policy to allow selection of which version of HTML should be generated. Policy is "HTML version to be targeted".
Only "HTML 3.2" and "HTML 4.0 Transitional" are currently supported.
Windows
- The main screen now allows access to Policy file selection. Previously this was only available on the menu structure. The Menu structure has been left unchanged, meaning you now have two ways of choosing your policy files.
All
- The contents list styling has been changed slightly. For example only the major section headings are now shown in bold. People were complaining :-)
- Now add BORDER=0 attribute to tables with no border, rather than just omitting the attribute. This is a workaround for a bug in Netscape where a gap appears where a border would be when coloured rows are selected.
- Support for IE 3.0 as the browser of choice is added, by allowing the filename rather than file URL to be passed to the browser. To do this disable the "file://localhost/" option on the Settings->Viewers dialog screen.
- More changes on bullet characters, in particular to disallow 'O' (upper case) from becoming a bullet character through analysis. This really doesn't work in Portuguese documents :-) 'o' (lower case) may still be detected. If upper case 'O' is wanted this can still be manually switched on.
- Increased maximum width allowed in tables to 200 (after encountering a sample at 165). Lines longer than this are disregarded as candidate table lines.
- Introduction of German and Portuguese user interface, with extension of the Spanish user interface.
- Horizontal lines are now implemented as <HR> tags whose length attempts to approximate the original (e.g. 50% or whatever). Previously lines would become full width.
- Chapters 7 and 8 of this document were merged into a single chapter 7 (about the pre-processor). Most of that material has now been moved to the new Tag manual. Subsequent chapters have thus been renumbered which may lead to invalid references to chapter 11... especially if you keep old versions of the doco lying around.
Also reversed the order of sections in this "Change History" section
(Version 3.1 was never released, but a release of AscToTab occurred sometime after version 3.0, and so in keeping with the policy of synchronizing version numbers that was labelled version 3.1)
Over a year after the last release, version 3.2 is a major upgrade, but is only given a minor version number change because the remainder of the functionality produced in that time will be revealed in version 4.0.
Version 3.2 starts to prepare the groundwork for Cascading Style Sheet (CSS) and general font support that will be introduced in version 4.0. This has required a fairly radical change to the type of HTML code generated and how this is put together.
For example the HTML is now more standards compliant (this is now a stated goal of the software, although I can't always promise full compliance see Standards compliance), and as an aid towards CSS support "optional" end tags such as </P> are now being placed in the generated HTML.
Note that the use of the <FONT> tag is deprecated in HTML 4.0, and if you choose to add FONT markup to your pages they'll become much bigger, especially if they contain tables. This is because the HTML standard requires the FONT tag to continually be re-expressed to achieve the right appearance in all browsers (believe me, I only accepted this through bitter experience and grudgingly).
Major changes in version 3.2 include :-
- The program now always makes three passes through the document - previously it only did this if a contents list was requested. This may make the conversion a little slower. The middle pass calculates how the file will be split into sections, where all the hyperlinks should point to and what the contents list should be. This approach should be less error prone than previously.
- New "overview" options (see 'What to look for' Policies). These allow you to easily enable and disable the program's search for certain features.
- Introduction of in-line tagging (see in-line tags). These allow you to get more out of your conversion by inserting commands into your source text.
- Addition of DDE support (in Windows)
- New and improved command line options, and full command line support built into the Windows version
- Improved message filtering. Each message is now labelled according to its type (information, warning etc), and may be optionally suppressed or filtered by severity. A new /SILENT command qualifier allows complete suppression of messages.
- Improved log file capability
- Added support for mail and USENET headers
- (Limited) support added for stripping out page markers, converting "double spaced" files, and converting .prn and VT escape sequences. This functionality may be improved in later versions.
- New options to colour the odd and even rows of tables differently (see Table generation policies and 7.1.4)
Windows Version
- Added "Save" option to status dialog, so that the messages can be saved into a .log file
- Added DDE support to display results in existing browser window
- Full drag and drop support added. You can now drag files onto the program when it is visible.
- New "browse for directory" buttons added.
- More menu options added to make finding policies easier.
All versions
- Now support tab-delimited tables
- Support for stripping out mail and USENET headers
- New pre-processor directives :-
- BEGIN/END_DELIMITED_TABLE section delimiters
- BEGIN/END_IGNORE command
- CONTENTS_LIST command
- NAVIGATION_BAR command
- LINERULE command
- TOC command
- New and improved command line qualifiers
- New overview "look for" analysis policies :-
- Other new analysis policies :-
- New diagnostic policies :-
- "Monitor tag generation"
- Display messages policy and /SILENT qualifier
- Suppress INFO messages,
- Suppress TAG ERROR messages
- Suppress URL messages
- Suppress WARNING messages
- Suppress program ERROR messages
- Other new output policies :-
- Maximum level to show in contents
- Preserve underlining of headings
- Use <EM> and <STRONG> markup
- Colour data rows and related policies (see TABLE generation policies).
- Default TABLE cell alignment and TABLE_CELL_ALIGN directive
- Suppress all colour markup
- Open links in new browser window and new browser window name
- Break up long HTML lines
On the web site, and documentation
- A dedicated site www.jafsoft.com now deals with AscToHTM and related products.
- An updates page has been added to the Web site. This will list all the updates available for AscToHTM, although in most cases you'll need to be a registered user to receive details for you to obtain the update.
- An AscToHTM FAQ has been added to the web site. It's not finished yet (what part of the web is?), but it may help answer some of your questions.
- Created a new document called "The Policy manual". This replaces what was becoming the largest section of this document.
Windows version
- The Windows help file now has a better Index. It also has a full contents list as a topic, showing you the structure of the RTF file used to generate the Help file. Unfortunately I've been unable to hyperlink this topic.
- The Windows version now "remembers" which options page you were on so that each time you go back there the same sheet is shown.
- The Windows version is now "statically linked" against the necessary .DLLs. This makes the program slightly larger, but makes the download smaller as it is no longer necessary to ship .DLLs with the program. This makes overall version management simpler.
VMS version
- The VMS version now converts all filenames to lower case internally. This is so that all hyperlinks and references to the file are in lower case, making them more Internet-friendly and portable to other systems.
All versions
- Changes to the tagging to aid standards compliance and CSS support. this includes the addition of the </P> tag which was previously omitted. These changes have introduced slight differences in the amount of vertical white spacing produced in places.
- Improvements have been made to the file splitting algorithms. In particular
- The program will no longer generate two output pages with the same name. Where duplicate names are detected, the second file is given a generated name, usually by appending "_n" (n=1,2,3...) to the filename. All hyperlinks pointing to sections in the duplicate file will be adjusted accordingly.
- A file with underlined headings can now be split into pages at the heading boundaries. The subsequent pages have _U1, _U2... appended to the name of the first page.
- Local links (i.e. to anchors in the same file) are now recognised as such, and the filename is omitted.
This should make it easier to rename files after production without breaking local hyperlinks. Links to/from other files would still stop working though.
- link names for underlined or capitalised headings that are more than 60 characters long are now truncated. They are given a link name derived from the first 30 characters of the section name with a unique identifier tagged on the end. This avoids long link names being split over two or more lines and becoming unusable.
- Allow relative links to subtract out filename (e.g. in contents list) when target is in same file
- Can now recognize URLs with commas in then such as recognize http://cgi.pathfinder.com/netly/opinion/0,1042,1692,00.html in addition to comma separated lists of URLs.
- The KEYWORDS, DESCRIPTION and TITLE pre-processor commands can now be multi-line. This allows long lists of keywords to be placed over several lines (each beginning with the command), making then easier to manage.
- The default name for the directory index file is now "dirindex.html" rather than "index.html" to prevent overwriting of any existing index file.
- Program now always does a "contents pass". Benefits of this are
- can now generate in situ contents lists /contents bars
- can now generate navigation bars wherever wanted
- can now eliminate duplicate filename generation
- can check hyperlink cross references are correct
- Improved table/diagram recognition
- Now support conversion of tab-delimited data into tables, provided it's placed inside BEGIN/END_DELIMITED_TABLE directives
- Relaxed indentation test on "n.n" headings. Heading can now be 2 characters to the left, or 1 character to the right of the expected position
- Now recognize use of asterisk and underscore combined to produce bold-italic emphasis. Previously only asterisk (bold) and underscore (italic) by themselves were recognised.
- Now recognize "]" as a possible "quoting" character.
- Now recognize '+' as an underling character
- Improved error reporting when file errors occur. The program will now abort the conversion on error, instead of continuing and reporting errors for each line.
- Now detect read-only output directories and abort conversion. This would occur if you tried to convert a file on CD.
- Definitions now use <DL compact> offering a more-faithful rendition of the original text
- Underlined heading and text will now be rendered as underlined by default. Previously this either promoted the previous line to be a heading, or was drawn as a line.
- Improved handling of first line indents on paragraphs. Now these are preserved in the output by the inclusion of characters, and the error whereby the following line was deemed to be a different indentation (and thus acquire a <BLOCKQUOTE>) has been largely solved.
- Introduction of the TEXT in-line tag now allows numbers like "Windows 3.1" to be protected from conversion into a hyperlink to section 3.1.
There are a fair number of small changes in functionality over V2.3, together with a fair number of bug fixes and refined algorithms. A lot of development during this time was directed towards the production of a text-to-RTF converter (AscToRTF) using the same analysis engine. Consequently there are a lot of changes "under the bonnet".
The main functional change has been the revamp of the Windows User Interface. A new section (4.1.2) has been added to this document describing the Windows interface in some detail. The changes include :-
- the button bar is replaced by a proper Windows menu, allowing easier access to the programs functions.
- under the Help menu a link to the HTML documentation shipped with the software is now provided.
- the policy sheets are now "non-modal". This means you no longer have to dismiss them in order to do a conversion, you can leave them up whilst the conversion is going on, making it easier to go through the convert-change policy-convert cycle.
Windows Version
- Major re-structuring of the user interface
- Program's Help options now provide access to the online and offline versions of the HTML doco. A lot of people were downloading the software and then picking up a version of the doco, unawares they already had it. Don't you people read README.TXT files or what? :-)
All Versions
- New Search for Definitions policy
- New TAB size policy
- New Expect sparse tables policy and TABLE_MAY_BE_SPARSE pre-processor command
- New Add <BR> to lines with URLs policy
- New Output file extension policy
- New Minimise HTML file size policy
- New Headings colour policy. Eventually I hope to add a whole suite of heading styling options, as these have been requested by a number of people.
- New Convert TABLE X-refs to links policy and TABLE_CONVERT_XREFS pre-processor command
- New CHANGE_POLICY pre-processor command
- New Error reporting level policy
- Improved Windows interface
- Empty lines in a table cell now get an extra added, in addition to the <BR>. This is to compensate for a bug in Internet Explorer 3 which would ignore the <BR> otherwise, leading to alignment errors.
- Now treat phrases with all the words connected by underscores, and with underscores at both ends as well as underlined e.g. this type of thing
- Improved handling of tables with long URLs in them. Previously these would not be recognised as part of a table. Increased "long line" limit inside tables to 110 characters
- Improved error reporting/handling
- Report unrecognised pre-processor lines
- Report results of table analysis (e.g. if diagrams are detected)
- Report failure to find requested files
- Abort conversion if can't find requested policy file
- Improved detection of "mal-formed" tables. Previously this was over-cautious, especially on short tables.
- Now add a trailing "/" to www etc URLs if none present (e.g. www.jafsoft.com). This is a more correct URL, which should be accessed slightly more efficiently.
- Now recognised "....." underlining, although why people do this is beyond me :)
- Improved contents list detection in short documents with only level one headings, and documents with a chapter "0".
- Improved headings detection in small files. Made this less trigger happy.
- Improved code detection, and now add bold emphasis of C++ like comments inside a code section
- No longer allow "{" and "}" to be detected as probable bullet characters when code is expected
- I've produced (with help from antipodean friends) an icon for files converted by AscToHTM. It's called a2hlogo.gif. Feel free to use it should you wish on any pages created with AscToHTM.
An example piece of HTML code would be
<A HREF="http://www.jafsoft.com/asctohtm/?from=doco"> <IMG SRC="a2hlogo.jpg" WIDTH=100 HEIGHT=36 BORDER=0 ALT="Converted by AscToHTM"></A>
- With the introduction of the Add <BR> to lines with URLs policy this behaviour is no longer default. That is, if you do want <BR> added at the end of all lines containing URLs you will need to switch this behaviour on using the new policy.
- With the introduction of the Convert TABLE X-refs to links policy this behaviour is no longer default. That is, if you do want section links inside your tables, you will need to switch this behaviour on using the new policy.
- ".htm" files are now with a lowercase extension, unless Use DOS filenames policy selected
Minor bug fixes and upgraded functionality over V2.2. The main functional changes have been
- The introduction of wildcard support to allow conversion of multiple files at once.
- (related to the above) the introduction of the Directory Page
feature that allows the generation of a hyperlinked document spanning
all the files in a directory.
- Major re-write of the contents-list generating routines. The
program now makes a third, intermediate, pass through the document
to analyse the contents structure. This means that contents lists
are now placed at the top of the HTML file be default, rather than
in a separate file as previously - though that behaviour is still
supported if wanted.This approach is expected to pay further dividends in later releases.
Windows Version
- Added a "Preform simple conversion" tick box on the front panel. This does exactly the same as the Keep it simple policy.
- Improved the Headings dialog to allow headings policies to be more easily edited now.
- Pre-processor document sections now working.
All versions
- Wildcard support has been added (see Using Wildcards).
- Major re-writing of contents list generation has occurred. Includes new Use any existing contents list and Generate external contents file. More changes are expected here in later versions.
- New Directory Page feature. Supporting policies include:-
Make Directory
Directory filename
Show file titles in Directory
Indent headings in Directory
Directory title
Directory keywords
Directory description
Directory return hyperlink text
Directory Script file
Directory header file
Directory footer file
- New Minimum TABLE column separation policy and TABLE_MIN_COLUMN_SEPARATION pre-processor command to allow some tuning of table analysis.
- New Use first heading as title policy
- New Use first line as title policy
- New Recognised USENET groups policy
- New Automatic centring tolerance policy
- New Use <P> markup for paragraphs policy to allow choice of either <P> or <BR> markup to be used for paragraphs.
- New Default table width policy and TABLE_WIDTH pre-processor command to allow table widths to be specified as percentages
- New pre-processor command HTML_LINE
- Reinstated some of the "error" messages removed in the last version, to do with section numbering. This should make it more visible when the section heading analysis goes wrong.
- Added error reporting to file open. You should now get an error message if the program fails to find/open a file somewhere.
- Now support headings down to 5 levels (previously this was 4). Note, if you only have a couple at this level, the program may still ignore them as statistically insignificant.
- Removed certain policies (such as "generate policy file") from the output when generating a full policy file. This is because, when they were read back in, they could cause problems.
- The "Include document section" policy is now renamed to "Include document section(s)" reflecting the fact that you can now enter multiple values on one line, rather than requiring multiple lines with one value each as previously.
- Major re-structuring and additions to HTML markup produced to make the section more coherent and up to date. Some of the sections marked as new in this version are simply the documentation catching up on the features added in earlier releases.
Sometimes I just work too hard :^)
First major release after V2.0 (when AscToHTM first went fully-Windowed). Major change this time has been the introduction of TABLE generating algorithms. These were first made available as a separate freeware utility AscToTab.
This version is reviewed by ZDNet and awarded 5-stars, their highest award.
Table generation
This is the biggest change in this version. AscToHTM now incorporates the technology first introduced in AscToTab. To support this the detection of pre-formatted text has been improved, new policies added, and new pre-processor commands added.
New policies include :-
Attempt TABLE generation
Default TABLE border size
Default TABLE header rows
Default TABLE header cols
Default TABLE cell spacing
Default TABLE cell padding
Default TABLE colour
Default TABLE border colour
Default TABLE caption
New Pre-processor commands include :-
BEGIN/END_CODE
BEGIN/END_DIAGRAM
BEGIN/END_TABLE
TABLE_BORDER
TABLE_BORDERCOLOR
TABLE_BGCOLOR
TABLE_CAPTION
TABLE_CELLSPACING
TABLE_CELLPADDING
TABLE_HEADER_ROWS
TABLE_HEADER_COLS
Other changes
- Added pre-processor BEGIN/END_CODE commands to allow sections of code samples to be identified and distinguished from tables
- Added pre-processor BEGIN/END_DIAGRAM commands to allow diagrams and sections ASCII art to be identified and distinguished from tables
Documentation
- Added the "Policy Dictionary" (since superceded by the Policy manual), and renumbered the document accordingly.
All versions
- "tables/pre-formatted text"
- Various improvements to detecting the start and end of pre-formatted regions of text.
- Shareware now expires after 30 days, rather than after a fixed date.
- Headings policies have been revised. Still more work to be done in this area.
- Slight improvement in detection of centred text. Still