Click Here To Go To The Your Computer Archive



Written By Fintan Culwin


Cover Art
Click Here To Enlarge Loading Screen

Loading Screen
Click Here To Enlarge Opening Screen

Opening Screen
Click Here To Enlarge Screenshot

Game Screenshot

Compacter

Rem statements, variable names, spaces and lines waste space in your programs. Fintan Culwin piles on the pressure

The program presented in this article contains four methods of saving space. First, it removes Rem statements; second, it renames all variables and reduces function names to optimised two-character codes - this procedure is known as re-variable - third, it removes all spaces and fourth, it backs up lines.

As it is similar to using a compiler I will borrow the terminology for the rest of this article. The programs that do the compacting I will call the compactor. The program to be compacted will be called the source program and the compacted program produced will be called the object code. Where a variable name is discussed it also means string, floating and integer names and arrays. Where a procedure name is referred to it applies to procedures and functions equally.

The main program is given in listing I; it requires the machine-code routine produced by listing 2 to be loaded into the machine before the line-pack section is called. There are various places in memory that the code can be loaded into. The most useful place is below HIMEM for Mode 7. But it can be relocated by changing the value of P% in listing 2; this is catered for in the main program's initialisation section.

The most suitable source files for the compactor will be those with large amounts of screen memory. The compactor program itself occupies about 11K in source form and around 6K after it itself has been compacted. It should, in its compacted form, run easily in 16K.

The procedure is first to load the source program. Then reset PAGE above it by typing:

PAGE=PAGE+256

then Load and Run the compactor program. The compactor asks if the machine-code routine needs to be loaded and, if so, asks where it is to be loaded and then *LOADs it. If the source file does not extend beyond &4000 there should be enough space for the compactor program to run. If there is not enough space, then there are two possibilities.

Firstly, the source program can be loaded from a lower address. Page can be reset downward before loading the source program. It is important to remember that 0D00 is not used; 0C00 is the user-defined graphics; 0B00 is the user key definitions and 0900 is the 242 buffer.

To accommodate this the compactor program prompts for the start address of the Basic program to be input. If this is still not enough for your source program, the compactor itself can be split up. Each of the major sections is complete in itself and draws on some of the utility functions included in the utilities section. This is made clear in the program listing.

After the compactor program has been run, it is wise to renumber the file before saving it as a normal Basic program. The object file is virtually unreadable and definitely uneditable so a copy of the source file should be retained for any future development or maintenance.

In order for a program file to be successfully compacted it has to be prepared with the compactor in mind. The rules are:

  1. No computed GOSUBs or GOTOs.
  2. No variable names of two characters - three characters within the assembler - not including the terminal % or $.
  3. No two-character variable or procedure names.
  4. No use of variable names that are identical with assembler mnemonics, LDA, STA and EOR.
  5. A space in the assembler after every mnemonic including those that do not require an argument; NOP, ASL, CLC and so on.
  6. Variables cannot be used in any * commands if the assembler is not being used then point 4 can be safely ignored.

It is necessary to explain how the Basic interpreter stores the program and organises its variables. Although the program is typed in and displayed as a sequence of ASCII characters, it is stored within the machine in a shorter form.

To achieve this, each Basic keyword is replaced by one or two tokens. These tokens have values greater than 123 (&7B) in order not to be confused with the other alphaŽnumeric parts of the file. Each line of the Basic program is prefaced by four bytes.

The first of these is an end-of-line delineator (&0D). The following two bytes are the line number organised as two parts, high part and low part to the base 255. That is, the line number in decimal is 255 times the high part plus the low part. The last of the four characters is the line length in bytes, including the four-byte overhead, and has a maximum value of &EF (239).

There are a few other points worth noting. The way in which line numbers are referenced is not at all obvious. Referenced line numbers are the line numbers used in GOTO and GOSUB commands. These numbers are stored as a sequence of four bytes.

The first of these bytes is a token marker having the value

&84 (132)

The following three bytes are the line number itself, coded from two into three bytes. Acorn gives two reasons for this. Firstly, the coding avoids any confusion between line codes and tokens. Secondly, the coding allows for a rapid renumbering algorithm to be used. The decoding algorithm is:

Assembler Basic
LDA BYTE1 TEMP%=?BYTE1%
ASL A TEMP%=TEMP%*4
ASL A FACTOR%=TEMP% AND aI0
STA TEMP LOW%=FACTOR% EOR?BYTE2%
AWD #&C0 TEMP%=TEMP%*4
EOR BYTE2  
STA LOW HIGH%=TEMP% EOR ?BYTE3%
LDA TEMP LINE NUMBER=255*HIGH% + LOW%
ASL A  
ASL A
EOR BYTE3
STA HIGH

where bytes 1, 2 and 3 are the three locations following the &84 token. The method by which the variables are stored is a consideration to minimise the execution time of a Basic program.

The resident integer variables are always stored in locations &0400 (@%) to &047C (Z%). Other variables are identified by using their initial character as a pointer to an entry address lying in the range &0480 to &04F5. Each of these entry points indicates the location of the value of the first variable - the text of the variable name - having that initial letter.

It also contains a pointer to the next value and a further value. To look up the value of a variable, the interpreter uses the initial character to find the first name, attempts to match the names and carries on down the list until the variable is matched, or the end of the list is encountered.

The program commences its run by asking if the machine-code routine is installed and, if not, where to load it. If the source file does not occupy space below &E00, then it is probably wisest to load the Page &0D00 where it is safe . against an accidental mode change or hard reset. If this is not possible, then it can be loaded below HIMEM for mode 7, but it will be lost if a change of mode or a hard reset is made. The program then asks if you wish to use all the options. If you do not, then all the sections are presented separately.

The first of these is the de-Rem option which merely removes Rem statements where they occur. But if the first word after the REM is "debug" it will remove the whole line. This is followed by a down-copy option which leaves one space only between statements. This option is useful for the development of programs where some sections or lines are left in for de-bug purposes only.

The re-variable option, which renames all variables and procedures which are above the mininal length, follows. The down-packing option following does not allow any spaces to be left in the program. If the line-packing option is not chosen following this, then:

PROC-DOWN-COPY (FINISH%)

should be entered from the keyboard after the program has finished. If the assembler is involved in the source program. The final option presented is to pack lines together. If this option is chosen then the machine-code routine must be installed in the computer.

The first of the working sections is de-REM - Option%. The option is either to debug or de-REM as already explained. The section proceeds by initialising a local variable address to the Start address and then stepping through the whole of the source file in rwo repeat-until loops. The inner loop steps through each line and terminates when the end of file marker - &00 followed by &FF - is found.

Within each line the address is incremented, skipping three posinons if a reference line number token (&84) is found; and to the end of quotes, if a quote symbol (&22) ASCII 34 is found. If the line detects the Rem token (&F4) then, depending on the option, either the rest of the line is replaced with spaces, or the whole line is replaced with spaces. This is done by:

FN-REM-CRUNCH

which uses FN-Get-String to examine the first word of the REM statement. Its debug option is chosen and if the first word is Debug then FN-Start-Line is followed by FN-End-Line; or else FN-End-Line is called directly.

The routine also contains a switch called Assembler% which is turned on or off by the occurrence of the assembler markers. If the switch is on, then the assembler comment delineator is acted on in the same way; but the blanking-out of lines can finish when a multi-line delineator is found.

This section is followed by the re-variable section which renames all variables. Its stepping routine is largely identical to that of de-Rem, the major differences are that lines beginning with A * are left intact.

Hex numbers are skipped over as the system cannot decide between variable ABCD and number ABCA. The assembler delineators are also used to change the value of the variable string-length; which is used to decide if an encountered variable is long enough to be replaced. The main action routine

FN-ONE-VAR

is called when a valid start character is encountered. One-Var firstly attempts to identify the type of variable/name by looking backwards for the FN or PROC token (&A4 and &F2). If these are found, then the Type attribute can be set. After the string has been extracted, then the new string is produced by

FN-MAKE-STRING

The string is produced by a number from the array string-array%(2). 0 is for function; 1 for procedures and 2 for variables.

Within the assembler two other considerations apply. Firstly, the interpreter stores opcodes as three ASCII characters, not as a token. To avoid these being re-variabled then the minimum length of variables which will trigger PROC-Replace is increased from three to four. Any three-character variables outside the assembler will cause the op-codes to be re-variabled with disastrous consequences. Accordingly variables such as LDA, ASC, etc, should not be used if the assembler is being used. Secondly, a space must separate the code from the address in assembler, to avoid the compactor recognising it as a variable. To prevent this space from being removed by the line-pack routine it is replaced by CHR$ 0 in re-variable and changed back in down copy: Finish%.

The system does not discriminate between codes which require an address and those which do not - so a space must follow all codes. The line-packing routine works by replacing the four-byte line delineator with a colon and three spaces. Lines which start with an asterisk have to be left alone in their entirety. Lines which include an IF or REM statement have to be the last old line packed on to the end of the new line. Any line which starts with a DEF statement or which is referenced by a GOSUB or GOTO has to be put at the start of a new line.