Chapter 13 : ARM Assembler
Although this book is not concerned with programming in arm assembler as such, it is appropriate to look at how assembler may be incorporated into BASIC programs, and how assembled routines may be called. BBC BASIC has always included a full assembler as part of its facilities, so we shall concentrate on the changes that have come about through the use of ARM assembler (instead of the 6502 assembler of earlier BBC Micros), and its incorporation into BASIC V.
Embedding ARM assembly language code in a BASIC program follows the same pattern as before. An area of memory must be defined for the assembled code using the dim statement. The assembly code is enclosed in square brackets, normally in a FOR...NEXT loop which makes two passes through the code. The typical outline of this is shown below:
DIM code_area% 400
FOR pass%=0 TO 3 STEP 3
P%==code_area%
[
OPT pass%
]
NEXT pass%
In addition to the use of P% to act as a program counter during assembly, the variable 0% can additionally be used so that the code is assembled to run from an address different to that at which the code is being assembled. In that case, P% is set to the address that the code will be run from, and 0% is set to the address at which the code will be (temporarily) assembled. This all follows exactly the same format as in previous versions of BBC BASIC.
If the machine code routine is to be called from BASIC, and a return made back to basic, then the address stored in register R14 should always be used to return to BASIC. Typically this would be effected by specifying MOV PC,Rl4 as the last executable instruction.
BASIC V also provides a number of assembler directives, but because we are now dealing with ARM rather than 6502 assembler, there are some differences compared with those used previously. Of course, their functions are broadly the same as before, the facility to insert specified values, characters and strings as part of the assembled code. The directives now available are as follows:
EQUB <int> | Define one byte of memory from the LSB (least significant byte) of the integer value 'int. |
EQUW <int> | Define two bytes of memory from 'int' |
EQUD <int> | Define four bytes of memory from 'int' |
EQUS <string> | Load between 0 and 256 bytes of memory with the given string. |
ALIGN | Align the values of P% (and 0%) to the next word boundary. |
ADR <reg>,<addr> | Assemble an instruction to load the specified address into the given register (R0 to Rl5) in position independent format. |
The first four directives have alternative mnemonics (DCB, DCW, DCD, and DCS respectively) which may be used in place of those given above. These directives are very similar to their 6502 equivalents. The values or strings specified may be expressions. The ALIGN directive is used, after EQUS for example, to ensure that the next instruction starts on a word boundary.
The Archimedes is a 32-bit (four bytes) machine, compared to the 8 bits of the 6502. This can be quite important.
The ADR directive looks very much like an ARM assembler mnemonic, but this is not the case. Its purpose is to generate an assembled instruction as part of the resulting machine code program which, when executed, will cause the address specified in the directive to be loaded into a particular register. The address is in position independent format, which means that it is specified as an offset from the current value of the pc (program counter). Thus:
ADR RO.data
will load the address at which 'data' is assembled into RO.
LDR R0,data
will load the value at address 'data' into R0.
All of the above information is covered in some detail in Part Two of the Programmer's Reference Manual. Another useful reference is 'Archimedes Assembly Language by Mike Ginns, published by Dabs Press.
Calling Machine Code Routines
BASIC programs can call machine code routines by using USR or CALL, much as with earlier versions of BBC BASIC. USR is the simplest by far of these two calls. It is effectively a function whose only parameter is the start address of the machine code routine being called, and which returns the value left in the first register, R0. No parameters other than the address of the routine itself are allowed.
The call instruction is more comprehensive, allowing parameters as well as the start address to be specified. The parameters are accessed through a parameter block which must be set up by the BASIC program before executing CALL. On entry to the machine code routine, the registers are set up as follows:
Register | Set up as |
R0-R7 | Values of built-in integers A% - H% |
R8 | Pointer to basic's workspace (start address referred to as argp) |
R9 | Pointer to a list of L-values of the parameters |
R10 | Number of parameters |
R11 | Pointer to basic's string accumulator |
R12 | Pointer to current basic statement |
R13 | Pointer to basic's stack |
R14 | Return address for BASIC |
On the Archimedes, up to eight values may be passed to the first eight registers (ro to R7) by assigning them to the appropriate integer variables. This is equally valid for routines which are accessed with USR. This means there are two ways of passing values to a machine code routine which is CALLed. Using A% to H% is certainly an easy way of entering initial values into any of the registers R0 to R7 for routines called by either USR or CALL.
Variables that are specified as parameters following CALL can be used to both supply values to the machine code routine, and as a means for the routine to return values. Note too, that if the routine called was to change the values of any variables passed as parameters, then these will retain their new values on a return to BASIC.
Writing machine code routines is a book in itself, and unravelling the parameters and other information can be quite a complex and daunting task. The following sections deal with what is involved, but unless you are familiar with parameter passing, on the 6502-based BBC micro for example, you may well wish to skip the rest of this chapter.
For a simple approach, use A% to H% to pass values to a routine. To return values, use indirection operators to access memory locations defined within the assembler routine to which the results have been assigned. The following, very simple, example adds together the contents of ro and Ri (A% and B% in the calling program), and leaves the result at the address 'result'.
.start
ADD R2,RG,R1
STR R2,result
MOV PC,R14
.result
EQUD 0
If this were embedded in a BASIC program, and assembled as explained previously, it could then be called as in the following example:
A%=123
B%=456
CALL start
PRINT [result
Several results could be left in a table starting at 'result', and then accessed as:
PRINT [result
PRINT result! 4
PRINT result'8
and so on. Of course, if a routine is only going to return a single value, it may be best to assign this to RO and use the USR function which returns the contents of this register. The routine above could have been written:
.start
ADD RO,RO,R1
MOV PC,R14
and called by typing:
A%=123
B%=456
PRINT USR(start)
Using CALL with Parameter Passing.
We will now examine the details of parameter passing using the CALL statement. This is also covered in the User Guide, under the keyword CALL, in some detail, but this manual gives virtually no other information on arm assembler at all. You will need to refer to the Programmer's Reference Manual, or to any of the books on this subject, for more detailed information and guidance.
Refer once again to the table showing the settings of the 16 registers on entry. The L-value of a variable is the address in memory at which the value is stored. R9 is a pointer to a table, held in reverse order, consisting of two words for each parameter passed, its L-value and its type. The range of possible types is listed below.
BASIC | Type | Address points to | Comment |
?name | 0 | byte-aligned byte | byte value |
!name | 4 | byte-aligned word | four-byte integer |
name% | 4 | word-aligned word | four-byte integer |
name%(n) | 4 | word-aligned word | integer array element |
|name | 5 | byte-aligned 5 bytes | five-byte real |
name | 5 | byte-aligned 5 bytes | five-byte real |
name(n) | 5 | byte-aligned 5 bytes | real array element |
name$ | 128 | byte-aligned 5 bytes | address (4 bytes) and length (1 byte) of string |
name$(n) | 128 | byte-aligned 5 bytes | address (4 bytes) and length (1 byte) of string array element |
$name | 129 | byte-aligned bytes | string terminated by ASCII 13 |
name%() | 256+4 | word-aligned word | pointer to integer array |
name() | 256+5 | word-aligned word | pointer to real array |
name$() | 256+128 | word-aligned word | pointer to string array |
For each variable passed as a parameter, the table shows the type number and the information pointed to by the associated address. Any of the data types recognised by BASIC, from a single byte to a whole array, may be passed as a parameter. For whole arrays, the address provides a pointer to a list of all the values needed to access the array from within the called machine code routine. If the array has not previously been dimensioned, the pointer (word-aligned word) contains zero. Otherwise, the list of words pointed to by the word pointer is as follows:
word+0 first dimension limit (+1)
word+4 2nd dimension limit (+1)
........
word+n 0
word+n+4 total number of entries in array
word+n+8 the zero element of the array
The dimension limits are the limits to which the array was dimensioned in a DIM statement, with 1 added in each case. The list of limits is terminated by a zero word. The following two words contain the total number of possible entries in the array, and the value of the first (position 0) element.
Register R14 contains the address of a branch instruction to return to BASIC. The last instruction in a machine code routine, MOV PC,Rl4, copies that address to the PC, which causes the branch instruction to be executed to effect the return. The words following that specified in R14 contain various values relating to the current BASIC program as follows. Each word is an offset from ARGP whose location is stored in R8.
Register | Set up as |
RI4 | return address to BASIC |
R14+4 | string accumulator |
R14+8 | current value of PAGE |
R14+12 | current value of TOP |
R14+16 | current value of LOMEM (start of BASIC variable table) |
R14+20 | current value of HIMEM (BASIC end of stack) |
RI4+24 | limit of available memory (MEMLIMIT) |
R14+28 | start of free space |
R14+32 | current value of COUNT |
R14+36 | not used |
R14+40 | exception flag (contains byte-aligned bytes) |
R14+44 | current value of WIDTH |
Locations from R14+48 onwards contain the addresses of a set of internal routines as follows.
VARIND | returns the value and type of a parameter |
STOREA | converts between formats, eg, integer and real |
STSTORE | store a string |
LVBLNK | obtain address and type of variable for VARIND |
CREATE | create a variable if LVBLNK fails |
EXPR | evaluates an expression supplied as a string |
MATCH | analyse source string lexically |
TOKENADDR | returns a string corresponding to a given token |
We have been getting into progressively deeper waters here, and are now well and truly into the realms of the machine code programmer. We will therefore call a halt, and refer the reader to the Programmer's Reference Manual and other books on this subject for more information.
|