www.riscos.com Technical Support: |
|
Assembly language is a programming language in which each statement translates directly into a single machine code instruction or piece of data. An assembler is a piece of software which converts these statements into their machine code counterparts.
Writing in assembly language has its disadvantages. The code is more verbose than the equivalent high-level language statements, more difficult to understand and therefore harder to debug. High-level languages were invented so that programs could be written to look more like English so we could talk to computers in our language rather than directly in their own.
There are two reasons why, in certain circumstances, assembly language is used in preference to high-level languages. The first reason is that the machine code program produced by it executes more quickly than its high-level counterparts, particularly those in languages such as BASIC which are interpreted. The second reason is that assembly language offers greater flexibility. It allows certain operating system routines to be called or replaced by new pieces of code, and it allows greater access to the hardware devices and controllers.
The BBC BASIC interpreter, supplied as a standard part of RISC OS, includes an ARM assembler. This supports the full instruction set of the ARM 2 processor. At present it neither supports extra instructions that were first implemented by the ARM 3 processor, nor does it support coprocessor instructions.
It is the BASIC assembler that is described below, serving as an introduction to ARM assembler.
The Acorn Desktop Assembler is a separate product that provides much more powerful facilities than the BASIC assembler. With it you can develop assembler programs under the desktop, in an environment common to all Acorn desktop languages. It contains two different assemblers:
These assemblers are not described in this appendix, but use a broadly similar syntax to the BASIC assembler described below. For full details, see the Acorn Assembler Release 2 manual, which is supplied with Acorn Desktop Assembler, or is separately available.
The assembler is part of the BBC BASIC language. Square brackets '[' and ']' are used to enclose all the assembly language instructions and directives and hence to inform BASIC that the enclosed instructions are intended for its assembler. However, there are several operations which must be performed from BASIC itself to ensure that a subsequent assembly language routine is assembled correctly.
The assembler allows the use of BASIC variables as addresses or data in instructions and assembler directives. For example variables can be set up in BASIC giving the numbers of any SWI routines which will be called:
OS_WriteI = &100 ... [ ... SWI OS_WriteI+ASC">" ...
The machine code generated by the assembler is stored in memory. However, the assembler does not automatically set memory aside for this purpose. You must reserve sufficient memory to hold your assembled machine code by using the DIM statement. For example:
1000 DIM code% 100
The start address of the memory area reserved is assigned to the variable code%. The address of the last memory location is code%+100. Hence, this example reserves a total of 101 bytes of memory. In future examples, the size of memory reserved is shown as required_size, to emphasise that you must substitute a value appropriate to the size of your code.
You need to tell the assembler the start address of the area of memory you have reserved. The simplest way to do this is to assign P% to point to the start of this area. For example:
DIM code% required_size ... P% = code%
P% is then used as the program counter. The assembler places the first assembler instruction at the address P% and automatically increments the value of P% by four so that it points to the next free location. When the assembler has finished assembling the code, P% points to the byte following the final location used. Therefore, the number of bytes of machine code generated is given by:
P% - code%
This method assumes that you wish subsequently to execute the code at the same location.
The position in memory at which you load a machine code program may be significant. For example, it might refer directly to data embedded within itself, or expect to find routines at fixed addresses. Such a program only works if it is loaded in the correct place in memory. However, it is often inconvenient to assemble the program directly into the place where it will eventually be executed. This memory may well be used for something else whilst you are assembling the program. The solution to this problem is to use a technique called 'offset assembly' where code is assembled as if it is to run at a certain address but is actually placed at another.
To do this, set O% to point to the place where the first machine code instruction is to be placed and P% to point to the address where the code is to be run.
To notify the assembler that this method of generating code is to be used, the directive OPT, which is described in more detail below, must have bit 2 set.
It is usually easy, and always preferable, to write ARM code that is position independent.
Normally, when the processor is executing a machine code program, it executes one instruction and then moves on automatically to the one following it in memory. You can, however, make the processor move to a different location and start processing from there instead by using one of the 'branch' instructions. For example:
.result_was_0 ... BEQ result_was_0
The fullstop in front of the name result_was_0 identifies this string as the name of a 'label'. This is a directive to the assembler which tells it to assign the current value of the program counter (P%) to the variable whose name follows the fullstop.
BEQ means 'branch if the result of the last calculation that updated the PSR was zero'. The location to be branched to is given by the value previously assigned to the label result_was_0.
The label can, however, occur after the branch instruction. This causes a slight problem for the assembler since when it reaches the branch instruction, it hasn't yet assigned a value to the variable, so it doesn't know which value to replace it with.
You can get around this problem by assembling the source code twice. This is known as two-pass assembly. During the first pass the assembler assigns values to all the label variables. In the second pass it is able to replace references to these variables by their values.
It is only when the text contains no forward references of labels that just a single pass is sufficient.
These two passes may be performed by a FOR...NEXT loop as follows:
DIM code% required_size FOR pass% = 0 TO 3 STEP 3 P% = code% [ OPT pass% ... further assembly language statements and assembler directives ] NEXT pass%
Note that the pointer(s), in this case just P%, must be set at the start of both passes.
The OPT is an assembler directive whose bits have the following meaning:
Bit | Meaning |
---|---|
0 | Assembly listing enabled if set |
1 | Assembler errors enabled |
2 | Assembled code placed in memory at O% instead of P% |
3 | Check that assembled code does not exceed memory limit L% |
Bit 0 controls whether a listing is produced. It is up to you whether or not you wish to have one or not.
Bit 1 determines whether or not assembler errors are to be flagged or suppressed. For the first pass, bit 1 should be zero since otherwise any forward-referenced labels will cause the error 'Unknown or missing variable' and hence stop the assembly. During the second pass, this bit should be set to one, since by this stage all the labels defined are known, so the only errors it catches are 'real ones' - such as labels which have been used but not defined.
Bit 2 allows 'offset assembly', ie the program may be assembled into one area of memory, pointed to by O%, whilst being set up to run at the address pointed to by P%.
Bit 3 checks that the assembled code does not exceed the area of memory that has been reserved (ie none of it is held in an address greater than the value held in L%). When reserving space, L% might be set as follows:
DIM code% required_size L% = code% + required_size
Once an assembly language routine has been successfully assembled, you can then save it to file. To do so, you can use the *Save command. In our above examples, code% points to the start of the code; after assembly, P% points to the byte after the code. So we could use this BASIC command:
OSCLI "Save "+outfile$+" "+STR$~(code%)+" "+STR$~(P%)
after the above example to save the code in the file named by outfile$.
From memory, the resulting machine code can be executed in a variety of ways:
CALL address USR address
These may be used from inside BASIC to run the machine code at a given address. See the BBC BASIC Guide for more details on these statements.
The commands below will load and run the named file, using either its filetype (such as &FF8 for absolute code) and the associated Alias$@LoadType_xxx and Alias$@RunType_xxx system variables, or the load and execution addresses defined when it was saved.
*name *RUN name* /name
We strongly advise you to use file types in preference to load and execution addresses.
The assembly language statements and assembler directives should be between the square brackets.
There are very few rules about the format of assembly language statements; those which exist are given below:
The BASIC assembler contains the following directives:
EQUB int | Define 1 byte of memory from LSB of int (DCB, =) |
EQUW int | Define 2 bytes of memory from int (DCW) |
EQUD int | Define 4 bytes of memory from int (DCD) |
EQUS str | Define 0 - 255 bytes as required by string expression (DCS) |
ALIGN | Align P% (and O%) to the next word (4 byte) boundary |
ADR reg, addr | Assemble instruction to load addr into reg |
At any particular time there are sixteen 32-bit registers available for use, R0 to R15. However, R15 is special since it contains the program counter and the processor status register.
R15 is split up with 24 bits used as the program counter (PC) to hold the word address of the next instruction. 8 bits are used as the processor status register (PSR) to hold information about the current values of flags and the current mode/register bank. These bits are arranged as follows:
The top six bits hold the following information:
Bit | Flag | Meaning |
---|---|---|
31 | N | Negative flag |
30 | Z | Zero flag |
29 | C | Carry flag |
28 | V | Overflow flag |
27 | I | Interrupt request disable |
26 | F | Fast interrupt request disable |
The bottom two bits can hold one of four different values:
M | Meaning |
---|---|
0 | User mode |
1 | Fast interrupt processing mode (FIQ mode) |
2 | Interrupt processing mode (IRQ mode) |
3 | Supervisor mode (SVC mode) |
User mode is the normal program execution state. SVC mode is a special mode which is entered when calls to the supervisor are made using software interrupts (SWIs) or when an exception occurs. From within SVC mode certain operations can be performed which are not permitted in user mode, such as writing to hardware devices and peripherals. SVC mode has its own private registers R13 and R14. So after changing to SVC mode, the registers R0 - R12 are the same, but new versions of R13 and R14 are available. The values contained by these registers in user mode are not overwritten or corrupted.
Similarly, IRQ and FIQ modes have their own private registers (R13 - R14 and R8 - R14 respectively).
Although only 16 registers are available at any one time, the processor actually contains a total of 27 registers.
For a more complete description of the registers, see the chapter entitled ARM Hardware.
All the machine code instructions can be performed conditionally according to the status of one or more of the following flags: N, Z, C, V. The sixteen available condition codes are:
AL | Always | This is the default |
CC | Carry clear | C clear |
CS | Carry set | C set |
EQ | Equal | Z set |
GE | Greater than or equal | (N set and V set) or (N clear and V clear) |
GT | Greater than | ((N set and V set) or (N clear and V clear)) and Z clear |
HI | Higher (unsigned) | C set and Z clear |
LE | Less than or equal | (N set and V clear) or (N clear and V set) or Z set |
LS | Lower or same (unsigned) | C clear or Z set |
LT | Less than | (N set and V clear) or (N clear and V set) |
MI | Negative | N set |
NE | Not equal | Z clear |
NV | Never | |
PL | Positive | N clear |
VC | Overflow clear | V clear |
VS | Overflow set | V set |
Two of these may be given alternative names as follows: | ||
LO | Lower unsigned | is equivalent to CC |
HS | Higher / same unsigned | is equivalent to CS |
You should not use the NV (never) condition code - see Any instruction that uses the 'NV' condition flag.
The available instructions are introduced below in categories indicating the type of action they perform and their syntax. The description of the syntax obeys the following standards:
opcode«cond»«S» Rd, (#exp|Rm)«,shift»
There are two move instructions. 'Op2' means '(#exp|Rm)«,shift»':
Instruction | Calculation performed | |
---|---|---|
MOV | Move | Rd = Op2 |
MOVN | Move NOT | Rd = NOT Op2 |
Each of these instructions produces a result which it places in a destination register (Rd). The instructions do not affect bytes in memory directly.
Again, all of these instructions can be performed conditionally. In addition, if the 'S' is present, they can cause the condition codes to be set or cleared. These instructions set N and Z from the ALU, C from the shifter (but only if it is used), and do not affect V.
MOV R0, #10 ; Load R0 with the value 10.
Special actions are taken if the source register is R15; the action is as follows:
If the destination register is R15, then the action depends on whether the optional 'S' has been used:
opcode«cond»«S» Rd, Rn, (#exp|Rm)«,shift»
The instructions available are given below; again, 'Op2' means '(#exp|Rm)«,shift»':
Instruction | Calculation performed | |
---|---|---|
ADC | Add with carry | Rd = Rn + Op2 + C |
ADD | Add without carry | Rd = Rn + Op2 |
SBC | Subtract with carry | Rd = Rn - Op2 - (1 - C) |
SUB | Subtract without carry | Rd = Rn - Op2 |
RSC | Reverse subtract with carry | Rd = Op2 - Rn - (1 - C) |
RSB | Reverse subtract without carry | Rd = Op2 - Rn |
AND | Bitwise AND | Rd = Rn AND Op2 |
BIC | Bitwise AND NOT | Rd = Rn AND NOT (Op2) |
ORR | Bitwise OR | Rd = Rn OR Op2 |
EOR | Bitwise EOR | Rd = Rn EOR Op2 |
Each of these instructions produces a result which it places in a destination register (Rd). The instructions do not affect bytes in memory directly.
As was seen above, all of these instructions can be performed conditionally. In addition, if the 'S' is present, they can cause the condition codes to be set or cleared. The condition codes N, Z, C and V are set by the arithmetic logic unit (ALU) in the arithmetic operations. The logical (bitwise) operations set N and Z from the ALU, C from the shifter (but only if it is used), and do not affect V.
ADDEQ R1, R1, #7 ; If the zero flag is set then add 7 ; to the contents of register R1. SBCS R2, R3, R4 ; Subtract with carry the contents of register R4 from ; the contents of register R3 and place the result in ; register R2. The flags will be updated. AND R3, R1, R2, LSR #2 ; Perform a logical AND on the contents of register R1 ; and the contents of register R2 / 4, and place the ; result in register R3.
Special actions are taken if any of the source registers are R15; the action is as follows:
If the destination register is R15, then the action depends on whether the optional 'S' has been used:
opcode«cond»«S|P» Rn, (#exp|Rm)«,shift»
There are four comparison instructions; again, 'Op2' means '(#exp|Rm)«,shift»':
Instruction | Calculation performed | |
---|---|---|
CMN | Compare negated | Rn + Op2 |
CMP | Compare | Rn - Op2 |
TEQ | Test equal | Rn EOR Op2 |
TST | Test | Rn AND Op2 |
These are similar to the arithmetic and logical instructions listed above except that they do not take a destination register since they do not return a result. Also, they automatically set the condition flags (since they would perform no useful purpose if they didn't). Hence, the 'S' of the arithmetic instructions is implied. You can put an 'S' after the instruction to make this clearer.
These routines have an additional function which is to set the whole of the PSR to a given value. This is done by using a 'P' after the opcode, for example TEQP.
Normally the flags are set depending on the value of the comparison. The I and F bits and the mode and register bits are unaltered. The 'P' option allows the corresponding eight bits of the result of the calculation performed by the comparison to overwrite those in the PSR (or just the flag bits in user mode).
TEQP PC, #&80000000 ; Set N flag, clear all others. Also enable ; IRQs, FIQs, select User mode if privileged
The above example (as well as setting the N flag and clearing the others) will alter the IRQ, FIQ and mode bits of the PSR - but only if you are in a privileged mode.
The 'P' option is also useful in user mode, for example to collect errors:
STMFD sp!, {r0, r1, r14} ... BL routine1 STRVS r0, [sp, #0] ; save error block ptr in return r0 ; in stack frame if error MOV r1, pc ; save psr flags in r1 BL routine2 ; called even if error from routine1 STRVS r0, [sp, #0] ; to do some tidy up action etc. TEQVCP r1, #0 ; if routine2 didn't give error, LDMFD sp!, {r0, r1, pc} ; restore error indication from r1
MUL«cond»«S» Rd,Rm,Rs
MLA«cond»«S» Rd,Rm,Rs,Rn
There are two multiply instructions:
Instruction | Calculation performed | |
---|---|---|
MUL | Multiply | Rd = Rm × Rs |
MLA | Multiply-accumulate | Rd = Rm × Rs + Rn |
The multiply instructions perform integer multiplication, giving the least significant 32 bits of the product of two 32-bit operands.
The destination register must not be R15 or the same as Rm. Any other register combinations can be used.
If the 'S' is given in the instruction, the N and Z flags are set on the result, and the C and V flags are undefined.
MUL R1,R2,R3 MLAEQS R1,R2,R3,R4
B«cond» expression
BL«cond» expression
There are essentially only two branch instructions but in each case the branch can take place as a result of any of the 15 usable condition codes:
Instruction | |
---|---|
B | Branch |
BL | Branch and link |
The branch instruction causes the execution of the code to jump to the instruction given at the address to be branched to. This address is held relative to the current location.
BEQ label1 ; branch if zero flag set BMI minus ; branch if negative flag set
The branch and link instruction performs the additional action of copying the address of the instruction following the branch, and the current flags, into register R14. R14 is known as the 'link register'. This means that the routine branched to can be returned from by transferring the contents of R14 into the program counter and can restore the flags from this register on return. Hence instead of being a simple branch the instruction acts like a subroutine call.
BLEQ equal ......... ; address of this instruction ......... ; moved to R14 automatically .equal ......... ; start of subroutine ......... MOVS R15,R14 ; end of subroutine
opcode«cond»«B»«T» Rd, address
The single register load/save instructions are as follows:
Instruction | |
---|---|
LDR | Load register |
STR | Store register |
These instructions allow a single register to load a value from memory or save a value to memory at a given address.
The instruction has two possible forms:
The simplest form of address is a register number, in which case the contents of the register are used as the address to load from or save to. There are two other alternatives:
With pre-indexed addressing the contents of another register, or an immediate value, are added to the contents of the first register. This sum is then used as the address. It is known as pre-indexed addressing because the address being used is calculated before the load/save takes place. The first register (Rn below) can be optionally updated to contain the address which was actually used by adding a '!' after the closing square bracket.
Address syntax | Address |
---|---|
[Rn] | Contents of Rn |
[Rn,#m]«!» | Contents of Rn + m |
[Rn,«-»Rm]«!» | Contents of Rn ± contents of Rm |
[Rn,«-»Rm,shift #s]«!» | Contents of Rn ± (contents of Rm shifted by s places) |
With post-indexed addressing the address being used is given solely by the contents of the register Rn. The rest of the instruction determines what value is written back into Rn. This write back is performed automatically; no '!' is needed. Post-indexing gets its name from the fact that the address that is written back to Rn is calculated after the load/save takes place.
Address syntax | Value written back |
---|---|
[Rn],#m | Contents of Rn + m |
[Rn],«-»Rm | Contents of Rn ± contents of Rm |
[Rn],«-»Rm,shift #s | Contents of Rn ± (contents of Rm shifted by s places) |
If the address is given as a simple expression, the assembler will generate a pre-indexed instruction using R15 (the PC) as the base register. If the address is out of the range of the instruction (±4095 bytes), an error is given.
If the 'B' option is specified after the condition, only a single byte is transferred, instead of a whole word. The top 3 bytes of the destination register are cleared by an LDRB instruction.
If the 'T' option is specified after the condition, then the TRANs pin on the ARM processor will be active during the transfer, forcing an address translation. This allows you to access User mode memory from a privileged mode. This option is invalid for pre-indexed addressing.
If you use the program counter (PC, or R15) as one of the registers, a number of special cases apply:
opcode«cond»type Rn«!», {Rlist}«^»
These instructions allow the loading or saving of several registers:
Instruction | |
---|---|
LDM | Load multiple registers |
STM | Store multiple registers |
The contents of register Rn give the base address from/to which the value(s) are loaded or saved. This base address is effectively updated during the transfer, but is only written back to if you follow it with a '!'.
Rlist provides a list of registers which are to be loaded or saved. The order the registers are given, in the list, is irrelevant since the lowest numbered register is loaded/saved first, and the highest numbered one last. For example, a list comprising {R5,R3,R1,R8} is loaded/saved in the order R1, R3, R5, R8, with R1 occupying the lowest address in memory. You can specify consecutive registers as a range; so {R0-R3} and {R0,R1,R2,R3} are equivalent.
The type is a two-character mnemonic specifying either how Rn is updated, or what sort of a stack results:
Mnemonic | Meaning |
---|---|
DA | Decrement Rn After each store/load |
DB | Decrement Rn Before each store/load |
IA | Increment Rn After each store/load |
IB | Increment Rn Before each store/load |
EA | Empty Ascending stack is used |
ED | Empty Descending stack is used |
FA | Full Ascending stack is used |
FD | Full Descending stack is used |
In fact these are just different ways of looking at the situation - the way Rn is updated governs what sort of stack results, and vice versa. So, for each type of instruction in the first group there is an equivalent in the second:
LDMEA | is the same as | LDMDB |
LDMED | is the same as | LDMIB |
LDMFA | is the same as | LDMDA |
LDMFD | is the same as | LDMIA |
STMEA | is the same as | STMIA |
STMED | is the same as | STMDA |
STMFA | is the same as | STMIB |
STMFD | is the same as | STMDB |
All Acorn software uses an FD (full, descending) stack. If you are writing code for SVC mode you should try to use a full descending stack as well - although you can use any type you like.
A '^' at the end of the register list has two possible meanings:
LDMIA R5, {R0,R1,R2} ; where R5 contains the value ; &1484 ; This will load R0 from &1484 ; R1 from &1488 ; R2 from &148C LDMDB R5, {R0-R2} ; where R5 contains the value ; &1484 ; This will load R0 from &1478 ; R1 from &147C ; R2 from &1480
If there were a '!' after R5, so that it were written back to, then this would leave R5 containing &1490 and &1478 after the first and second examples respectively.
The examples below show directly equivalent ways of implementing a full descending stack. The first uses mnemonics describing how the stack pointer is handled:
STMDB Stackpointer!, {R0-R3} ; push onto stack ... LDMIA Stackpointer!, {R0-R3} ; pull from stack
and the second uses mnemonics describing how the stack behaves:
STMFD Stackpointer!, {R0,R1,R2,R3} ; push onto stack ... LDMFD Stackpointer!, {R0,R1,R2,R3} ; pull from stack
However, you should see Appendix B: Warnings on the use of ARM assembler for notes on using writeback when doing so.
So, if the base register is the lowest-numbered one in the list, its original value is stored:
STMIA R2!, {R2-R6} ; R2 stored is value before write back
Otherwise its written back value is stored:
STMIA R2!, {R1-R5} ; R2 stored is value after write back
If you use the program counter (PC, or R15) in the list of registers:
It is generally not sensible to use the PC as the base register. If you do:
SWI«cond» expression
SWI«cond» "SWIname" (BBC BASIC assembler)
The SWI mnemonic stands for SoftWare Interrupt. On encountering a SWI, the ARM processor changes into SVC mode and stores the address of the next location in R14_svc - so the User mode value of R14 is not corrupted. The ARM then goes to the SWI routine handler via the hardware SWI vector containing its address.
The first thing that this routine does is to discover which SWI was requested. It finds this out by using the location addressed by (R14_svc - 4) to read the current SWI instruction. The opcode for a SWI is 32 bits long; 4 bits identify the opcode as being for a SWI, 4 bits hold all the condition codes and the bottom 24 bits identify which SWI it is. Hence 224 different SWI routines can be distinguished.
When it has found which particular SWI it is, the routine executes the appropriate code to deal with it and then returns by placing the contents of R14_svc back into the PC, which restores the mode the caller was in.
This means that R14_svc will be corrupted if you execute a SWI in SVC mode - which can have disastrous consequences unless you take precautions.
The most common way to call this instruction is by using the SWI name, and letting the assembler translate this to a SWI number. The BBC BASIC assembler can do this translation directly:
SWINE "OS_WriteC"
See the chapter entitled An introduction to SWIs for a full description of how RISC OS handles SWIs, and the index of SWIs for a full list of the operating system SWIs.