www.riscos.com Technical Support: |
|
The ARM processor family uses Reduced Instruction Set (RISC) techniques to maximise performance; as such, the instruction set allows some instructions and code sequences to be constructed that will give rise to unexpected (and potentially erroneous) results. These cases must be avoided by all machine code writers and generators if correct program operation across the whole range of ARM processors is to be obtained.
In order to be upwards compatible with future versions of the ARM processor family never use any of the undefined instruction formats:
In addition the 'NV' (never executed) instruction class should not be used (it is recommended that the instruction 'MOV R0,R0' be used as a general purpose no-op).
This chapter lists the instructions and code sequences to be avoided. It is strongly recommended that you take the time to familiarise yourself with these cases because some will only fail under particular circumstances which may not arise during testing.
For more details on the ARM chip see the Acorn RISC Machine family Data Manual. VLSI Technology Inc. (1990) Prentice-Hall, Englewood Cliffs, NJ, USA: ISBN 0-13-781618-9.
There are three main reasons for restricting the use of certain parts of the instruction set:
Such instructions can cause a program to fail unexpectedly, for example:
LDM R15,Rlist
uses PC+PSR as the base and so cn cause an unexpected address exception.
It is better to reserve the instruction space occupied by existing 'useless' instructions for instruction expansion in future processors. For example:
MUL R15,Rm,Rs
only serves to scramble the PSR.
This category also includes ineffective instructions, such as a PC relative LDC/STC with writeback; the PC cannot be written back in these instructions, so the writeback bit is ineffective (and an attempt to use it should be flagged as an error).
Note: LDC/STC are instructions to load/store a coprocessor register from/to memory; since they are not supported by the BASIC assembler, they were not described in Appendix A: ARM assembler.
It is hard to guarantee the side-effects of instructions across different processor. If, for example, the following is used:
LDR Rd,[R15,#expression]!
the PC writeback will produce different results on different types of processor.
The instructions and code sequences are split into a number of categories. Each category starts with an indication of which of the two main ARM variants (ARM2, ARM3) it applies to, and is followed by a recommendation or warning. The text then goes on to explain the conditions in more detail and to supply examples where appropriate.
Unless a program is being targeted specifically for a single version of the ARM processor family, all of these recommendations should be adhered to.
Applicability: ARM2
When the processor's mode is changed by altering the mode bits in the PSR using a data processing operation, care must be taken not to access a banked register (R8-R14) in the following instruction. Accesses to the unbanked registers (R0-R7, R15) are safe.
The following instructions are affected, but note that mode changes can only be made when the processor is in a non-user mode:
TSTP Rn,Op2 TEQP Rn,Op2 MPP Rn,Op2 CMNP Rn,Op2
These are the only operations that change all the bits in the PSR (including the mode bits) without affecting the PC (thereby forcing a pipeline refill during which time the register bank select logic settles).
The following examples assume the processor starts in Supervisor mode:
a) TEQP PC,#0 MOV R0,R0 ADD R0,R1,R13_usr | Safe: NOP added between mode change and access to a banked register (R13_usr) |
b) TEQP PC,#0 ADD R0,R1,R2 | Safe: No access made to a banked register |
c) TEQP PC,#0 ADD R0,R1,R13_usr | Fails: Data not read from Register R13_usr! |
The safest default is always to add a NOP (e.g. MOV R0,R0) after a mode changing instruction; this will guarantee correct operation regardless of the code sequence following it.
Do not use writeback when forcing user bank transfer in LDM/STM.
For STM instructions the S bit is redundant as the PSR is always stored with the PC whenever R15 is in the transfer list. In user mode programs the S bit is ignored, but in other modes it has a second interpretation; S=1 is used to force transfers to take values from the user register bank instead of from the current register bank. This is useful for saving the user state on process switches.
Similarly, in LDM instructions the S bit is redundant if R15 is not in the transfer list. In user mode programs, the S bit is ignored, but in non-usermode programs where R15 is not in the transfer list, S=1 is used to force loaded values to go to the user registers instead of the current register bank.
In both cases where the processor is in a non-user mode and transfer to or from the user bank is forced by setting the S bit, writeback of the base will also be to the user bank though the base will be fetched from the current bank. Therefore don't use writeback when forcing user bank transfer in LDM/STM.
The following examples assume the processor to be in a non-user mode and Rlist not to include R15:
STMxx Rn!,Rlist | Safe: Storing non-user registers with writeback to the non-user base register |
LDMxx Rn!,Rlist | Safe: Loading non-user registers with write back to the non-user base register |
STMxx Rn,Rlist^ | Safe: Storing user registers, but no base write-back |
STMxx Rn!,Rlist^ | Fails: Base fetched from non-user register, but written back into user register |
LDMxx Rn!,Rlist^ | Fails: Base fetched from non-user register, but written back into user register |
Applicability: ARM2, ARM3
When loading use bank registers with an LDM in a non-user mode, care must be taken not to access a banked register (R8-R14) in the following instruction. Accesses to the unbanked registers (R0-R7,R15) are safe.
Because the register bank switches from user mode to non-user mode during the first cycle of the instruction following an LDM Rn,Rlist^, an attempt to access a banked register in that cycle may cause the wrong register to be accessed.
The following examples assume the processor to be in a non-user mode and Rlist not to include R15:
LDM Rn,Rlist^ ADD R0,R1,R2 | Safe: Access to unbanked registers after LDM^ |
LDM Rn,Rlist^ MOV R0,R0 ADD R0,R1,R13_svc | Safe: NOP inserted before banked register used following an LDM^ |
LDM Rn,Rlist^ ADD R0,R1,R13_svc | Fails: Accessing a banked register immediately after an LDM^ returns the wrong data |
ADR R14_svc, saveblock LDMIA R14_svc, {R0 - R14_usr}^ LDR R14_svc, [R14_svc,#15*4] MOVS PC, R14_svc (R14_svc) | Fails: Banked base register used immediately after the LDM^ |
ADR R14_svc, saveblock LDMIA R14_svc, {R0 - R14_usr}^ MOV R0,R0 LDR R14_svc, [R14_svc,#15*4] MOVS PC, R14_svc | Safe:NOP inserted before banked register (R14_svc) used |
Note: The ARM2 and ARM3 processors usually give the expected result, but cannot be guaranteed to do so under all circumstances, therefore this code sequence should be avoided in future.
Applicability: ARM2
Care must be taken when writing an undefined instruction handler to allow for an unexpected call from a SWI instruction. The erroneous SWI call should be intercepted and redirected to the software interrupt handler.
The implementation of the CDP instruction on ARM2 may cause - under certain circumstances - a Software Interrupt (SWI) to take the Undefined Instruction trap if the SWI was the next instruction after the CDP. For example:
SIN F0 SWI &11 | Fails: ARM2 may take the undefined instruction trap instead of software interrupt trap. |
All Undefined Instruction handler code should check the failed instruction to see if it is a SWI, and if so pass it over to the software interrupt handler by branching to the SWI hardware vector at address 8.
Note: CDP is a Coprocessor Data Operation instruction; since it is not supported by the BASIC assembler, it was not described in Appendix A: ARM assembler.
Applicability: ARM2, ARM3
Care must be taken when writing the Prefetch abort trap handler to allow for an unexpected call due to an undefined instruction.
When an undefined instruction is fetched from the last word of a page, where the next page is absent from memory, the undefined instruction will cause the undefined instruction trap to be taken, and the following (aborted) instructions will cause a prefetch abort trap. One might expect the undefined instruction trap to be taken first, then the return to the succeeding code will cause the abort trap. In fact the prefetch abort has a higher priority than the undefined instruction trap, so the prefetch abort handler is entered before the undefined instruction trap, indicating a fault at the address of the undefined instruction (which is in a page which is actually present). A normal return from the prefetch abort handler (after loading the absent page) will cause the undefined instruction to execute and take the trap correctly. However the indicated page is already present, so the prefetch abort handler may simply return control, causing an infinite loop to be entered.
Therefore, the prefetch abort handler should check whether the indicated fault is in a page which is actually present, and if so it should suspect the above condition and pass control to the undefined instruction handler. This will restore the expected sequential nature of the execution sequence. A normal return from the undefined instruction handler will cause the next instruction to be fetched (which will abort), the prefetch abort handler will be re-entered (with an address pointing to the absent page), and execution can proceed normally.
Applicability: ARM2, ARM3
The following single instructions and code sequences should be avoided in writing any ARM code.
Avoid using the NV (execute never) condition code:
opcodeNV ...
i.e. any operation where {cond}= NV
By avoiding the use of the 'NV' condition code, 228 instructions become free for future expansion.
Note: It is recommended that the instruction MOV R0,R0 be used as a general purpose NOP.
Avoid using R15 in the Rs position of a data processing instruction:
MOV|MVN{cond}{S} Rd,Rm,shiftname R15 CMP|CMN|TEQ|TST{cond}{P} Rn,Rm,shiftname R15 ADC|ADD|SBC...|EOR{cond}{S} Rd,Rn,shiftname R15
Shifting a register by an amount dependent upon the code position should be avoided.
Do not specify R15 as the destination register as only the PSR will be affected by the result of the operation:
MUL{cond}{S} R15,Rm,Rs MLA{cond}{S} R15,Rm,Rs,Rn
Do not use the same register in the Rd and Rm positions, as the result of the operation will be incorrect:
MUL{cond}{S} Rd,Rd,Rs MLA{cond}{S} Rd,Rd,Rs
Do not use a PC relative load or store with base writeback as the effects may vary in future processors:
LDR|STR{cond}{B}{T} Rd,[R15,#expression]! LDR|STR{cond}{B}{T} Rd,[R15,{-}Rm{,shift}]!
LDR|STR{cond}{B}{T} Rd,[R15],#expression LDR|STR{cond}{B}{T} Rd,[R15],{-}Rm{,shift}
Note: It is safe to use pre-indexed PC relative loads and stores without base writeback.
Avoid using R15 as the register offset (Rm) in single data transfers as the value used will be PC+PSR which can lead to address exceptions:
LDR|STR{cond}{B}{T} Rd,[Rn,{-}R15{,shift}]{!} LDR|STR{cond}{B}{T} Rd,[Rn],{-}R15{,shift}
A byte load or store operation on R15 must not be specified, as R15 contains the PC, and should always be treated as a 32 bit quantity:
LDR|STR{cond}B{T} R15,Address
A post-indexed LDR|STR where Rm=Rn must not be used (this instruction is very difficult for the abort handler to unwind when late aborts are configured - which do not prevent base writeback):
LDR|STR{cond}{B}{T} Rd,[Rn],{-}Rn{,shift}
Do not use the same register in the Rd and Rm positions of an LDR which specifies (or implies) base writeback; such an instruction is ambiguous, as it is not clear whether the end value in the register should be the loaded data or the updated base:
LDR{cond}{B}{T} Rn,[Rn,#expression]! LDR{cond}{B}{T} Rn,[Rn,{-}Rm{,shift}]!
LDR{cond}{B}{T} Rn,[Rn],#expression LDR{cond}{B}{T} Rn,[Rn],{-}Rm{,shift}
Do not specify base writeback when forcing user mode block data transfer as the writeback may target the wrong register:
STM{cond}<FD|ED...|DB> Rn!,Rlist^ LDM{cond}<FD|ED...|DB> Rn!,Rlist^
where Rlist does not include R15.
Note: The instruction:
LDM{cond}<FD|ED...|DB> Rn!,<Rlist,R15>^
does not force user mode data transfer, and can be used safely.
Do not perform a PC relative block data transfer, as the PC+PSR is used to form the base address which can lead to address exceptions:
LDM|STM{cond}<FD|ED...|DB> R15{!},Rlist{^}
Do not perform a PC relative swap as its behaviour may change in the future:
SWP{cond}{B} Rd,Rm,[R15]
Avoid specifying R15 as the source or destination register:
SWP{cond}{B} R15,Rm,[Rn] SWP{cond}{B} Rd,R15,[Rn]
Note: SWP is a Single Data Swap instruction, typically used to implement semaphores, and introduced in the ARM3; since it is not supported by the BASIC assembler, it was not described in Appendix A: ARM assembler.
When performing a PC relative coprocessor data transfer, writeback to R15 is prevented so the W bit should not be set:
LDC|STC{cond}{L} CP#,CRd,[R15]!
LDC|STC{cond}{L} CP#,CRd,[R15,#expression]!
LDC|STC{cond}{L} CP#,CRd,[R15]#expression!
ARM2 has two undefined instructions, and ARM3 has only one (the other ARM2 undefined instruction has been defined as the Single data swap operation).
Undefined instructions should not be used in programs, as they may be defined as a new operation in future ARM variants.
Care must be taken not to access a banked register (R8-R14) in the cycle following an in-line mode change. Thus the following code sequences should be avoided:
The banked registers (R8-R14) should not be accessed in the cycle immediately after an LDM that forces user mode data transfer. Thus the following code sequence should be avoided:
This section highlights some obscure cases of ARM operation which should be borne in mind when writing code.
Applicability: ARM2, ARM3
Warning: When the PC is used as a destination, operand, base or shift register, different results will be obtained depending on the instruction and the exact usage of R15.
Full details of the value derived from or written into R15+PSR for each instruction class is given in the Acorn RISC Machine family Data Manual. Care must be taken when using R15 because small changes in the instruction can yield significantly different results. For example, consider data operations of the type:-
opcode{cond}{S} Rd,Rn,Rm
opcode{cond}{S} Rd,Rn,Rm,shiftname Rs
MOV R0,#0 ORR R1,R0,R15 ; R1:=PC+PSR (bits 31:26,1:0 reflect PSR flags) ORR R2,R15,R0 ; R2:=PC (bits 31:26,1:0 set to zero)
Note: The relevant instruction description in the ARM Acorn RISC Machine family Data Manual should be consulted for full details of the behaviour of R15.
Applicability: ARM2, ARM3
Warning: In the case of a STM with writeback that includes the base register in the register list, the value of the base register stored depends upon its position in the register list.
During an STM, the first register is written out at the start of the second cycle of the instruction. When writeback is specified, the base is written back at the end of the second cycle. An STM which includes storing the base, with the base as the first register to be stored, will therefore store the unchanged value, whereas with the base second or later in the transfer order, it will store the modified value.
For example:
MOV R5,#&1000 STMIA R5!,{R5-R6} ; Stores value of R5=&1000
MOV R5,#&1000 STMIA R5!,{R4-R5} ; Stores value of R5=&1008
Given | MUL Rd,Rm,Rs |
or | MLA Rd,Rm,Rs,Rn |
Then | Rd & Rm must be different registers |
Rd must not be R15 |
Due to the way the Booth's algorithm has been implemented, certain combinations of operand registers should be avoided. (The assembler will issue a warning if these restrictions are overlooked.)
The destination register (Rd) should not be the same as the Rm operand register, as Rd is used to hold intermediate values and Rm is used repeatedly during the multiply. A MUL will give a zero result if Rm=Rd, and a MLA will give a meaningless result.
The destination register (Rd) should also not be R15. R15 is protected from modification by these instructions, so the instruction will have no effect, except that it will put meaningless values in the PSR flags if the S bit is set.
All other register combinations will give correct results, and Rd, Rn and Rs may use the same register when required.
Warning: Illegal addresses formed during a LDM or STM operation will not cause an address exception.
Only the address of the first transfer of a LDM or STM is checked for an address exception; if subsequent addresses over-flow or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap.
The following examples assume the processor is in a non-user mode and MEMC is being accessed:
MOV R0,#&04000000 ; R0=&04000000 STMIA R0,{R1-R2} ; Address exception reported (base address illegal) MOV R0,#&04000000 SUB R0,R0,#4 ; R0=&03FFFFFC STMIA R0,{R1-R2} ; No address exception reported (base address legal) ; code will overwrite data at address &00000000
Note: The exact behaviour of the system depends upon the memory manager to which the processor is attached; in some cases, the wraparound may be detected and the instruction aborted.
Applicability: ARM2, ARM3
Warning: Illegal addresses formed during a LDC or STC operation will not cause an address exception (affects LDF/STF).
The coprocessor data transfer operations act like STM and LDM with the processor generating the addresses and the coprocessor supplying/reading the data. As with LDM/STM, only the address of the first transfer of a LDC or STC is checked for an address exception; if subsequent addresses over-flow or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap.
Note that the floating point LDF/STF instructions are forms of LDC and STC.
The following examples assume the processor is in a non-user mode and MEMC is being accessed:
MOV R0,#&04000000 ; R0=&04000000 STC CP1,CR0,[R0] ; Address exception reported (base address illegal) MOV R0,#&04000000 SUB R0,R0,#4 ; R0=&03FFFFFC STFD F0,[R0] ; No address exception reported (base ; address legal) code will overwrite data at ; address &00000000
Note: The exact behaviour of the system depends upon the memory manager to which the processor is attached; in some cases, the wraparound may be detected and the instruction aborted.
Applicability: ARM3
Data to be transferred to a coprocessor with the LDC instruction should never be placed in the last word of an addressable chunk of memory, nor in the word of memory immediately preceding a read-sensitive memory location.
Due to the pipelining introduced into the ARM3 coprocessor interface, an LDC operation will cause one extra word of data to be fetched from the internal cache or external memory by ARM3 and then discarded; if the extra data is fetched from an area of external memory marked as cacheable, a whole line of data will be fetched and placed in the cache.
A particular case in point is that an LDC whose data ends at the last word of a memory page will load and then discard the first word (and hence the first cache line) of the next page. A minor effect of this is that it may occasionally cause an unnecessary page swap in a virtual memory system. The major effect of it is that (whether in a virtual memory system or not), the data for an LDC should never be placed in the last word of an addressable chunk of memory: the LDC will attempt to read the immediately following non-existent location and thus produce a memory fault.
The following example assumes the processor is in a non-user mode, FPU hardware is attached and MEMC is being accessed:
MOV R13,#&03000000 ; R13=Address of I/O space STFD F0,[R13,#-8]! ; Store F.P. register 0 at top of physical memory : (two words of data transferred) LDFD F1,[R13],#8 ; Load F.P. register 1 from top of physical ; memory, but three words of data are ; transferred, and the third access will read ; from I/O space which may be read sensitive
The static ARM is a variant of the ARM processor designed for low power consumption, that is built using static CMOS technology. (The difference between it and the standard ARM is similar to that between SRAM and DRAM.)
The static ARM exhibits different behaviour to ARM2 and ARM3 when executing a PC relative LDR with base writeback. This class of instruction has very limited application, so the discrepancy should not be a problem, but if you wish to use any of the following instructions in your code you are advised to contact Acorn Computers.
LDR Rd,[PC,#expression]! LDR Rd,[PC],#expression LDR Rd,[PC,{-}Rm{,shift}]! LDR Rd,[PC],{-}Rm{,shift}
Note: A PC relative LDR without writeback works exactly as expected.
Provided that this instruction class is unused, it is likely that writeback to the PC on LDR and STR will be disabled completely in the future. The fewer incidental ways there are to modify the PC the better.
The instructions affected are:
Expected result: | Rd (PC+8+expression) PC PC+8+expression ...so execution continues from PC+8+expression |
Actual ARM2 result: | Rd Rd {no change} PC PC+8+expression+4 ...so execution continues from PC+12+expression |
Expected result: | Rd (PC+8) PC PC+8+expression ...so execution continues from PC+8+expression |
Actual ARM2 result: | Rd Rd {no change} PC PC+8+expression+4 ...so execution continues from PC+12+expression |