RISCOS.com

www.riscos.com Technical Support:
Acorn Assembler

 


Warnings on the use of ARM assembler


The ARM processor family uses Reduced Instruction Set (RISC) techniques to maximise performance; as such, the instruction set allows some instructions and code sequences to be constructed that will give rise to unexpected (and potentially erroneous) results. These cases must be avoided by all machine code writers and generators if correct program operation across the whole range of ARM processors is to be obtained.

In order to be upwards compatible with future versions of the ARM processor family never use any of the undefined instruction formats:

  • those shown in Undefined instructions, which the processor traps;
  • those which are not shown in the manual and which don't trap (for example, a Multiply instruction where bit 5 or 6 of the instruction is set).

In addition the condition code 1111 (which was given the mnemonic 'NV', i.e. never) should not be used. We recommend that you use the instruction 'MOV R0,R0' as a general purpose no-op.

This appendix lists the instructions and code sequences to be avoided. It is strongly recommended that you take the time to familiarise yourself with these cases because some will only fail under particular circumstances which may not arise during testing.

For more details on the ARM chip and its instruction set see the chapters The ARM CPU and CPU instruction set, and the datasheets for the different versions of the ARM chip.

Restrictions to the ARM instruction set

There are three main reasons for restricting the use of certain parts of the instruction set:

  • Dangerous instructions

    Such instructions can cause a program to fail unexpectedly, for example:

            LDM R15,Rlist

    uses PC+PSR as the base and so can cause an unexpected address exception.

  • Useless instructions

    It is better to reserve the instruction space occupied by existing 'useless' instructions for instruction expansion in future processors. For example:

            MUL R15,Rm,Rs

    only serves to scramble the PSR.

    This category also includes ineffective instructions, such as a PC relative LDC/STC with writeback; the PC cannot be written back in these instructions, so the writeback bit is ineffective (and an attempt to use it should be flagged as an error).

  • Instructions with undesirable side-effects

    It is hard to guarantee the side-effects of instructions across different processors. If, for example, the following is used:

            LDR Rd,[R15,#expression]!

    the PC writeback will produce different results on different types of processor.

Instructions and code sequences to avoid

The instructions and code sequences are split into a number of categories. Each category starts with an indication of which of the two main ARM variants (ARM2, ARM3) it applies to, and is followed by a recommendation or warning. The text then goes on to explain the conditions in more detail and to supply examples where appropriate.

Unless a program is being targeted specifically for a single version of the ARM processor family, all of these recommendations should be adhered to.

TSTP/TEQP/CMPP/CMNP: Changing mode

Applicability: ARM2

When the processor's mode is changed by altering the mode bits in the PSR using a data processing operation, care must be taken not to access a banked register (R8-R14) in the following instruction. Accesses to the unbanked registers (R0-R7, R15) are safe.

The following instructions are affected, but note that mode changes can only be made when the processor is in a non-user mode:

        TSTP Rn,Op2
        TEQP Rn,Op2
        CMPP Rn,Op2
        CMNP Rn,Op2

These are the only operations that change all the bits in the PSR (including the mode bits) without affecting the PC (thereby forcing a pipeline refill during which time the register bank select logic settles).

The following examples assume the processor starts in Supervisor mode:

a)  TEQP PC,#0
    MOV  R0,R0          Safe: NOP added between mode change and
    ADD  R0,R1,R13_usr        access to a banked register (R13_usr)

b)  TEQP PC,#0
    ADD  R0,R1,R2       Safe: No access made to a banked register

c)  TEQP PC,#0
    ADD  R0,R1,R13_usr  Fails: Data not read from Register R13_usr!

The safest default is always to add a NOP (e.g. MOV R0,R0) after a mode changing instruction; this will guarantee correct operation regardless of the code sequence following it.

LDM/STM: Forcing transfer of the user bank (Part 1)

Applicability: ARM2, ARM3

Do not use writeback when forcing user bank transfer in LDM/STM.

For STM instructions the S bit is redundant as the PSR is always stored with the PC whenever R15 is in the transfer list. In user mode programs the S bit is ignored, but in other modes it has a second interpretation; S=1 is used to force transfers to take values from the user register bank instead of from the current register bank. This is useful for saving the user state on process switches.

Similarly, in LDM instructions the S bit is redundant if R15 is not in the transfer list. In user mode programs, the S bit is ignored, but in non-user mode programs where R15 is not in the transfer list, S=1 is used to force loaded values to go to the user registers instead of the current register bank.

In both cases where the processor is in a non-user mode and transfer to or from the user bank is forced by setting the S bit, writeback of the base will also be to the user bank though the base will be fetched from the current bank. Therefore don't use writeback when forcing user bank transfer in LDM/STM.

The following examples assume the processor to be in a non-user mode and Rlist not to include R15:

    STMxx Rn!,Rlist  Safe: Storing non-user registers with write
                           back to the non-user base register

    LDMxx Rn!,Rlist  Safe: Loading non-user registers with write
                           back to the non-user base register

    STMxx Rn,Rlist^  Safe: Storing user registers, but no base
                           write-back

    STMxx Rn!,Rlist^ Fails: Base fetched from non-user register,
                            but written back into user register

    LDMxx Rn!,Rlist^ Fails: Base fetched from non-user register,
                            but written back into user register

LDM: Forcing transfer of the user bank (Part 2)

Applicability: ARM2, ARM3

When loading user bank registers with an LDM in a non-user mode, care must be taken not to access a banked register (R8-R14) in the following instruction. Accesses to the unbanked registers (R0-R7,R15) are safe.

Because the register bank switches from user mode to non-user mode during the first cycle of the instruction following an LDM Rn,Rlist^, an attempt to access a banked register in that cycle may cause the wrong register to be accessed.

The following examples assume the processor to be in a non-user mode and Rlist not to include R15:

    LDM Rn Rlist^
    ADD R0,R1,R2       Safe: Access to unbanked registers after
                             LDM^

    LDM Rn,Rlist^
    MOV R0,R0          Safe: NOP inserted before banked register
    ADD R0,R1,R13_svc        used following an LDM^

    LDM Rn,Rlist^
    ADD R0,R1,R13_svc  Fails: Accessing a banked register
                              immediately after an LDM^ returns the
                              wrong data

    ADR   R14_svc, saveblock
    LDMIA R14_svc, {R0 - R14_usr}^
    LDR   R14_svc, [R14_svc,#15*4]  Fails: Banked base register used
    MOVS  PC, R14_svc                      immediately after the LDM^

    ADR   R14_svc, saveblock
    LDMIA R14_svc, {R0 - R14_usr}^
    MOV   R0,R0                     Safe: NOP inserted before
    LDR   R14_svc, [R14_svc,#15*4]        banked register
    MOVS  PC, R14_svc(R14_svc)            used

Note: The ARM2 and ARM3 processors usually give the expected result, but cannot be guaranteed to do so under all circumstances, therefore this code sequence should be avoided in future.

SWI/Undefined Instruction trap interaction

Applicability: ARM2

Care must be taken when writing an undefined instruction handler to allow for an unexpected call from a SWI instruction. The erroneous SWI call should be intercepted and redirected to the software interrupt handler.

The implementation of the CDP instruction on ARM2 may cause - under certain circumstances - a Software Interrupt (SWI) to take the Undefined Instruction trap if the SWI was the next instruction after the CDP. For example:

    SIN F0
    SWI &11  Fails: ARM2 may take the undefined instruction
                    trap instead of software interrupt trap.

All Undefined Instruction handler code should check the failed instruction to see if it is a SWI, and if so pass it over to the software interrupt handler by branching to the SWI hardware vector at address 8.

Undefined instruction/Prefetch abort trap interaction

Applicability: ARM2, ARM3

Care must be taken when writing the Prefetch abort trap handler to allow for an unexpected call due to an undefined instruction.

When an undefined instruction is fetched from the last word of a page, where the next page is absent from memory, the undefined instruction will cause the undefined instruction trap to be taken, and the following (aborted) instructions will cause a prefetch abort trap. One might expect the undefined instruction trap to be taken first, then the return to the succeeding code will cause the abort trap. In fact the prefetch abort has a higher priority than the undefined instruction trap, so the prefetch abort handler is entered before the undefined instruction trap, indicating a fault at the address of the undefined instruction (which is in a page which is actually present). A normal return from the prefetch abort handler (after loading the absent page) will cause the undefined instruction to execute and take the trap correctly. However the indicated page is already present, so the prefetch abort handler may simply return control, causing an infinite loop to be entered.

Therefore, the prefetch abort handler should check whether the indicated fault is in a page which is actually present, and if so it should suspect the above condition and pass control to the undefined instruction handler. This will restore the expected sequential nature of the execution sequence. A normal return from the undefined instruction handler will cause the next instruction to be fetched (which will abort), the prefetch abort handler will be re-entered (with an address pointing to the absent page), and execution can proceed normally.

Single instructions to avoid

Applicability: ARM2, ARM3

The following single instructions and code sequences should be avoided in writing any ARM code.

Any instruction that uses the 1111 condition code

Avoid using the condition code 1111 (which was given the mnemonic 'NV', i.e. never):

    opcodeNV ...

i.e. any operation where «cond»=NV

By avoiding the use of the 'NV' condition code, 228 instructions become free for future expansion.

Note: It is recommended that the instruction MOV R0,R0 be used as a general purpose NOP.

Data processing

Avoid using R15 in the Rs position of a data processing instruction:

    MOV|MVN«cond»«S» Rd,Rm,shiftname R15

    CMP|CMN|TEQ|TST«cond»«P» Rn,Rm,shiftname R15

    ADC|ADD|SBC...|EOR«cond»«S» Rd,Rn,shiftname R15

Shifting a register by an amount dependent upon the code position should be avoided.

Multiply and multiply-accumulate

Do not specify R15 as the destination register as only the PSR will be affected by the result of the operation:

    MUL«cond»«S» R15,Rm,Rs
    MLA«cond»«S» R15,Rm,Rs,Rn

Do not use the same register in the Rd and Rm positions, as the result of the operation will be incorrect:

    MUL«cond»«S» Rd,Rd,Rs
    MLA«cond»«S» Rd,Rd,Rs

Single data transfer

Do not use a PC relative load or store with base writeback as the effects may vary in future processors:

    LDR|STR«cond»«B»«T» Rd,[R15,#expression]!
    LDR|STR«cond»«B»«T» Rd,[R15,«-»Rm«,shift»]!

    LDR|STR«cond»«B»«T» Rd,[R15],#expression
    LDR|STR«cond»«B»«T» Rd,[R15],«-»R,shift»

Note: It is safe to use pre-indexed PC relative loads and stores without base writeback.

Avoid using R15 as the register offset (Rm) in single data transfers as the value used will be PC+PSR which can lead to address exceptions:

    LDR|STR«cond»«B»«T» Rd,[Rn,«-»R15«,shift»]«!»
    LDR|STR«cond»«B»«T» Rd,[Rn],«-»R15«,shift»

A byte load or store operation on R15 must not be specified, as R15 contains the PC, and should always be treated as a 32 bit quantity:

    LDR|STR«cond»B«T» R15,address

A post-indexed LDR|STR where Rm=Rn must not be used (this instruction is very difficult for the abort handler to unwind when late aborts are configured - which do not prevent base writeback):

    LDR|STR«cond»«B»«T» Rd,[Rn],«-»Rn«,shift»

Do not use the same register in the Rd and Rm positions of an LDR which specifies (or implies) base writeback; such an instruction is ambiguous, as it is not clear whether the end value in the register should be the loaded data or the updated base:

    LDR«cond»«B»«T» Rn,[Rn,#expression]!
    LDR«cond»«B»«T» Rn,[Rn,«-»Rm«,shift»]!

    LDR«cond»«B»«T» Rn,[Rn],#expression
    LDR«cond»«B»«T» Rn,[Rn],«-»Rm«,shift»

Block data transfer

Do not specify base writeback when forcing user mode block data transfer as the writeback may target the wrong register:

    STM«cond»<FD|ED...|DB> Rn!,Rlist^
    LDM«cond»<FD|ED...|DB> Rn!,Rlist^ 

where Rlist does not include R15.

Note: The instruction:

    LDM«cond»<FD|ED...|DB> Rn!,<Rlist,R15>^

does not force user mode data transfer, and can be used safely.

Do not perform a PC relative block data transfer, as the PC+PSR is used to form the base address which can lead to address exceptions:

    LDM|STM«cond»<FD|ED...|DB> R15«!»,Rlist«^»

Single data swap

Do not perform a PC relative swap as its behaviour may change in the future:

    SWP«cond»«B» Rd,Rm,[R15]

Avoid specifying R15 as the source or destination register:

    SWP«cond»«B» R15,Rm,[Rn]
    SWP«cond»«B» Rd,R15,[Rn]

Coprocessor data transfers

When performing a PC relative coprocessor data transfer, writeback to R15 is prevented so the W bit should not be set:

    LDC|STC«cond»«L» CP#,CRd,[R15]!

    LDC|STC«cond»«L» CP#,CRd,[R15,#expression]!

    LDC|STC«cond»«L» CP#,CRd,[R15]#expression!

Undefined instructions

ARM2 has two undefined instructions, and ARM3 has only one (the other ARM2 undefined instruction has been defined as the Single data swap operation).

Undefined instructions should not be used in programs, as they may be defined as a new operation in future ARM variants.

Register access after an in-line mode change

Care must be taken not to access a banked register (R8-R14) in the cycle following an in-line mode change. Thus the following code sequences should be avoided:

  1. TSTP|TEQP|CMPP|CMNP«cond» Rn,Op2
  2. any instruction that uses R8-R14 in its first cycle.
Register access after an LDM that forces user mode data transfer

The banked registers (R8-R14) should not be accessed in the cycle immediately after an LDM that forces user mode data transfer. Thus the following code sequence should be avoided:

  1. LDM«cond»<FD|ED...|DB> Rn,Rlist^
    where Rlist does not include R15
  2. any instruction that uses R8-R14 in its first cycle.
Other points to note

This section highlights some obscure cases of ARM operation which should be borne in mind when writing code.

Use of R15

Applicability: ARM2, ARM3

Warning: When the PC is used as a destination, operand, base or shift register, different results will be obtained depending on the instruction and the exact usage of R15.

Full details of the value derived from or written into R15+PSR for each instruction class is given in the CPU instruction set. Care must be taken when using R15 because small changes in the instruction can yield significantly different results. For example, consider data operations of the type:

    opcode«cond»«S» Rd,Rn,Rm
 or opcode«cond»«S» Rd,Rn,Rm,shiftname Rs

  • When R15 is used in the Rm position, it will give the value of the PC together with the PSR flags.
  • When R15 is used in the Rn or Rs positions, it will give the value of the PC without the PSR flags (PSR bits replaced by zeros).

    MOV R0,#0
    ORR R1,R0,R15  ; R1:=PC+PSR (bits 31:26,1:0 reflect PSR flags)
    ORR R2,R15,R0  ; R2:=PC     (bits 31:26,1:0 set to zero)

Note: The relevant instruction description in the CPU instruction set should be consulted for full details of the behaviour of R15.

STM: Inclusion of the base in the register list

Applicability: ARM2, ARM3

Warning: In the case of a STM with writeback that includes the base register in the register list, the value of the base register stored depends upon its position in the register list.

During an STM, the first register is written out at the start of the second cycle of the instruction. When writeback is specified, the base is written back at the end of the second cycle. An STM which includes storing the base, with the base as the first register to be stored, will therefore store the unchanged value, whereas with the base second or later in the transfer order, it will store the modified value.

For example:

    MOV   R5,#&1000
    STMIA R5!,{R5-R6}  ; Stores value of R5=&1000

    MOV   R5,#&1000
    STMIA R5!,{R4-R5}  ; Stores value of R5=&1008

MUL/MLA: Register restrictions

Applicability: ARM2, ARM3

Given

    MUL Rd,Rm,Rs

or

    MLA Rd,Rm,Rs,Rn

Then Rd & Rm must be different registers Rd must not be R15

Due to the way the Booth's algorithm has been implemented, certain combinations of operand registers should be avoided. (The assembler will issue a warning if these restrictions are overlooked.)

The destination register (Rd) should not be the same as the Rm operand register, as Rd is used to hold intermediate values and Rm is used repeatedly during the multiply. A MUL will give a zero result if Rm=Rd, and a MLA will give a meaningless result.

The destination register (Rd) should also not be R15. R15 is protected from modification by these instructions, so the instruction will have no effect, except that it will put meaningless values in the PSR flags if the S bit is set.

All other register combinations will give correct results, and Rd, Rn and Rs may use the same register when required.

LDM/STM: Address Exceptions

Applicability: ARM2, ARM3

Warning: Illegal addresses formed during a LDM or STM operation will not cause an address exception.

Only the address of the first transfer of a LDM or STM is checked for an address exception; if subsequent addresses over-flow or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap.

The following examples assume the processor is in a non-user mode and MEMC is being accessed:

    MOV   R0,#&04000000  ; R0=&04000000
    STMIA R0,{R1-R2}     ; Address exception reported
                         ;  (base address illegal)

    MOV   R0,#&04000000
    SUB   R0,R0,#4       ; R0=&03FFFFFC
    STMIA R0,{R1-R2}     ; No address exception reported
                         ;  (base address legal)
                         ; code will overwrite data at address &00000000

Note: The exact behaviour of the system depends upon the memory manager to which the processor is attached; in some cases, the wraparound may be detected and the instruction aborted.

LDC/STC: Address Exceptions

Applicability: ARM2, ARM3

Warning: Illegal addresses formed during a LDC or STC operation will not cause an address exception (affects LDF/STF).

The coprocessor data transfer operations act like STM and LDM with the processor generating the addresses and the coprocessor supplying/reading the data. As with LDM/STM, only the address of the first transfer of a LDC or STC is checked for an address exception; if subsequent addresses over-flow or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap.

Note that the floating point LDF/STF instructions are forms of LDC and STC.

The following examples assume the processor is in a non-user mode and MEMC is being accessed:

    MOV  R0,#&04000000  ; R0=&04000000
    STC  CP1,CR0,[R0]   ; Address exception reported
                        ;  (base address illegal)

    MOV  R0,#&04000000
    SUB  R0,R0,#4       ; R0=&03FFFFFC
    STFD F0,[R0]        ; No address exception reported
                        ;  (base address legal)
                        ;  code will overwrite data at address &00000000

Note: The exact behaviour of the system depends upon the memory manager to which the processor is attached; in some cases, the wraparound may be detected and the instruction aborted.

LDC: Data transfers to a coprocessor fetch more data than expected

Applicability: ARM3

Data to be transferred to a coprocessor with the LDC instruction should never be placed in the last word of an addressable chunk of memory, nor in the word of memory immediately preceding a read-sensitive memory location.

Due to the pipelining introduced into the ARM3 coprocessor interface, an LDC operation will cause one extra word of data to be fetched from the internal cache or external memory by ARM3 and then discarded; if the extra data is fetched from an area of external memory marked as cacheable, a whole line of data will be fetched and placed in the cache.

A particular case in point is that an LDC whose data ends at the last word of a memory page will load and then discard the first word (and hence the first cache line) of the next page. A minor effect of this is that it may occasionally cause an unnecessary page swap in a virtual memory system. The major effect of it is that (whether in a virtual memory system or not), the data for an LDC should never be placed in the last word of an addressable chunk of memory: the LDC will attempt to read the immediately following non-existent location and thus produce a memory fault.

The following example assumes the processor is in a non-user mode, FPU hardware is attached and MEMC is being accessed:

    MOV  R13,#&03000000  ; R13=Address of I/O space
    STFD F0,[R13,#-8]!   ; Store F.P. register 0 at top of physical memory
                         ;  (two words of data transferred)
    LDFD F1,[R13],#8     ; Load F.P. register 1 from top of physical
                         ; memory, but three words of data are
                         ; transferred, and the third access will read
                         ; from I/O space which may be read sensitive 

Static ARM problems

The static ARM is a variant of the ARM processor designed for low power consumption, that is built using static CMOS technology. (The difference between it and the standard ARM is similar to that between SRAM and DRAM.)

The static ARM exhibits different behaviour to ARM2 and ARM3 when executing a PC relative LDR with base writeback. This class of instruction has very limited application, so the discrepancy should not be a problem, but if you wish to use any of the following instructions in your code you are advised to contact Acorn Computers.

        LDR Rd,[PC,#expression]!
        LDR Rd,[PC],#expression
        LDR Rd,[PC,{-}Rm{,shift}]!
        LDR Rd,[PC],{-}Rm{,shift}

Note: A PC relative LDR without writeback works exactly as expected.

Provided that this instruction class is unused, it is likely that writeback to the PC on LDR and STR will be disabled completely in the future. The fewer incidental ways there are to modify the PC the better.

Unexpected Static ARM2 behaviour when executing a PC relative LDR with writeback

The instructions affected are:-

  • LDR Rd,[PC,#expression]!
  • LDR Rd,[PC],#expression
Case 1: LDR Rd,[PC,#expression]!

Expected result:

Rd <- (PC+8+expression)
PC <- PC+8+expression
...so execution continues from PC+8+expression

Actual ARM2 result:

Rd <- Rd {no change}
PC <- PC+8+expression+4
...so execution continues from PC+12+expression

Case 2: LDR Rd,[PC],#expression

Expected result:

Rd <- (PC+8)
PC <- PC+8+expression
...so execution continues from PC+8+expression

Actual ARM2 result:

Rd <- Rd {no change}
PC <- PC+8+expression+4
...SO EXECUTION CONTINUES FROM PCP12PEXPRESSION

This edition Copyright © 3QD Developments Ltd 2015
Last Edit: Tue,03 Nov 2015