www.riscos.com Technical Support: |
|
This chapter is split into parts, each of which details certain aspects of Acorn C's implementation of the ANSI C standard.
Appendix A.6 of the standard X3.159-1989 collects together information about portability issues; section A.6.3 lists those points which are implementation defined, and directs that each implementation shall document its behaviour in each of the areas listed. This part corresponds to appendix A.6.3, answering the points listed in the appendix, under the same headings and in the same order.
Identifiers can be of any length. They are truncated by the compiler to 256 characters, all of which are significant (the standard requires a minimum of 31).
The source character set expected by the compiler is 7 bit ASCII, except that within comments, string literals, and character constants, the full ISO 8859-1 8 bit character set is recognised. At run time, the C library processes the full ISO 8859-1 8 bit character set, except that the default locale is the C locale (see the chapter entitled Standard implementation definition). The ctype functions therefore all return 0 when applied to codes in the range 160-255. By calling setlocale(LC_CTYPE,"ISO8859-1") you can cause the ctype functions such as isupper() and islower() to behave as expected over the full 8 bit Latin alphabet, rather than just over the 7 bit ASCII subset.
Upper and lower case characters are distinct in all identifiers, both internal and external.
In -pcc and -fc modes an identifier may also contain a dollar character.
The sizes of data elements are as follows:
Type | Size in bits |
---|---|
char | 8 |
short | 16 |
int | 32 |
long | 32 |
float | 32 |
double | 64 |
long double | 64 (subject to future change) |
all pointers | 32 |
Integers are represented in two's complement form.
Data items of type char are unsigned by default, though they may be explicitly declared as signed char or unsigned char. (In -pcc mode there is no signed keyword, so chars are signed by default and may be declared unsigned if required.) Single-character constants are thus always positive.
Floating point quantities are stored in the IEEE format. In double and long double quantities, the word containing the sign, the exponent and the most significant part of the mantissa is stored at the lower machine address.
The standard defines two header files, limits.h and float.h, which contain constant declarations describing the ranges of values which can be represented by the arithmetic types. The standard also defines minimum values for many of these constants.
The following table sets out the values in these two headers on the ARM, and a brief description of their significance. See the standard for a full definition of their meanings.
Number of bits in smallest object that is not a bit field (ie a byte):
CHAR_BIT 8
Maximum number of bytes in a multibyte character, for any supported locale:
MB_LEN_MAX 1
Numeric ranges of integer types:
The middle column gives the numerical value of each range's endpoint, while the right hand column gives the bit patterns (in hexadecimal) that would be interpreted as this value in C. When entering constants you must be careful about the size and signed-ness of the quantity. Furthermore, constants are interpreted differently in decimal and hexadecimal/octal. See the ANSI standard or any of the recommended textbooks on the C programming language for more details.
Range | End-point | Hex representation |
---|---|---|
CHAR_MAX | 255 | 0xff |
CHAR_MIN | 0 | 0x00 |
SCHAR_MAX | 127 | 0x7f |
SCHAR_MIN | -128 | 0x80 |
UCHAR_MAX | 255 | 0xff |
SHRT_MAX | 32767 | 0x7fff |
SHRT_MIN | -32768 | 0x8000 |
USHRT_MAX | 65535 | 0xffff |
INT_MAX | 2147483647 | 0x7fffffff |
INT_MIN | -2147483648 | 0x80000000 |
UINT_MAX | 4294967295 | 0xffffffff |
LONG_MAX | 2147483647 | 0x7fffffff |
LONG_MIN | -2147483648 | 0x80000000 |
ULONG_MAX | 4294967295 | 0xffffffff |
Characteristics of floating point:
FLT_RADIX | 2 |
FLT_ROUNDS | 1 |
The base (radix) of the ARM floating point number representation is 2, and floating point addition rounds to nearest.
Ranges of floating types:
FLT_MAX | 3.40282347e+38F |
FLT_MIN | 1.17549435e-38F |
DBL_MAX | 1.79769313486231571e+308 |
DBL_MIN | 2.22507385850720138e-308 |
LDBL_MAX | 1.79769313486231571e+308 |
LDBL_MIN | 2.22507385850720138e-308 |
Ranges of base two exponents:
FLT_MAX_EXP | 128 |
FLT_MIN_EXP | (-125) |
DBL_MAX_EXP | 1024 |
DBL_MIN_EXP | (-1021) |
LDBL_MAX_EXP | 1024 |
LDBL_MIN_EXP | (-1021) |
Ranges of base ten exponents:
FLT_MAX_10_EXP
38
| FLT_MIN_10_EXP
| (-37)
| DBL_MAX_10_EXP
| 308
| DBL_MIN_10_EX
| (-307)
| LDBL_MAX_10_EXP
| 308
| LDBL_MIN_10_EXP
| (-307)
| |
Decimal digits of precision:
FLT_DIG | 6 |
DBL_DIG | 15 |
LDBL_DIG | 15 |
Digits (base two) in mantissa:
FLT_MANT_DIG | 24 |
DBL_MANT_DIG | 53 |
LDBL_MANT_DIG | 53 |
Smallest positive values such that (1.0 + x! = 1.0):
FLT_EPSILON | 1.19209290e-7F |
DBL_EPSILON | 2.2204460492503131e-16 |
LDBL_EPSILON | 2.2204460492503131e-16L |
The standard leaves details of the layout of the components of structured data types to each implementation. The following points apply to the Acorn C compiler:
The following remarks apply to pointer types:
When two pointers are subtracted, the difference is obtained as if by the expression:
((int)a - (int)b) / (int)sizeof(type pointed to)
If the pointers point to objects whose size is no greater than four bytes, word alignment of data ensures that the division will be exact in all cases. For longer types, such as doubles and structures, the division may not be exact unless both pointers are to elements of the same array. Moreover the quotient may be rounded up or down at different times, leading to potential inconsistencies.
The compiler performs all of the 'usual arithmetic conversions' set out in the standard.
The following points apply to operations on the integral types:
The following points apply to operations on floating types:
The compiler performs the 'usual arithmetic conversions' (promotions) set out in the standard before evaluating any expression.
The standard sets out certain minimum translation limits which a conforming compiler must cope with; you should be aware of these if you are porting applications to other compilers. A summary is given here. The 'mem' limit indicates that no limit is imposed other than that of available memory.
Description | Requirement | Acorn C | |
---|---|---|---|
Nesting levels of compound statements and iteration/selection control structures | 15 | mem | |
Nesting levels of conditional compilation | 8 | mem | |
Declarators modifying a basic type | 31 | mem | |
Expressions nested by parentheses | 32 | mem | |
Significant characters | |||
in internal identifiers and macro names | 31 | 256 | |
in external identifiers | 6 | 256 | |
External identifiers in one source file | 511 | mem | |
Identifiers with block scope in one block | 127 | mem | |
Macro identifiers in one source file | 1024 | mem | |
Parameters in one function definition/call | 31 | mem | |
Parameters in one macro definition/invocation | 31 | mem | |
Characters in one logical source line | 509 | no limit | |
Characters in a string literal | 509 | mem | |
Bytes in a single object | 32767 | mem | |
Nesting levels for #included files | 8 | mem | |
Case labels in a switch statement | 257 | mem | |
Members in a single struct or union, enumeration constants in a single enum | 127 | mem | |
Nesting of struct/union in a single declaration | 15 | mem |
Diagnostic messages produced by the compiler are of the form
"source-file", line #: severity: explanation
where severity is one of
The mapping of a command line from the ARM-based environment into arguments to main() is implementation-specific. The shared C library supports the following:
A double quote or backslash character (\) inside double quotes must be preceded by a backslash character. An I/O redirection will not be recognised inside double quotes.
The shared C library supports a pair of interactive devices, both called :tt, that handle the keyboard and the VDU screen:
Using the shared C library, the standard input, output and error streams, stdin, stdout, and stderr can be redirected at runtime in the ways shown below. For example, if mycopy is a compiled and linked program which simply copies the standard input to the standard output, the following line:
*mycopy < infile > outfile 2> errfile
runs the program, redirecting stdin to the file infile, stdout to the file outfile and stderr to the file errfile.
The following shows the allowed redirections:
0< filename | read stdin from filename |
< filename | read stdin from filename |
1> filename | write stdout to filename |
> filename | write stdout to filename |
2> filename | write stderr to filename |
2>&1 | write stderr to wherever stdout is currently going |
>& filename | write both stdout and stderr to filename |
>> filename | append stdout to filename |
>>& filename | append both stdout and strerr to filename |
1>&2 | write stdout to whereever stderr is currently going |
256 characters are significant in identifiers without external linkage. (Allowed characters are letters, digits, and underscores.)
256 characters are significant in identifiers with external linkage. (Allowed characters are letters, digits, and underscores.)
Case distinctions are significant in identifiers with external linkage.
In -pcc and -fc modes, the character '$' is also valid in identifiers.
The characters in the source character set are ISO 8859-1 (Latin-1 Alphabet), a superset of the ASCII character set. The printable characters are those in the range 32 to 126 and 160 to 255. Any printable character may appear in a string or character constant, and in a comment.
The compiler has no support for multibyte character sets.
The ARM C library supports the ISO 8859-1 (Latin-1) character set, so the following points hold:
The representations and sets of values of the integral types are set out in the Data elements. Note also that:
The representations and ranges of values of the floating point types have been given in the Data elements. Note also that:
The ANSI standard specifies three areas in which the behaviour of arrays and pointers must be documented. The points to note are:
In the Acorn C compiler, you can declare any number of objects to have the storage class register. Depending on which variant of the ARM Procedure Call Standard is in use, there are between five and seven registers available. (There are six available in the default APCS variant, as used by RISC OS.) Declaring more than this number of objects with register storage class must result in at least one of them not being held in a register. It is advisable to declare no more than four. The valid types are:
Note that other variables, not declared with the register storage class, may be held in registers for extended periods; and that register variables may be held in memory for some periods.
Note also that there is a #pragma which assigns a file-scope variable to a specified register everywhere within a compilation unit.
The Acorn C compiler handles structures in the following way:
An object that has volatile-qualified type is accessed if any word or byte of it is read or written. For volatile-qualified objects, reads and writes occur as directly implied by the source code, in the order implied by the source code.
The effect of accessing a volatile-qualified short is undefined.
The number of declarators that may modify an arithmetic, structure or union type is limited only by available memory.
A single-character constant in a preprocessor directive cannot have a negative value.
The standard header files are contained within the compiler itself. The mechanism for translating the standard suffix notation to an Acorn filename is described in the CC and C++.
Quoted names for includable source files are supported. The rules for directory searching are given in the CC and C++.
The recognized #pragma directives and their meaning are described in the #pragma directives.
The date and time of translation are always available, so __DATE__ and __TIME__ always give respectively the date and time.
The C library has or supports the following features:
*** assertion failed: expression, file filename, line, line-number
and then calls the function abort().
After the call setlocale(LC_CTYPE,"ISO8859-1") the following statements also apply to character codes and affect the results returned by the ctype functions:
The mathematical functions return the following values on domain errors:
Function | Condition | Returned value |
---|---|---|
log(x) | x <= 0 | -HUGE_VAL |
log10(x) | x <= 0 | -HUGE_VAL |
sqrt(x) | x < 0 | -HUGE_VAL |
atan2(x,y) | x = y = 0 | -HUGE_VAL |
asin(x) | abs(x) > 1 | -HUGE_VAL |
acos(x) | abs(x) > 1 | -HUGE_VAL |
Where -HUGE_VAL is written above, a number is returned which is defined in the header h.math. Consult the errno variable for the error number.
The mathematical functions set errno to ERANGE on underflow range errors.
A domain error occurs if the second argument of fmod is zero, and
-HUGE_VAL returned.
The set of signals for the generic signal() function is as follows:
SIGABRT | Abort |
SIGFPE | Arithmetic exception |
SIGILL | Illegal instruction |
SIGINT | Attention request from user |
SIGSEGV | Bad memory access |
SIGTERM | Termination request |
SIGSTAK | Stack overflow |
The default handling of all recognised signals is to print a diagnostic message and call exit. This default behaviour applies at program start-up.
When a signal occurs, if func points to a function, the equivalent of signal(sig, SIG_DFL); is first executed.
If the SIGILL signal is received by a handler specified to the signal function, the default handling is reset.
The C library also has the following characteristics relating to I/O:
Note also the following points about library functions:
Pragmas recognised by the compiler come in two forms:
#pragma -letter«digit»
and
#pragma «no_»feature-name
A short-form pragma given without a digit resets that pragma to its default state; otherwise to the state specified.
For example:
#pragma -s1 #pragma no_check_stack
#pragma -p2 #pragma profile_statements
The set of pragmas recognised by the compiler, together with their default settings, varies from release to release of the compiler. The current list of recognised pragmas is:
Pragma name | Short form | Short 'No' form | Command line option |
---|---|---|---|
warn_implicit_fn_decls | a1 * | a0 | -Wf |
check_memory_accesses | c1 | c0 * | -zpc0|1 |
warn_deprecated | d1 * | d0 | -Wd |
continue_after_hash_error | e1 | e0 * | |
(FP register variable) | f1-f4 | f0 * | |
include_only_once | i1 | i0 * | |
optimise_crossjump | j1 * | j0 | -zpj0|1 |
optimise_multiple_loads | m1 * | m0 | -zpm0|1 |
profile | p1 | p0 * | -p |
profile_statements | p2 | p0 * | -px |
(integer register variable) | r1-r7 | r0 * | |
check_stack | s0 * | s1 | -zps0|1 |
force_top_level | t1 | t0 * | |
check_printf_formats | v1 | v0 * | |
check_scanf_formats | v2 | v0 * | |
side_effects | y0 * | y1 | |
optimise_cse | z1 * | z0 | -zpz0|1 |
In each case, the default setting is starred.
You can also globally set pragmas by options set in the command line passed to the cc program (see the chapter entitled Command lines); the preferred option to use is shown above. Where no option is shown for a pragma, it is because that pragma may only sensibly be used locally, and should be enabled/disabled around the particular program statements it is to affect.
The pragma continue_after_hash_error in effect implements a #warning ... preprocessor directive. Pragma include_only_once asserts that the containing #include file is to be included only once, and that if its name recurs in a subsequent #include directive then the directive is to be ignored.
The pragma force_top_level asserts that the containing #include file should only be included at the top level of a file. A syntax error will result if the file is included, say, within the body of a function.
The pragmas check_printf_formats and check_scanf_formats control whether the actual arguments to printf and scanf, respectively, are type-checked against the format designators in a literal format string.
Of course, calls using non-literal format strings cannot be checked. By default, all calls involving literal format strings are checked.
The pragmas optimise_crossjump, optimise_multiple_loads and optimise_cse give fine control over where these optimisations are applied. For example, it is sometimes advantageous to disable cross-jumping (the common tail optimisation) in the critical loop of an interpreter; and it may be helpful in a timing loop to disable common subexpression elimination and the opportunistic optimisation of multiple load instructions to load multiples. Note that the correct use of the volatile qualifier should remove most of the more obvious needs for this degree of control (and volatile is also available in the Acorn C compiler's -pcc mode unless -strict is specified).
By default, functions are assumed to be impure, so function invocations are not candidates for common subexpression elimination. Pragma noside_effects asserts that the following function declarations (until the next #pragma side_effects) describe pure functions, invocations of which can be common subexpressions. See also the __pure.
The pragma no_check_stack disables the generation of code at function entry which checks for stack limit violation. In reality there is little advantage to turning off this check: it typically costs only two instructions and two machine cycles per function call. The one circumstance in which no_check_stack must be used is in writing a signal handler for the SIGSTAK event. When this occurs, stack overflow has already been detected, so checking for it again in the handler would result in a fatal circular recursion.
The pragma check_memory_accesses instructs the compiler to precede each access to memory by a call to the appropriate one of:
__rt_rdnchk where n is 1, 2, or 4, for byte, short, or long reads (respectively)
__rt_wrnchk where n is 1, 2, or 4, for byte, short, or long writes (respectively).
The pragmas f0-f4 and r0-r7 have no long form counterparts. Each introduces or terminates a list of extern, file-scope variable declarations. Each such declaration declares a name for the same register variable. For example:
#pragma r1 /* 1st global register */ extern int *sp; #pragma r2 /* 2nd global register */ extern int *fp, *ap; /* Synonyms */ #pragma r0 /* End of global declaration */ #pragma f1 /* 1st global FP register */ extern double pi; #pragma f0 /* End of global declaration */
Any type that can be allocated to a register (see the chapter entitled Registers (A.6.3.8)) can be allocated to a global register. Similarly, any floating point type can be allocated to a floating point register variable.
Global register r1 is the same as register v1 in the ARM Procedure Call Standard (APCS); similarly, r2 equates to v2, and so on. Depending on the APCS variant, between five and seven integer registers (v1-v7, machine registers R4-R10) and four floating point registers (F4-F7) are available as register variables. (There are six integer registers available in the default APCS variant, as used by RISC OS.) In practice it is probably unwise to use more than three global integer register variables and 2 global floating point register variables.
Provided the same declarations are made in each compilation unit, a global register variable may exist program-wide.
Otherwise, because a global register variable maps to a callee-saved register, its value will be saved and restored across a call to a function in a compilation unit which does not use it as a global register variable, such as a library function.
A corollary of the safety of direct calls out of a global-register-using compilation unit, is that calls back into it are dangerous. In particular, a global-register-using function called from a compilation unit which uses that register as a compiler allocated register, will probably read the wrong values from its supposed global register variables.
Currently, there is no link-time check that direct calls are sensible. And even if there were, indirect calls via function arguments pose a hazard which is harder to detect. This facility must be used with care. Preferably, the declaration of global register variable should be made in each compilation unit of the program. See also the __global_reg(n).
Several special function declaration options are available to tell the Acorn C compiler to treat that function in a special way. None of these are portable to other machines.
This allows the compiler to return a structure in registers rather than returning a pointer to the structure. For example:
typedef struct int64_struct { unsigned int lo; unsigned int hi; } int64;
__value_in_regs extern int64 mul64(unsigned a, unsigned b);
See the chapter entitled ARM procedure call standard of the Desktop Tools guide for details of the default way in which structures are passed and returned.
By default, functions are assumed to be impure (i.e. they have side effects), so function invocations are not candidates for common subexpression elimination. __pure has the same effect as pragma noside_effects, and asserts that the function declared is a pure function, invocations of which can be common subexpressions.
Allocates the declared variable to a global integer register variable, in the same way as #pragma rn. The variable must have an integral or pointer type. See also the Global (program-wide) register variables.
Allocates the declared variable to a global floating point register variable, in the same way as #pragma fn. The variable must have type float or double. See also the Global (program-wide) register variables.
Note that the global register, whether specified by keyword or pragmas, must be declared in all declarations of the same variable. Thus:
int x; __global_reg(1) x;
is an error.