www.riscos.com Technical Support: |
|
The C programming language has gained a reputation for being portable across machines, while still providing capabilities at a machine-specific level. The fact that a program is written in C by no means indicates the effort required to port software from one machine to another, or indeed from one compiler to another. Obviously the most time-consuming task is porting between two entirely different hardware environments, running different operating systems with different compilers. Since many users of the Acorn C compiler will find themselves in this situation, this chapter deals with a number of issues you should be aware of when porting software to or from our environment. The chapter covers the following:
If you intend your code to be used on a variety of different systems, there are certain aspects which you should bear in mind in order to make porting an easy and relatively error-free process. It is essential to single out items which may make software system-specific, and to employ techniques to avoid non-portable use of such items. In this section, we describe general portability issues for C programs.
The size of fundamental data types such as char, int, long int, short int and float will depend mainly on the underlying architecture of the machine on which the C program is to run. Compiler writers usually implement these types in a manner which best fits the architectures of machines for which their compilers are targeted. For example, Release 5 of the Microsoft C Compiler has int, short int and long int occupying 2, 2 and 4 bytes respectively, where the Acorn C Compiler uses 4, 2 and 4 bytes. Certain relations are guaranteed by the ANSI C Standard (such as the fact that the size of long int is at least that of short int), but code which makes any assumptions regarding implementation-defined issues such as whether int and long int are the same size will not be maximally portable.
A common non-portable assumption is embedded in the use of hexadecimal constant values. For example:
int i; i = i & 0xfffffff8; /* set bottom 3 bits to zero, assuming 32-bit int */
Such non-portability can be avoided by using:
int i; i = i & ~0x07; /* set bottom 3 bits to zero, whatever sizeof(int) */
If you find that some size assumptions are inevitable, then at least use a series of assert calls when the program starts up, to indicate any conditions under which successful operation is not guaranteed. Alternatively, write macros for frequently-used operations so that size assumptions are localised and can be altered locally.
A highly non-portable feature of many C programs is the implicit or explicit exploitation of byte ordering within a word of store. Such assumptions tend to arise when copying objects word by word (rather than byte by byte), when inputting and outputting binary values, and when extracting bytes from or inserting bytes into words using a mix of shift-and-mask and byte addressing. A contrived example is the following code which copies individual bytes from an int variable w into an int variable pointed to by p, until a null byte is encountered. The code assumes that w does contain a null byte.
int a; char *p = (char *)&a; int w = AN_ARBITRARY_VALUE; for (;;) { if ((*p++ = w) == 0) break; w >>= 8; }
This code will only work on a machine with even (or little-endian) byte-sex, and so is not portable. The best solution to such problems is either to write code which does not rely on byte-sex, or to have different code to deal appropriately with different byte-sex and to compile the correct variant conditionally, depending on your target machine architecture.
The only guarantee given in the ANSI C Standard regarding alignment of members of a struct, is that a 'hole' (caused by padding) cannot exist at the beginning of the struct. The values of 'holes' created by alignment restrictions are undefined, and you should not make assumptions about these values. In particular, two structures with identical members, each having identical values, will only be considered equal if field-by-field comparison is used; a byte-by-byte, or word-by-word comparison may not indicate equality.
This may also have implications on the size requirements of large arrays of structs. Given the following declarations:
#define ARRSIZE 10000 typedef struct { int i; short s; } ELEM; ELEM arr[ARRSIZE];
this may require significantly different amounts of store under, say, a compiler which aligns ints on even boundaries, as opposed to one which aligns them on word boundaries.
A deficiency of the original definition of C, and of its subsequent use, has been the relatively unrestrained interchanging between pointers to different data types and integers or longs. Much existing code makes the assumption that a pointer can safely be held in either a long int or int variable. While such an assumption may indeed be true in many implementations on many machines, it is a highly non-portable feature on which to rely.
This problem is further compounded when taking the difference of two pointers by performing a subtraction. When the difference is large, this approach is full of possible errors. For this purpose, ANSI C defines a type ptrdiff_t, which is capable of reliably storing the result of subtracting two pointer values of the same type; a typical use of this mechanism would be to apply it to pointers into the same array.
Whilst the evaluation of operands to such operators as && and || is defined to be strictly left-to-right (including all side-effects), the same does not apply to function argument evaluation. For example, in the function call f(i, i++);, the issue of whether the post-increment of i is performed after the first use of i is implementation-dependent. In any case, this is an unwise form of statement, since it may be decided later to implement f as a macro, instead of a function.
The direct use of operating system calls is, as you would expect, non-portable. If you use code which is obviously targeted for a particular environment, then it should be clearly documented as such, and should preferably be isolated into a system-specific module, which needs to be modified when porting to a new machine or operating system. Pathnames of system files should be #defined and not hard-coded into the program, and, as far as possible, all processing of filenames should be made easy to modify. Many file operations can be written in terms of the ANSI input/output library functions, which will make an application more portable. Obviously, binary data files are inherently non-portable, and the only solution to this problem may be the use of some portable external representation.
The ANSI C Standard has succeeded in tightening up many of the vague areas of K&R C. This results in a much clearer definition of a correct C program. However, if programs have been written to exploit particular vague features of K&R C, then their authors may find surprises when porting to an ANSI C environment. In the following sections, we present a list of what we consider to be the major differences between ANSI and K&R C. These differences are at the language level, and we defer discussion of library differences until a later section. The order in which this list is presented follows approximately relevant parts of the ANSI C Standard Document.
The ordering of phases of translation is well-defined. Of special note is the preprocessor which is conceptually token-based (which does not yield the same results as might naively be expected from pure text manipulation).
A number of new keywords have been introduced with the following meanings:
char *p1 = "hello"; char *p2 = "hello";
p1 and p2 will point at the same store location, where the string hello is held. Programs should not therefore modify literal strings.
ANSI C uses value-preserving rules for arithmetic conversions (whereas K&R C implementations tend to use unsigned-preserving rules). Thus, for example:
int f(int x, unsigned char y) { return (x+y)/2; }
does signed division, where unsigned-preserving implementations would do unsigned division.
Aside from value-preserving rules, arithmetic conversions follow those of K&R C, with additional rules for long double and unsigned long int. It is now also possible to perform float arithmetic without widening to double. Floating-point values truncate towards zero when they are converted to integral types.
It is illegal to attempt to assign function pointers to data pointers and vice versa (even using explicit casts). The only exception to this is the value 0, as in:
int (*pfi)(); pfi = 0;
Assignment compatibility between structs and unions is now stricter. For example, consider the following:
struct {char a; int b;} v1; struct {char a; int b;} v2; v1 = v2; /* illegal because v1 and v2 strictly have different types*/
((struct io_space *)0x00ff)->io_buf;
Perhaps the greatest impact on C of the ANSI Standard has been the adoption of function prototypes. A function prototype declares the return type and argument types of a function. For example, int f(int, float); declares a function returning int with one int and one float argument. This means that a function's argument types are part of the type of that function, thus giving the advantage of stricter argument type-checking, especially across source files. A function definition (which is also a prototype) is similar except that identifiers must be given for the arguments. For example, int f(int i, float f);. It is still possible to use 'old style' function declarations and definitions, but you are advised to convert to the 'new style'. It is also possible to mix old and new styles of function declaration. If the function declaration which is in scope is an old style one, normal integral promotions are performed for integral arguments, and floats are converted to double. If the function declaration which is in scope is a new style one, arguments are converted as in normal assignment statements.
Any #include files are passed through steps 1-4 recursively.
The macro __STDC__ is #defined to 1 in ANSI-conforming compilers.
This section discusses the differences apparent when the compiler is used in 'PCC' mode. When the UNIX pcc setup option is enabled, the C compiler will accept (Berkeley) UNIX-compatible C, as defined by the implementation of the Portable C Compiler and subject to the restrictions which are noted below.
In essence, PCC-style C is K&R C, as defined by B Kernighan and D Ritchie in their book The C Programming Language, with a small number of extensions and clarifications of language features that the book leaves undefined.
In UNIX pcc mode, the Acorn C compiler accepts K&R C, but it does not accept many of the old-style compatibility features, the use of which has been deprecated and warned against for many years. Differences are listed briefly below:
struct {int a, b;}; double d; d.a = 0; d.b = 0x....;
This is accepted by some UNIX PCCs and may cause problems when porting old (and badly written) code.
If the verbosity of CC in UNIX pcc mode is found undesirable, all warnings and/or errors can be turned off using the Suppress warnings and/or Suppress errors setup options.
Use of the compiler in UNIX pcc mode precludes neither the use of the standard ANSI headers built in to the compiler nor the use of the run-time library supplied with the C compiler. Of course, the ANSI library does not contain the whole of the UNIX C library, but it does contain almost all the commonly used functions. However, look out for functions with different names, or a slightly different definition, or those in different 'standard' places. Unless the user directs otherwise using Default path, the C compiler will attempt to satisfy references to, say, <stdio.h> from its in-store filing system.
Listed below are a number of differences between the ANSI C Library, and the BSD UNIX library. They are placed under headings corresponding to the ANSI header files:
There are no isascii() and toascii() functions, since ANSI C is not character-set specific.
On BSD systems there are sys_nerr and sys_errlist() defined to give error messages corresponding to error numbers. ANSI C does not have these, but provides similar functionality via perror(const char *s), which displays the string pointed to by s followed by a system error message corresponding to the current value of errno.
There is also char *strerror(int errnum) which, when given a purported value of errno, returns its textual equivalent.
The #defined value HUGE, found in BSD libraries, is called HUGE_VAL in ANSI C. ANSI C does not have asinh(), acosh(), atanh().
In ANSI C the signal() function's prototype is:
extern void (*signal(int, void(*func)(int)))(int);
signal() therefore expects its second argument to be a pointer to a function returning void with one int argument. In BSD-style programs it is common to use a function returning int as a signal handler. The PCC-style function definitions shown below will therefore produce a compiler warning about an implicit cast between different function pointers (since f() defaults to int f()). This is just a warning, and correct code will be generated anyway.
f(signo) int signo; { ......... } main() { extern f(); signal(SIGINT, f); }
sprintf() now returns the number of characters 'printed' (following UNIX System V), whereas the BSD sprintf() returns a pointer to the start of the character buffer.
The BSD functions ecvt(), fcvt() and gcvt() are not included in ANSI C, since their functionality is provided by sprintf().
On BSD systems, string manipulation functions are found in strings.h, whereas ANSI C places them in <string.h>. The Acorn C Compiler also has strings.h for PCC-compatibility.
The BSD functions index() and rindex() are replaced by the ANSI functions strchr() and strrchr() respectively.
Functions which refer to string lengths (and other sizes) now use the ANSI type size_t, which in our implementation is unsigned int.
malloc() returns void *, rather than the char * of the BSD malloc().
A new header added by ANSI giving details of floating point precision etc.
A new header added by ANSI to give maximum and minimum limit values for data types.
A new header added by ANSI to provide local environment-specific features.
When porting an application, the most extensive changes will probably need to be made at the operating system interface level. The following is a brief description of aspects of RISC OS and Acorn C which differ from systems such as UNIX and MS-DOS.
The most apparent interface between a C program and its environment is via the arguments to main(). The ANSI Standard declares that main() is a function defined as the program entry point with either no arguments or two arguments (one giving a count of command line arguments, commonly called int argc, the other an array of pointers to the text of the arguments themselves, after removal of input/output redirection, commonly called char *argv[]). As discussed in the Environment (A.6.3.2), Acorn C supports the style of input/output redirection used by UNIX BSD4.3, but does not support filename wildcarding. Further parameters to main() are not supported.
Under UNIX and MS-DOS, it is common to use a third parameter, normally called char *environ[] under UNIX and char *envp[] under Microsoft C for MS-DOS, to give access to environment variables. The same effect can be achieved in our system by using getenv() to request system variable values explicitly; the names of these variables are as they appear from a RISC OS *Show command. The string pointed at by argv[0] is the program name (similar to UNIX and MS-DOS, except the name is exactly that typed on invocation, so if a full pathname is used to invoke the program, this is what appears in argv[0]).
File naming is one of the least portable aspects in any programming environment. RISC OS uses a full stop (.) as a separator in pathnames and does not support filename extensions (nor does UNIX, but existing UNIX tools make assumptions about file naming conventions). The best way to simulate extensions is to create a directory whose name corresponds to the required extension (in a manner similar to the use of c and h directories for C source and header files). RISC OS filename components are limited to 10 characters.
The Acorn C compiler has support for making Software Interrupt (SWI) calls to RISC OS routines, which can be used to replace any system calls which you make under UNIX or MS-DOS. The include file kernel.h has function prototypes and appropriate typedefs for issuing SWIs. Briefly, the type _kernel_swi_regs allows values to be placed in registers R0-R9, and _kernel_swi() can then be used to issue the SWI; a list of SWI numbers can be found in the include file swis.h. File information, for example, can be obtained in a way similar to stat() under UNIX, by making an OS_GBPB SWI with R0 set to the reason code 11 (full file information). Most of the UNIX/MS-DOS low-level I/O can be simulated in this way, but the ANSI C run-time library provides sufficient support for most applications to be written in a portable style.
You'll find some more information on kernel.h in comments within the header file itself.
RISC OS does not support different memory models as in MS-DOS, so programs which have been written to exploit this will need modification; this should only require the removal of Microsoft C keywords such as near, far and huge, if the program has otherwise been written with portability in mind.