Chapter 3 : String Handling
String Functions
Probably the most frequently used string functions in BBC BASIC are LEFT$, MID$ and RIGHT$. In BASIC V all three functions have enhanced functionality offering additional facilities. The existing interpretations remain as before. Both LEFT$ and RIGHT$ may now take a single character string as their sole parameter. For example, given:
data$="Appalachians"
then the assignment:
one$=LEFT$(data$)
would assign the character string "Appalachian" to the variable one$. Effectively, the string specified is reduced by one character from the right, and avoids the previous requirement of:
one$=LEFT$(data$,LEN(data$)-1)
The equivalent use of RIGHT$ is used as follows:
two$=RIGHT$(data$)
and would result in two$ being assigned the right-most character of data$. Thus given a string data$, there is a simple relationship between these functions in the form:
data$=LEFT$(data$)+RIGHT$(data$)
One possible use of these new formats is in stripping characters from the right-hand end of a string. This process is the reverse of padding out a string for justification or for file handling using fixed-length fields. For example, a suitable function could be written as:
DEF FNstrip(data$,pad$)
WHILE RIGHT$(data$)=pad$
data$=LEFT$(data$)
ENDWHILE
where pad$ is the character selected previously for padding out the string in the first place (this could be space - ASCII 32 - or any other character depending on the context). Notice, too, the efficiency of the new WHILE construction for this purpose (see Chapter Four), compared with REPEAT...UNTIL. Even if the string specified contains no pad characters the function still works correctly, where as this would need to be treated as a special case using REPEAT...UNTIL, which always executes a loop at least once.
A new variation on the MID$ function has also been provided. It differs from other forms in that the MID$ function appears on the left-hand side of an assignment statement. Again, this is most easily explained by means of an example. Given:
town$="Newcastle-under-Lyme"
the assignment:
MID$(town$,11,10)="upon-Tyne"
would result in the variable town$ containing the string "Newcastle-upon-Tyne".
To the right of the equals sign may be any expression which evaluates to a string. The characters are used to replace specified characters within the string variable supplied as the first parameter in the MID$ function. The second parameter indicates the position of the first character to be replaced, while the third parameter, which is optional, specifies the maximum number of characters to be replaced. If this third parameter is omitted, then all the following characters may potentially be replaced. Note, however, that it is the length of the string contained within the first string variable that determines how many characters will be replaced, not the number of characters to the right of the '=' sign. If the example above had been given the other way round as:
town$="Newcastle-upon-Tyne"
then the assignment:
MID$(town$,11,9)="under-Lyme"
would result in town$ containing just "Newcastle-under-Lyme". Starting on the 11th character of the original string assigned to town$, there are only 9 characters marked to be replaced. The string "under-Lyme" contains 10 characters, so the last one remains unused.
Another way of looking at this use of MID$ is to consider how the same function might have been written previously. Thus:
MID$(string1$,p,q)=string2$
is the equivalent of:
string1$=LEFT(string1$,p-1)+MID$(string2$,p,q)+RIGHT$(string1$,LEN(string1$)-p-q+1)
Although this conveys quite well the idea of taking the left-hand part, replacing the middle part, and then keeping the right-hand part, this would more likely be written as:
string1$=LEFT(string1$,p-1)+MID$(string2$,p,q)+MID$(string1$,p+q)
String Handling in Files
Two further string functions have been enhanced in BASIC V for use where reading data from files. The function:
GET$#C
will read a string of characters from an open file until a linefeed (CHR$10), carriage return (CHR$13), null character (CHR$0) or until end-of-file (EOF# returns true) is found. However, the maximum number of characters that may be read in cannot exceed 255. As with other instructions for reading and writing data, the instruction must specify the channel number of the file to be used, which should already be open, for example:
data$=GET$#C
The ability to read a string from a file with GET$# has been complemented by a new version of the BPUT# function, which may now also be used to write a string to a file. The format is:
BPUT#C,data$
where C is the channel number of the file to be used and data$ is any expression which evaluates to a character string. The resultant string plus a terminating linefeed character (ASCII 10) is sent to the file. If the struct specified is terminated by a semi-colon as in:
BPUT#C,data$;
then the terminating linefeed character is omitted. In either case, the maximum string length is again 255 characters.
Organisation and Allocation of String Storage
The organisation and allocation of memory for string storage in BASIC V has been completely changed compared with the system used for BBC BASIC. This system was potentially wasteful of space if character strings of varying sizes were to be assigned to the same string variable at frequent intervals. If a character string was to be assigned to a string variable, then new space would be allocated from remaining free memory f it was larger than the space previously allocated to that string variable. The old memory allocation would then be 'thrown away', and would effectively be lost to the program for any future use.
Thus, a situation in which progressively larger strings are assigned to a string variable will cause new memory to be assigned each time and the old allocation wasted. This system was 'improved' by two particular techniques. First, new memory was always allocated at four bytes more than the immediate requirement to allow some room for future expansion of that string. Second, if the existing memory was contiguous with free space then it was kept and extended. BASIC programmers, aware of the resulting memory problems, developed various techniques to minimise the effects. The principal method used was to assign to any frequently used string variable, a character string of the maximum length that the program would be called upon to handle. This avoided any need for future allocations of memory space. Of course, if the string was not to grow to that extend, then memory space would be wasted anyway.
New Memory Allocation System
BASIC V implements a totally new method of string storage. When a new character string is to be assigned to a string variable which is greater in length than the space already allocated, then that space is de-allocated and new string space used. However, the discarded string space is now added to a linked list of similar space of the same length. The system maintains a set of linked lists, each one containing string space of the same size.
When new string space is required, BASIC first checks with the appropriate linked list and if there is an entry it takes it. If no space of the right size is available it then assigns string space from free memory. The new system will still extend existing string space if this is contiguous with free memory. However, only the current string length (CLEN) is stored, whereas before, both this and maximum string length (MLEN) were stored for every string. This represents a useful memory saving, particularly for string arrays.
The new system is intended to reduce significantly the loss of memory through the inability to re-use discarded string space, while the multiple linked lists of free string space provide a faster method of locating and assigning string space as required.
String space is always allocated in multiples of words (not bytes), and strings are always aligned on word boundaries. The new system improves on the previous system of memory allocation, but is still less than perfect. Most programs are still likely to generate unusable string space, but the much greater memory of the Archimedes (compared with the older BBC Micros) should reduce the likely incidence of insufficient memory. Certainly there is now much less advantage to be gained from the previous technique of initialising a string to the longest length needed.
|