Both the original C language and ANSI C handle vectors of indefinite length, but neither caters for multidimensional arrays whose bounds are determined during execution. Such a facility is especially useful for numerical library routines, but is currently painful to program. Many of the issues, and some suggested language changes, were discussed by Tom MacDonald in J. C Language Translation 1:3 (December, 1989, pp. 215-233). Some compilers, for example the GCC compiler from the Free Software Foundation, already have extensions in this area. This note discusses problems in these approaches, and offers a differing proposal for an extension.
Previous thoughts (including my own) have fixed on the idea of allowing general expressions, and not merely constants, as bounds for automatic arrays. Although this extension, with careful restrictions, seems to fit into the structure of the existing language, and is useful by itself, it presents implementation difficulties in some run-time systems. Moreover, I argue that this proposal alone does not properly resolve the question of function parameters that involve varying arrays. The rules for both the GCC and MacDonald schemes are difficult to use and comprehend, and are difficult to formalize even to the level of the current ANSI standard; in particular, the type calculus for variable-sized arrays is murky for both. In the existing ANSI C language, the type and value of an object p suffice to determine the evaluation of operations on it. In particular, if p is a pointer, the code generated for expressions like p[i] and p[i][j] depend only on its type, because any necessary array bounds are part of the type of p. In the MacDonald and GCC extensions, the values of non-constant array bounds are not tied firmly to its type.
My proposal, by contrast, is to avoid allowing variable-sized arrays as declared objects, but to provide pointers to such arrays; the pointers carry the array bounds inside themselves. This will preserve the property that pointer arithmetic and dereferencing depend only on the type and value of the pointer. Allocation of varying-size arrays can be done by already existing facilities.
I intend that the proposed amendments to the Standard do not change any currently conforming programs.
To simplify the description, let us call the types `array of T,' `array of array of T,' and so forth matrix types.
The first extension permits array declarators to be written
D [ ? ]
The second extension permits any integral expression to appear as a bound in an array declarator that occurs in a cast (but nowhere else). Such a cast may appear only in a function, and the expression is evaluated in the scope in which the cast appears. The type of the resulting array is the same as if the bound had been the constant expression of equal value. (The implied constraints here are not statically checkable.)
The final extension specifies that the sizeof operator, when applied to an expression involving a pointer to an adjustable matrix, need not yield a constant; there is a corresponding constraint that the value of such a sizeof expression is undefined unless the value of the pointer is defined. (This implies that the argument of sizeof must be evaluated when its type involves an adjustable array.)
Pointers to adjustable matrices (that is, matrices at least one bound of which is adjustable) are `fat pointers' whose representation will include the space to store the adjustable bounds at run-time. It is crucial to arrange the rules so that these bounds can always be filled in properly. Therefore, the following restrictions are necessary.
The type of an adjustable array of an element type is compatible only with another adjustable array, and only if the element types are compatible. In particular, an adjustable array is not compatible with an ordinary array or an incomplete array.
Pointers to adjustable matrices may be assigned (or initialized or passed as arguments). If p is a pointer to an adjustable matrix, then p = q is permitted if the types are compatible, or if q fails to be compatible with p only because one matrix has a known array bound where the other has an adjustable bound. If the LHS p has a known bound at a particular position and q has an adjustable bound, it is necessary (though not statically checkable) that the actual bound for q at that position be the equal to the known bound of p.
Although the ANSI standard made void * pointers a generic type for pointers to objects, it exempted function pointers from this universality. Similarly, this proposal forbids automatic coercion of void * pointers to pointers to adjustable matrices; a cast specifying the bounds must be used.
These rules for casts and assignments are not simple generalizations of existing rules; they are formulated so that no pointer to an adjustable matrix can be constructed unless there is a way of finding the appropriate bounds at run time.
A function with a general two-dimensional array as a
parameter might be declared
void f1(int (*a)[?][?]);
void f1(int (*a)[?][?]) { int i, j; ... (*a)[i][j] ... }
sizeof(*a) / sizeof((*a)[0]) sizeof((*a)[0]) / sizeof((*a)[0][0])
#define UPB1(x) (sizeof(*x) / sizeof((*x)[0])) #define UPB2(x) (sizeof((*x)[0]) / sizeof((*x)[0][0]))
int (*b)[?][?] = (int (*)[UPB1(a)][UPB2(a)])malloc(sizeof(*a));
int aa[100][100]; f1(&aa);
The ugliest aspect of this formulation is that uses of the
argument need to be explicitly dereferenced; we must write
(*a)[i][j]
inside of
f1.
One way of ameliorating the effect, within the function,
is to generate a pointer to the first row of the array:
void f1(int (*a)[?][?]) { int (*ap)[?] = *a; ... ap[i][j] ... }
void f2(int a[][?]) { ... a[i][j] ... }
void f2(int (*a)[?]) { ... a[i][j] ... }
int aa[100][100]; f2(aa);
Previous attempts to generalize C to adjustable arrays have
taken the apparently straightforward approach of allowing
array bounds (in carefully chosen circumstances) to be
arbitrary expressions, instead of constants.
I suggest that the straightforwardness is only apparent,
and that the necessary rules are in fact quite complicated;
neither the FSF nor MacDonald have really worked out their
consequences.
For example, the GCC language extension envisions function definitions like
int f(int n, int m, int (*a)[n][m]) { ... }
For example, what is the type of this function
f,
and how does one write a
prototype for it?
In GCC, one says
void f(int, int, int (*)[][]);
void f(int, int, int [*][*]);
Other problems occur because of the way the function
definition is written. In the example,
void f(int n, int m, int (*a)[n][m]) { ... }
void f(int (*a)[n][m], int n, int m) { ... }
MacDonald suggests rules that cause all the arguments
be processed simultaneously, or in two passes, but the rules would be messy,
and dissimilar to the approach taken in the rest of the language.
One example of the problems
is arguments that depend on each other:
void g((*p)[sizeof(*q)], (*q)[sizeof(*p)]) { ... }
This paper proposes to extend C by allowing pointers to adjustable arrays and arranging that the pointers contain the array bounds necessary to do subscript calculations and compute sizes. In this way, the problem of handling variable-size array references is solved in a way that can be made consistent with the rest of the language. By contrast, the apparently most obvious language generalization (that is, allowing general expressions instead of constants in array bounds) complicates the type structure of the language, causes difficulties with the calculus of datatypes, and causes nonuniformity in discovering the sizes of arrays.