Understanding Complex C/C++ Declarations

Understanding Complex C/C++ Declarations

For novice programmers diving into the world of C/C++ programming, understanding complex declaration might seem like a daunting task. With asterisk, parentheses, and qualifiers, these declarations can appear cryptic at first glance. But fear not! With a few simple guidelines, complex C/C++ declarations becomes much more manageable.

C declarations in the C programming have a reputation for being difficult to understand. Designed over 50 years ago, the language's creators didn't prioritize making declarations easy to grasp. Take, for example, the declaration:

int *p[5];

How should we read this? Is p an array of four elements, each pointing to an integer, or is it a pointer to an array of four integers?

Let's break it down:

Declarator

A declarator is a simple identifier (also called variable name), an array identifier (also called array variable name), a function name, or a pointer to any of the above, optionally followed by an equal sign and initial value or values. For example:

int a = 0;
int b[4] = {1, 2, 3, 4};
int c();
int *d;
int *e[4];
int *f();

Above declarators are all valid.

There may be any number of pointers, such as ***g, any number of array dimensions, such as h[1][2][3], but one pair of function parentheses. The declarator func()() is invalid. The declarators (*p)()[], and (*p)[]() are also invalid.

Type Specifier

This specifies the data type of the declared entity, like char, double, float, int, long, signed, unsigned, enum, and union. The keywords struct and union declare complex types.

Storage Class

The storage class of a variable tells a compiler how to allocate memory for that variable. There are five storage classes.

  1. auto
  2. extern
  3. register
  4. static
  5. typdef

The typedef storage class doesn't tell a compiler about memory allocation. It only defines a new name for a data type.

Type Qualifier

A type qualifier in C/C++ is a keyword that modifies the properties of a data type. Type qualifiers provide additional information about how the data can be accessed or manipulated.

1 const: This qualifier indicates that the data associated with the declared identifier cannot be modified. For example:

const int x = 5;

Here, x is a constant integer whose value cannot be changed.

2 volatile: This qualifier indicates that the data associated with the declared identifier may change unexpectedly, often due to external factors such as hardware interrupts or multi-threading. For example:

volatile int sensor_reading;

Here, sensor_reading is a volatile integer that may change outside of the program's control.

3 restrict: This qualifier is used in pointer declarations to convey to the compiler that the memory regions pointed to by different restrict-qualified pointers do not overlap. This can enable the compiler to perform certain optimizations. For example:

void func(int *restrict arr1, int *restrict arr2);

Here, arr1 and arr2 are pointers to integer arrays, and the restrict qualifier indicates that they do no overlap in memory.

4 _Atomic: Introduced in C11, this qualifier is used to specify atomic types, which are types that can be accessed and modified automatically in a multi-threaded environment without causing data races. For example:

_Atomic int atomic_counter;

Here, atomic_counter is an atomic integer that can be safely accessed and modified by multiple threads simultaneously.

Type qualifiers provide valuable information to both programmers and compilers, helping ensure code correctness, optimize performance, and manage concurrency effectively in multi-threaded environments.

Type Qualifier Rule

If a type qualifier or qualifiers appear next to a type specifier (such as int, char, float, etc.), it applies to that type specifier. Otherwise, it applies to the asterisk pointer to its immediate left. The restrict qualifier only applies to pointers.

Consider the following declaration:

int const *ptr;
const int *ptr2;

The const keyword is next to a type specifier (int) in both declarations, therefore it applies to the type and not to pointer asterisk.

Consider another declarations:

char * const ptr3;

Here, the const keyword is not next to the type specifier, hence it applies to the pointer asterisk to its immediate left. So it considered as the constant pointer to a character.

Rules for Understanding C/C++ Declarations

  1. Locate the Identifier: Start by identifying the identifier (variable or function name) in the declaration. This serves as the anchor point for understanding the declaration.
  2. Follow Precedence Rules: Apply precedence rules to determine the order of operations within the declaration. The following precedence rules should be followed:
    1. Rule 1: Read postfix operators (e.g., square brackets for arrays, parentheses for function parameters) from left to right until reaching a semicolon or an unmatched closing parenthesis.
    2. Rule 2: Read prefix asterisk (*) operators (indicating pointers) from right to left until reaching the beginning of the declaration or an opening parenthesis.
  3. Understand Type Qualifiers: If a type qualifier or qualifiers appear next to a type specifier (int, char, float, double, etc.) it applies to that type-specifier. Otherwise, it applies to the asterisk pointer to its immediate left. The type qualifier restrict only applied to pointers.

Basic Type Specifiers:

  1. char
  2. signed char
  3. unsigned char
  4. short
  5. unsigned short
  6. int
  7. unsigned int
  8. long
  9. unsigned long
  10. float
  11. double
  12. void
  13. struct tag
  14. union tag
  15. enum tag
  16. long long
  17. unsigned long long
  18. long double

A declaration can have exactly one basic type, and it's always on the far left of the expression.

The “basic types” are augmented with “derived types”, C/C++ has three of them:

  1. *  = pointer to…
    1. This is denoted by the familiar * character, and it should be self evident that a pointer always has to point to something.
  2. [] = array of…
    1. Array of can be undimensioned – [] – or dimensioned – [15] – but the size don't really play significantly into reading a declaration. We typically include the size in the description. It should be clear that arrays have to be arrays of something.
  3. () = function returning…
    1. This is usually denoted by a pair of parentheses together – () – though it's also possible to find a prototype parameter list inside.

Application of Rules

Let's apply the above rules to understand the very first declaration we talked about, i.e. int *p[4];.

The first identifier in the above declaration is p.

  • We apply Rule 1 and read the postfix operator (in this case square brackets indicating an array) till we reach the semicolon, “pis an array of 4 …”.
  • Since a semicolon marks the end of a declaration, we stop the application of Rule 1 and apply Rule 2. We read the prefix operator (asterisk indicating a pointer), preceded by the type specifier int, till we reach the beginning of the declaration, “pis an array of 4 pointers to integers”.

Another example:

int (*p)[4];

Let's start with p and read the declaration.

  • We attempt to apply Rule 1 but do not find any postfix operators. We find an unmatched closing parenthesis. So Rule 1 is not applicable because of closing parenthesis.
  • We apply Rule 2 to the prefix asterisk operator (pointer) till we reach the opening parenthesis and read, “pis a pointer to …”.
  • We have read whatever we found inside the parenthesis and apply Rule 1 to the part of the declaration outside the parenthesis. We find a postfix operator (in this case the square brackets indicating any array), followed by the semicolon indicating the end of the declaration, and read, “pis a pointer to an array of 4 …”.
  • We have reached the end of the declaration but still have a part of the declaration to read. We apply Rule 2 but do not see any prefix operator. Instead, we find the type specifier int before reaching the beginning of the declaration line, and read, “p is a pointer to array of 4 integers.”.

Operator Precedence

The “array of ” [] and “function returning” () type operators have higher precedence than “pointer to” *, and this leads to some fairly straightforward rules for decoding.

Always start with the variable name:

foo is

and always end with the basic type:

foo is … int

The “filling in the middle” part is usually the trickier part, but it can be summarize with this rule:

go right when you can, go left when you must

A Simple Example

Let's start with a simple example:

long **foo[7];

We will approach this systematically, focusing on just one or two small part as we develop the description in English.

1 long **foo [7];

Start with the variable name and end with the basic type:

foo islong

2 long** foo [7];

At this point, the variable name is touching two derived types: “array of 7” and “pointer to”, and the rule is to go right when you can, so this case we consume the “array of 7”

foo isarray of 7long

3 long   **foo[7] ;

Now we have gone as far right as possible, so the innermost part is only touching the “pointer to” - consume it.

foo is array of 7pointer tolong

4 long ** foo[7] ;

The innermost part is now only touching a “pointer to”, so consume it also.

foo is array of 7 pointer topointer tolong

References

Reading C type declarations (unixwiz.net)

Decoding C Declarations (educative.io)