Chars

In programming, numbers and true/false values are essential, but what about the need to represent individual letters, numbers, symbols, or even whitespace? Enter the char data type, specially designed to hold a single character.

Understanding Char

A character can be a single letter, number, or whitespace.

The char data type is an integral type, meaning the underlying value is stored as an integer. Similar to how a Boolean value 0 is interpreted as false and non-zero is interpreted as true, the integer stored by a char variable are interpreted as an ASCII character.

ASCII stands for American Standard Code for Information Interchange, and it defines a particular way to represent English characters (plus a few other symbols) as numbers between 0 and 127 (called an ASCII code or code point). For example, ASCII code 97 is interpreted as the character a.

Character literals are always placed between single quotes (e.g. ‘g’, ‘1’, ‘ ’).

Char

The char data type is a tailor-made to store single characters. A character, in this context, can be any single letter, digit, symbol, or even whitespace. Unlike integers and floating-point numbers, char is singular in focus, aiming to capture the essence of individual characters.

char myChar1 = 'A';
char myChar2{ 'a' }; // initialize with code point for 'a' (stored as integer 97) (preferred)

You can initialize chars with integers as well, but this should be avoided if possible.

char ch2{ 'a' }; // initialize with code point for 'a' (stored as integer 97) (preferred)

Be careful not to mix up character numbers with integer numbers. The following two initializations are not the same:
char ch{5}; // initialize with integer 5 (stored as integer 5)
char ch{'5'}; // initialize with code point for '5' (stored as integer 53)

ASCII and Unicode Representation

The char data type is usually used to store ASCII values. Each ASCII character is represented by an integer between 0 and 127. For example:

CharacterASCII Value
'A'65
'a'97
'0'48
' ' (space)32

When you assign a character like 'A' to a char variable, the system stores the ASCII value 65.

Example:

// In C
#include <stdio.h>

int main()
{
  char ch = 'A';
  printf("%d", ch);
  return 0;
  
}


// Output

65
// In C++
#include <iostream>

using namespace std;

int main()
{
  char ch = 'A';
  cout << (int)ch;
  return 0;
  
}

// Output
65

Printing chars

When using std::cout to print a char, std::cout outputs the char variables as an ASCII character:

#include <iostream>

int main()
{
    char ch1{ 'a' }; // (preferred)
    std::cout << ch1; // cout prints character 'a'

    char ch2{ 98 }; // code point for 'b' (not preferred)
    std::cout << ch2; // cout prints a character ('b')


    return 0;
}

// Output
ab

We can also output char literals directly:

std::cout << 'c';

// Output
c

Inputting chars

The following program asks the user to input a character, then prints out the character:

#include <iostream>

int main()
{
    std::cout << "Input a keyboard character: ";

    char ch{};
    std::cin >> ch;
    std::cout << "You entered: " << ch << '\n';

    return 0;
}

// Output
Input a keyboard character: q
You entered: q

Char size, range and default sign

Char is defined by C++ to always be 1 byte in size. By default, a char may be signed or unsigned (though it's usually signed). If you are using chars to hold ASCII characters, you don't need to specify a sign (since both signed and unsigned chars can hold values between 0 and 127).

If you are using char to hold small integers (something you should not do unless you are explicitly optimizing for space), you should always specify whether it is signed or unsigned. A signed can hold a number between -128 to 127. An unsigned char can hold a number between 0 to 255.

Escape Sequences

There are some characters in C++ that have special meaning. These characters are called escape sequences. An escape sequence starts with a \ (backslash), character, and then a following letter or number.

You have already seen the most common escape sequence: \n, which can be used to print a newline:

#include <iostream>

int main()
{
    int x { 7 };
    std::cout << "The value of x is: " << x << '\n'; // standalone \n goes in single quotes
    std::cout << "First line\nSecond line\n";        // \n can be embedded in double quotes
    return 0;
}

// Output
The value of x is 7
First line
Second line

Another commonly used escape sequence is \t, which embeds a horizontal tab:

#include <iostream>

int main()
{
    std::cout << "First part\tSecond part";
    return 0;
}

// Output
First part		Second Part

Three other notable escape sequences are:

  • \' prints a singe quote
  • \" prints a double quote
  • \\ prints a backslash

Newline (\n) vs std::endl

TODO.

Difference between putting symbols in single and double quotes?

Single quotes are used to represent individual characters. For example:

char myChar = 'A';

The character literal 'A' signifies the character A and is enclosed in single quotes.

Double quotes are used to represent strings, which are essentially sequences of characters. (string of multiple characters) For example:

const char* myString = "Hello, World!";

The double-quoted literal "Hello, World!" represents a null-terminated string.

Put stand-alone chars in single quotes (e.g., ‘t’ or ‘\n’, not “t” or “\n”). This helps the compiler optimize effectively.

Avoid multichar literals

For backwards compatibility reasons, many C++ compilers support multicharacter literals, which are char literals that contain multiple characters (e.g. ‘56’). If supported, these have an implementation-defined values (meaning it varies depending on the compiler). Because they are not part of the C++ standard, and their value is not strictly defined, multicharacter literals should be avoided.

Avoid multicharacter literals, example: ‘56’

#include <iostream>

int add(int x, int y)
{
	return x + y;
}

int main()
{
	std::cout << add(1, 2) << '/n';

	return 0;
}

The programmer expects this program to print the value 3 and a newline. But instead, on author's machine, it outputs the following.

312142

The issue here is that programmer accidently used ‘/n’ (a multicharacter literal consisting of forward slash and an 'n' character) instead of '\n' (the escape sequence for a newline). The program first prints3 (the result of add(1, 2)) correctly. But then it prints the value of /n, which on the author's machine had the implementation-defined value 12142.