In addition to numeric data, symbolic data is often required. Symbolic or non-numeric data might include an important message such as “Hello World” a common greeting for first programs. Such symbols are well understood by English language speakers. Computer memory is designed to store and retrieve numbers. Consequently, the symbols are represented by assigning numeric values to each symbol or character.
Character Representation
In a computer, a character is unit of information that corresponds to s symbol such as a letter in the alphabet. Examples of characters include letter, numerical digits, common punctuation marks (such as “.” or “~”), and whitespace.
American Standard Code for Information Interchange
Characters are represented using the American Standard Code for Information Interchange (ASCII). Based on the ASCII table, each character and control character is assigned a numeric value. When using ASCII, the character displayed is based on the assigned numeric value. This only works if everyone agrees on common values, which is the purpose of the ASCII table. For example, the letter “A” is defined as 65 in decimal (0x41). The 0x41 is stored in computer memory, and when displayed to the console, the letter “A” is shown.
Additionally, numeric symbols can be represented in ASCII. For example, “9” is represented as 57 in decimal (0x39) in computer memory. The “9” can be displayed as output to the console. If sent to the console, the integer value 9 (0x9) would be interpreted as an ASCII value which in this case would be a tab.
It is very important to understand the difference between characters (such as “2”) and integers (such a 210). Characters can be displayed to the console, but cannot be used for calculations. Integers can be used for calculations, but cannot be displayed to the console (without changing the representation).
A Character is typically stored in a byte (8-bits) of space. This works well since memory is byte addressable.
Unicode
It should be noted that Unicode is a current standard that includes support for different languages. The Unicode Standard provides series of different encoding schemes (UTF-8, UTF-16, UTF-32, etc.) in order to provide a unique number of every character, no matter what platform, device, application or language. In the most common encoding scheme, UTF-8, and the ASCII text looks exactly the same in UTF-8 as it did in ASCII. Additional bytes are needed for other characters as needed.
String Representation
A string is a series of ASCII character, typically terminated with a NULL. The NULL, is non-printable ASCII control character. Since it is not printable, it can be used to mark the end of a string.
For example, the string “Hello” would be represented as follows:
Character | “H” | “e” | “l” | “l” | “o” | NULL |
ASCII Value (decimal) | 72 | 101 | 108 | 108 | 111 | 0 |
ASCII Value (hex) | 0x48 | 0x65 | 0x6C | 0x6C | 0x6F | 0x0 |
A string consist partially or completely of numeric symbols. For example, the string “19653” would be represented as follows:
Character | “1” | “9” | “6” | “5” | “3” | NULL |
ASCII Value (decimal) | 49 | 57 | 54 | 53 | 51 | 0 |
ASCII Value (hex) | 0x31 | 0x39 | 0x36 | 0x35 | 0x33 | 0x0 |
Again, it is very important to understand the difference between the string “19653” (using 6 bytes) and the single integer 19,653(decimal) (which can be stored in a single word, which is 2 bytes).