Characters and Strings Representation

Updated on 27 Jul, 202511 mins read 167 views

In addition to numeric data, symbolic data is often required. Symbolic or non-numeric data might include an important message such as “Hello World” a common greeting for first programs. Such symbols are well understood by English language speakers. Computer memory is designed to store and retrieve numbers. Consequently, the symbols are represented by assigning numeric values to each symbol or character.

Character Representation

In a computer, a character is unit of information that corresponds to s symbol such as a letter in the alphabet. Examples of characters include letter, numerical digits, common punctuation marks (such as “.” or “~”), and whitespace.

American Standard Code for Information Interchange

Characters are represented using the American Standard Code for Information Interchange (ASCII). Based on the ASCII table, each character and control character is assigned a numeric value. When using ASCII, the character displayed is based on the assigned numeric value. This only works if everyone agrees on common values, which is the purpose of the ASCII table. For example, the letter “A” is defined as 65 in decimal (0x41). The 0x41 is stored in computer memory, and when displayed to the console, the letter “A” is shown.

Additionally, numeric symbols can be represented in ASCII. For example, “9” is represented as 57 in decimal (0x39) in computer memory. The “9” can be displayed as output to the console. If sent to the console, the integer value 9 (0x9) would be interpreted as an ASCII value which in this case would be a tab.

It is very important to understand the difference between characters (such as “2”) and integers (such a 210). Characters can be displayed to the console, but cannot be used for calculations. Integers can be used for calculations, but cannot be displayed to the console (without changing the representation).

A Character is typically stored in a byte (8-bits) of space. This works well since memory is byte addressable.

Unicode

It should be noted that Unicode is a current standard that includes support for different languages. The Unicode Standard provides series of different encoding schemes (UTF-8, UTF-16, UTF-32, etc.) in order to provide a unique number of every character, no matter what platform, device, application or language. In the most common encoding scheme, UTF-8, and the ASCII text looks exactly the same in UTF-8 as it did in ASCII. Additional bytes are needed for other characters as needed.

String Representation

A string is a series of ASCII character, typically terminated with a NULL. The NULL, is non-printable ASCII control character. Since it is not printable, it can be used to mark the end of a string.

For example, the string “Hello” would be represented as follows:

Character	“H”	“e”	“l”	“l”	“o”	NULL
ASCII Value (decimal)	72	101	108	108	111	0
ASCII Value (hex)	0x48	0x65	0x6C	0x6C	0x6F	0x0

A string consist partially or completely of numeric symbols. For example, the string “19653” would be represented as follows:

Character	“1”	“9”	“6”	“5”	“3”	NULL
ASCII Value (decimal)	49	57	54	53	51	0
ASCII Value (hex)	0x31	0x39	0x36	0x35	0x33	0x0

Again, it is very important to understand the difference between the string “19653” (using 6 bytes) and the single integer 19,653(decimal) (which can be stored in a single word, which is 2 bytes).

Your email address will not be published. Required fields are marked *

Characters and Strings Representation

Character Representation

American Standard Code for Information Interchange

Unicode

String Representation

Leave a comment

Popular Posts

Variadic Function Working in C

How Characters are Stored in Memory

Tags

Quick links

Newsletter