Chapter 5Character ExpressionsIn This Chapter▶ Defining character variables and constants▶ Encoding characters▶ Declaring a string▶ Outputting characters to the consoleChapter 4 introduces the concept of the integer variable. This chapter introduces the integer’s smaller sibling, the character or char (pronounced variously as care, chair, or as in the first syllable of charcoal) to usinsiders. I have used characters in programs appearing in earlier chapters —
now it’s time to introduce them formally.Defining Character VariablesCharacter variables are declared just like integers except with the keywordchar in place of int:char inputCharacter;Character constants are defined as a single character enclosed in singlequotes, as in the following:char letterA = ‘A’;This may seem like a silly question, but what exactly is “A”? To answer that, Ineed to explain what it means to encode characters.
60 Part II: Writing a Program: Decisions, DecisionsEncoding charactersAs I mentioned in Chapter 1, everything in the computer is represented bya pattern of ones and zeros that can be interpreted as numbers. Thus, the
bit pattern 0000 0001 is the number 1 when interpreted as an integer.However, this same bit pattern means something completely different when
interpreted as an instruction by the processor. So it should come as no surprise that the computer encodes the characters of the alphabet by assigning
each a number.
Consider the character ‘A’. You could assign it any value you want as long as
we all agree. For example, you could assign a value of 1 to ‘A’, if you wanted
to. Logically, you might then assign the value 2 to ‘B’, 3 to ‘C’, and so on. In
this scheme, ‘Z’ would get the value 26. You might then start over by assigning the value 27 to ‘a’, 28 to ‘b’, right down to 52 for ‘z’. That still leaves the
digits ‘0’ through ‘9’ plus all the special symbols like space, period, comma,
slash, semicolon, and the funny characters you see when you press the
number keys while holding Shift down. Add to that the unprintable characters like tab and newline. When all is said and done, you could encode the
entire English keyboard using numbers between 1 and 127.
I say “you could” assign a value for ‘A’, ‘B’, and the remaining characters;however, that wouldn’t be a very good idea because it has already been
done. Sometime around 1963, there was a general agreement on how characters should be encoded in English. The ASCII (American Standard Coding for
Information Interchange) character encoding shown in Table 5-1 was adopted
pretty much universally except for one company. IBM published its own standard in 1963 as well. The two encoding standards duked it out for about ten
years, but by the early 1970s when C and C++ were being created, ASCII had
just about won the battle. The char type was created with ASCII characterencoding in mind.Table 5-1 The ASCII Character SetValue Char Value Char0 NULL 64 @1 Start of Heading 65 A
2 Start of Text 66 B
3 End of Text 67 C
4 End of Transmission 68 D
5 Enquiry 69 E
Chapter 5: Character Expressions 61Value Char Value Char6 Acknowledge 70 F7 Bell 71 G
8 Backspace 72 H
9 Tab 73 I
10 Newline 74 J
11 Vertical Tab 75 K
12 New Page; Form Feed 76 L
13 Carriage Return 77 M
14 Shift Out 78 N
15 Shift In 79 O
16 Data Link Escape 80 P
17 Device Control 1 81 Q
18 Device Control 2 82 R
19 Device Control 3 83 S
20 Device Control 4 84 T
21 Negative Acknowledge 85 U
22 Synchronous Idle 86 V
23 End of Transmission 87 W
24 Cancel 88 X
25 End of Medium 89 Y
26 Substitute 90 Z
27 Escape 91 [
28 File Separator 92 \
29 Group Separator 93 ]
30 Record Separator 94 ^
31 Unit Separator 95 _
32 Space 96 `
33 ! 97 a
34 “ 98 b
35 # 99 c
36 $ 100 d
37 % 101 e(continued)62 Part II: Writing a Program: Decisions, Decisions
Table 5-1 (continued)Value Char Value Char38 & 102 f39 ‘ 103 g
40 ( 104 h
41 ) 105 i
42 * 106 j
43 + 107 k
44 , 108 l
45 = 109 m
46 . 110 n
47 / 111 o
48 0 112 p
49 1 113 q
50 2 114 r
51 3 115 s
52 4 116 t
53 5 117 u
54 6 118 v
55 7 119 w
56 8 120 x
57 9 121 y
58 : 122 z
59 ; 123 {
60 < 124 |
61 = 125 }
62 > 126 ~
63 ? 127 DELThe first thing that you’ll notice is that the first 32 characters are the“unprintable” characters. That doesn’t mean that these characters are so
naughty that the censor won’t allow them to be printed — it means that they
don’t display as a symbol when printed on the printer (or on the console
for that matter). Many of these characters are no longer used or only used
Chapter 5: Character Expressions 63in obscure ways. For example, character 25 “End of Medium” was probablyprinted as the last character before the end of a reel of magnetic tape. That
was a big deal in 1963, but today it has limited use. My favorite is character
7, the Bell — this used to ring the bell on the old teletype machines. (The
Code::Blocks C++ generates a beep when you display the bell character.)
The characters starting with 32 are all printable with the exception of the last
one, 127, which is the Delete character.Example of character encodingThe following simple program allows you to play with the ASCII character set:// CharacterEncoding - allow the user to enter a// numeric value then print that value
// out as a character
#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;
int main(int nNumberofArgs, char* pszArgs[])
{
// Prompt the user for a value
int nValue;
cout << “Enter decimal value of char to print:”;
cin >> nValue;
// Now print that value back out as a character
char cValue = (char)nValue;
cout << “The char you entered was [“ << cValue
<< “]” << endl;
// wait until user is ready before terminating program
// to allow the user to see the program results
system(“PAUSE”);
return 0;
}This program begins by prompting the user to “Enter decimal value ofa char to print”. The program then reads the value entered by the userinto the int variable nValue.The program then assigns this value to a char variable cValue.64 Part II: Writing a Program: Decisions, DecisionsThe (char) appearing in front of nValue is called a cast. In this case, itcasts the value of nValue from an int to a char. I could have performed theassignment without the cast as incValue = nValue;However, the type of the variables wouldn’t match: The value on the right ofthe assignment is an int, while the value on the left is a char. C++ will perform the assignment anyway, but it will generally complain about such conversions by generating a warning during the build step. The cast converts thevalue in nValue to a char before performing the assignment:cValue = (char)nValue; // cast nValue to a char before// assigning the value to cValueThe final line outputs the character cValue within a set of square brackets.The following shows a few sample runs of the program. In the first run, I
entered the value 65, which Table 5-1 shows as the character ‘A’:Enter decimal value of char to print:65The char you entered was [A]Press any key to continue . . .The second time I entered the value 97, which corresponds to the character ‘a’:Enter decimal value of char to print:97The char you entered was [a]Press any key to continue . . .On subsequent runs, I tried special characters:Enter decimal value of char to print:36The char you entered was [$]Press any key to continue . . .The value 7 didn’t print anything, but did cause my PC to issue a loud beepthat scared the heck out of me.
The value 10 generated the following odd output:Enter decimal value of char to print:10The char you entered was []
Press any key to continue . . .Referring to Table 5-1, you can see that 10 is the newline character. This character doesn’t actually print anything but causes subsequent output to startChapter 5: Character Expressions 65at the beginning of the next line, which is exactly what happened in this case:The closed brace appears by itself at the beginning of the next line when following a newline character.
The endl that appears at the end of many of the output commands thatyou’ve seen so far generates a newline. It also does a few other things, which
you’ll see in Chapter 31.Encoding Strings of CharactersTheoretically, you could print anything you want using individual characters. However, that could get really tedious as the following code snippetdemonstrates:cout << ‘E’ << ‘n’ << ‘t’ << ‘e’ << ‘r’ << ‘ ‘<< ‘d’ << ‘e’ << ‘c’ << ‘i’ << ‘m’ << ‘a’
<< ‘l’ << ‘ ‘ << ‘v’ << ‘a’ << ‘l’ << ‘u’
<< ‘e’ << ‘ ‘ << ‘o’ << ‘f’ << ‘ ‘ << ‘c’
<< ‘h’ << ‘a’ << ‘r’ << ‘ ‘ << ‘t’ << ‘o’
<< ‘ ‘ << ‘p’ << ‘r’ << ‘i’ << ‘n’ << ‘t’
<< ‘:’;C++ allows you to encode a sequence of characters by enclosing the string indouble quotes:cout << “Enter decimal value of char to print:”;I’ll have a lot more to say about character strings in Chapter 16.Special Character ConstantsYou can code a normal, printable character by placing it in single quotes:char cSpace = ‘ ‘;You can code any character you want, whether printable or not, by placingits octal value after a backslash:char cSpace = ‘\040’;A constant appearing with a leading zero is assumed to be octal, also knownas base 8.
66 Part II: Writing a Program: Decisions, DecisionsYou can code characters in base 16, hexadecimal, by preceding the numberwith a backslash followed by a small x as in the following example:char cSpace = ‘\x20’;The decimal value 32 is equal to 40 in base 8 and 20 in base 16. Don’t worry ifyou don’t feel comfortable with octal or hexadecimal. C++ provides shortcuts
for the most common characters.
C++ provides a name for some of the unprintable characters that are particularly useful. Some of the more common ones are shown in Table 5-2.Table 5-2 Some of the Special C++ CharactersChar Special Symbol Char Special Symbol‘ \’ Newline \n“ \” Carriage Return \r
\ \\ Tab \t
NULL \0 Bell \aThe most common is the newline character, which is nicknamed ‘\n’. In addition, you must use the backslash if you want to print the single quote character:char cQuote = ‘\’’;Since C++ normally interprets a single quote mark as enclosing a character,you have to precede a single quote mark with a backslash character to tell it,
“Hey, this single quote is not enclosing a character, this is the character.”
In addition, the character ‘\\’ is a single backslash.
This leads to one of the more unfortunate coincidences in C++. In Windows,
the backslash is used in filenames as in the following:C:\\Base Directory\Subdirectory\File NameThis is encoded in C++ with each backslash replaced by a pair of backslashesas follows:“C:\\\\Base Directory\\Subdirectory\\File Name”Chapter 5: Character Expressions 67Wide load aheadBy the early 1970s when C and C++ wereinvented, the 128-character ASCII character
set had pretty much beat out all rivals. So it
was logical that the char type was definedto accommodate the ASCII character set.
This character set was fine for English but
became overly restrictive when programmers
tried to write applications for other European
languages.
Fortunately, C and C++ had provided enough
room in the char for 256 different characters.Standards committees got busy and used the
characters between 128 and 255 for characters that occur in European languages but not
English, such as umlauts and accented characters. You can see the results of their handy work
using the example CharacterEncodingprogram from this chapter: Enter 142 and theprogram prints out an Ä.
No matter what you do, the char variable isjust not large enough to handle all of the many
different alphabets, such as Cyrillic, Hebrew,
Arabic, and Korean — not to mention the many
thousands of Chinese kanji symbols. Something
had to give.
C++ responded first by introducing the “wide
character” of type wchar_t. This wasintended to implement whatever wide character set that is native to the host operating
system. On Windows, that would be the variant of Unicode known as UTF-2 or UTF-16.
(Here the 2 stands for two bytes, the size of
each wide character, whereas the 16 stands
for 16 bits.) However, Macintosh’s OS X uses
a different variant of Unicode known as UTF-8.
Unicode can display not only every alphabet on
the planet but also the kanjis used in Chinese
and Japanese. The 2009 update to the C++
standard added two further types, char16_tand char32_t, which implement specificallyUTF-16 and UTF-32.
For almost every feature that I describe in this
book for handling character variables, there
is an equivalent feature for the wide character types; programming Unicode, however, is
beyond the scope of a beginning text.
Subscribe to:
Post Comments (Atom)
fuctional c++, arduino,etc
Chapter 11 Functions, I Declare! In This Chapter ▶ Breaking programs down into functions ▶ Writing and using functions ▶ Returning values fr...
-
C/C++ for Visual Studio Code (Preview) C/C++ support for Visual Studio Code is provided by a Microsoft C/C++ extension to enable cr...
-
udah lama ngga nge-blog nih.. jadi kangen posting sesuatu buat kalian :) adakah yang jam 00:36 WIB ini belum tidur? oke, kalo ada yang ...
No comments:
Post a Comment