-
Notifications
You must be signed in to change notification settings - Fork 0
Intro to Characters and Strings
Up to now, we've been mostly dealing with the int
type...great for numbers. What if we want to do text? We need a type that can represent a character, and that's char
. Here's a sample declaration and usage...much like our previous int
example:
char my_char;
char big_g = 'G';
my_char = big_g;
Serial.print("My char is ");
Serial.println(my_char);
Note that we use single quotes to specify which character we're using.
Characters are one byte long (8 bits). How does the machine, with 8 ones and zeros, know that we want a "capital G" above? Think about your keyboard...there's 26 letters, which can be upper or lower case. Then we've got the digits 0 through 9. Then there's a bunch of special characters...exclamation points, periods, commas, semi-colons, etc. We use a code to map each of these characters to a number...for example, the number 71 maps to our capital G. This mapping is called the Ascii table.
Ascii stands for "American Standard Code for Information Interchange". You can find this table in many computer science textbooks and throughout the internet...here's a link: http://www.asciitable.com
Remember that numbers in our computer can be thought of in decimal, hex, or binary. For our capital G example, the number we're using to represent that number is 71...which is 0x47 in hex, or, in binary 0b01000111. So, if the machine has a character defined, and it's bits are 01000111
, it knows that represents a capital G.
With that knowlege, you, as the programmer have many different ways you can specify that capital g. You can use:
my_char = 'G'; // The single quote way
my_char = 71; // You can use the number in decimal
my_char = 0x47 // ...or in hex
my_char = 0b01000111 // ...or in binary
All of these represent a capital G.
In our example above, Serial.println(my_char)
knows that my_char is a character...and so it's going to default to printing out th e actual character specified (a capitol G in this case). But what if we really wanted to print 71? C gives us a mechanism called "casting" to convert one type to another. Casting looks like this:
int char_as_number;
char my_char = 'G';
char_as_number = (int) my_char;
To cast, you put the desired type in front of the variable or value you want to "reinterpret". There's no conversion going on here...in this example, the machine is looking at the bits representing my_char (01000111
) and then interpreting them as an integer rather than a character...the number 71 in this case.
At this point, you are probably asking yourself "Okay, what about going the other way? What if I want to convert the number 7 to the character '7'? Hold on to that question...we'll get to it below, but we've got a couple steps we need to hit first.
Doing one character is nifty, but we need more than one if we want to really communicate. This is where c-strings come in.
One quick aside: There are many ways that various languages and libraries inside those languages represent strings. Because this varies greatly, I'm going to focus only on what I call "c-strings"...the basic C definition of what a string is. Those aforementioned libraries give lots of useful helper functions for "strings in general", but because they aren't ubiquitous, we'll focus only on building the basics here...when I call something a "string", I mean a "c-string".
A c-string is an array of characters. We can define one like so:
char my_string[]="Hello world";
Serial.println(my_string);
Note that our definition is an array (the square brackets), and we're letting the compiler figure out how much space to use for our array of characters (no number inside the square brackets).
Also note that we're using double-quotes here to show that it's an array of characters, rather than the single quote for a single character. Think: single quote, single character.
Because it's an array, we can tweak it just like we would a normal array:
char my_string[]="Hello world";
Serial.println(my_string);
my_string[0]='J';
Serial.println(my_string);
One final note: you've used strings before this...consider:
Serial.print("my number is ");
You are passing an array of characters to Serial.print. Nifty, huh?
So how do you know when your string is done? C defines a special character than means "nothing"...it's the NULL character, and we give it the value of 0. (meaning when we look at the byte representing the character, it's all zeros or 00000000
) C helps us out by automatically putting a null after the end of any string we define with double-quotes, but if we're making a string by hand, we have to do it ourselves. Here's an example:
char my_char='A';
int i;
char my_string[27];
// This loop populates our string with capital-A to capital-Z
for (i = 0; i<26; i++)
{
my_string[i] = my_char;
my_char++;
}
// null terminate the string...i should now be 26.
my_string[i] = NULL;
Serial.println(my_string);
Under the hood of Serial.print, it's walking through the array we passed in (my_string above), and going until it sees this null character.
The cool thing about using NULL as the end of our string is how we use it in conditionals...since c declares anything 0 as false and anything non-zero as true, we can do things like this:
while (current_char != NULL)
{
// do stuff
...
}
If you are feeling confused, skip over this next bit to the "Digits and Numbers" section, but here's a different way of doing that loop above:
int i;
char my_string[27];
for (i = 0; i < 26; i++)
{
my_string[i] = (char) i + 'A';
}
my_string[i] = NULL;
Serial.println(my_string);
The middle line is the tricky bit. We're taking advantage of the fact that the ascii table lays out letters sequentially...for example, 'A' is 65, 'B' is 66, etc. Then, because we don't want to remember that 65 is 'A', we can refer to that character as just 'A'...and add our i
value to it to make it walk up from A
to Z
.
Let's talk about the digits 0
through 9
. One common confusion is that the digit 0
in the ascii table has value 48...and you may ask yourself, "Wait, why not make the digit zero have value 0 and the digit one have value 1 and so on???" The reason is because we want to use that NULL indicator to help us with loops...so we need NULL to be zero.
(This is where I usually say "If you don't like it, go write your own programming language")
Note that Serial.print helps us here...if it sees a type that's an integer, it does the conversion for us...like in the following:
int my_number = 42;
Serial.print("My number is ");
Serial.println(my_number);
The first print takes a string...so Serial.print deals with those as an array of characters. The second print takes an integer...under the hood, Serial.print converts that number to a two digit string: "42".
There are going to be cases where we need to do this ourselves, so let's look at how that happens...first with a single digit.
char my_digit = '5';
int my_number;
my_number = my_digit - '0';
So what's happening here? Let's go line by line:
- First, we set my_digit to the character '5'. Looking at the ascii table, we know that byte has the value of 53 (decimal).
- We'll be subtracting the digit '0'...which, looking back at the ascii table, has the value of 48 (decimal).
- my_number therefore gets 53 minus 48, which is.......yup, 5.
We're taking advantage of the fact that the ascii table lays out it's digits in sequential order...0 is 48, 1 is 49, etc.
We can go the other way as well:
int my_number = 1;
char my_digit;
my_digit = my_number + '0';
Homework assignment: Using some sort of looping construct, make a function that takes a three digit string (number), and returns the appropriate integer value. Here's the prototype:
int convert_3digit_to_int( char my_number[] )
{
// your code goes here
}
I'm going to call it inside of setup with something like this:
void setup()
{
char test_string = "314";
int answer;
answer = convert_3digit_to_int(test_string);
Serial.println(answer);
}
Hint: You'll need to multiply the first digit by 100, second by 10, and then add all three together to get the answer...
We talked about NULL; there are other special characters in the ascii table. One common one is the "newline" character:
'\n'
Note that to represent this character, we start with a backslash...this means "escape to a different code". If we look at that ascii table, we see newline at decimal 10.
You can use this \n
in both character and string notation. Try it like this:
Serial.print("Newline test\nLine2\nLine3\n");
Notice something cool? Serial.println is identical to Serial.print, but it appends a newline character to the end for us.