-
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds array-from string splitting tip.
- Loading branch information
Showing
1 changed file
with
109 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
--- | ||
{ | ||
"datetime": "2021-02-06T16:55:46Z", | ||
"updatedAt": null, | ||
"draft": false, | ||
"title": "Tip: get an array of characters from a string", | ||
"description": "Splitting strings into an array of individual characters is a common source of bugs in JavaScript. I'll show you three ways of doing it reliably.", | ||
"tags": [ | ||
"JavaScript" | ||
] | ||
} | ||
--- | ||
Splitting strings into an array of individual characters is a common source of | ||
bugs in JavaScript. I'll show you three ways of doing it reliably. | ||
|
||
When asked to split a string in JavaScript, many programmers will try to split | ||
the string using an empty string: | ||
|
||
```javascript | ||
// DON'T DO THIS. | ||
const string = 'Hello, world!'; | ||
const characters = string.split(''); | ||
``` | ||
|
||
The `characters` variable appears to be what we wanted, i.e. an array populated | ||
with length one strings, each with a character of the original strings. | ||
|
||
So what's the problem with this? To illustrate the problem, let's try the same | ||
string with an emoji in it: | ||
|
||
```javascript | ||
// DON'T DO THIS. | ||
const string = 'Hello, world! π'; | ||
const characters = string.split(''); | ||
``` | ||
|
||
The beginning of the array looks fine, but where the emoji should be there are | ||
instead two weird characters. This is also true of lots of runes, not just some | ||
emoji! The encoding of unicode characters and their interaction of JavaScript is | ||
beyond the scope of this article (I'm only concerned with the solution) but the | ||
gist of it is that JavaScript internally represents strings in UTF-16, but | ||
unicode characters cannot in general be represented as a single UTF-16 | ||
character, so some spill over into two. | ||
|
||
This problem tends to escape notice until it's too late. This happens because | ||
basic latin characters are represented as a single UTF-16 character, and unit | ||
tests tend to be written with strings using the same basic character set. | ||
|
||
The same problem happens when you loop over strings by index: | ||
|
||
```javascript | ||
// DON'T DO THIS. | ||
const characters = []; | ||
|
||
for (let i = 0; i < s.length; i++) { | ||
characters.push(s[i]); | ||
} | ||
``` | ||
|
||
So what's the solution? | ||
|
||
## Solution 1 | ||
|
||
ES2015 introduced _iterables_ to JavaScript. Strings are such a type. One way in | ||
which this helps is that the `Array` constructor can consume them to build an | ||
array. | ||
|
||
```javascript | ||
const string = 'Hello, world! π'; | ||
const characters = Array.from(string); // This works! | ||
``` | ||
|
||
This works out of the box for ES2015 or above compliant browsers. The one | ||
browser this tends to be an issue for at the time of writing is IE11. If you | ||
need to support IE11, then you can polyfill `Array.from` or you can compile it | ||
your JavaScript to ES5. | ||
|
||
## Solution 2 | ||
|
||
ES2015 also gave us some special syntax called _spread_ for this. You can spread | ||
an iterable into a new array. | ||
|
||
```javascript | ||
const string = 'Hello, world! π'; | ||
const characters = [...string]; // This works! | ||
``` | ||
|
||
Unlike solution 1, solution 2 will require compilation to ES5 to work in older | ||
browsers like IE11. I prefer solution 1 to solution 2 because solution 1 | ||
documents itself, whereas solution 2 can be cryptic if you're not familiar with | ||
the spread operator. | ||
|
||
## Solution 3 | ||
|
||
If you want to loop over strings by character, the `for-of` loop was created to | ||
work with iterables. | ||
|
||
```javascript | ||
const string = 'Hello, world! π'; | ||
const characters = []; | ||
|
||
for (const character of string) { | ||
characters.push(character); // This works! | ||
} | ||
``` | ||
|
||
Like the spread operator, `for-of` loops use new syntax. If you need to support | ||
old browsers which don't support ES2015, then you'll need to compile your | ||
JavaScript to ES5. |