Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds array-from string splitting tip. #320

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions content/posts/2021-02-06.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
{
"datetime": "2021-02-06T16:55:46Z",
"updatedAt": null,
"draft": false,
"title": "Tip: get an array of characters from a string",
"description": "Splitting strings into an array of individual characters is a common source of bugs in JavaScript. I'll show you three ways of doing it reliably.",
"tags": [
"JavaScript"
]
}
---
Splitting strings into an array of individual characters is a common source of
bugs in JavaScript. I'll show you three ways of doing it reliably.

When asked to split a string in JavaScript, many programmers will try to split
the string using an empty string:

```javascript
// DON'T DO THIS.
const string = 'Hello, world!';
const characters = string.split('');
```

The `characters` variable appears to be what we wanted, i.e. an array populated
with length one strings, each with a character of the original strings.

So what's the problem with this? To illustrate the problem, let's try the same
string with an emoji in it:

```javascript
// DON'T DO THIS.
const string = 'Hello, world! 👋';
const characters = string.split('');
```

The beginning of the array looks fine, but where the emoji should be there are
instead two weird characters. This is also true of lots of runes, not just some
emoji! The encoding of unicode characters and their interaction of JavaScript is
beyond the scope of this article (I'm only concerned with the solution) but the
gist of it is that JavaScript internally represents strings in UTF-16, but
unicode characters cannot in general be represented as a single UTF-16
character, so some spill over into two.

This problem tends to escape notice until it's too late. This happens because
basic latin characters are represented as a single UTF-16 character, and unit
tests tend to be written with strings using the same basic character set.

The same problem happens when you loop over strings by index:

```javascript
// DON'T DO THIS.
const characters = [];

for (let i = 0; i < s.length; i++) {
characters.push(s[i]);
}
```

So what's the solution?

## Solution 1

ES2015 introduced _iterables_ to JavaScript. Strings are such a type. One way in
which this helps is that the `Array` constructor can consume them to build an
array.

```javascript
const string = 'Hello, world! 👋';
const characters = Array.from(string); // This works!
```

This works out of the box for ES2015 or above compliant browsers. The one
browser this tends to be an issue for at the time of writing is IE11. If you
need to support IE11, then you can polyfill `Array.from` or you can compile it
your JavaScript to ES5.

## Solution 2

ES2015 also gave us some special syntax called _spread_ for this. You can spread
an iterable into a new array.

```javascript
const string = 'Hello, world! 👋';
const characters = [...string]; // This works!
```

Unlike solution 1, solution 2 will require compilation to ES5 to work in older
browsers like IE11. I prefer solution 1 to solution 2 because solution 1
documents itself, whereas solution 2 can be cryptic if you're not familiar with
the spread operator.

## Solution 3

If you want to loop over strings by character, the `for-of` loop was created to
work with iterables.

```javascript
const string = 'Hello, world! 👋';
const characters = [];

for (const character of string) {
characters.push(character); // This works!
}
```

Like the spread operator, `for-of` loops use new syntax. If you need to support
old browsers which don't support ES2015, then you'll need to compile your
JavaScript to ES5.