Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add byteLength method and hasState property #258

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Node.js stuff
node_modules
package-lock.json

# Editors
*~
*sublime-*
/.idea

# Development environment
/coverage
/benchmarks/node_envs
/generation/source-data

# Temporarily excluded
/generation/research
29 changes: 29 additions & 0 deletions .eslintrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
env:
commonjs: true
es6: true
node: true
mocha: true
extends: "eslint:recommended"
parserOptions:
ecmaVersion: 6
plugins:
- es
rules:
strict: "error"
no-console: "error"
no-template-curly-in-string: "error"
consistent-return: "error"
eqeqeq: ["error", "smart"]
no-throw-literal: "error"
no-eval: "error"
no-implied-eval: "error"
# no-use-before-define: "error" # TODO
# no-var: "error" # TODO
block-scoped-var: "error"
prefer-const: "error"
yoda: ["error", "never", { "exceptRange": true }]

# 'es' pluin rules. See https://mysticatea.github.io/eslint-plugin-es/rules/
# Disallow the ones that are not supported by Node 4.5
es/no-destructuring: "error"
es/no-default-parameters: "error"
12 changes: 6 additions & 6 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

version: 2
updates:
- package-ecosystem: "npm"
directory: "/"
schedule:
interval: "daily"
allow:
- dependency-type: production
- package-ecosystem: "npm"
directory: "/"
schedule:
interval: "daily"
allow:
- dependency-type: production
10 changes: 10 additions & 0 deletions .lintstagedrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
const { CLIEngine } = require("eslint");
const cli = new CLIEngine({});

// This hack is recommended by lint-staged authors:
// https://github.com/okonet/lint-staged#how-can-i-ignore-files-from-eslintignore-
module.exports = {
"*.js": (files) =>
"eslint --max-warnings=0 " + files.filter((file) => !cli.isPathIgnored(file)).join(" "),
"*.{js,json,yml,md,ts}": "prettier --write",
};
21 changes: 21 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Node.js stuff
node_modules
package-lock.json

# Editors
*~
*sublime-*
/.idea

# Development environment
/coverage
/benchmarks/node_envs
/generation/source-data

# Generated data
/encodings/tables
/encodings/sbcs-data-generated.js
Changelog.md

# Temporarily excluded
/generation/research
3 changes: 3 additions & 0 deletions .prettierrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
tabWidth: 4
useTabs: false
printWidth: 100
42 changes: 27 additions & 15 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,31 @@
language: node_js
node_js:
- "0.10"
- "0.11"
- "0.12"
- "iojs"
- "4"
- "6"
- "8"
- "10"
- "12"
- "node"
- "4.5.0" # Oldest supported version
- "5.10.0" # Oldest supported version from version 5.x
- "4"
- "6"
- "8"
- "10"
- "12"
- "node"

# Only install test-related modules on older Node versions
install: npm run-script test-install

jobs:
include:
- name: webpack
node_js: "12"
install: cd test/webpack; npm install
script: npm test
include:
- name: webpack
node_js: "12"
install: cd test/webpack; npm install
script: npm test

- name: node-web-backend
node_js: "12"
script: npm run-script test-node-web

- name: linters
node_js: "12"
install: npm install
script:
- eslint --max-warnings 0 .
- prettier --check .
119 changes: 73 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
## iconv-lite: Pure JS character encoding conversion

* No need for native code compilation. Quick to install, works on Windows and in sandboxed environments like [Cloud9](http://c9.io).
* Used in popular projects like [Express.js (body_parser)](https://github.com/expressjs/body-parser),
[Grunt](http://gruntjs.com/), [Nodemailer](http://www.nodemailer.com/), [Yeoman](http://yeoman.io/) and others.
* Faster than [node-iconv](https://github.com/bnoordhuis/node-iconv) (see below for performance comparison).
* Intuitive encode/decode API, including Streaming support.
* In-browser usage via [browserify](https://github.com/substack/node-browserify) or [webpack](https://webpack.js.org/) (~180kb gzip compressed with Buffer shim included).
* Typescript [type definition file](https://github.com/ashtuchkin/iconv-lite/blob/master/lib/index.d.ts) included.
* React Native is supported (need to install `stream` module to enable Streaming API).
* License: MIT.
- No need for native code compilation. Quick to install, works on Windows and in sandboxed environments like [Cloud9](http://c9.io).
- Used in popular projects like [Express.js (body_parser)](https://github.com/expressjs/body-parser),
[Grunt](http://gruntjs.com/), [Nodemailer](http://www.nodemailer.com/), [Yeoman](http://yeoman.io/) and others.
- Faster than [node-iconv](https://github.com/bnoordhuis/node-iconv) (see below for performance comparison).
- Intuitive encode/decode API, including Streaming support.
- In-browser usage via [browserify](https://github.com/substack/node-browserify) or [webpack](https://webpack.js.org/) (~180kb gzip compressed with Buffer shim included).
- Typescript [type definition file](https://github.com/ashtuchkin/iconv-lite/blob/master/lib/index.d.ts) included.
- React Native is supported (need to install `stream` module to enable Streaming API).
- License: MIT.

[![NPM Stats](https://nodei.co/npm/iconv-lite.png)](https://npmjs.org/package/iconv-lite/)
[![Build Status](https://travis-ci.org/ashtuchkin/iconv-lite.svg?branch=master)](https://travis-ci.org/ashtuchkin/iconv-lite)
Expand All @@ -17,67 +17,92 @@
[![npm bundle size](https://img.shields.io/bundlephobia/min/iconv-lite.svg)](https://npmjs.org/package/iconv-lite/)

## Usage

### Basic API

```javascript
var iconv = require('iconv-lite');
var iconv = require("iconv-lite");

// Convert from an encoded buffer to a js string.
str = iconv.decode(Buffer.from([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251');
str = iconv.decode(Buffer.from([0x68, 0x65, 0x6c, 0x6c, 0x6f]), "win1251");

// Convert from a js string to an encoded buffer.
buf = iconv.encode("Sample input string", 'win1251');
buf = iconv.encode("Sample input string", "win1251");

// Check if encoding is supported
iconv.encodingExists("us-ascii")
iconv.encodingExists("us-ascii");

// Calculate the actual length in bytes.
len = iconv.byteLength("Hello, world! 😀", "utf16be");

// Get a decoder and decode two different buffers into a single string, the decoder keeps state between buffers
var utf8Decoder = iconv.getDecoder("utf8");
var bytes1 = Buffer.from([0x20, 0x23, 0xe2]); // space, # and part of ☣
var bytes2 = Buffer.from([0x98, 0xa3]); // the rest of ☣
var str = utf8Decoder.write(bytes1);
// You can check if the decoder has state currently
var hasState = utf8Decoder.hasState; // true;
str += utf8Decoder.write(bytes2);
var hasState = utf8Decoder.hasState; // false;

// The same for encoder, you rarely need to care about the encoder's state, except for some special encoders and surrogate pair
var utf8Encoder = iconv.getEncoder("utf8");
var bytes = utf8Encoder.write("Hi \uD83D");
var hasState = utf8Encoder.hasState; // true
bytes = bytes.concat([utf8Encoder.write("\uDE00")]);
hasState = utf8Encoder.hasState; // false

// Use the "end" method to get the remaining data in encoder/decoder's state and clear the state
var bytes = encoder.end();
var str = decoder.end();
```

### Streaming API
```javascript

```javascript
// Decode stream (from binary data stream to js strings)
http.createServer(function(req, res) {
var converterStream = iconv.decodeStream('win1251');
http.createServer(function (req, res) {
var converterStream = iconv.decodeStream("win1251");
req.pipe(converterStream);

converterStream.on('data', function(str) {
converterStream.on("data", function (str) {
console.log(str); // Do something with decoded strings, chunk-by-chunk.
});
});

// Convert encoding streaming example
fs.createReadStream('file-in-win1251.txt')
.pipe(iconv.decodeStream('win1251'))
.pipe(iconv.encodeStream('ucs2'))
.pipe(fs.createWriteStream('file-in-ucs2.txt'));
fs.createReadStream("file-in-win1251.txt")
.pipe(iconv.decodeStream("win1251"))
.pipe(iconv.encodeStream("ucs2"))
.pipe(fs.createWriteStream("file-in-ucs2.txt"));

// Sugar: all encode/decode streams have .collect(cb) method to accumulate data.
http.createServer(function(req, res) {
req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) {
assert(typeof body == 'string');
http.createServer(function (req, res) {
req.pipe(iconv.decodeStream("win1251")).collect(function (err, body) {
assert(typeof body == "string");
console.log(body); // full request body string
});
});
```

## Supported encodings

* All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
* Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap, utf32, utf32-le, and utf32-be.
* All widespread singlebyte encodings: Windows 125x family, ISO-8859 family,
IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library.
- All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
- Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap, utf32, utf32-le, and utf32-be.
- All widespread singlebyte encodings: Windows 125x family, ISO-8859 family,
IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library.
Aliases like 'latin1', 'us-ascii' also supported.
* All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2312, GBK, GB18030, Big5, Shift_JIS, EUC-JP.
- All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2312, GBK, GB18030, Big5, Shift_JIS, EUC-JP.

See [all supported encodings on wiki](https://github.com/ashtuchkin/iconv-lite/wiki/Supported-Encodings).

Most singlebyte encodings are generated automatically from [node-iconv](https://github.com/bnoordhuis/node-iconv). Thank you Ben Noordhuis and libiconv authors!

Multibyte encodings are generated from [Unicode.org mappings](http://www.unicode.org/Public/MAPPINGS/) and [WHATWG Encoding Standard mappings](http://encoding.spec.whatwg.org/). Thank you, respective authors!


## Encoding/decoding speed

Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0).
Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0).
Note: your results may vary, so please always check on your hardware.

operation [email protected] [email protected]
Expand All @@ -87,31 +112,33 @@ Note: your results may vary, so please always check on your hardware.

## BOM handling

* Decoding: BOM is stripped by default, unless overridden by passing `stripBOM: false` in options
(f.ex. `iconv.decode(buf, enc, {stripBOM: false})`).
A callback might also be given as a `stripBOM` parameter - it'll be called if BOM character was actually found.
* If you want to detect UTF-8 BOM when decoding other encodings, use [node-autodetect-decoder-stream](https://github.com/danielgindi/node-autodetect-decoder-stream) module.
* Encoding: No BOM added, unless overridden by `addBOM: true` option.
- Decoding: BOM is stripped by default, unless overridden by passing `stripBOM: false` in options
(f.ex. `iconv.decode(buf, enc, {stripBOM: false})`).
A callback might also be given as a `stripBOM` parameter - it'll be called if BOM character was actually found.
- If you want to detect UTF-8 BOM when decoding other encodings, use [node-autodetect-decoder-stream](https://github.com/danielgindi/node-autodetect-decoder-stream) module.
- Encoding: No BOM added, unless overridden by `addBOM: true` option.

## UTF-16 Encodings

This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be
smart about endianness in the following ways:
* Decoding: uses BOM and 'spaces heuristic' to determine input endianness. Default is UTF-16LE, but can be
overridden with `defaultEncoding: 'utf-16be'` option. Strips BOM unless `stripBOM: false`.
* Encoding: uses UTF-16LE and writes BOM by default. Use `addBOM: false` to override.

- Decoding: uses BOM and 'spaces heuristic' to determine input endianness. Default is UTF-16LE, but can be
overridden with `defaultEncoding: 'utf-16be'` option. Strips BOM unless `stripBOM: false`.
- Encoding: uses UTF-16LE and writes BOM by default. Use `addBOM: false` to override.

## UTF-32 Encodings

This library supports UTF-32LE, UTF-32BE and UTF-32 encodings. Like the UTF-16 encoding above, UTF-32 defaults to UTF-32LE, but uses BOM and 'spaces heuristics' to determine input endianness.
* The default of UTF-32LE can be overridden with the `defaultEncoding: 'utf-32be'` option. Strips BOM unless `stripBOM: false`.
* Encoding: uses UTF-32LE and writes BOM by default. Use `addBOM: false` to override. (`defaultEncoding: 'utf-32be'` can also be used here to change encoding.)
This library supports UTF-32LE, UTF-32BE and UTF-32 encodings. Like the UTF-16 encoding above, UTF-32 defaults to UTF-32LE, but uses BOM and 'spaces heuristics' to determine input endianness.

- The default of UTF-32LE can be overridden with the `defaultEncoding: 'utf-32be'` option. Strips BOM unless `stripBOM: false`.
- Encoding: uses UTF-32LE and writes BOM by default. Use `addBOM: false` to override. (`defaultEncoding: 'utf-32be'` can also be used here to change encoding.)

## Other notes

When decoding, be sure to supply a Buffer to decode() method, otherwise [bad things usually happen](https://github.com/ashtuchkin/iconv-lite/wiki/Use-Buffers-when-decoding).
Untranslatable characters are set to � or ?. No transliteration is currently supported.
Node versions 0.10.31 and 0.11.13 are buggy, don't use them (see #65, #77).
- When decoding, be sure to supply a Buffer to decode() method, otherwise [bad things usually happen](https://github.com/ashtuchkin/iconv-lite/wiki/Use-Buffers-when-decoding).
- Untranslatable characters are set to � or ?. No transliteration is currently supported.
- Node versions 0.10.31 and 0.11.13 are buggy, don't use them (see #65, #77).

## Testing

Expand All @@ -120,7 +147,7 @@ $ git clone [email protected]:ashtuchkin/iconv-lite.git
$ cd iconv-lite
$ npm install
$ npm test

$ # To view performance:
$ node test/performance.js

Expand Down
Loading