-
Notifications
You must be signed in to change notification settings - Fork 284
Use Buffers when decoding
Decoding a string is probably the most common mistake when working with legacy encoded resources. Why? Lets see.
This is wrong:
var http = require('http'),
iconv = require('iconv-lite');
http.get("http://website.com/", function(res) {
var body = '';
res.on('data', function(chunk) {
body += chunk;
});
res.on('end', function() {
var decodedBody = iconv.decode(body, 'win1252');
console.log(decodedBody);
});
});
Before being decoded with iconv.decode
function, the original resource was (unintentionally) already decoded in body += chunk
via javascript type conversion. What really happens here is:
res.on('data', function(chunkBuffer) {
body += chunkBuffer.toString('utf8');
});
The same conversion is done behind the scenes if you call res.setEncoding('utf8');
.
Not only the double-decoding will lead to wrong results, it is also nearly impossible to restore original bytes (utf8 conversion is lossy), so even iconv.decode(new Buffer(body, 'utf8'), 'win1252')
will not help.
Keep original Buffer
-s and provide them to iconv.decode
. Use Buffer.concat()
if needed.
In general, keep in mind that all javascript strings are already decoded and should not be decoded again.
http.get("http://website.com/", function(res) {
var chunks = [];
res.on('data', function(chunk) {
chunks.push(chunk);
});
res.on('end', function() {
var decodedBody = iconv.decode(Buffer.concat(chunks), 'win1252');
console.log(decodedBody);
});
});
// Or, with [email protected] and Node v0.10+, you can use streaming support with `collect` helper
http.get("http://website.com/", function(res) {
res.pipe(iconv.decodeStream('win1252')).collect(function(err, decodedBody) {
console.log(decodedBody);
});
});