Skip to content

Modifying encoding tables

Alexander Shtuchkin edited this page Dec 9, 2019 · 2 revisions

Sometimes you need to adjust existing encoding tables. Here's a small overview of how to do that (thanks @btsimonh in #226 for writing it down).

In order to add a DBCS table based on another, you need to do a few things:

  • 1 you need to call iconv.getCodec(); so that iconv.encodings exists.
  • 2 create a table (or extra parts you want to add to a table).
  • 3 create a new encoding definition (like in dbcs-data.js). Note now I based it on a previous table (cp950) without having to directly require the relevant table file - requiring was difficult because of paths.
  • 4 Add the new definition directly to iconv.encodings.
  • 5 use your sparkly new table :).

Example code snippet:

var iconv = require('iconv-lite');

var private = [
    ["fa40","\ue000", 62],
    ["faa1","\ue03f", 93],
    ["fb40","\ue09d", 62],
    ["fba1","\ue0dc", 93],
    ["fc40","\ue13a", 62],
    ["fca1","\ue179", 93],
    ["fd40","\ue1d7", 62],
    ["fda1","\ue216", 93],
    ["fe40","\ue274", 62],
    ["fea1","\ue2b3", 93],
];

try {
    iconv.getCodec(); // if you get ANY named table here, then you won't except.
} catch(e) {
    // ignore
    console.log('ignored:', e);
}
var big5pua = {
    type: '_dbcs',
    table: function() {
        var tab = iconv.encodings['cp950'].table();  
        return tab.concat(private);
    },
    encodeSkipVals: [0xa2cc, 0xa2ce],
};

iconv.encodings['big5pua'] = big5pua;

// test our two duplicate characters and the first PUA character
const buf = Buffer.from('fa4020fa7efaa120fafefb4020fb7efba120fbfe20fefe20a2cca451a2cea4ca', 'hex');
const str = iconv.decode(buf, 'big5pua');
const buf2 = iconv.encode(str, 'big5pua');
console.log('src:',buf);
console.log('string:['+str+']');
var be = Buffer.from(str, 'utf16le').swap16();
console.log('string in utf16be:', be);
console.log('back to big5:',buf2);
Clone this wiki locally