Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIN-1255 Encoding for Type 2 Fields #32

Closed
Doreen-Schwartz opened this issue Aug 6, 2024 · 7 comments · Fixed by #33
Closed

WIN-1255 Encoding for Type 2 Fields #32

Doreen-Schwartz opened this issue Aug 6, 2024 · 7 comments · Fixed by #33
Assignees
Labels
enhancement New feature or request

Comments

@Doreen-Schwartz
Copy link
Contributor

Doreen-Schwartz commented Aug 6, 2024

Hello, I understood that currently, the only supported encoding for fields that are not Type 1 is UTF-8. I was wondering if there are any intentions to add support for additional encoding?

If not, I would love to try implementing this myself. Would appreciate getting pointers to how to implement this- which parts of the encoding process would require adjusting to enable this functionality?

From what I've gathered, the default Node Buffer currently used to construct the NIST file only supports very specific types of encoding. Looking at possible alternatives, iconv-lite seems like a decent package to utilize for this.

@JonathanZahavii

@ivosh ivosh self-assigned this Aug 6, 2024
@ivosh
Copy link
Owner

ivosh commented Aug 6, 2024

Hi @Doreen-Schwartz, this is an interesting subject.
Honestly, throughout various law-enforcement agencies across the world I have not encountered such a requirement.
Nevertheless, adding a conversion process into the library would be fine with me.

A question: how the charset encoding is conveyed between the NIST file producer and the NIST file consumer? Is there a special field in the NIST file? Or just by convention? That would somehow influence the futher discussion.

I don't want to tie node-nist to any specific conversion library (such as iconv-lite, node-iconv...).
I am thinking to add a new property (callback function) to NistCodecOptions interface and then use that instead of Buffer.toString and Buffer.write.
If you are willing to prepare a PR, please do it in several steps:

  1. Propose the updated interface and example usage
  2. Once the interface is discussed/reviewed/agreed, flesh in the implementation, unit tests and documentation.

@ivosh ivosh added the enhancement New feature or request label Aug 7, 2024
@Doreen-Schwartz
Copy link
Contributor Author

As far as I have seen, there is no specific field that the consumer takes into consideration. We have an existing producer that creates NIST files which the consumer does successfully read, and I didn't notice any fields that provided this information- not in the file or the documentation for the consumer. So it is just a convention for this specific consumer.

Just to give some context, when I tried uploading a NIST produced using the node-nist library to our consumer, any Hebrew characters (which is the main reason non-ASCII characters are needed) would appear as gibberish. Speaking to someone from the team that worked on the consumer, they said that it only accepts WIN-1255 encoding for Hebrew characters.

As for the potential changes, I was thinking I could add callbacks for the NistFieldEncodeOptions and NistFieldDecodeOptions interfaces, one for writing fields into the buffer and one for decoding fields from the buffer.

Something like this, maybe:

/** Encoding options for a single NIST Field. */
interface NistFieldEncodeOptions extends NistFieldCodecOptions {
  formatter?: (field: NistField, nist: NistFile) => NistFieldValue;
  informationWriter?: (informationItem: NistInformationItem, data: EncodeTracking) => EncodeTracking;
}

Which would then be used in place data.buf.write in encodeNistInformationItem, and of course a parallel for decode. Does this make sense?

@Doreen-Schwartz
Copy link
Contributor Author

Another thing I noted is that there is usage of the Buffer.byteLength function in a few places. I imagine this would also require some changes.

@ivosh
Copy link
Owner

ivosh commented Aug 12, 2024

For decoding logic, the proposed interface enhancement looks good, such as:

/** Decoding options for a single NIST Field. */
export interface NistFieldDecodeOptions extends NistFieldCodecOptions {
  parser?: (field: NistField, nist: NistFile) => Result<NistFieldValue, NistParseError>;
  informationDecoder?: (buffer: Buffer, startOffset: number, endOffset: number): string;
}

For encoding logic, consider that the the NistFile tree gets visited couple of times, in particular first for getting right the lengths of all the fields, records, target buffer etc. And second time for actually writing the data into the buffer.
So either you need two new properties (one for determining the length and another one for writing the converted data) or the new property should be rather along a generic lines of informationWritter?: (informationItem: NistInformationItem): Buffer.

Performance-wise, I'd say a charset conversion process is computationally comparable to Buffer.byteLength of an ASCII string. This means it's probably fine to perform the character conversion twice: first for determining the length and then for the serialization. Anyway, you should do at least some rudimentary performance comparison between plain 'UTF-8' and 'WIN-1255' enabled workloads.

@ivosh
Copy link
Owner

ivosh commented Sep 11, 2024

@Doreen-Schwartz I've finished updating the documentation. Please check 8473957.
For consistency, I've renamed informationWriter to informationEncoder. I hope it's fine with you.
Once I get a green light from you, I'll publish a new npm version.
Thanks again for the PR.

@ivosh ivosh linked a pull request Sep 11, 2024 that will close this issue
@Doreen-Schwartz
Copy link
Contributor Author

@ivosh Overall looks great! Only one thing I noticed is that in the JSDoc for informationEncoder, the parameters are the same as for formatter instead of the function's actual parameters.

Otherwise seems to cover things well 👍

@ivosh
Copy link
Owner

ivosh commented Sep 12, 2024

@Doreen-Schwartz Thank you for the review. I am always glad to have another pair of eyeballs ;-)
New release 0.10.0 is out.

@ivosh ivosh closed this as completed Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants