Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension for handling Yojson data with code positions #63

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

gfngfn
Copy link

@gfngfn gfngfn commented Oct 22, 2018

This PR proposes an extension that enables us to handle Yojson data equipped with code positions the data come from. It adds

  • a submodule Yojson.SafePos and
  • a type Yojson.position for code positions,

and thereby maintains backward compatibility with the latest version (i.e. yojson 1.4.1). It probably be useful in situations like the following: where one expects the value corresponding to a certain name (such as "title") in objects to be a specific kind of data (such as string) but some data in a given (possibly handwritten) Yojson do not meet the expectation. In such case, one can report the code position where the unexpected data occur by using Yojson.SafePos in order to ask for correction in an easy-to-understand manner.

For an example of usage, see examples/filtering_pos.ml (which can be run by invoking the updated run-examples.sh).

@mjambon
Copy link
Member

mjambon commented Nov 19, 2018

Hi! Thanks for looking into this.

It seems that keeping track of locations would be useful to people who use yojson directly i.e. not from atdgen and don't have strict performance requirements. I imagine this fits the scenarios where json is used as a simple config file. Note that there's a bit of tension between achieving the best performance when working with large amounts of machine-generated json data and with human-written json which benefits from better error reporting.

A thing I never particularly like about yojson is that we have already 3 modules to pick from (Basic, Safe, and Raw) and that they're generated by cppo. Are you confident that this fourth module is the right one? Is there anything else we should add to it or remove while we're at it?

Other than that, my concern is about the performance loss introduced by creating and discarding location objects. Did you run performance benchmarks? I haven't done that recently, but based on results I got in the past, it would be significantly faster (10% or more) to call get_raw_position only if the position is not discarded. Can you try getting rid of these calls by using macros or preprocessor conditionals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants