Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/json: allow encoding dicts with int keys #468

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 32 additions & 6 deletions lib/json/json.go
Original file line number Diff line number Diff line change
Expand Up @@ -155,19 +155,45 @@ func encode(thread *starlark.Thread, b *starlark.Builtin, args starlark.Tuple, k
// e.g. dict (must have string keys)
buf.WriteByte('{')
items := x.Items()
stringKeys := false
intKeys := false
for _, item := range items {
if _, ok := item[0].(starlark.String); !ok {
return fmt.Errorf("%s has %s key, want string", x.Type(), item[0].Type())
_, isString := item[0].(starlark.String)
_, isInt := item[0].(starlark.Int)
if !isString && !isInt {
return fmt.Errorf("%s has %s key, want string or int", x.Type(), item[0].Type())
}
stringKeys = stringKeys || isString
intKeys = intKeys || isInt
}
// Sort the keys. This is useful if they come from a dict, which presents
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what this key sorting business is about, and whether it should be pushed into the different implementations of the IterableMapping that want it?
I'm thinking that by default a random impl does not want it (i.e. it wants the keys rendered in the order it produces them in Items()), but notable starlark.Dict probably wants it (because I think it currently produces its items in a random order).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very little is random in Starlark, as it was originally designed for a build system where reproducibility of computation is paramount. For that reason, Items() yields dict elements in insertion order.

This does mean that sorting isn't necessary for determinism, and nor is it part of the contract specified by the doc comment, but it is the existing behavior in starlark-go, and is the documented behavior of starlark-java, so I think this is in spirit a breaking change, and thus warrants an issue in the bazelbuild/starlark repo before we implement any change. Supporting int keys is also a behavior change that needs discussion.

Personally I'm loathe to remove it, even if it is a minor optimization. Even Go, where performance is generally more important than in Starlark, sorts keys in its json.Marshal implementation.

I suspect that using Go's sort.Strings instead of sorting a slice of abstract starlark.Values is a significant optimization that makes this change unnecessary. I suggest you evaluate that first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense about the key sorting. I wasn't particularly interested in doing anything about it.
As you can see, the patch kept the sorting, although not for the case of mixed int/string keys. I'm happy to do whatever in that case - sort as strings, or even reject the whole marshalling.

Supporting int keys is also a behavior change that needs discussion.

So you're saying I should necessarily discuss the int keys support on https://github.com/bazelbuild/starlark/ for this patch to move forward (i.e. "extras" in the Go impl are not kosher)? In the happy scenario that the proposal gets accepted, would a Java implementation also be required? Because that might be beyond personal interest :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should necessarily discuss the int keys support on https://github.com/bazelbuild/starlark/ for this patch to move forward...?

Yes, the two implementations should be consistent in the behavior of all their core functions. (This is not a theoretical concern: at Google we have tools in Go and Java that process Bazel/Blaze BUILD files, and any divergence in their behaviors is problematic.)

would a Java implementation also be required?

No. Someone else will implement that if the proposal is accepted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to discuss the int keys as a wider Starlark proposal, assuming it has your support in principle.
But just to check - I've found this recent comment on the
starlark repo that seems to say that the JSON library is not under their purview -
bazelbuild/starlark#253 (comment)
Should I not read too much into it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to discuss the int keys as a wider Starlark proposal, assuming it has your support in principle.

Personally I don't see much value in json.encode supporting dicts with int keys, since you can't round-trip encode+decode them: since JSON doesn't support objects with int keys, they end up being coerced to strings. And while it's true the Go's json.Marshal supports this behavior (again by coercing to strings), I've never seen anyone use it, and I can't really imagine why you'd want to. Plus it complicates the sorting. But feel free to propose it if you like.

Should I not read too much into it?

Yep, don't worry about that comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't care about it, I probably will not propose anything; I'm not blocked on this.
But,

since you can't round-trip encode+decode

They do round-trip, though, at least in Go:
https://go.dev/play/p/v07cIJJNZLA
Let me know if this changes your interest; if it doesn't, I'll close this PR.

Copy link
Collaborator

@adonovan adonovan May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I had no idea. Thanks for the correction. Reading threads golang/go#12146 and golang/go#12529, I'm honestly surprised the feature was accepted with so little resistance.

If you want this feature, by all means propose it. I'm curious though, what do you need it for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extending Delve with an RPC that takes a Starlark script as an argument, and returns the result of running that script. All my JSON interest comes from figuring out how exactly to return those results. My current thinking is that I'll mandate that the script return a string - in particular, a JSON string. So it's up to the script to do the json encoding. On the client side, I want to easily map the result onto a struct - a Go struct since the client is also a Go program.
In the one script that I'm playing with, I was naturally constructing a Starlark dict with int keys (goroutine IDs, in my case), and I was upset that it didn't "just work" when I attempted to json.encode() it. Otherwise, things work pretty well, partially courtesy of your struct package that gives me nice json matching go-side structs.

I'm not married to constructing the result JSON inside the script - particularly since I still have the issue with the fallible attribute accessors that Delve implements (I'm running with a fork of starlark-go to handle that for the moment). Delve uses the stdlib rpc system with the json codec, so I could even conceivably have my rpc talk in terms of starlark.Value and let the rpc codec deal with encoding/decoding. I very briefly tried building the json outside of the script, in Delve. But the problem I ran into immediately was that the implementation of the starlark-go hashtable is not marshallable.

I've also discovered that you've made a proto library for starlark-go, so that's another option. But for now I'm interested in prototyping things more quickly, so I didn't jump at the opportunity to add protos to the mix.

So, anyway, I don't really know what I'm doing yet, and I don't particularly need this patch. It just struck me as something that oughta "just work".

// its items unordered. However, we only sort if all the keys have the
// same type.
if !(stringKeys && intKeys) {
sort.Slice(items, func(i, j int) bool {
// Compare as strings if all the keys are strings, or as ints if all
// the keys are ints.
if stringKeys {
return items[i][0].(starlark.String) < items[j][0].(starlark.String)
}
cmp, err := items[i][0].(starlark.Int).Cmp(items[j][0], 0 /* depth */)
if err != nil {
panic(fmt.Errorf("unexpected failure to compare ints: %w", err))
}
return cmp == -1
})
}
sort.Slice(items, func(i, j int) bool {
return items[i][0].(starlark.String) < items[j][0].(starlark.String)
})
for i, item := range items {
if i > 0 {
buf.WriteByte(',')
}
k, _ := starlark.AsString(item[0])
key := item[0]
k, ok := starlark.AsString(key)
if !ok {
// If the key is not a string, it must be an int as per the checks
// above.
k = key.(starlark.Int).BigInt().String()
}
quote(k)
buf.WriteByte(':')
if err := emit(item[1]); err != nil {
Expand Down
4 changes: 2 additions & 2 deletions starlark/testdata/json.star
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,20 @@ assert.eq(json.encode((1, 2, 3)), "[1,2,3]")
assert.eq(json.encode(range(3)), "[0,1,2]") # a built-in iterable
assert.eq(json.encode(dict(x = 1, y = "two")), '{"x":1,"y":"two"}')
assert.eq(json.encode(dict(y = "two", x = 1)), '{"x":1,"y":"two"}') # key, not insertion, order
assert.eq(json.encode({1: "one", 2: "two"}), '{"1":"one","2":"two"}') # key, not insertion, order
assert.eq(json.encode(struct(x = 1, y = "two")), '{"x":1,"y":"two"}') # a user-defined HasAttrs
assert.eq(json.encode("😹"[:1]), '"\\ufffd"') # invalid UTF-8 -> replacement char

def encode_error(expr, error):
assert.fails(lambda: json.encode(expr), error)

encode_error(float("NaN"), "json.encode: cannot encode non-finite float nan")
encode_error({1: "two"}, "dict has int key, want string")
encode_error({(1,2): "two"}, "dict has tuple key, want string or int")
encode_error(len, "cannot encode builtin_function_or_method as JSON")
encode_error(struct(x=[1, {"x": len}]), # nested failure
'in field .x: at list index 1: in dict key "x": cannot encode...')
encode_error(struct(x=[1, {"x": len}]), # nested failure
'in field .x: at list index 1: in dict key "x": cannot encode...')
encode_error({1: 2}, 'dict has int key, want string')

recursive_map = {}
recursive_map["r"] = recursive_map
Expand Down