Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"dhall resolve" does too much #2116

Open
georgefst opened this issue Dec 21, 2020 · 10 comments
Open

"dhall resolve" does too much #2116

georgefst opened this issue Dec 21, 2020 · 10 comments

Comments

@georgefst
Copy link

georgefst commented Dec 21, 2020

I was under the impression that dhall resolve would just substitute in the contents of import expressions. But it sometimes performs a normalisation step that can cause a huge blowup in output size. e.g.

-- 1.dhall
 let x = ./2.dhall in [ x.A, x.B ]
-- 2.dhall
< A | B | C >
> echo ./1.dhall | dhall resolve
[ < A | B | C >.A, < A | B | C >.B ]

I would have expected let x = < A | B | C > in [ x.A, x.B ].

In my actual use case, this is resulting in an output of over 100,000 lines, instead of under 1000 (there's a 540-member sum type involved...).

@georgefst
Copy link
Author

Hmm. Github's syntax highlighting appears to be slightly broken - wrong colour for the opening bracket in 2.dhall.

@sjakobi
Copy link
Collaborator

sjakobi commented Dec 22, 2020

dhall resolve's non-normalizing imports resolution is currently limited to its input expression – any transitive imports are resolved in the usual, normalizing way.

In your example, the output is fully normalized because the input expression is a single import: ./1.dhall. You can preserve the let-binding by letting dhall resolve work on the expression in 1.dhall directly:

$ cat 1.dhall | dhall resolve
let x = < A | B | C > in [ x.A, x.B ]
$ dhall resolve --file 1.dhall
let x = < A | B | C > in [ x.A, x.B ]

In principle, we could also extend dhall resolve so any transitive imports aren't normalized either. (The Status used during import resolution can use a custom _normalizer, which, in this case, would simply return its input.)

@georgefst
Copy link
Author

Ah, thanks, that suddenly makes a lot of sense.

Is there any particular reason for it to be the way it currently is? Keeping normalisation separate seems preferable on the face of it.

@Gabriella439
Copy link
Collaborator

@georgefst: There is a high likelihood that I will propose changing the standard to disable the behavior of pre-normalizing all imports. This is in part to fix the performance issues discussed in #1890 (comment)

@georgefst
Copy link
Author

@Gabriel439 That would be great!

There's still a potential blowup from diamond dependencies. But that seems like something I can experiment with downstream. e.g. introducing a new let-binding with a fresh name.

@Gabriella439
Copy link
Collaborator

Gabriella439 commented Jan 2, 2021

@georgefst: Yes, I plan to fix the diamond dependency issue, too. The idea is that if an import is protected by an integrity check then we don't have to resolve the import in order to cache it. We would just cache the import separately and then resolve it from the cache on the fly when decoding anything that imports it with an integrity check. That ensures that we only ever cache one copy of an import within the entire cache.

@georgefst
Copy link
Author

Has there been any progress towards this? It's still my main frustration with Dhall.

(I don't like to pester maintainers - I'm just genuinely curious whether there are any threads I've missed)

@Gabriella439
Copy link
Collaborator

@georgefst: No worries! There hasn't been progress on this recently

@georgefst
Copy link
Author

@georgefst: Yes, I plan to fix the diamond dependency issue, too. The idea is that if an import is protected by an integrity check then we don't have to resolve the import in order to cache it. We would just cache the import separately and then resolve it from the cache on the fly when decoding anything that imports it with an integrity check. That ensures that we only ever cache one copy of an import within the entire cache.

Re-reading, I realise this doesn't quite solve the issue I'm thinking of. My use case for dhall resolve is for distributing a single file which doesn't depend on any other Dhall files or on remote resources. So the blowup I'm talking about is where a single dependency might be inlined in multiple locations in the final file.

@Gabriella439
Copy link
Collaborator

@georgefst: I believe the combination of:

  • Not pre-β-reducing imports
  • Caching unresolved imports (so long as their transitive imports were protected by an integrity check)

… would suffice to give you the performance improvement you're looking for, at least when using the Haskell implementation of Dhall. This is because the implementation is smart enough to avoid duplicating sub-expressions when they are referenced multiple times.

The exception is if the final result display to the user or conversion tool contains duplicate sub-expressions, in which case it has to actually render all of them. That's unavoidable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants