-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: NonEmptySet and NonEmptyMap #616
base: master
Are you sure you want to change the base?
Conversation
My personal alignment preferences are not aligned with the historical standards of the package. If there are to be major alignment shifts, I would prefer to see the style change as well. In particular, I like fairly minimal alignment when readability doesn't demand more, and I like things to stay pretty far to the left. The classic "we can make it look like traditional mathematical notation, so we should" argument doesn't carry much weight with me. That said, try not to change much more than you have to. A readable diff is a high priority when you're doing something complicated like this. |
@treeowl Great! yes I think the current style is mainly a recipe for tons of merge conflicts. The "sed style" I did does keep the diff smaller, and when I changed it more it was to wrap lines and move to the left. |
There's a build failure. |
Oops. Took out the pattern synonyms but left the extension. |
Weird CI failures. Maybe restart it? |
Weird build error fixed. Real build errors remain. |
Looks like you still have some pattern synonyms going on. |
Sorry! I removed that one import----didn't see anything else and built it locally again---hopefully it works now. |
It's a good start, but there are some significant issues.
|
Er ... My second point is only valid if we end up exporting that stuff. Is there enough demand for nonempty sets and maps to justify it? That at least needs a libraries list discussion. |
Oh I made this intermediate state PR just because of the benchmarking of the type change alone. If you want to go all or nothing re exporting the stuff that's fine; I can email the list. I'll do the constructor renames in any event. |
OK fixed conflicts and did
as requested. |
I think it's pretty safe to assume we're going to expose the nonempty types (by some names or other). So if you have time, could you start working on breaking up the functions into mutually recursive pairs? Don't squash into the current commit yet, in case we change our minds. |
Use |
@treeowl OK sounds good. I renamed the PR accordingly. |
@treeowl OK pushed a select few Relatedly, the wording of |
@treeowl hold up, github is loosing comments, ugh. (It normally just marks outdated comments as such and force pushes are fine.) |
(9555fb4 some missing comments here) |
I'll just not force-push for now; that seems easiest. |
Yeah, GitHub is rough. I meant |
Oh no worries at all. I'll keep poking at it, taking those into account. |
Sync non-empty branch with upstream
@treeowl I'm helping with some legwork on this PR.
|
My personal wish is that E.g. there is no But I agree that filter :: (a -> Bool) -> NonEmpty a -> [a] in |
To be clear: I expect this change to cause minimal or even no changes to the |
@phadej Why do you think that's the right thing to expect? |
Has it already been decided that we really want non-empty maps/sets in As for the names, adding |
@treeowl I expect no breaking changes, which implies there won't be massive changes in @konsumlamm if you look at the implementation, it's almost free: +data Map k a = Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
-data Map k a = NE {-# UNPACK #-} !(NonEmptyMap k a)
| Tip
+data NonEmptyMap k a = Bin' {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a) It's silly to duplicate the code in data NEMap k a =
NEMap { nemK0 :: !k -- ^ invariant: must be smaller than smallest key in map
, nemV0 :: a
, nemMap :: !(Map k a)
}
deriving (Typeable) OTOH. It might be a good idea to first only do the changes in |
The idea of doing only internal changes first also solves the name bikeshedding problem. As soon as internals are done, you can make a release. |
This PR does a lot more than just that though, it adds a bunch of new functions (more than 1000 additional LOC). |
@konsumlamm that is why I suggest to keep changes to the Internal modules, to reduce the size of the change. |
I think the same. I can see that it's nice to know statically whether some collection (sequence, list) is non-empty (although it feels a bit arbitrary - perhaps I want the size to be greater than 42, or prime) Specifically for Map and Set, we will (mostly?) ultimately look up some key, and we can't know statically whether it's there, even if we have static info on emptiness. Well, unless we only do bulk operations - but then intersection has a similar problem: the result can be empty. On the other hand, I do follow the "curried map" motivation (see beginning of #608 (comment)). But then, what will we do with these curried maps? Probably some look-up, so .. see above. In all, I probably won't use guaranteed-non-empty map/set much (in my code, in my teaching), so I prefer that the well-known API (Data.Map, Data.Set) does not change - as @phadej was already saying in #616 (comment) |
The existing public API can't change, but the implementation can, as long as performance doesn't suffer. |
Has anyone already done benchmarks to see how this change affects performance? |
I've ran Here's the result for
I would say it's not just lookups. A recurring use-case is to perform a bunch of My plan going forward is roughly to:
Once that's all done and merged we can start thinking about external facing API. Thoughts? |
@alexfmpe that sounds great! |
The proliferation of split functions and combinations has been gnawing at me, so I did a little experiment on 'emptiness polymorphism'. Looks like we can have both A bunch of functions that do not produce sets or preserve emptyness can be 'generalized' by simply changing the types and adding a dummy to argument to This approach would make the diff massively smaller, and less room for regressions since the implementations are essentially the same. Benchmark for the above commit in 9.4.2:
Notice the changed methods member/map/fold have the same time performance. Haven't yet looked into what's up with delete/union/intersection but since the only thing that actually changed for those was adding This lets a bunch of the current functions consume nonemptys, so the variants we need to add should significantly go down in size. To be able to actually create Another advantage is that since we change from "a set is either empty or an element and two sets" to "a non empty set is a set that can not be empty", that is, changed the nullary constructor instead of the recursive one, it now also works when there's structurally no element in the recursive case, as is the case for There's a drawback though - a breaking change.
If folks think this approach makes sense and that whether it should be pursued depends on the magnitude of breakage, I'd be up for doing impact assessment. |
Another advantage is that we can in theory expose
Since this uses GHC-specific extensions, it'd require the more general definitions (type/instances/code) to be behind CPP to preserve compatibility. AFAICT it should still result in a much smaller diff than the current approach, since It would only apply to the types/signatures, rather than split every function in half. Impact assessmentAn attempt to build 2660 packages via clc-stackage with a ghc using
Additionally,
This encoding is technically breaking as the assessment shows, but it seems much more useful than completely separate types which is likely to also lead to function proliferation downstream, and in principle can also be applied to other types: say, to de-duplicate |
It's been a while since I looked in on the progress here. Why do you want to make |
See #616 (comment) |
Why not use a newtype instead of a type synonym? There aren't that many wrapper functions to write. |
Don't think I understand what's being proposed. The attempt to parameterize emptiness had as goal unifying |
I mean instead of |
That'd be So that'd mean
I see. That would prevent breakage, allow also adding non-empty-ness for the Int-trees, and keep the implementations the same, up to coerce, so I do think it seems an improvement to the current approach. |
Oh that technically still breaks at least one package - as mentioned above, there's at least one use of Once this PR is in shape, I should run another impact analysis |
The newtype constructor would be available from the |
Trying to write the wrappers with
Do we un-mark the module as safe, or write the wrappers the old way? import Data.Set.Internal as S hiding (empty, insert, member)
import qualified Data.Set.Internal as S
empty :: Set a
empty = Set S.empty
insert :: Ord a => a -> Set a -> Set a
insert a (Set s) = Set $ S.insert a s
member :: Ord a => a -> Set a -> Bool
member a (Set s) = S.member a s |
Switching to |
Do #608.
One downside to this PR as writing is a made a mess of various vertical alignments in the code. I'd be happy to fix that if this looks promising.