Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonthOfYear data type #134

Closed
NorfairKing opened this issue Aug 11, 2020 · 68 comments
Closed

MonthOfYear data type #134

NorfairKing opened this issue Aug 11, 2020 · 68 comments

Comments

@NorfairKing
Copy link

NorfairKing commented Aug 11, 2020

We already have DayOfWeek; this structure (and relevant instances and functions) would be handy as well

data MonthOfYear
  = January
  | February
  | March
  | April
  | May
  | June
  | July
  | August
  | September
  | October
  | November
  | December
@NorfairKing
Copy link
Author

The MonthDay functions, as well as toGregorian can then use MonthOfYear instead of Int.

@AshleyYakeley
Copy link
Member

Would prefer not to proliferate types.

@NorfairKing
Copy link
Author

NorfairKing commented Aug 11, 2020

I'm not sure what you mean.
Could you elaborate?

This would be safer than Int, which has 2^64-12 invalid values.
Types are cheap and make a real difference with respect to safety, so we should use them.

@AshleyYakeley
Copy link
Member

Should there be a DayOfMonth type? Currently we have Int, which has 2^64-31 invalid values.

@AshleyYakeley
Copy link
Member

data DayOfMonth
  = Day1
  | Day2
  | Day3
  ...
  | Day31

Is this a good idea?

@NorfairKing
Copy link
Author

Is this a good idea?

I wouldn't say so, just because [Day29..Day31] can all be invalid depending on what month and year they're from.
If all months were 31 days long then I would say yes (then we would likely also have individual names for them).

@AshleyYakeley
Copy link
Member

OK, so we have to allow invalid year-month-day representations anyway. I'm not seeing the benefit of a month type.

In any case, dates come from numbers (which can be invalid) or text (which can be invalid). Use fromGregorianValid if you want to validate year-month-day representations.

@NorfairKing
Copy link
Author

OK, so we have to allow invalid year-month-day representations anyway.

Maybe, but that doesn't mean you have to allow for invalid month representations.

@NorfairKing
Copy link
Author

@AshleyYakeley
Copy link
Member

Why does it matter if there are invalid month representations, when they have to be validated anyway?

@NorfairKing
Copy link
Author

NorfairKing commented Aug 11, 2020

The short answer: There are numerous reasons; for example to be able to generate them without writing the generator manually.
The long answer: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

@AshleyYakeley
Copy link
Member

AshleyYakeley commented Aug 11, 2020

In that case I recommend creating your own month type. In the time library, validation of year-month-date formats cannot be avoided, and it's easier to consistently use Int for parts (arguably year should be Int rather than Integer). YYYY-MM-DD is intuitively three numbers that don't require the extra step of converting to a type.

@ad-si
Copy link

ad-si commented Aug 11, 2020

This would also make it possible to add an Int-MonthOfYear (e.g. 2020-may) type to define a month without needing any validation because any Int-MonthOfYear is valid.

@AshleyYakeley
Copy link
Member

I would probably represent that as a newtype of an Integer (an absolute count of months).

@NorfairKing
Copy link
Author

I would probably represent that as a newtype of an Integer (an absolute count of months).

Then you have the worst of both worlds: It's both still mostly invalid (because it's still just an int) AND you have to deal with an extra type.

@AshleyYakeley
Copy link
Member

Oh, I meant for a year-and-month type for, for example, "May 2020". Using a newtype for an Integer would have no invalid values.

@AshleyYakeley
Copy link
Member

A new MonthOfYear type would mean having to duplicate various functions to use the new type. I'd prefer to keep things simple and consistent.

@mixphix
Copy link

mixphix commented Nov 26, 2020

I just don't see how this is any different from #69. The Enum instance for MonthOfYear is easily made circular just like DayOfWeek, and provides the benefit of a type with no invalid values as well as an easy way to change between it and any integral type the user needs.

Once an integer has been parsed as a Month, it should be reflected in the type system, especially when there are already built-in functions for converting to and from integers safely.

@AshleyYakeley
Copy link
Member

There's no canonical numbering of days of the week, so a type makes sense. By contrast, month-of-year and day-of-month have canonical numberings.

@ad-si
Copy link

ad-si commented Nov 26, 2020

ISO 8601 defines a numbering for weekdays: 1-7 (https://en.wikipedia.org/wiki/ISO_week_date). Nevertheless it's still better to use Monday-Sunday, as it would be better to use January-December. I can't believe that we even have to discuss this 🙄.

@ad-si
Copy link

ad-si commented Nov 26, 2020

Reminds me of this gem by @justinwoo 😂😭
image

@georgefst
Copy link

Any chance of this being re-opened? I think it's a real shame that a core library isn't exposing a more idiomatic Haskell-y API.

I've just hit this today. I get a MonthOfYear from toGregorian, and once I have it, I want to be sure that I'm passing around a valid value, perform exhaustive matches on it, and not have to worry about error handling. It's a textbook example of the "Parse, don't validate" approach, to quote the link further up the thread.

I've ended up defining my own sum type and conversion functions, as I imagine other people in this thread will have done. But, in general, if every user of a library is writing the same wrapper around it, that's usually a sign something should be upstreamed.

@AshleyYakeley
Copy link
Member

Hmm, should we also have a DayOfMonth type? You could pass it around as a valid value, perform exhaustive matches on it, and not have to worry about error handling.

data DayOfMonth =
   DayOfMonth01
   DayOfMonth02
   DayOfMonth03
...
   DayOfMonth29
   DayOfMonth30
   DayOfMonth31

@mixphix
Copy link

mixphix commented May 22, 2021

The difference is that every year always has twelve months, and the twelve months listed above are always present in a given year. There won't ever be invalid pattern matches like there will be with Day31, and when working with dates far in the past, certain guarantees about the Gregorian calendar don't hold (see old-style dates), regardless of whether they're represented as numbers or as members of their own type.

@NorfairKing
Copy link
Author

@bi-functor We're going in circles now: #134 (comment)

@AshleyYakeley
Copy link
Member

I see. What about MinuteOfHour? Every hour has sixty minutes, and there won't ever be invalid pattern matches.

@mixphix
Copy link

mixphix commented May 22, 2021

If each minute had its own name? I would agree, there should be a type to represent that. But they don't, they are literally named by their number. If there was an easy way of representing modulo 60 at the type level, that would be ideal.

Each month has its own name. I'd settle for a dozen pattern synonyms and an English monthName function if need be.

@NorfairKing
Copy link
Author

@AshleyYakeley If you ask "Why does it matter if there are invalid month representations, when they have to be validated anyway?", why even store these numbers as numbers and not just bytestrings? You have to validate the bytes anyway.

@ad-si
Copy link

ad-si commented Jan 11, 2024

I hope we’re still on this. The arguments against it are as weak as they were 3 years ago IMHO. ^^

@AshleyYakeley
Copy link
Member

QuarterOfYear should probably have been Int I guess? IDK what to tell you.

In any case, I've made the guiding principles quite clear.

@AshleyYakeley
Copy link
Member

Simplicity over safety is actually pretty important for a library used as much as this. Copying from another issue:

As an example of the cost in simplicity, consider two scenarios:

  1. You have year, month, day, (y,m,d) :: (Int,Int,Int) which you already know is valid, and you need to convert that to a Day.

    Currently:
    YearMonthDay (toInteger y) m d

    With WIP: Month of year type #246:
    YearMonthDay (toInteger y) (fromJust (parseMonthOfYearIndex m)) d
    note use of partial function fromJust

  2. You have year, month, day, (y,m,d) :: (Int,Int,Int) which you need to validate, converting to Maybe Day.

    Currently:
    fromGregorianValid (toInteger y) m d

    With WIP: Month of year type #246:
    parseMonthOfYearIndex m >>= \m' -> fromGregorianValid (toInteger y) m' d
    has to validate twice

@georgefst
Copy link

I'm sorry I don't have time for a detailed response right now, but quite simply, if every library followed that logic, I wouldn't use Haskell.

@AshleyYakeley
Copy link
Member

AshleyYakeley commented Jan 11, 2024

Does creating a MonthOfYear enumeration help with extremely common cases I mentioned? No. It can't. A brief reflection on where and how such MonthOfYear values are constructed should show that.

I mean, in a few cases, you might hard-code MonthOfYear values in your code, and then you get the full benefit of safety of type, the compiler checking you didn't create an invalid month.

But in every other possible construction of the MonthOfYear type, you are converting from a number obtained somewhere else, and my two examples apply.

Anyway I've explained my reasoning, the principles and how they apply, with no more response than a thumbs-down emoji. TBH I'm beginning to think you've all kind of dug in here, having run out of arguments?

@L0neGamer
Copy link

I agree with another user's point of "parse, don't validate". The ideal situation is once your inputs are parsed, you shouldn't have to make sure that they're correct in future. Type synonyms don't enforce that at all, and can disguise issues.

If I parse what is meant to be a date in JSON, I don't want -3 to be a month I have. The easiest way to solve that is to have the parser catch that, and instead of forcing every user of this library to do that for themselves it would make a lot more sense for the library to handle it.

I believe many people are just using reactions because the points being made are just being dismissed out of hand.

For point one you would only have that partial function of the m value is invalid; and if it is, you shouldn't be constructing the year month day value anyway!

For point two, you would only have to validate twice if you don't carry around the proof that you parsed it correctly.

Instead of a data type you could have an opaque newtype around an Int or Word8 or similar, with validateMonth :: Int -> Maybe NewtypeMonth being the only exposed way to construct it, which would ensure that the numeric value is maintained internally but we can only make them if it's correct.

@NorfairKing
Copy link
Author

NorfairKing commented Jan 11, 2024

Anyway I've explained my reasoning, the principles and how they apply, with no more response than a thumbs-down emoji. TBH I'm beginning to think you've all kind of dug in here, having run out of arguments?

I'll happily argue about this for as long as you want.

In my view there are two really good arguments for you to make that I'd begrudgingly accept:

  1. "Backward compatibility is more important than safety."
  2. "Sod off, it's my code and I do what I want."

Unfortunately this is a boot library. Otherwise I'd have forked it long ago and not bothered you with this.
So if you'd like to confirm that you are in fact interested in a technical argument that might convince you, then I'll happily write out all the arguments again and respond to your responses.

The only reason I haven't done that is because I am now convinced that you just don't like me.
(Which is fair, because you've clearly felt responsible for responding to this issue so in your view I've probably wasted a lot of your time.)

@Abastro
Copy link

Abastro commented Jan 11, 2024

I'm sorry I don't have time for a detailed response right now, but quite simply, if every library followed that logic, I wouldn't use Haskell.

Why are you still using haskell, then? There are safer, lower-level alternatives like rust.

@mixphix
Copy link

mixphix commented Jan 11, 2024

I really don't think we need to be suggesting to kick each other out of the community over these types.

I'll admit that my suggestion to add pattern synonyms for the MonthOfYear type was shortsighted -- especially the {-# COMPLETE #-} pragma. But that's not the real issue here. IIUC @NorfairKing & cie. are arguing for stronger guarantees about the values they want to use. You want to make your life easier by having an integer representation.

The reason I usually want a data MonthOfYear isn't for going from (y, m, d) to Day, it's the reverse. I want to pattern match on a constructor in the m value of YearMonthDay y m d :: Day. I want to write [October .. March] and get a list that wraps around January. The library makes it easy to obtain Day and LocalTime values from scratch -- but not to then use those values safely!

@AshleyYakeley
Copy link
Member

The reason I usually want a data MonthOfYear isn't for going from (y, m, d) to Day, it's the reverse.

OK, finally someone gives an actual realistic code scenario.

I want to pattern match on a constructor in the m value of YearMonthDay y m d :: Day.

You can already do this.

I want to write [October .. March] and get a list that wraps around January.

It's not clear this would be appropriate even with an enumeration type. I do this for DayOfWeek, but only because no-one can agree when the first and last days of the week are. But everyone agrees on the first and last months of the year.

The library makes it easy to obtain Day and LocalTime values from scratch -- but not to then use those values safely!

I don't understand -- if you obtain a MonthOfYear from a Day or a LocalTime, you know it's correct.

@AshleyYakeley
Copy link
Member

AshleyYakeley commented Jan 11, 2024

The fact of the matter is that the MonthOfYear type isn't used much. It's only really used "incoming" to construct Day values, where it already starts out as an Int, and "outgoing" extracted from Day values, when it's guaranteed to be in the correct range, and so is already safe. Day is your safe, correct, parsed, validated type, while month-of-year is best understood as an attribute of that, which in most cultures/uses is a number in [1..12].

I'm not convinced any of the people complaining are running into genuine safety issues in their own code, where using an enumeration type would actually catch potential defects at compile-time. Rather, you seem to be thinking entirely abstractly about the type. Haskell encourages this sort of thinking, of course, but it doesn't mean it's always appropriate for working code.

If I make this change now, obviously, lots of people will complain about the sudden (pointless!) incompatibility.

If I had done it this way from the beginning, lots of people would complain about how awkward it is to construct Day values from (y :: Int, m :: Int, d :: Int) values, because that is an extremely common use of month-of-year. You have to parse twice, or else I have to provide duplicate from-Gregorian functions for Int and for MonthOfYear.

And literally no actual safety would be gained. None of your code defects would be caught at compile-time. It's all abstraction.

@NorfairKing
Copy link
Author

NorfairKing commented Jan 12, 2024

And literally no actual safety would be gained. None of your code defects would be caught at compile-time. It's all abstraction.

I've already shown you several real footguns and actual dangers

Here are some more:

> read "-5" :: MonthOfYear
-5

Should be a parse error, and is silently both wrong and invalid instead.

^ This one is extra fun when it's part of a bigger string like 2023-5.

> minBound :: MonthOfYear
-9223372036854775808

Should be January, and is silently both wrong and instead.

ghci> pred January
0

Should be December or an error, and is silently invalid instead.

ghci> February * March  == June
True

Should be a compile error, and is silently nonsense instead.

ghci> January `quot` February
0

Should be a compile error, and is silently invalid instead.

ghci> May `divMod` February == (February, January)  
True

Should be a compile error, and is silently nonsense instead.

ghci> sum (February, January) == January
True

Should be a compiler error, but is silently nonsense instead.

ghci> sample (arbitrary :: Gen MonthOfYear)
0
2
2
5
6
-8
-12
-3
8
-11
0

Most of these are silently invalid.

Note that all of these perform operations that the compiler is convinced should "just work".
Some result in invalid values and others just produce nonsense because type MonthOfYear = Int is a lie.

Rather, you seem to be thinking entirely abstractly about the type. Haskell encourages this sort of thinking, of course, but it doesn't mean it's always appropriate for working code.

If you know anything about my work you'll know that the topic of mental masturbation is not one I take lightly.
I created this issue and am continuing to respond because I have had real problems with this lying to the compiler.

@AshleyYakeley
Copy link
Member

These are all, at most, arguments for removing the type synonym MonthOfYear and January etc. patterns, and simply forcing use of Int and numeric literals instead. I created the synonym and patterns as an expedient: I figured the convenience upside outweighs any downside of allowing defects. But apparently you disagree? I have to say, most of your examples look very contrived.

But regardless of the existence the type synonym and patterns, can you at least agree that month-of-year should be an Int and not a datatype in places such as fromGregorian and YearMonthDay?

@NorfairKing
Copy link
Author

But regardless of the existence the type synonym and patterns, can you at least agree that month-of-year should be an Int and not a datatype in places such as fromGregorian and YearMonthDay?

I don't know if it should but I agree that it's a good idea that the option exists.
There are good reasons to parse triples like that into Days.
The opposite direction then also has a good reason to exist.

For toGregorian, at least, I would definitely want a version that uses an ADT for MonthOfYear instead.

@AshleyYakeley
Copy link
Member

Also, should the DayOfMonth and DayOfYear type synonyms also be removed? You can contrive similar bad code for them too.

@AshleyYakeley
Copy link
Member

For toGregorian, at least, I would definitely want a version that uses an ADT for MonthOfYear instead.

This means duplicating a bunch of functions, which I definitely want to avoid. In order to keep things simple, I prefer to stick to one type for any given concept. And for month-of-year, it has to be Int, because that's the common use case.

@NorfairKing
Copy link
Author

Also, should the DayOfMonth and DayOfYear type synonyms also be removed? You can contrive similar bad code for them too.

Yes, actually.
Don't use type synonyms for documentation if it introduces footguns.

@AshleyYakeley AshleyYakeley closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2024
@NorfairKing
Copy link
Author

NorfairKing commented Jan 16, 2024

This means duplicating a bunch of functions, which I definitely want to avoid.

If that duplication can introduce safety that wasn't there before then I'm all for it.
You can even define one in terms of the other.

@AshleyYakeley
Copy link
Member

It introduces no more safety than an HourOfDay ADT.

@NorfairKing
Copy link
Author

@AshleyYakeley It does because not every day has 24 hours but every year has 12 months. And an HourOfDay ADT would also introduce more safety because you can prevent 2^64-60 invalid values by not using an Int.

@NorfairKing
Copy link
Author

@AshleyYakeley #134 (comment)
This is really the comment I'd like a reply to.

@jarekr-da
Copy link

jarekr-da commented Jan 17, 2024

Day is your safe, correct, parsed, validated type, while month-of-year is best understood as an attribute of that, which in most cultures/uses is a number in [1..12].

Actually, many systems (including popular Java / java.util.Date)) use [0..11] which leads to a further confusion when you work with multiple platforms. Having one int less to worry about, thanks to a proper type, is a nice idea.

@tomjaguarpaw
Copy link
Member

I'm in favour of @NorfairKing's API design. However, whether the current API should be changed to that design is a different matter. One downside is that it would be a breaking change. Another downside is simply that it goes against the preferences of the maintainer; that's an important consideration.

I don't use time in a big way myself. If I did, and I was frustrated by the API, then I would create a new library, a wrapper around time, with the API I preferred. It seems like @NorfairKing has considered this option and rejected it:

Unfortunately this is a boot library. Otherwise I'd have forked it long ago and not bothered you with this.

I don't understand why. process is also a type-unsafe boot library, and typed-process exists as a wrapper with more type safety. Why could we not similarly have typed-time? That seems like an easy option that would satisfy everyone. What am I missing?

@AshleyYakeley
Copy link
Member

It does because not every day has 24 hours but every year has 12 months.

This is not correct: every day has 24 hours. More specifically, every day has an hour-of-day in the range [0..23].

And an HourOfDay ADT would also introduce more safety because you can prevent 2^64-60 invalid values by not using an Int.

Do you even believe this? How would an HourOfDay ADT introduce any safety? All it would do is force the addition of conversion functions to/from the ADT, so no actual safety would be introduced. Exactly the same as a MonthOfYear ADT.

I think people here are being fooled by the fact that in their culture/language, there are names for the months, but not names for the hours of a day. So they think the first should be an ADT but the second should not be. But from a calendrical & software point of view, they're very similar.

@AshleyYakeley
Copy link
Member

AshleyYakeley commented Jan 17, 2024

@AshleyYakeley #134 (comment) This is really the comment I'd like a reply to.

So there are two issues here:

  1. Whether MonthOfYear should be an ADT and used in time-related functions such as fromGregorian, YearMonthDay etc.

  2. Whether MonthOfYear, DayOfMonth, DayOfYear synonyms, and January etc. patterns should exist, given that all those types are in fact Int.

(1.) is the actual issue described here. Let me repeat why this would add complication to actual practical code:

  • You have year, month, day, (y,m,d) :: (Int,Int,Int) which you already know is valid, and you need to convert that to a Day.

    Currently:
    YearMonthDay (toInteger y) m d

    With ADT:
    YearMonthDay (toInteger y) (fromJust (parseMonthOfYearIndex m)) d
    note use of partial function fromJust

  • You have year, month, day, (y,m,d) :: (Int,Int,Int) which you need to validate, converting to Maybe Day.

    Currently:
    fromGregorianValid (toInteger y) m d

    With ADT:
    parseMonthOfYearIndex m >>= \m' -> fromGregorianValid (toInteger y) m' d
    has to validate twice

And what would the safety benefit be? What defects would it prevent? All of your (highly contrived) examples of bad code have been due to issue (2.).

Regarding (2.): I feel this is a worthwhile trade-off of convenience for safety. It's true, someone might make the mistake of thinking they were ADTs if they don't read the documentation and use something like arbitrary :: Gen MonthOfYear. But (2.) is a separate issue, not the one described here.

@L0neGamer
Copy link

In the first case, there is no guarantee that your Int triple is correct, and use of a partial function shows how dangerous the current implementation is. If you can't prove that a month is a valid month, then you shouldn't be able to make it! Further, the point others have made about different systems using different indexing for months further shows how having a "valid" month can still be incorrect.

In the second case, this is very easily solved by just moving the validation for month inside that function, (which it looks like you already do in Data.Time.Calendar.MonthDay) or you could have different sorts of validation functions, where one takes a year and a valid month and an integral day and says whether that's valid, and that's wrapped up inside one that takes an integral month.

Neither of these cases are good arguments against using an ADT.

@AshleyYakeley
Copy link
Member

In the first case, there is no guarantee that your Int triple is correct,

Yes there is, you already have that guarantee as a premise to the case.

I'm giving two examples here, one where you already know it's correct (and therefore have no need to validate), and one where you don't (and therefore you need to validate).

In the second case, this is very easily solved by just moving the validation for month inside that function,

Right, we already have that with fromGregorianValid, which takes Int for month-of-year.

or you could have different sorts of validation functions

So this means duplicating all functions and patterns that work with month-of-year. As a matter of simplicity, I avoid having different types for the same concept.

@AshleyYakeley
Copy link
Member

AshleyYakeley commented Jan 17, 2024

Folks, this issue is closed. I recommend you create your own typed-time library as @tomjaguarpaw suggested if you wish to explore this space.

@haskell haskell locked as resolved and limited conversation to collaborators Jan 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants