-
Notifications
You must be signed in to change notification settings - Fork 0
/
dl_problem_Ahmed_Salama.txt
11 lines (7 loc) · 2.44 KB
/
dl_problem_Ahmed_Salama.txt
1
2
3
4
5
6
7
8
9
To simplify the problem i will divide it into two subproblems, a model for each subproblem, the first submodel will predict the number of year(0-9 that represent year-number of the decade), and will use 5 columns as input(leap_year_condition, "decade4", "decade100", "decade400" and decade).
the second submodel will predict the day(1-31) and the input for this model will be(1- the output of the first model(year) splited into 1-century_code and century_year where century_code is the century mapped into certain values[0, 2, 4, 6] according to this website `https://artofmemory.com/blog/how-to-calculate-the-day-of-the-week/` and also the 2-year_code which is the last 2 digits of the year engineered according to the same site,3- the month column also mapped into certain values according to the exact website, 4-the week-day, and finally 5- the leap_year_condition column)
i convert days into groups where i grouped the same week-day days into the same group, as example if the day "1" is sunday, then for sure the day "8" must be sunday also, and the same for day "15" and "22" so i grouped all of them as one group and refer to this group by it's index, the same for the second group which include days[2, 9, 16, 23] and so on, the resultant groups are 10 groups covers all posibilities in all caces such that if the model produce certain group index, all numbers in that index will satisfy the same week-day for sure
- but what is the reasons that i group days into 10 groups?
1- generally the less number of choices you select from, the higher probalility to get the correct selection, if you select from 2 choices it's very easier than selecting from 1000 and so on, and 10 is less than 31, so if i train the model to select from 31 choices (days), it's harder than selecting from 10 choices(group of days)
2- every number in any group represent the same week-day for sure, so if i train the model to select from 31 days and assume that the label for a given row is the day "1" so the model will adjust it's parameters to produce the day "1" and any other number as wrong output, but that is incorrect as the day "8" or "15" or "22" is correct for the same exact row, so you will find your self train the model on partially wrong labels, but you need to train the model on the correct and exact correct labels
instead of selecting any single random number from the list, i leave the output as a group-index to refere that the group values is all the possible correct day values,