Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date conversion dependent on local timezone #1804

Closed
lev-cx opened this issue Mar 18, 2020 · 2 comments
Closed

Date conversion dependent on local timezone #1804

lev-cx opened this issue Mar 18, 2020 · 2 comments

Comments

@lev-cx
Copy link

lev-cx commented Mar 18, 2020

This is related to a few existing issues: #1725, #1716, #1470.

The reason to writing up this one, is that I've tracked the bug to a particular location in the code, and so this issue is with that particular piece of logic. Issue manifests itself either in browser environments or in server environment running in a timezone other than UTC. It is NOT a problem if timezone is UTC (for example, serverless environments, like AWS Lambda).

It is known that Excel does not store timezone information, all dates are timezone-less. It is stored as an offset (serial number) from 1st Jan 1900 (or 1904 in some formats).

Following code is performing conversion at the moment from the serial number to JS Date object: https://github.com/SheetJS/sheetjs/blob/master/xlsx.js#L2683

var basedate = new Date(1899, 11, 30, 0, 0, 0); // 2209161600000
var dnthresh = basedate.getTime() + (new Date().getTimezoneOffset() - basedate.getTimezoneOffset()) * 60000;
function datenum(v, date1904) {
	var epoch = v.getTime();
	if(date1904) epoch -= 1462*24*60*60*1000;
	return (epoch - dnthresh) / (24 * 60 * 60 * 1000);
}
function numdate(v) {
	var out = new Date();
	out.setTime(v * 24 * 60 * 60 * 1000 + dnthresh);
	return out;
}

The problem is that JS Date is a local date in the timezone of the browser (or system timezone for Node.js). Which means that a straight timezone offset calculated in dnthresh is dependent on whether daylight saving is on or off at the time that the library is loaded (i.e. new Date().getTimezoneOffset()). This leads to unexpected dates being returned after parsing the XLSX - sometimes they are off by an hour, which could then be one day before the expected date.

Since time in a timezone with daylight savings is not serial (i.e. it jumps around daylight savings), the serial formula needs to be corrected.

Logically the fix is something like this (not taking performance into account):

function numdate(v) {
  var out = new Date();
  out.setTime(v * 24 * 60 * 60 * 1000 + dnthresh);
  // DST correction (!)
  out.setTime(out.getTime() + (out.getTimezoneOffset() - new Date().getTimezoneOffset()) * 60000);
  return out;
}

Obviously there would be some performance optimisation in the actual fix. For example, new Date() and new Date().getTimezoneOffset() should be calculated only once at the init time.

The main idea here is that applying the date serial, it is not clear whether it will fall into daylight saving according to the local timezone. That is why we have to check out.getTimezoneOffset() after setting it, and adjust accordingly.

Noting here - other solutions are also possible. We don't have to play with Unix epoch UTC offsets (i.e. getTime/setTime), and instead apply some math to the date serial to extract the actual year, month, day (logic widely available in popular date libraries). I am not sure of the performance implications here, perhaps setTime was chosen because of its speed? In any case, several approaches can be tried, and benchmarked.

@SheetJSDev
Copy link
Contributor

Same as suggested in #1333 , the current implementation is incorrect around DST

@lev-cx
Copy link
Author

lev-cx commented Mar 18, 2020

Oh wow, I honestly didn't see that one (with exact same solution)... Sorry for raising a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants