Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series data load from csv stalls if timestamps have timezone information appended #6589

Open
pmayostendorp opened this issue Nov 1, 2024 · 1 comment

Comments

@pmayostendorp
Copy link

Describe the bug
I have two separate csv files used to create "activity recognition tasks". Literally the only difference between the two is the format of the timestamps. File 1 was generated directly from time series data stored in a pandas.DataFrame object, using a pandas.DatetimeIndex for the index. It was then exported using pandas.DataFrame.to_csv(..., index=True). This will generate datetime-aware timestamps in the following format: YYYY-MM-DD hh:mm:ss.f+00:00 which looks like 2024-04-11 12:25:59.487929+00:00. File 2 is the same file, but with the timezone information scrubbed from the datetime (e.g. YYYY-MM-DD hh:mm:ss.f which looks like 2024-04-11 12:25:59.487929.

When file 1 is loaded, the data will not load and a spinner shows indefinitely:
image

The console logs several errors that are not particularly useful/diagnostic and may not even be related, mostly various "cannot read properties of undefined" errors, which leads me to think this could be masking issues identified in other bug reports like this one. Kudos to this comment which pointed me to this issue in the first place.

File 2 loads normally, so the format is obviously the culprit.

To Reproduce

  1. Create a simple time series csv file with the following format for datetimes in the first column: YYYY-MM-DD hh:mm:ss.f+00:00 which looks like 2024-04-11 12:25:59.487929+00:00
  2. Save a copy of this file, but with the timezone info removed from the datetimes. They should be in the format: YYYY-MM-DD hh:mm:ss.f which looks like 2024-04-11 12:25:59.487929.
  3. Generate tasks for both of these time series.
  4. Set up a time series activity recognition or similar annotation template.
  5. Try to annotate file 1 and observe spinner.
  6. Try to annotate file 2 and see data load. Load data, load.

Expected behavior
Data loads into template in step 5 as well.
OR

  • Documentation is exceedingly clear about accepted time series formats. E.g. "only use POSIX timestamps in UTC" or something of that nature.
  • If a time series is loaded and does not match one of the accepted datetime formats, the user receives a nudge on the frontend to correct it.

Environment (please complete the following information):

  • Chrome 130.0.6723.70 on Macos 14.6.1 (Sonoma)
  • AWS EKS deployment from latest Helm chart
  • Label Studio 1.13.1
@heidi-humansignal
Copy link
Collaborator

heidi-humansignal commented Nov 6, 2024

Hello,

Label Studio's TimeSeries tag requires that the timeFormat parameter in your labeling configuration matches the exact format of your timestamps. The presence of timezone information in the format +00:00 can cause parsing issues because Python's strptime function, which Label Studio uses internally, does not support parsing timezone offsets with a colon.

Solution:
To resolve this issue, you can adjust your timestamp format or modify the timeFormat parameter.

Option 1: Modify the Timestamp Format
Since the timezone information is causing the parsing issue, you can preprocess your CSV file to remove the timezone offset from the timestamps.
Here's how you can do it using Pandas:

import pandas as pd

# Read your original CSV with timezone info
df = pd.read_csv('file_with_timezone.csv')

# Convert 'timestamp' column to datetime and remove timezone
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')

# Save the modified CSV without timezone info
df.to_csv('file_without_timezone.csv', index=False)

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants