-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/time series dataframe #431
Feature/time series dataframe #431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave in-line comments for what to review. Have all CI passing before requesting for review. (You can turn on Github Actions in your fork. It is also productive to run linter locally.)
- For a simple change like this, please use a single commit. Squash properly.
-
timeseries_dataframe.py:L31
: Use relative import insidemodmesh
. -
timeseries_dataframe.py:L39
: Move simple prototype to a test file. -
timeseries_dataframe.py:L50
: Do not use type annotation in Python.
modmesh/timeseries_dataframe.py
Outdated
import os | ||
import numpy as np | ||
|
||
from modmesh import SimpleArrayUint64, SimpleArrayFloat64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When inside modmesh
, use relative import. You need the .core
module for the relative import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit contains this modification
modmesh/timeseries_dataframe.py
Outdated
] | ||
|
||
|
||
class TimeSeriesDataFrame(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the very early stage prototype code, put it in unit tests. It also allows you to add basic tests for your prototype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test file and data have been added into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class needs to be put in the test file because it is a prototype. The prototype is not mature enough to be put in modmesh
namespace.
modmesh/timeseries_dataframe.py
Outdated
|
||
def read_from_text_file( | ||
self, | ||
txt_path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use type annotation in Python. It's bad smell. If something needs typing information, it should go to C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed and already contained in my latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spaces are removed in my latest commit. Similar change is also made to Line No. 96 in my latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have not addressed all points in the previous review. When requesting for the next review, please address all points.
When requesting for the next review, a global comment should be added and summarize the changes. Inline comments should be added to highlight what to review.
It is ideal for each review to use a commit. Sometimes a couple of commits are OK. But so far you are using 16. They are too many. Please do squash for this batch. Be kind to your reviewer.
Additional points:
- Since the time series prototype should go to test file, no change should be made in
__init__.py
.
modmesh/timeseries_dataframe.py
Outdated
] | ||
|
||
|
||
class TimeSeriesDataFrame(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class needs to be put in the test file because it is a prototype. The prototype is not mature enough to be put in modmesh
namespace.
modmesh/__init__.py
Outdated
@@ -41,6 +41,8 @@ | |||
from . import onedim # noqa: F401 | |||
from . import system # noqa: F401 | |||
from . import toggle # noqa: F401 | |||
from . import timeseries_dataframe # noqa: F401, F403 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not surpress F403.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. This line is removed since I moved prototype into test file.
modmesh/timeseries_dataframe.py
Outdated
SimpleArrayFloat64(array=nd_arr[:, i].copy()) | ||
) | ||
|
||
def __getitem__(self, column_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a convention of Pandas or any popular package to name the argument this long?
It should only be the name for column so name
alone seems to suffice. But I can be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Since in the normal practice of a DataFrame, the "[]" operator usually extracts a column, we can just use "name" instead of column_name. I will change in my next commit.
8769873
to
745ea15
Compare
tests/test_timeseries_dataframe.py
Outdated
os.path.join(self.DATADIR, "dlc_trimmed.csv") | ||
) | ||
|
||
one_column_data = tsdf['DATA_DELTA_VEL[1] '] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m wondering if the user needs to include a space when entering the column name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @j8xixo12
My design is to preserve whatever there is in the CSV. I do not want to assume what the data "should be". If there is a space in the column name in the original file, I will read as is. This is also the behavior of Pandas if you use it to read my test CSVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not user friendly to require them remember the whitespaces in text file. It is especially difficult when the whitespaces are at the end of a line. Trailing whitespaces are invisible in many editors and web browsers.
In this early prototype it's OK to keep the space in key names. But I don't think it's a good design to force keeping whitespaces. We need to think about it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yungyuc
I think we can discuss further when we have F2F time, and I still want to keep the trailing white spaces since user can access the list of columns through "columns" property. I think there could be scenario where user wants to keep the column names as is. Let's say user compose an automatic script to read the column names from another file (maybe the config of the data source instruments) and the trailing spaces is there in the beginning. Under this scenario the user may want the column name stays the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not mark conversation resolved. Doing it makes it harder to track.
Hi @yungyuc and reviewers My latest commit should have made following changes, and therefore, should have already addressed above issues.
All the CIs are passed, please help to review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, @xingularity . Additional changes are requested:
- Rebase instead of duplicate changes for CI update.
- Move the fixture contents to the test (code) file because there are not yet many lines.
.github/workflows/devbuild.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for this time. Please rebase, do not duplicate change. The former is clearer than the latter.
tests/data/dlc_trimmed.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the contents of the two short fixture csv files dlc_trimmed.csv
and dlc_trimmed_header_changed.csv
in the test file test_timeseries_dataframe.py
. They are short and distinct files are harder to read and maintain.
Distinct fixture files can be created later after the fixture (data) become larger/longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yungyuc
I found the numpy function I used to generate data from text supported StringIO. I will change my function to support streaming text input, and I will also put the CSV data into test file. I estimate this can be done on 11/3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more time to verify my new implementation, push the estimated time to 11/6.
tests/test_timeseries_dataframe.py
Outdated
os.path.join(self.DATADIR, "dlc_trimmed.csv") | ||
) | ||
|
||
one_column_data = tsdf['DATA_DELTA_VEL[1] '] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not mark conversation resolved. Doing it makes it harder to track.
modmesh/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init.py in modmesh module is restored since TimeSeriesDataFrame class has been moved to test file.
modmesh/__init__.py
Outdated
@@ -41,6 +41,8 @@ | |||
from . import onedim # noqa: F401 | |||
from . import system # noqa: F401 | |||
from . import toggle # noqa: F401 | |||
from . import timeseries_dataframe # noqa: F401, F403 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. This line is removed since I moved prototype into test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Moved test data into the test file and removed CSV files.
- Refactored read_from_text_file method to accept file path (as string), iterable of strings or StringIO.
- Shorten several variable names to make code cleaner.
- column name now strips white spaces.
- re-initiate class members when reading a new file.
- Modify all tests to read CSV using StringIO in test file.
] | ||
nd_arr = np.genfromtxt(fhd, delimiter=delimiter) | ||
|
||
self._init_members() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-initiate object members when reading a new file using same object.
2. Added test data into tests/data folder 3. Added test file for timeseries_dataframe
2. Restore __init__.py in modmesh module 3. Fixed indentation in docstring 4. Changed parameter name in the TimeSeriesDataFrame.
2. Modified the implementation of read_from_text_file to support streaming and iterable text input. 3. Shorten various variable names.
b5024ae
to
b5da7ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks much cleaner. Thank you, @xingularity .
- Please rebase to get rid of the change in the directory of
.github/workflows
.
class TimeSeriesDataFrame(object): | ||
|
||
def __init__(self): | ||
self._init_members() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code will be more readable to have member data listed in __init__()
, but keeping them in a separate function is OK for now in the prototype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's rebased. I'll merge once CI finishes running.
It looks like one of them has failed. But the all CI under devbuild have passed in my fork repo. Quiet weird. |
It's a sensitive profiling test and passed after rerun. Merged. |
This PR contains the first version of a prototype, TimeSeriesDataFrame which is to preserve and extract time series data for analysis. This feature is related to #380. Current features as following: