-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pandas): Flopy pandas support #1955
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1955 +/- ##
=========================================
- Coverage 72.6% 71.7% -1.0%
=========================================
Files 257 258 +1
Lines 57800 57412 -388
=========================================
- Hits 42017 41179 -838
- Misses 15783 16233 +450
|
linking back to comment on original PR, sorry again for the accidental close |
flopy/mf6/data/mfdataplist.py
Outdated
""" | ||
self._set_data(data, check_data=check_data) | ||
|
||
def set_record(self, data_record, autofill=False, check_data=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be "set_data_record" to be consistent with the variable that it is setting? Or set_control_record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the hidden and exposed method necessary for this? set_record()
is only calling _set_record()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name changed to be more consistent with method name. Changed variable name instead of method name since method name change would breaking existing interface.
Having the hidden method is not necessary. I was originally doing this to make sure the correct method (parent vs child class) got called. But it is better to just explicitly define this, which I am now doing.
flopy/mf6/data/mfdataplist.py
Outdated
""" | ||
self._set_record(data_record, autofill, check_data) | ||
|
||
def _set_record(self, data_record, autofill=False, check_data=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be _set_data_record to be consistent with the variable it is setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed variables to be consistent with method name
flopy/mf6/data/mfdataplist.py
Outdated
self._resync() | ||
try: | ||
# convert to tuple | ||
tuple_record = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why lists are being converted to tuples here. It looks like there is support for lists in .append_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if this is a single list record, could this just be tuple(record)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the conversion code and now passing list instead of tuple.
ex, | ||
) | ||
|
||
def update_record(self, record, key_index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be clearer if key_index
was kper
or stress_period
, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface is also used for packages like Time-Array Series whose "TIME" block (BEGIN TIME <tas_time>) has a key that is not a stress period. Similarly, the Observation package "CONTINUOUS" block (BEGIN CONTINUOUS FILEOUT <obs_output_file_name>) has a key that is a file name. I therefore choose the generalized name "key_index".
flopy/mf6/data/mfdataplist.py
Outdated
message = ( | ||
f"ERROR: Data list {self._data_name} supplied the " | ||
f"wrong number of columns of data, expected " | ||
f"{len(self._data_item_names)} got {len(data[0])}." | ||
) | ||
type_, value_, traceback_ = sys.exc_info() | ||
raise MFDataException( | ||
self._data_dimensions.structure.get_model(), | ||
self._data_dimensions.structure.get_package(), | ||
self._data_dimensions.structure.path, | ||
"setting list data", | ||
self._data_dimensions.structure.name, | ||
inspect.stack()[0][3], | ||
type_, | ||
value_, | ||
traceback_, | ||
message, | ||
self._simulation_data.debug, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we could either return the name of the missing column (or extra column) here to provide a more detailed error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not easily determine the name of the missing column since this code accepts Pandas Dataframes with incorrect column names (it corrects the column names below, which is trivial to do given that the Dataframe has the correct number of columns). I did however make a more detailed error message that lists the data column names supplied and expected.
flopy/mf6/mfpackage.py
Outdated
if isinstance(dataset_one, mfdataplist.MFPandasList) or isinstance( | ||
dataset_one, mfdataplist.MFPandasTransientList | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified to isinstance(dataset_one, (mfdataplist.MFPandasList, mfdataplist.MFPandasTransientList))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified
flopy/mf6/mfpackage.py
Outdated
assert isinstance( | ||
dataset, mfdataplist.MFPandasList | ||
) or isinstance(dataset, mfdataplist.MFPandasTransientList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance simplification here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified
write_headers=True, | ||
lazy_io=False, | ||
use_pandas=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does write_headers write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write_headers: bool
When true flopy writes a header to each package file indicating that
it was created by flopy.
flopy/mf6/utils/model_splitter.py
Outdated
elif isinstance( | ||
value, mfdatalist.MFTransientList | ||
) or isinstance(value, mfdataplist.MFPandasTransientList): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isinstance statement can be simplified `isinstance(value, (mfdatalist.MFTransientList, mfdataplist.MFPandasTransientList))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified
flopy/mf6/utils/model_splitter.py
Outdated
elif isinstance(value, mfdatalist.MFList) or isinstance( | ||
value, mfdataplist.MFPandasList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same isinstance comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified
This PR is the first step towards integrating Pandas into Flopy. This integration takes place in the MFPandasList and MFPandasTransientList classes (MFPandas*), which are used instead of the MFList and MFTransientList classes under the following conditions:
The MFPandas* classes currently support the same interface as MFList and MFTransientList, and should behave similarly to the end-user. However, MFPandas* stores data internally in a Pandas Dataframe and reads and writes data using Pandas “read_csv” and “to_csv” methods, which can be significantly faster than flopy’s current file reading. The MFPandas* classes set_data methods support DataFrames and their new “get_dataframe” method returns data in a Panda’s Dataframe (“get_data” still returns a recarray).
Remaining work on this PR includes: