You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think that ProjectTemplate should be a solid and reliable basis for building data analysis projects.
If a user creates a project with a set of data import functions in place, we don't want that data analysis project to break six months or a year later because the data import rules have changed. We want ProjectTemplate to be a solid basis for building projects. For many years, ProjectTemplate has provided this solid basis.
Specifically, I think that any change to data import rules should not break existing data analysis projects.
In contrast,
converting data.frames to tibbles breaks existing code
converting to tidyverse data import functions breaks existing code (see post on readcsv, tidyverse
As a general rule, data import functions have to make a wide range of choices around variable names, variable types, row names, na conversion, tibbles versus data.frames, strings/factors, use of meta-data and so on.
Thus, the starting assumption should be that whenever you change a data import function, you will break existing code. If the tests are not breaking, it's more likely that the tests are not thorough enough.
That said, several new data import functions do offer benefits. readxl removes dependencies on java, perl, etc, readr is faster than read.csv.
Possible resolutions
So, what happens if the ProjectTemplate community decides, for example, that readxl would be a better excel import function, because it does not require dependencies.
Use project version number to choose data import function. I suppose the code could have something conditional that looks at the config$version. Thus, any modification to the data import rules would involve having a condition so that the new import function only applies to projects with a later version.
Implement a function like archive.project(): This could create some kind of localised version of ProjectTemplate in a folder in the project. I'm not quite sure how this would work.
Anyway, I don't really have the solution to this tension between improving data import functions and maintaining backwards compatibility. But I just thought I'd post this to emphasise the value of backwards compatibility and stability as a counterpoint to the desire to improve data import functions.
The text was updated successfully, but these errors were encountered:
There is an option to dump the code when creating a new project but I guess it makes more sense to do that in an archive.project function. I'm wondering to what extent our current project layout is compatible with packrat. But it seems to me that archive.project could be a thin wrapper around packrat::init and packrat::snapshot (depending on whether packrat was already initialised in the project). I guess you could combine it with devtools::install_version if you need to install an older version in the packrat library, although I haven't tried this..
The major advantage of this is that we don't have to incorporate logic to simulate all different versions of ProjectTemplate, which is bound to break and it allows us to diverge from previous choices instead of going out of our way to maintain backward compatibility with all previous versions.
I think that ProjectTemplate should be a solid and reliable basis for building data analysis projects.
If a user creates a project with a set of data import functions in place, we don't want that data analysis project to break six months or a year later because the data import rules have changed. We want ProjectTemplate to be a solid basis for building projects. For many years, ProjectTemplate has provided this solid basis.
Specifically, I think that any change to data import rules should not break existing data analysis projects.
In contrast,
As a general rule, data import functions have to make a wide range of choices around variable names, variable types, row names, na conversion, tibbles versus data.frames, strings/factors, use of meta-data and so on.
Thus, the starting assumption should be that whenever you change a data import function, you will break existing code. If the tests are not breaking, it's more likely that the tests are not thorough enough.
That said, several new data import functions do offer benefits. readxl removes dependencies on java, perl, etc, readr is faster than read.csv.
Possible resolutions
So, what happens if the ProjectTemplate community decides, for example, that
readxl
would be a better excel import function, because it does not require dependencies.Use project version number to choose data import function. I suppose the code could have something conditional that looks at the
config$version
. Thus, any modification to the data import rules would involve having a condition so that the new import function only applies to projects with a later version.Implement a function like
archive.project()
: This could create some kind of localised version of ProjectTemplate in a folder in the project. I'm not quite sure how this would work.Anyway, I don't really have the solution to this tension between improving data import functions and maintaining backwards compatibility. But I just thought I'd post this to emphasise the value of backwards compatibility and stability as a counterpoint to the desire to improve data import functions.
The text was updated successfully, but these errors were encountered: