Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributions to ProjectTemplate documentation #211

Open
MattForshaw opened this issue Nov 6, 2017 · 18 comments
Open

Contributions to ProjectTemplate documentation #211

MattForshaw opened this issue Nov 6, 2017 · 18 comments
Labels

Comments

@MattForshaw
Copy link
Contributor

Hi everyone,

Thank you for your ongoing hard work on ProjectTemplate.

At the end of the Getting Started guide is the following sentence:

In a future piece of documentation, we’ll describe some of the more advanced features that ProjectTemplate offers.

I would like to be able to contribute to ProjectTemple by building additional documentation, but before embarking on this, I wanted to check if there is a wish-list anywhere for the advanced features you would like documenting next?

Matt Forshaw,
Lecturer in Data Science, Newcastle University

@KentonWhite
Copy link
Owner

Thanks @MattForshaw for your offer. We do love having documentation contributions. There isn't a list of features that are prioritized for documentation. If you could make a list of what features you think should be added to the documentation and then we can discuss priority. Is that fair?

@rsangole
Copy link
Collaborator

@MattForshaw I use project template almost everyday. To that end, I can do my part and contribute in the documentation as well. Can follow your lead if you have a prioritized list of features that need documentation.

@Hugovdberg
Copy link
Collaborator

Hugovdberg commented Nov 27, 2017

I just updated some of the changes I had in mind for the documentation on the website. I think the following sections still need work:

  • Expand "Introduction"

    • What is ProjectTemplate
    • How does ProjectTemplate work
      • Explain convention over configuration philosophy
      • Minimal usage example
    • Basic installation, plus link to separate page
    • What is ProjectTemplate NOT
      • Integrate "Building packages"
      • Remove separate page for building packages
  • Update "Getting started":

    • Remove preference of text editor, just keep it neutral (or perhaps move it to a new section with useful tools).
    • Instruct to download the file directly into the data directory: download.file('http://projecttemplate.net/letters.csv.bz2', 'data/letters.csv.bz2') would be a lot clearer I think and prevents dependencies on operating systems.
    • Update ddply example to equivalent dplyr example
  • Update "Mastering ProjectTemplate":

    • Instruct to download the philapd.db file directly
    • Combine instructions for SQL databases with those on "Supported File Formats"
      • Move to separate page "SQL databases"
      • Make better distinction between
        • general configuration flags: type, user, password, ...;
        • database specific flags: class, classpath, dsn, ...;
        • and data selection flags table and query
  • Expand "Configuring", add documentation on missing configuration flags (not all are listed!)

  • Remove "Updating" from website altogether, as the page is no longer relevant since version 0.3.5.

  • Clarify "Supported File Formats"

    • Combine .bz2, .zip, .gz variants of extensions into the main extension (more like ".csv: CSV files that use a comma separator (supports compressed variants)", and explain which compressed variants are accepted separately)
    • Link to new page "SQL databases"
  • Further improvements:

    • Add vignettes
    • Check documentation of existing functions for typo's, unclear/ambiguous sentences.
    • Check documentation for information that should move to a vignette

These are just some possible updates to the current pages that I can think of right now, but perhaps you (as new users?) are missing something altogether. Please feel free to let me know, I can add it to this list. Also, if you think something is utter nonsense to change, then also let me know, I can just as well remove it again.

@Hugovdberg
Copy link
Collaborator

I just saw this video about creating good documentation: https://www.youtube.com/watch?v=azf6yzuJt54 We might consider to restructure our documentation that way, because it helps us to create structure within the current website.
The technical reference is kept pretty clean, so we might not need to add that on the website, although the contents are not easily browsable from the webbrowser.

@rsangole
Copy link
Collaborator

Folks, any progress on this? I recommend we schedule a skype session for us to create a quick plan of who does what, what's needed in the documentation etc.

@Hugovdberg
Copy link
Collaborator

I haven't done anything about the documentation recently, perhaps it's even easier if you just pick an item to update and mark it as done on the list. If you think you're making bigger changes that might conflict with other people's efforts then just shout out ahead of time ;-)

@KentonWhite
Copy link
Owner

We're chipping away at this slowly. Every month or so I get someone who wants to help with documentation and can point them to this list. Any help on this is greatly appreciated!

@rsangole
Copy link
Collaborator

I have adopted project-template fully for my R projects. I teach it to my team at work too. (We might fork it and make customizations specific to our application). Perhaps I can put together a vignette or blog-post to show how I use it in a real-world project.

@maikol-solis
Copy link

maikol-solis commented Feb 2, 2018

It's a great idea. For example, I'm lost with the cache function. I don't know where to invoke it in my projects.

@Hugovdberg
Copy link
Collaborator

@rsangole What kind of changes would you like to make that requires a separate fork from the main project?

@rsangole
Copy link
Collaborator

rsangole commented Feb 5, 2018

@Hugovdberg Quite a few customizations actually. I'm using this format to develop projects that might go into a more 'production' environment. So along with /src/ for the source files to call, I need additional folder structure for error & log files, intermediate calculation outputs and final outputs, plots and algorithm performance metrics. I'm standardizing these structures within my team. Furthermore, I'd like to replace all the readme markdowns with customized starter Rmd documents, which will have our logo, color scheme css etc.

@Hugovdberg
Copy link
Collaborator

@rsangole That sounds like you don't actually need to fork the project, but just need to create a custom template (using the new create.template function) ;-)

@rsangole
Copy link
Collaborator

rsangole commented Feb 5, 2018

Ah, alright. I've yet to explore that function. I'll look at it over the weekend.

@KentonWhite
Copy link
Owner

@maikol-solis Thanks for joining us. Actually your questions about the cache function would be great. There should be documentation on caching. Since we are so familiar with the project, it is hard to see what is confusion.

Could you help us by commenting on what is confusing for you and we can update the documentation there. It would be great if you could make a caching documentation issue so we can keep it in one spot.

@maikol-solis
Copy link

@KentonWhite Thanks for helping us to understand the software.

In the munge folder I process the data and create some clean data frame depending of my project. Here is my question: Where I should call the cache function in order to avoid that the scripts in the munge folder recreate the data frames when I run the load.project. How should be an example for the 01.A.R file?

#Load data
load(...)

#Preprocesing
MyDataFrame <- Some coding to process Raw data

#Is it correct?
cache(MyDataFrame)

Now, If in the src folder I made some analysis, Could I call the cache function to save results?

If I want to save the analysis results should be saved in the data folder or where?

Thanks for the help.

@Hugovdberg
Copy link
Collaborator

@maikol-solis it appears your question wasn't answered yet. Data from the data is loaded automatically based on the file extension (unless you have less common file type), there should be no need to load it manually in the munge scripts. If you have the option cache_loaded_data enabled the files are cached automatically.
If you have expensive munge scripts you might want to cache the results manually by calling cache. You might then also want to build in a guard in the munge script to prevent it from running if the result was loaded from the cache.

Usually you call load.project from a file in the src directory, after which you can do the analysis based on the preprocessed data. If you want to store results the graphs directory is created by default in the full template. If you want you could also create a directory output to store other output. (which I personally do in a custom template).

@maikol-solis
Copy link

@Hugovdberg Thank you very much for the information. I was confused about how to use the cache function. One more thing: how this function is aware of changes in data? I mean, if I re-run the munge scripts and re-create another clean data, do I have to call again the cache function or is it aware of the change?

@Hugovdberg
Copy link
Collaborator

The cache function only writes to cache if the data in memory has changed from the data in the cache. At the moment the variable is always read from the cache, even if the original data file was changed. In that case you need to clear the variable from the cache and reload manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants