Skip to content

Commit cbe8501

Browse files
committed
ecology lesson updates
1 parent 1d6696f commit cbe8501

File tree

8 files changed

+1744
-270
lines changed

8 files changed

+1744
-270
lines changed

EcologyLesson/.DS_Store

2 KB
Binary file not shown.

EcologyLesson/landing/.DS_Store

6 KB
Binary file not shown.

EcologyLesson/landing/index.qmd

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: "Data Organization in Spreadsheets for Ecologists"
3+
---
4+
5+
Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start.
6+
7+
We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too!
8+
9+
In this lesson, you will learn:
10+
11+
* Good data entry practices - formatting data tables in spreadsheets
12+
* How to avoid common formatting mistakes
13+
* Approaches for handling dates in spreadsheets
14+
* Basic quality control and data manipulation in spreadsheets
15+
* Exporting data from spreadsheets
16+
17+
In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.
18+
19+
## Getting Started
20+
21+
Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow.
22+
**These lessons assume no prior knowledge of the skills or tools.**
23+
24+
To get started, follow the directions in the “[Setup](../setuppage/index.qmd]” tab to download data to your computer and follow any installation instructions.
25+
26+
27+
### Prerequisites
28+
29+
This lesson requires a working copy of spreadsheet software, such as Microsoft Excel or LibreOffice or OpenOffice.org (see more details in “[Setup](../setuppage/index.qmd)”).
30+
To most effectively use these materials, please make sure to install everything before working through this lesson
31+
32+
## For Instructors
33+
If you are teaching this lesson in a workshop, please see the [Instructor notes](../instructornotes/index.qmd).
34+
35+
## Schedule
36+
37+
| | [Setup](../setuppage/index.qmd) | Download files required for the lesson |
38+
| 00:00 | 1. [Introduction](../intro/index.qmd) | What are basic principles for using spreadsheets for good data organization? |
39+
| 00:18 | 2. [Formatting data tables in spreadsheets](../formattingtables/index.qmd) | How do we format data in spreadsheets for effective data use? |
40+
| 00:53 | 3. [Formatting problems](../formattingproblems/index.qmd) | What are some common challenges with formatting data in spreadsheets and how can we avoid them? |
41+
| 01:13 | 4. [Dates as data](../datesasdata/index.qmd) | What are good approaches for handling dates in spreadsheets? |
42+
| 01:26 | 5. [Quality Control](../qualitycontrol/index.qmd) | How can we carry out basic quality control and quality assurance in spreadsheets? |
43+
| 01:46 | 6. [Exporting Data](../exporting/index.qmd) | How can we export data from spreadsheets in a way that is useful for downstream applications? |
44+
| 01:56 | Finish | |
45+
46+
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
47+
48+
Licensed under [CC-BY 4.0 2018–2022](https://datacarpentry.org/spreadsheet-ecology-lesson/00-intro/index.html) by [The Carpentries](https://carpentries.org/)
49+
50+
Licensed under [CC-BY 4.0 2016–2018](https://datacarpentry.org/spreadsheet-ecology-lesson/00-intro/index.html) by [Data Carpentry](http://datacarpentry.org/)

EcologyLesson/setuppage/index.qmd

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: "Setup"
3+
subtitle: "Data Organization in Spreadsheets for Ecologists"
4+
---
5+
6+
## Data
7+
Download this data file to your computer: [https://ndownloader.figshare.com/files/2252083](https://ndownloader.figshare.com/files/2252083)
8+
9+
## About the data
10+
11+
The data for this lesson is a part of the Data Carpentry Ecology workshop. It is a teaching version of the Portal Database. The data in this lesson is a subset of the teaching version that has been intentionally ‘messed up’ for this lesson.
12+
13+
The data for this lesson and the workshop are in the [Portal Project Teaching Database](https://figshare.com/articles/Portal_Project_Teaching_Database/1314459) available on FigShare, with a CC-BY license available for reuse.
14+
15+
Ernest, M., Brown, J., Valone, T., and White, E.P. (2017). Portal Project Teaching Database. Version 6. Figshare. [DOI: 10.6084/m9.figshare.1314459.v6](https://figshare.com/articles/Portal_Project_Teaching_Database/1314459)
16+
17+
## Software
18+
19+
To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric, OpenOffice.org, or other programs. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same.
20+
21+
22+
For this lesson, if you don’t have a spreadsheet program already, you can use LibreOffice. It’s a free, open source spreadsheet program.
23+
24+
### Windows
25+
26+
* Download the Installer
27+
* Install LibreOffice by going to [the installation page](https://www.libreoffice.org/download/libreoffice-fresh/). The version for Windows should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
28+
* Install LibreOffice
29+
* Once the installer is downloaded, double click on it and LibreOffice should install.
30+
31+
### Mac OS X
32+
33+
* Download the Installer
34+
* Install LibreOffice by going to [the installation page](https://www.libreoffice.org/download/libreoffice-fresh/). The version for Mac should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
35+
* Install LibreOffice
36+
* Once the installer is downloaded, double click on it and LibreOffice should install
37+
38+
### Linux
39+
* Download the Installer
40+
* Install LibreOffice by going to [the installation page](https://www.libreoffice.org/download/libreoffice-fresh/). The version for Linux should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
41+
* Install LibreOffice
42+
* Once the installer is downloaded, double click on it and LibreOffice should install.
43+
44+
Licensed under [CC-BY 4.0 2018–2022](https://datacarpentry.org/spreadsheet-ecology-lesson/00-intro/index.html) by [The Carpentries](https://carpentries.org/)
45+
46+
Licensed under [CC-BY 4.0 2016–2018](https://datacarpentry.org/spreadsheet-ecology-lesson/00-intro/index.html) by [Data Carpentry](http://datacarpentry.org/)

docs/EcologyLesson/Intro/index.qmd

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: "Introduction"
3+
subtitle: "Data Organization in Spreadsheets for Ecologists"
4+
---
5+
6+
## Overview
7+
* **Teaching:** 15 min
8+
* **Exercises:** 3 min
9+
* **Questions**
10+
* What are basic principles for using spreadsheets for good data organization?
11+
* **Objectives**
12+
* Describe best practices for organizing data so computers can make the best use of data sets.
13+
14+
Good data organization is the foundation of your research project. Most researchers have data or do data entry in spreadsheets. Spreadsheet programs are very useful graphical interfaces for designing data tables and handling very basic data quality control functions.
15+
16+
## Spreadsheet outline
17+
18+
After this lesson, you will be able to:
19+
20+
*bImplement best practices in data table formatting
21+
* Identify and address common formatting mistakes
22+
* Understand approaches for handling dates in spreadsheets
23+
* Utilize basic quality control features and data manipulation practices
24+
* Effectively export data from spreadsheet programs
25+
Overall good data practices
26+
27+
Spreadsheets are good for data entry. Therefore we have a lot of data in spreadsheets. Much of your time as a researcher will be spent in this ‘data wrangling’ stage. It's not the most fun, but it's necessary. We'll teach you how to think about data organization and some practices for more effective data wrangling.
28+
29+
## What this lesson will not teach you
30+
31+
* How to do statistics in a spreadsheet
32+
* How to do plotting in a spreadsheet
33+
* How to write code in spreadsheet programs
34+
35+
If you're looking to do this, a good reference is [Head First Excel](https://www.amazon.com/Head-First-Excel-learners-spreadsheets/dp/0596807694/), published by O’Reilly.
36+
37+
## Why aren't we teaching data analysis in spreadsheets
38+
39+
* Data analysis in spreadsheets usually requires a lot of manual work. If you want to change a parameter or run an analysis with a new data set, you usually have to redo everything by hand. (We do know that you can create macros, but see the next point.)
40+
41+
* It is also difficult to track or reproduce statistical or plotting analyses done in spreadsheet programs when you want to go back to your work or someone asks for details of your analysis.
42+
43+
44+
## Spreadsheet programs
45+
46+
Many spreadsheet programs are available. Since most participants utilize Excel as their primary spreadsheet program, this lesson will make use of Excel examples.
47+
48+
Free spreadsheet programs that can also be used are LibreOffice Calc, and even Google Sheets.
49+
50+
Commands may differ a bit between programs, but the general idea is the same.
51+
52+
Spreadsheets encompass a lot of the things we need to be able to do as researchers. We can use them for:
53+
54+
* Data entry
55+
* Organizing data
56+
* Subsetting and sorting data
57+
* Statistics
58+
* Plotting
59+
60+
We do a lot of different operations in spreadsheets. What kind of operations do you do in spreadsheets? Which ones do you think spreadsheets are good for?
61+
62+
## Problems with Spreadsheets
63+
64+
Spreadsheets are good for data entry, but in reality we tend to use spreadsheet programs for much more than data entry. We use them to create data tables for publications, to generate summary statistics, and make figures.
65+
66+
Generating tables for publications in a spreadsheet is not optimal - often, when formatting a data table for publication, we're reporting key summary statistics in a way that is not really meant to be read as data, and often involves special formatting (merging cells, creating borders, making it pretty). Cutting and pasting from a spreadsheet to a document software (like Word) can have unpredictable results. We advise you to create tables within these document software using the document's own table editing software.
67+
68+
The latter two applications, generating statistics and figures, should be used with caution: because of the graphical, drag and drop nature of spreadsheet programs, it can be very difficult, if not impossible, to replicate your steps (much less retrace anyone else's), particularly if your stats or figures require you to do more complex calculations. Furthermore, in doing calculations in a spreadsheet, it's easy to accidentally apply a slightly different formula to multiple adjacent cells. When using a command-line based statistics program like R or SAS, it's practically impossible to apply a calculation to one observation in your data set but not another unless you're doing it on purpose.
69+
70+
## Using Spreadsheets for Data Entry and Cleaning
71+
72+
However, there are circumstances where you might want to use a spreadsheet program to produce “quick and dirty” calculations or figures, and data cleaning will help you use some of these features. Data cleaning also puts your data in a better format prior to importation into a statistical analysis program. We will show you how to use some features of spreadsheet programs to check your data quality along the way and produce preliminary summary statistics.
73+
74+
In this lesson, we will assume that you are most likely using Excel as your primary spreadsheet program - there are others (gnumeric, Calc from OpenOffice), and their functionality is similar, but Excel seems to be the program most used by biologists and ecologists.
75+
76+
In this lesson we're going to talk about:
77+
78+
1. [Formatting data tables in spreadsheets](../formattingtables/index.qmd)
79+
2. [Formatting problems](../formattingproblems/index.qmd)
80+
3. [Dates as data](../datesasdata/index.qmd)
81+
4. [Quality control](../qualitycontrol/index.qmd)
82+
5. [Exporting data](../exporting/index.qmd)
83+
84+
## Key Points
85+
* Good data organization is the foundation of any research project.
86+
87+
Licensed under [CC-BY 4.0 2018–2022](https://datacarpentry.org/spreadsheet-ecology-lesson/00-intro/index.html) by [The Carpentries](https://carpentries.org/)
88+
89+
Licensed under [CC-BY 4.0 2016–2018](https://datacarpentry.org/spreadsheet-ecology-lesson/00-intro/index.html) by [Data Carpentry](http://datacarpentry.org/)

0 commit comments

Comments
 (0)