Skip to content

Commit

Permalink
1208 data warning message
Browse files Browse the repository at this point in the history
  • Loading branch information
mvanrongen committed Aug 12, 2024
1 parent 4722a62 commit 6650e02
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 2 deletions.
5 changes: 3 additions & 2 deletions _freeze/materials/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"hash": "81e5aa7cbb6e8c68398a3b970a16532a",
"hash": "3eefa49f221f833ccfad6601268ec8fa",
"result": {
"markdown": "---\ntitle: \"Data\"\nsubtitle: Detailed course materials can be found in this section, including exercises to practice. If you are a self-learner, make sure to check the [setup page](setup.qmd).\n---\n\n::: {.cell}\n\n:::\n\n\n## Data {#index-datasets}\nThe data we will be using throughout all the sessions are contained in a single ZIP file. They are all small CSV files (comma separated values). You can download the data below:\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n<a href=\"https://github.com/cambiotraining/corestats/raw/main/data_CS.zip\">\n<button class=\"btn btn-primary\"><i class=\"fa fa-save\"></i> Download ZIP file</button>\n</a>\n```\n:::\n:::\n\n\n## Tidy data\nFor two samples the data can be stored in one of three formats:\n\n1.\tas two separate vectors,\n2.\tin a stacked data frame,\n3.\tor in an unstacked data frame/list.\n\nTwo separate vectors case is (hopefully) obvious.\n\nWhen using a data frame we have different options to organise our data. The best way of formatting data is by using [the tidy data format](https://r4ds.had.co.nz/tidy-data.html).\n\n:::highlight\nTidy data has the following properties:\n\n- Each variable has its own column\n- Each observation has its own row\n- Each value has its own cell\n:::\n\nStacked form (or [long format data](https://tidyr.tidyverse.org/reference/pivot_longer.html)) is where the data is arranged in such a way that each variable (thing that we measured) has its own column. If we consider a dataset containing meerkat weights (in g) from two different countries then a stacked format of the data would look like:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 6 × 2\n country weight\n <chr> <dbl>\n1 Botswana 514\n2 Botswana 568\n3 Botswana 519\n4 Uganda 624\n5 Uganda 662\n6 Uganda 633\n```\n:::\n:::\n\n\nIn the unstacked (or [wide format](https://tidyr.tidyverse.org/reference/pivot_wider.html)) form a variable (measured thing) is present in more than one column. For example, let's say we measured meerkat weight in two countries over a period of years. We could then organise our data in such a way that for each year the measured values are split by country:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 3 × 3\n year Botswana Uganda\n <dbl> <dbl> <dbl>\n1 1990 514 624\n2 1992 568 662\n3 1995 519 633\n```\n:::\n:::\n\n\nHaving tidy data is the easiest way of doing analyses in programming languages and I would strongly encourage you all to start adopting this format as standard for data collection and processing.\n\n## Conditional operators\n\nTo set filtering conditions, use the following *relational operators*:\n\n- `>` is greater than\n- `>=` is greater than or equal to\n- `<` is less than\n- `<=` is less than or equal to\n- `==` is equal to\n- `!=` is different from\n- `%in%` is contained in\n\nTo combine conditions, use the following *logical operators*:\n\n- `&` AND\n- `|` OR\n",
"engine": "knitr",
"markdown": "---\ntitle: \"Data\"\nsubtitle: Detailed course materials can be found in this section, including exercises to practice. If you are a self-learner, make sure to check the [setup page](setup.qmd).\n---\n\n::: {.cell}\n\n:::\n\n\n## Data {#index-datasets}\nThe data we will be using throughout all the sessions are contained in a single ZIP file. They are all small CSV files (comma separated values). You can download the data below:\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<a href=\"https://github.com/cambiotraining/corestats/raw/main/data_CS.zip\">\n<button class=\"btn btn-primary\"><i class=\"fa fa-save\"></i> Download ZIP file</button>\n</a>\n```\n\n:::\n:::\n\n\n::: {.callout-warning}\nThe data we use throughout the course is varied, covering many different topics.\nIn some cases the data on medical or socioeconomic topics may be uncomfortable\nto some, since they can touch on diseases or death.\n\nAll the data are chosen for their pedagogical effectiveness.\n:::\n\n## Tidy data\nFor two samples the data can be stored in one of three formats:\n\n1.\tas two separate vectors,\n2.\tin a stacked data frame,\n3.\tor in an unstacked data frame/list.\n\nTwo separate vectors case is (hopefully) obvious.\n\nWhen using a data frame we have different options to organise our data. The best way of formatting data is by using [the tidy data format](https://r4ds.had.co.nz/tidy-data.html).\n\n:::highlight\nTidy data has the following properties:\n\n- Each variable has its own column\n- Each observation has its own row\n- Each value has its own cell\n:::\n\nStacked form (or [long format data](https://tidyr.tidyverse.org/reference/pivot_longer.html)) is where the data is arranged in such a way that each variable (thing that we measured) has its own column. If we consider a dataset containing meerkat weights (in g) from two different countries then a stacked format of the data would look like:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 2\n country weight\n <chr> <dbl>\n1 Botswana 514\n2 Botswana 568\n3 Botswana 519\n4 Uganda 624\n5 Uganda 662\n6 Uganda 633\n```\n\n\n:::\n:::\n\n\nIn the unstacked (or [wide format](https://tidyr.tidyverse.org/reference/pivot_wider.html)) form a variable (measured thing) is present in more than one column. For example, let's say we measured meerkat weight in two countries over a period of years. We could then organise our data in such a way that for each year the measured values are split by country:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 3 × 3\n year Botswana Uganda\n <dbl> <dbl> <dbl>\n1 1990 514 624\n2 1992 568 662\n3 1995 519 633\n```\n\n\n:::\n:::\n\n\nHaving tidy data is the easiest way of doing analyses in programming languages and I would strongly encourage you all to start adopting this format as standard for data collection and processing.\n\n## Conditional operators\n\nTo set filtering conditions, use the following *relational operators*:\n\n- `>` is greater than\n- `>=` is greater than or equal to\n- `<` is less than\n- `<=` is less than or equal to\n- `==` is equal to\n- `!=` is different from\n- `%in%` is contained in\n\nTo combine conditions, use the following *logical operators*:\n\n- `&` AND\n- `|` OR\n",
"supporting": [
"materials_files"
],
Expand Down
8 changes: 8 additions & 0 deletions materials.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ download_link(
)
```

::: {.callout-warning}
The data we use throughout the course is varied, covering many different topics.
In some cases the data on medical or socioeconomic topics may be uncomfortable
to some, since they can touch on diseases or death.

All the data are chosen for their pedagogical effectiveness.
:::

## Tidy data
For two samples the data can be stored in one of three formats:

Expand Down

0 comments on commit 6650e02

Please sign in to comment.