uvastatlab
diff --git a/‎complex_reshaping_example.html‎
Lines changed: 3412 additions & 0 deletions b/‎complex_reshaping_example.html‎
Lines changed: 3412 additions & 0 deletions
diff --git a/‎complex_reshaping_example.qmd‎
Lines changed: 114 additions & 0 deletions b/‎complex_reshaping_example.qmd‎
Lines changed: 114 additions & 0 deletions
@@ -0,0 +1,114 @@
+---
+title: "Complicated reshaping example"
+author: "Clay Ford"
+date: "February 15, 2024"
+format: 
+  html:
+    embed-resources: true
+---
+
+## The task
+
+The data set in question had six different sets of variables that needed to be reshaped into "long" format.  Each set was collected over five waves.
+
+```{r echo=FALSE}
+library(haven)
+d <- read_sav("SupportandERI.sav")
+# reorder columns
+vars <- c("MEI_affirm", "w2MEI_affirm", "w3MEI_affirm", "w4MEI_affirm", 
+          "w5MEI_affirm", "MEI_explor", "w2MEI_explor", "w3MEI_explor",
+          "w4MEI_explor", "w5MEI_explor")
+d <- d[,c(names(d)[1:6], vars, names(d)[17:36])]
+names(d)[c(7, 12)] <- c("w1MEI_affirm", "w1MEI_explor")
+```
+
+For example, here's one set:
+
+```{r}
+d[1:2,7:11]
+```
+
+There are five more sets like that in the data. For each I needed to reshape such that there was a column for time and a column identifying the type of measure. Something like this...
+
+```
+time    affirm
+1       3
+2       2
+3       2.33
+4       3
+5       3
+1       4.33
+2       4
+...
+```
+
+## The original approach
+
+My first attempt was pretty verbose but it worked. You can tell my mastery of `pivot_longer()` did not exceed much beyond the basics. Yes, I knew how to use the `names_transform` argument to extract the numbers from the column labels, but that was about it.
+
+```{r}
+library(tidyr)
+
+# reshape affirm
+d1 <- pivot_longer(d[,1:11], -c(1:6), names_to = "time",
+                   values_to = "affirm", 
+                   names_transform = readr::parse_number)
+
+# reshape explor
+d2 <- pivot_longer(d[,c(1, 12:16)], -1, names_to = "time",
+                   values_to = "explor", 
+                   names_transform = readr::parse_number)
+
+
+# reshape appSUM
+d3 <- pivot_longer(d[,c(1, 17:21)], -1, names_to = "time",
+                   values_to = "appSUM", 
+                   names_transform = readr::parse_number)
+
+# reshape emoSUM
+d4 <- pivot_longer(d[,c(1, 22:26)], -1, names_to = "time",
+                   values_to = "emoSUM", 
+                   names_transform = readr::parse_number)
+
+# reshape infoSUM
+d5 <- pivot_longer(d[,c(1, 27:31)], -1, names_to = "time",
+                   values_to = "infoSUM", 
+                   names_transform = readr::parse_number)
+
+# reshape racematch
+d6 <- pivot_longer(d[,c(1, 32:36)], -1, names_to = "time",
+                   values_to = "racematch", 
+                   names_transform = readr::parse_number)
+
+# merge reshaped dataframes into one
+d_long1 <- merge(d1, d2, by = c("P_ID", "time")) |>
+  merge(d3, by = c("P_ID", "time")) |> 
+  merge(d4, by = c("P_ID", "time")) |> 
+  merge(d5, by = c("P_ID", "time")) |> 
+  merge(d6, by = c("P_ID", "time"))
+
+head(d_long1, n = 5)
+```
+
+This did the trick and I was able to move on, but it didn't sit right with me. I thought I remembered Jacob demonstrating in [his data wrangling workshop](https://virginia.box.com/s/u3ojzf0c0xuxe13yohk8noi4di8zb4ri) how to reshape datasets similar to this using just _one call_ to `pivot_longer()`. So today I decided to revisit this code and see what I could do with it.
+
+## The elegant approach
+
+After some fiddling around with the `pivot_longer()` function and reading the {tidyr} "Pivoting" vignette, I was finally able to implement this in a much more elegant fashion. (I should have referred to Jacob's workshop materials, but I didn't have them handy and decided to see if I could figure it out on my own. Of course, once I did it, I then had to go download his materials and check if this approach is what he demonstrated, and it is!)
+
+```{r}
+d_long2 <- pivot_longer(d, cols = !c(1:6), 
+                       names_to = c("time", ".value"), 
+                       names_pattern = "(w[1-5])(.+)",
+                       names_transform = list(time = readr::parse_number))
+head(d_long2, n = 5)
+```
+
+This is for my benefit so I remember how this works:
+
+- `cols = !c(1:6)` says to reshape all columns but the first six
+- `names_to = c("time", ".value")` uses the keyword `.value` which "indicates that the corresponding component of the column name defines the name of the output column containing the cell values" (from the help page).
+- `names_pattern = "(w[1-5])(.+)"` defines the pattern of the column names: "the letter w followed by a number 1-5, `(w[1-5])`, and then everything else, `(.+)`.
+- `names_transform = list(time = readr::parse_number)` says to transform the time column to a number using the `parse_number` function. 
+
+In the grand scheme, I'm not sure this makes a difference. Most people don't care about data wrangling but rather the analysis. Although my first attempt was inefficient, it was easy for me to verify that each set of columns was successfully reshaped. I don't know if I'll ever get to a place where it becomes routine for me to reshape data like this casually using one call to `pivot_longer()`. But I thought it might be of interest to see what `pivot_longer()` is capable of.