tutorial_pages/real-life-example.qmd

# Real-life example

This is a walk through one relatively simple simulation written to check whether the following two models would provide the same results: 

* A generalized linear model based on a contingency table of counts (Poisson distribution).  
* A generalized linear model with one line per observation and the occurrence of the variable of interest coded as 'Yes' or 'No' (binomial distribution).  

I created this code while preparing my preregistration for a simple behavioural ecology experiment about methods for independently manipulating palatability and colour in small insect prey ([article](https://doi.org/10.1371/journal.pone.0231205), [OSF preregistration](https://osf.io/f8uk9?view_only=3943e7bb9c5f4effbf119ca5b062fe80)).  

The R script screenshot below, `glm_Freq_vs_YN.R`, can be found in the folder [Ihle2020](https://github.com/lmu-osc/Introduction-Simulations-in-R/tree/main/Ihle2020).  

This walkthrough will use the steps as defined on the page '[General structure](./general-structure.qmd)'.


1. **Define sample sizes** (within a dataset and number of repetitions), **experimental design** (fixed dataset structure, e.g. treatment groups, factors), and **parameters** that will need to vary (here, the strength of the effect).  

    <img src="../assets/define.png" width="1000">  
    <br/>

2. **Generate data** (here, using `sample()` and the probabilities defined in step 1) and format it in two different ways to accommodate the two statistical tests to be compared. 

    <img src="../assets/generate.png" width="1000">  
    <br/>

3. **Run the statistical test and save the parameter estimate of interest for that iteration**. Here, this is done for both statistical tests to be compared.    

    <img src="../assets/test.png" width="1000">  
    <br/>


4. **Repeat** steps 2 (data simulation) and 3 (data analyses) to get the distribution of the parameter estimates by wrapping these steps in a function.  

    Definition of the function at the beginning: 
    <br/>
    <img src="../assets/replicate1.png" width="800">    
    <br/>
    Output returned from the function at the end: 
    <br/>
    <img src="../assets/replicate2.png" width="1000">  
    <br/>
    Repeat the function `nrep` times. Here, `pbreplicate()` is used to provide a bar of progress for R to run this command. 
    <br/>
    <img src="../assets/replicate3.png" width="1000">  
    <br/>

5. **Explore the parameter space**. Here, vary the probabilities of sampling between 0 and 1 depending on the treatment group category.

    <img src="../assets/explore.png" width="1000">  
    <br/>

6. **Analyse and interpret the combined results of many simulation repetitions**. In this case, the results of the two models were qualitatively the same (comparison of results for a few different parameter values), and both models gave the same expected 5% false positive results when no effect was simulated. Varying the effect (the probability of sampling 0 or 1 depending on the experimental treatment) allowed us to find the minimum effect size for which the number of positive results of the tests is over 80%. 

    <img src="../assets/conclude.png" width="1000">  
    <br/>


***