diff --git a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb index 11852458..f3ebad6a 100644 --- a/02_activities/assignments/a1_sampling_and_reproducibility.ipynb +++ b/02_activities/assignments/a1_sampling_and_reproducibility.ipynb @@ -16,7 +16,12 @@ "cell_type": "markdown", "id": "4ea73db3", "metadata": {}, - "source": [] + "source": [ + "1. Sampling occurs at three stages: infection, primary tracing, and secondary tracing. \n", + "2. At the infection stage: simple random sampling was used with 100 people from 1000 attendees. The function used is [np.random.choice, replace=false]. The underlying distribution is hypergeometric.\n", + "3. At the primary tracing stage: among infected people, draw a uniform random number and compare to trace_success =0.2. The function used is np.random.rand(k), TRACE_SUCCESS. The sampling frame is the infected people and the sample size is random follows a binomial distribution. The underlying distribution is Bernoulli.\n", + "4. At the secondary tracing stage: Count tranced cases in each event (wedding or brunch). If an event type has 2 or more cases, marked all infected in that event type as traced. Function used is value_counts () in each event. The sample frame is infected individiuals within each event types and the sample size depends on how many infected people exist in an event that meets the threshold. If at least two attendees in one event were traced in primary, then all infected wedding attendees become traced. The underlying distribution is binomial." + ] }, { "cell_type": "markdown", @@ -30,7 +35,10 @@ "cell_type": "markdown", "id": "4cf5d993", "metadata": {}, - "source": [] + "source": [ + "1. Change the repetition to 10: The graph appears sparse. There is only one bar for the proportion of infection from weddings. Each execution changes drastically. \n", + "2. Change the repetiiton to 100: The graph is close to th original 1000. Compare to the repetitions to 10, the graphs are more stable accrss each exectuions. " + ] }, { "cell_type": "markdown", @@ -44,7 +52,9 @@ "cell_type": "markdown", "id": "77613cc3", "metadata": {}, - "source": [] + "source": [ + "I added a function: np.random.seed(42). This gave a fixed starting point to make sure that each execuion produce the same output. " + ] }, { "cell_type": "markdown", @@ -59,7 +69,18 @@ "execution_count": null, "id": "ab8587a0", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "# Import necessary libraries\n", "import pandas as pd\n", @@ -131,7 +152,9 @@ " return p_wedding_infections, p_wedding_traces\n", "\n", "# Run the simulation 1000 times\n", - "results = [simulate_event(m) for m in range(1000)]\n", + "np.random.seed(50)\n", + "\n", + "results = [simulate_event(m) for m in range(10)]\n", "props_df = pd.DataFrame(results, columns=[\"Infections\", \"Traces\"])\n", "\n", "# Plotting the results\n", @@ -193,7 +216,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "sampling-env", "language": "python", "name": "python3" }, @@ -207,7 +230,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.13.0" + "version": "3.11.13" } }, "nbformat": 4,