diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index 143ed62f..3cb2da26 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -3,6 +3,8 @@ on: push: branches: - master + schedule: + - cron: '0 1 * * *' # run at during the 1st hour of each day jobs: build-and-deploy: runs-on: ubuntu-18.04 diff --git a/README.md b/README.md index 9d9cec67..df0475cc 100644 --- a/README.md +++ b/README.md @@ -13,16 +13,16 @@ situation for their local environment. The ultimate goal is to influence individual behavior, to decrease the spread. The goal is to reach the general public, not experts familiar with graphs -and numbers. For this reason, effort is put on simplifying the +and numbers. For this reason, we put a lot of effort on simplifying the visualization and putting it along simple text. The predictions and the associated text should be trustworthy, hence be solid and sober, rather than fancy and dramatic. -## Well thougt-out visualization on COVID-19 +## Well thought-out visualization on COVID-19 COVID-19 is a serious issue and our visualization and data analysis needs -to be thought through serious. The following is a good read: +to be approached in a thoughtful, serious manner. The following would be a good read:
https://medium.com/nightingale/ten-considerations-before-you-create-another-chart-about-covid-19-27d3bd691be8 # Development workflow @@ -70,3 +70,6 @@ The Makefile Care is taken to have a static page, to be able to handle the load with many visits. + +An automatic schedule job is launched each day at 1:00 am (UTC) to build the +website and update with the latest available data. diff --git a/make_figures.py b/make_figures.py index 088ae47f..663db9ac 100644 --- a/make_figures.py +++ b/make_figures.py @@ -45,7 +45,13 @@ def make_map(df, df_fatalities, df_recovered): df_recovered['value']], color_continuous_scale='Plasma_r', labels={'color': 'Active
cases
per
Million'}) + + fig.update_geos(lataxis_range=[-80, 90], + lonaxis_range=[-165, 180] + ) fig.update_layout(title='Click on map to add/remove a country', + yaxis=dict(scaleanchor='x', + scaleratio=10), coloraxis_colorbar_tickprefix='1.e', coloraxis_colorbar_len=0.6, coloraxis_colorbar_title_font_size=LABEL_FONT_SIZE, @@ -144,12 +150,18 @@ def make_timeplot(df_measure, df_prediction): method="update", ), dict( - args=["yaxis", {'type':'log'}], + args=[{'yaxis': {'type':'log'}, + "legend": {'x':0.65, 'y':0.1, + "font":{"size":18}, + }}], label="log", method="relayout", ), dict( - args=["yaxis", {'type':'linear'}], + args=[{'yaxis': {'type':'linear'}, + "legend": {'x':0.05, 'y':0.8, + "font":{"size":18}, + }}], label="lin", method="relayout", ), @@ -171,8 +183,27 @@ def make_timeplot(df_measure, df_prediction): # The legend position + font size # See https://plot.ly/python/legend/#style-legend legend=dict(x=.05, y=.8, font_size=LABEL_FONT_SIZE, - title="Active cases in"), -) + ) + ) + fig.add_annotation( + x=0.1, + y=0.95, + xref='paper', + yref='paper', + showarrow=False, + font_size=LABEL_FONT_SIZE, + text="Active cases") + fig.add_annotation( + x=1, + y=-0.13, + xref='paper', + yref='paper', + showarrow=False, + font_size=LABEL_FONT_SIZE - 6, + font_color="DarkSlateGray", + text="Drag handles below to change time window", + align="right") + return fig diff --git a/text_block.md b/text_block.md index 6ec67788..3effd5c2 100644 --- a/text_block.md +++ b/text_block.md @@ -2,23 +2,23 @@ These visualizations give predictions about the future number of active COVID-19 cases. The predictions are based on extrapolating the growth observed in a given country over the last two weeks. -These predictions are only short-term extrapolation: predicting the future is hard, and epidemic dynamics will change with changes in public health measures, social interaction patterns, or even weather. Please also keep in mind that each data point in these visualizations represents a person who has suffered or lost their life to this disease. +These predictions are only short-term extrapolation: predicting the future is hard, and epidemic dynamics will change with changes in public health measures, social interaction patterns, or even weather. Please also keep in mind that each data point in these visualizations represents a person who has suffered or lost their lives to this disease. ## Understanding exponential growth -In their early stages, outbreaks display *exponential growth*: the number of cases grows as a multiple of itself. Let's say that Patient Zero infects two people, and then each of those infects two more people, and so on. The number of infected people will grow by a larger amount each day -- two on the first day, four on the second day, eight on the third day, and so on. This is what we call exponential growth, because the number of cases on each day is some number raised to the power of the number of days. +In their early stages, outbreaks display *exponential growth*: the number of cases grows as a multiple of itself. Let's say that Patient Zero infects two people, and then each of those infects two more people, and so on. The number of infected people will grow by a larger amount each day -- two on the first day, four on the second day, eight on the third day, and so on. This is what we call exponential growth because the number of cases on each day is some number raised to the power of the number of days. For a deeper explanation of how exponential growth relates to epidemics, see [this video](https://www.youtube.com/watch?v=Kas0tIxDvrg). ### The growth rate is not only a property of the virus -The local growth of an outbreaks is related to how likely one infected individual is to transmit the disease to another person. It is related to properties of the virus (such as how long it can stay on a surface), but also to how much people interact with each other, and public health measures such hand washing. +The local growth of an outbreaks is related to how likely one infected individual is to transmit the disease to another person. It is related to properties of the virus (such as how long it can stay on a surface), but also to how much people interact with each other, and public health measures such as hand washing. ### Plotting in log scale The plot of cases over time includes two different options: The linear plot shows the actual count of cases, while the log plot shows the *logarithm* of the number of cases - which is basically the number of times one has to multiply the number 10 in order to get the number of cases. This logarithm view has a direct relationship with the exponential growth of the epidemic: in such a view, an exponential growth appears as a straight line. You can think of the logarithm as the opposite of the exponential. -In addition, the log plot lets us more easily see the relationships between trends over time when the actual numbers are very different. Because the logarithm increasingly compresses large numbers, it makes it easier to see whether the rate of increase is similar between two countries, even when one has many more cases than the other. +Besides, the log plot lets us more easily see the relationships between trends over time when the actual numbers are very different. Because the logarithm increasingly compresses large numbers, it makes it easier to see whether the rate of increase is similar between two countries, even when one has many more cases than the other. # Where do the data come from? @@ -36,7 +36,7 @@ Those who want to know more details about how the estimates are computed can fin ## How can you be sure that the forecast is accurate? -We cannot. We are simply using the data to project further growth. However, you can see that the model has done well at predicting the growth rate over the last two weeks. The model should be relatively accurate for the next few days, but becomes less accurate for farther-out days. +We cannot. We are simply using the data to project further growth. However, you can see that the model has done well at predicting the growth rate over the last two weeks. The model should be relatively accurate for the next few days but becomes less accurate for farther-out days. # What are the potential biases in the data? @@ -44,14 +44,18 @@ We cannot. We are simply using the data to project further growth. However, you Accurate measurements of health across populations are difficult. There are many sources of bias in the data. ## Reporting biases -Perhaps the greatest bias is that cases can only be counted if they seek out medical care or are tested. COVID-19 appears to cause mild or no symptoms in a sizeable proportion of people, which means that the reported counts underestimate the true total number of infected persons. This could also cause biases between countries --- for example, if people are told to stay home unless their disease worsens, then fewer cases will be detected than if people are told to seek medical care for mild symptoms and receive testing for the virus. In addition, some countries test systematically many individuals, while other countries only test individuals with severe symptoms. This testing strategy, as well as well as the diagnostic criteria, may vary across time in a given country. +Perhaps the greatest bias is that cases can only be counted if they seek out medical care or are tested. COVID-19 appears to cause mild or no symptoms in a sizeable proportion of people, which means that the reported counts underestimate the true total number of infected persons. This could also cause biases between countries --- for example, if people are told to stay home unless their disease worsens, then fewer cases will be detected than if people are told to seek medical care for mild symptoms and receive testing for the virus. Also, some countries test systematically many individuals, while other countries only test individuals with severe symptoms. This testing strategy, as well as the diagnostic criteria, may vary across time in a given country. ## Test accuracy -A perfect diagnostic test would provide a positive result for every infected person, and a negative result for every non-infected person. Unfortunately, it is almost impossible to create such a perfect test, so all diagnostic tests will result in some errors. These can either be *false positive* errors (that is, saying that someone is infected when they are not), or a *false negative* error (saying that a person is not infected when they actually are). For example, the commonly used rapid tests for flu viruses have false negative rates of 30-70% and false positive rates of about 10%. We don't yet know the error rates for the various testing methods in use for SARS-CoV-2, but we have already seen that the that test intially developed by the US Centers for Disease Control [had high rates of false positive results](https://www.propublica.org/article/cdc-coronavirus-covid-19-test). +A perfect diagnostic test would provide a positive result for every infected person, and a negative result for every non-infected person. Unfortunately, it is almost impossible to create such a perfect test, so all diagnostic tests will result in some errors. These can either be *false positive* errors (that is, saying that someone is infected when they are not), or a *false negative* error (saying that a person is not infected when they actually are). For example, the commonly used rapid tests for flu viruses have false negative rates of 30-70% and false positive rates of about 10%. We don't yet know the error rates for the various testing methods in use for SARS-CoV-2, but we have already seen that the that test initially developed by the US Centers for Disease Control [had high rates of false positive results](https://www.propublica.org/article/cdc-coronavirus-covid-19-test). ## Population differences -There are differences between populations within and across countries that could affect the spread of the disease. For example, the prevalence of chronic lung diseases (which increase the risk of severe COVID-19 infection) [vary between countries and between urban and rural environments](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4693508). Differences in population density and in local customs (such as hand-shaking or face-kissing greetings) could also affect the rates of disease transmission between different countries. In addition, the age distribution varies across countries, and as a consequence a larger fraction of the population is at risk in certain countries compared to others. +There are differences between populations within and across countries that could affect the spread of the disease. For example, the prevalence of chronic lung diseases (which increase the risk of severe COVID-19 infection) [vary between countries and between urban and rural environments](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4693508). Differences in population density and in local customs (such as hand-shaking or face-kissing greetings) could also affect the rates of disease transmission between different countries. In addition, the age distribution varies across countries, and as a consequence, a larger fraction of the population is at risk in certain countries compared to others. + +# More detailed data + +An even more detailed visualization of the Coronavirus situation can be found here: [Coronavirus Disease (COVID-19) – Statistics and Research](https://ourworldindata.org/coronavirus)