class: center, middle, inverse, title-slide # Laboratorio Bio-demografico ## Lezione 1 ### Nicola Barban
Alma Mater Studiorum Università di Bologna
Dipartimento di Scienze Statistiche ### 16 Febbraio 2021
--- # Outline 1. Perchè visualizzare i dati? 2. Ugly, bad and wrong 3. Data matters 4. Alcuni tipi di grafici --- # Data visualization **If you are a scientist, an analyst, a consultant, or anybody else who has to prepare technical documents or reports, one of the most important skills you need to have is the ability to make compelling data visualizations, generally in the form of figures.** (Claus O. Wilke) [link](https://clauswilke.com/dataviz/) * Un grafico deve descrivere i dati in maniera accurata. (Se una quantità è il doppio di un'altra, ma sembrano lo stesso numero in un grafico, il grafico è sbagliato) * Un grafico non deve essere brutto * Ogni visualizzazione deve essere adattata alla vostra audience * "Keep it simple" * Prestare attenzione a titolo, e a descrivere in modo appropriato le quantità descritte nelle vostre figure --- # Know your audience * A chi devi presentare il tuo lavoro? + altri statistici (pubblico specializzato) + policy makers + stampa * Che tipo di presentazione + (scritta) giornale scientifico, report ecc. + orale + internet (interattiva?) * Cosa vuoi comunicare? --- # Ugly, bad, and wrong figures .pull-left[ a) A bar plot showing three values (A = 3, B = 5, and C = 4). b) An ugly version of part (a). c) A bad version of part (a). Each bar is shown with its own y-axis scale. Because the scales don’t align, this makes the figure misleading. d) A wrong version of part (a). Without an explicit y axis scale, the numbers representedby the bars cannot be ascertained. The bars appear to be of lengths 1, 3, and 2, even though the values displayed are meant to be 3, 5, and 4. *(Claus Wilke, Foundamentals of Data Visualization)* ] .pull-right[ <img src="images/ugly-bad-wrong-examples-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Bad visualizations * https://badvisualisations.tumblr.com * https://analythical.com/blog/examples-of-awful-data-visualization * https://cs.colby.edu/courses/S14/cs251/goodbad.php ---  ---  ---  ---  ---  ---  ---  --- # Good vizualizations  --- [How John Burn-Murdoch’s Influential Dataviz Helped The World Understand Coronavirus](https://medium.com/nightingale/how-john-burn-murdochs-influential-dataviz-helped-the-world-understand-coronavirus-6cb4a09795ae) --- # Scoprire anomalie nei dati il *Quartetto di Anscombe* comprende quattro dataset per cui le più comuni statistiche descrittive risultano praticamente identiche, ma che una volta riprodotti su un grafico assumono un aspetto molto diverso tra loro. <div class="figure" style="text-align: center"> <img src="lezione1_files/figure-html/anscombe_fig-1.png" alt="Anscombe's quartet" width="504" /> <p class="caption">Anscombe's quartet</p> </div> --- # A real example .pull-left[ * Hewitt (1977) in un paper pubblicato in *American Sociological Review* mostra una associazione significativa tra *voter turnout and income inequality* basandosi su un'analisi quantitativa di 18 paesi. ] .pull-right[ <img src="images/ch-01-jackman-outlier.png" width="100%" style="display: block; margin: auto;" /> ] --- # A real example .pull-left[ * Hewitt (1977) in un paper pubblicato in *American Sociological Review* mostra una associazione significativa tra *voter turnout and income inequality* basandosi su un'analisi quantitativa di 18 paesi. * Jackman (1980) mostra con uno scatterplot che la relazione dipende **esclusivamente** dall'inclusione del Sud Africa nel campione ] .pull-right[ <img src="images/ch-01-jackman-outlier.png" width="100%" style="display: block; margin: auto;" /> ] --- # Simulation of data mechanisms <img src="lezione1_files/figure-html/janhove_fig-1.png" width="504" style="display: block; margin: auto;" /> --- # Data dinosaurus  --- # Know your data | Type of variable | Examples | Appropriate scale | Description | | |-----------------------------------|----------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---| | quantitative/numerical continuous | 1.3, 5.7, 83, 1.5x10-2 | continuous | Arbitrary numerical values. These can be integers, rational numbers, or real numbers. | | | quantitative/numerical discrete | 1, 2, 3, 4 | discrete | Numbers in discrete units. These are most commonly but not necessarily integers. For example, the numbers 0.5, 1.0, 1.5 could also be treated as discrete if intermediate values cannot exist in the given dataset. | | | qualitative/categorical unordered | dog, cat, fish | discrete | Categories without order. These are discrete and unique categories that have no inherent order. These variables are also called factors. | | | qualitative/categorical ordered | good, fair, poor | discrete | Categories with order. These are discrete and unique categories with an order. For example, “fair” always lies between “good” and “poor”. These variables are also called ordered factors. | | | date or time | Jan. 5 2018, 8:03am | continuous or discrete | Specific days and/or times. Also generic dates, such as July 4 or Dec. 25 (without year). | | | text | The quick brown fox jumps over the lazy dog. | none, or discrete | Free-form text. Can be treated as categorical if needed. | | --- “How Stable Are Democracies? ‘Warning Signs Are Flashing Red’” (Taub, 2016 *New York Times*). {width=80%} --- # Original question in the survey How important is to live in a democracy?: (1) "Not important at all"---> (10) "Absolutely important" * **Il grafico del NYT mostra solo le differenze per età della proporzione di persone che ha dato punteggio 10**  --- # Elementi grafici  --- # Quantità  --- # Quantità (multiple)  --- # Distribuzioni  --- # Distribuzioni (multiple)  --- # Proporzioni  --- # Proporzioni (multiple)  --- # Bivariate (scatterplots)  --- # Bivariate (linee)  --- # Bivariate (aree)  --- # Cartogrammi  --- # Error bars  --- # Intervalli di confidenza  --- # Domande?