class: center, middle, inverse, title-slide # Laboratorio Bio-demografico ## Lezione 3 ### Nicola Barban
Alma Mater Studiorum Università di Bologna
Dipartimento di Scienze Statistiche ### 18 Febbraio 2021
--- # Outline 1. Visualizzare quantità con ggplot2 2. Visualizzare relazioni tra variabili 3. Opendata COVID-19. Primi passi ### Pacchetti da scaricare * `library(ggplot2)` * `library(gapminder)` * `library(forcats)` ### Cheatsheet ggplot2 https://rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf --- # US income data ```r # importing data US_income<-read.csv("US_income.csv") # listing objects ls() ``` ``` ## [1] "US_income" ``` ```r # prime 6 righe di US_income head(US_income) ``` ``` ## X GEOID name median_income median_income_moe population area ## 1 1 1 Alabama 43623 281 4830620 133958437749 ## 2 2 4 Arizona 50255 211 6641928 295232708152 ## 3 3 5 Arkansas 41371 247 2958208 137792577218 ## 4 4 6 California 61818 156 38421464 410516610493 ## 5 5 8 Colorado 60629 252 5278906 269580118211 ## 6 6 9 Connecticut 70331 409 3593222 12961831628 ## popdens ## 1 3.606059e-05 ## 2 2.249726e-05 ## 3 2.146856e-05 ## 4 9.359296e-05 ## 5 1.958196e-05 ## 6 2.772156e-04 ``` --- # il mio primo ggplot ```r # loading ggplot2 library library(ggplot2) # Basic barplot plot1 <- ggplot(data=US_income, aes(x=name, y=median_income)) + * geom_bar(stat="identity") ``` --- ```r plot1 ``` <img src="lezione3_files/figure-html/unnamed-chunk-1-1.png" width="50%" /> --- # Cambia coordinate ```r #Horizontal bar plot plot1 + coord_flip() ``` <img src="lezione3_files/figure-html/unnamed-chunk-2-1.png" width="50%" /> --- # Anatomia di un ggplot 1. load library (solo una volta) `library(ggplot2)` 2. define data `ggplot(data=US_income,` 3. define aesthetics `aes(x=name, y=median_income)` *links variables to things you will see * 4. add a layer to the graph `+geom_bar(` 5. define layer specific aesthetics `stat="identity"` *identity non calcola nessuna funzione* 6. Save into an object `plot1<-` e plot `plot1` 7. add another layer `+ coord_flip()` --- # Esercizio 1 * Riproduci lo stesso grafico per la popolazione --- # Soluzione 1 ```r library(ggplot2) # Basic barplot plot2<-ggplot(data=US_income, aes(x=name, y=population)) + geom_bar(stat="identity")+ coord_flip() ``` --- ```r plot2 ``` <img src="lezione3_files/figure-html/unnamed-chunk-3-1.png" width="50%" /> --- # Dotplot ```r # Basic barplot dotplot<-ggplot(data=US_income, aes(x=name, y=population)) + geom_point()+ coord_flip() ``` --- ```r dotplot ``` <img src="lezione3_files/figure-html/unnamed-chunk-4-1.png" width="50%" /> --- # Cambia colore ```r # Basic barplot p3<-ggplot(data=US_income, aes(x=name, y=median_income)) + * geom_bar(stat="identity", color="blue", fill="blue") + coord_flip() ``` --- ```r p3 ``` <img src="lezione3_files/figure-html/unnamed-chunk-5-1.png" width="50%" /> --- # Ggplot theme ```r # Basic barplot p4<-ggplot(data=US_income, aes(y=name, x=median_income)) + geom_bar(stat="identity", fill="blue", color="blue") + * theme_minimal() p4 ``` <img src="lezione3_files/figure-html/income-by-state-color-minimal-1.png" width="50%" /> --- # Ggplot themes https://ggplot2-book.org/polishing.html * `theme_bw()`: a variation on theme_grey() that uses a white background and thin grey grid lines. * `theme_linedraw()`: A theme with only black lines of various widths on white backgrounds, reminiscent of a line drawing. * `theme_light()`: similar to theme_linedraw( ) but with light grey lines and axes, to direct more attention towards the data. * `theme_dark()`: the dark cousin of theme_light(), with similar line sizes but a dark background. Useful to make thin coloured lines pop out. * `theme_minimal()`: A minimalistic theme with no background annotations. * `theme_classic()`: A classic-looking theme, with x and y axis lines and no gridlines. * `theme_void()` : A completely empty theme. --- # Altri themes ```r install.packages('ggthemes', dependencies = TRUE) ``` ``` ## ## The downloaded binary packages are in ## /var/folders/9l/47_t81dd0g32mw3bc16019dr0000gn/T//Rtmp0FP8NL/downloaded_packages ``` ```r library(ggthemes) plot2+ theme_economist() ``` <img src="lezione3_files/figure-html/unnamed-chunk-6-1.png" width="50%" /> --- --- # Riordina i fattori ```r *library(forcats) help(fct_reorder) # Basic barplot p5<-ggplot(data=US_income, aes(y=fct_reorder(name,median_income), x=median_income)) + geom_bar(stat="identity", fill="blue", color="blue") + theme_minimal() p5 ``` <img src="lezione3_files/figure-html/unnamed-chunk-7-1.png" width="50%" /> --- # Aggiungi etichette sugli assi ```r p6<-p5+ * xlab("Reddito mediano")+ ylab("Stato") p6 ``` <img src="lezione3_files/figure-html/unnamed-chunk-8-1.png" width="50%" /> --- # Esercizio 2 * Fai un dotplot per popolazione, punti colorati di rosso --- # Soluzione ```r p7<-ggplot(data=US_income, aes(y=fct_reorder(name,population), x=population)) + geom_point( color="red") + xlab("Population")+ ylab("Country")+ theme_minimal() p7 ``` <img src="lezione3_files/figure-html/unnamed-chunk-9-1.png" width="50%" /> --- # Aggiungi titolo ```r *p7+ggtitle("Popolazione Stati Uniti") ``` <img src="lezione3_files/figure-html/unnamed-chunk-10-1.png" width="50%" /> --- # Gapminder ```r library(gapminder) data(gapminder) help(gapminder) ``` --- # Scatterplot ```r g1<-ggplot( data=gapminder, aes(y=lifeExp, x= gdpPercap))+ geom_point(color="red")+ theme_minimal()+ ylab("Aspettativa di vita")+ xlab("PIL pro capite") g1 ``` <img src="lezione3_files/figure-html/unnamed-chunk-12-1.png" width="50%" /> --- # Scatterplot ```r g2<-g1+geom_smooth(method="lm") g2 ``` <img src="lezione3_files/figure-html/unnamed-chunk-13-1.png" width="50%" /> --- # Scatterplot con scala logaritmica (in base 10) ```r g3<-g1+ scale_x_log10()+ * geom_smooth(method="lm") g3 ``` <img src="lezione3_files/figure-html/unnamed-chunk-14-1.png" width="50%" /> --- # Scatterplot con scala logaritmica ```r g3<-g1+ scale_x_log10(labels=scales::dollar)+ geom_smooth(method="lm")+ ggtitle("Aspettativa di vita e PIL") g3 ``` <img src="lezione3_files/figure-html/unnamed-chunk-15-1.png" width="50%" /> --- # Esercizio 3 * riprodurre il grafico precedente con le seguenti caratteristiche: + selezionare solo dati di paesi provenienti dal continente "Asia" + attribuire alla linea di regressione il colore `orange` + rimuovere dalla linea di regressione l'intervallo di confidenza --- # Soluzione 3 ```r es2<-ggplot( data=subset(gapminder, continent=="Asia"), aes(y=lifeExp, x= gdpPercap))+ geom_point(color="red")+ theme_minimal()+ ylab("Aspettativa di vita")+ xlab("PIL pro capite")+ scale_x_log10(labels=scales::dollar)+ geom_smooth(method="lm", color="orange", se=F)+ ggtitle("Aspettativa di vita e PIL") ``` --- ```r es2 ``` <img src="lezione3_files/figure-html/unnamed-chunk-17-1.png" width="50%" /> --- # Versione più accurata ```r gapm<-ggplot( data=gapminder, aes(y=lifeExp, x= gdpPercap))+ geom_point(alpha=0.3)+ * scale_x_log10(labels=scales::dollar)+ geom_smooth(method="lm", color="orange", se=F)+ labs( y="Aspettativa di vita", x="PIL pro capite", title="Aspettativa di vita e PIL", subtitle = "Ogni punto rappresenta un paese in un anno", caption= "Fonte: Gapminder") ``` --- ```r gapm ``` <img src="lezione3_files/figure-html/unnamed-chunk-19-1.png" width="50%" /> --- # Fit nonparametrico ```r gapm.np<-ggplot( data=gapminder, aes(y=lifeExp, x= gdpPercap))+ geom_point(alpha=0.3)+ scale_x_log10(labels=scales::dollar)+ geom_smooth(method="loess", color="orange", se=F)+ labs( y="Aspettativa di vita", x="PIL pro capite", title="Aspettativa di vita e PIL", subtitle = "Ogni punto rappresenta un paese in un anno", caption= "Fonte: Gapminder") ``` --- ```r gapm.np ``` <img src="lezione3_files/figure-html/unnamed-chunk-21-1.png" width="50%" /> --- # Grafico per categorie ```r ggplot( data=gapminder, aes(y=lifeExp, x= gdpPercap, col=continent))+ geom_point(alpha=0.3)+ scale_x_log10(labels=scales::dollar)+ geom_smooth(method="loess")+ theme_minimal() ``` <img src="lezione3_files/figure-html/unnamed-chunk-22-1.png" width="50%" /> --- # Covid-19 Opendata Vaccini https://github.com/italia/covid19-opendata-vaccini --- # Dati vaccinazioni COVID Italia ```r vaccini_anagrafica <-read.csv("https://raw.githubusercontent.com/italia/covid19-opendata-vaccini/master/dati/anagrafica-vaccini-summary-latest.csv", header=T) vaccini1<-ggplot(data=vaccini_anagrafica, aes(x=fascia_anagrafica, y=totale/1000)) + geom_bar(stat="identity", fill="red") + xlab("Fascia Età")+ ylab("Dosi somministrate (migliaia)")+ theme_minimal() ``` --- ```r vaccini1 ``` <img src="lezione3_files/figure-html/unnamed-chunk-23-1.png" width="50%" /> --- ```r vaccini_somministrazione <-read.csv("https://raw.githubusercontent.com/italia/covid19-opendata-vaccini/master/dati/vaccini-summary-latest.csv", header=T) v2<-ggplot(data=subset(vaccini_somministrazione,area!="ITA"), aes(y=fct_reorder(area,dosi_somministrate,sum), x=dosi_somministrate/1000)) + geom_bar(stat="identity", fill="darkgreen") + ylab("Regione")+ xlab("Dosi somministrate (migliaia)")+ theme_minimal() ``` --- ```r v2 ``` <img src="lezione3_files/figure-html/unnamed-chunk-24-1.png" width="50%" /> --- # Scarica dati somministrazione nel tempo ```r vaccini_somministrazione_summary <-read.csv("https://raw.githubusercontent.com/italia/covid19-opendata-vaccini/master/dati/somministrazioni-vaccini-summary-latest.csv", header=T) dim(vaccini_somministrazione_summary) ``` ``` ## [1] 1045 15 ``` --- # Formato data ```r str(vaccini_somministrazione_summary) ``` ``` ## 'data.frame': 1045 obs. of 15 variables: ## $ data_somministrazione : chr "2021-01-08" "2021-02-01" "2021-02-01" "2021-01-17" ... ## $ area : chr "LIG" "PIE" "UMB" "ABR" ... ## $ totale : int 2685 8637 1260 187 349 713 9443 469 305 301 ... ## $ sesso_maschile : int 927 3051 333 59 93 300 3073 191 119 110 ... ## $ sesso_femminile : int 1758 5586 927 128 256 413 6370 278 186 191 ... ## $ categoria_operatori_sanitari_sociosanitari: int 1911 6161 876 136 318 656 5757 397 277 301 ... ## $ categoria_personale_non_sanitario : int 429 1402 32 51 15 57 2043 72 27 0 ... ## $ categoria_ospiti_rsa : int 345 1074 352 0 16 0 1643 0 1 0 ... ## $ categoria_over80 : int 0 0 0 0 0 0 0 0 0 0 ... ## $ prima_dose : int 2685 577 47 180 227 713 654 469 1 301 ... ## $ seconda_dose : int 0 8060 1213 7 122 0 8789 0 304 0 ... ## $ codice_NUTS1 : chr "ITC" "ITC" "ITI" "ITF" ... ## $ codice_NUTS2 : chr "ITC3" "ITC1" "ITI2" "ITF1" ... ## $ codice_regione_ISTAT : int 7 1 10 13 4 3 1 4 6 10 ... ## $ nome_area : chr "Liguria" "Piemonte" "Umbria" "Abruzzo" ... ``` ```r vaccini_somministrazione_summary$giorno<-as.Date(vaccini_somministrazione_summary$data_somministrazione) ``` --- # Line plot ```r v3<-ggplot(data=subset(vaccini_somministrazione_summary, area=="EMR"), aes(x=giorno, y=totale))+ylab("Dosi somministrate ") ``` --- ```r v3+geom_line(color="#69b3a2")+theme_minimal() ``` <img src="lezione3_files/figure-html/unnamed-chunk-27-1.png" width="50%" /> --- # Esercizio 4 * Mostra l'andamento della campagna vaccinale nelle regioni: + Emilia Romagna + Lombardia + Veneto + Campania --- # Soluzione 4 ```r sol4<-ggplot(data=subset(vaccini_somministrazione_summary, area==("EMR") | area==("LOM") | area==("VEN") | area==("CAM") ), aes(x=giorno, y=totale))+ geom_line(aes(group=area, color=area))+ theme_minimal() ``` --- ```r sol4 ``` <img src="lezione3_files/figure-html/unnamed-chunk-29-1.png" width="50%" /> --- # Soluzione 3 alternativa ```r regioni<-ggplot(data=subset(vaccini_somministrazione_summary, area==("EMR") | area==("LOM") | area==("VEN") | area==("CAM") ), aes(x=giorno, y=totale))+ * geom_line()+facet_wrap(~area) *#alt+125 ~ ``` --- ```r regioni+ylab("Dosi somministrate ") ``` <img src="lezione3_files/figure-html/unnamed-chunk-31-1.png" width="50%" /> --- # Salva i grafici in un documento PDF ```r ggsave("ilmiografico.pdf", plot=regioni) ```