class: center, middle, inverse, title-slide # Laboratorio Bio-demografico ## Lezione 6 ### Nicola Barban
Alma Mater Studiorum Università di Bologna
Dipartimento di Scienze Statistiche ### 25 Febbraio 2021
--- # Outline * Tabelle * Scatteplots * Visualizzare distribuzioni * Visualizzare correlazioni ### Dati * General Social Survey * UK Biobank ### Pacchetti * `library(ggridges)` * `library(ggplot2)` * `library(viridis)` * `library(ggridges)` * `library(haven)` * `library(knitr)` * `library(gapminder)` --- # General Social Survey https://gss.norc.org ```r load(file="gss.Rdata") glimpse(gss_sm) ``` ``` ## Rows: 2,867 ## Columns: 32 ## $ year <dbl> 2016, 2016, 2016, 2016, 2016, 2… ## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, … ## $ ballot <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1,… ## $ age <dbl> 47, 61, 72, 43, 55, 53, 50, 23,… ## $ childs <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5… ## $ sibs <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5,… ## $ degree <fct> Bachelor, High School, Bachelor… ## $ race <fct> White, White, White, White, Whi… ## $ sex <fct> Male, Male, Male, Female, Femal… ## $ region <fct> New England, New England, New E… ## $ income16 <fct> $170000 or over, $50000 to 5999… ## $ relig <fct> None, None, Catholic, Catholic,… ## $ marital <fct> Married, Never Married, Married… ## $ padeg <fct> Graduate, Lt High School, High … ## $ madeg <fct> High School, High School, Lt Hi… ## $ partyid <fct> "Independent", "Ind,near Dem", … ## $ polviews <fct> Moderate, Liberal, Conservative… ## $ happy <fct> Pretty Happy, Pretty Happy, Ver… ## $ partners <fct> NA, 1 Partner, 1 Partner, NA, 1… ## $ grass <fct> NA, Legal, Not Legal, NA, Legal… ## $ zodiac <fct> Aquarius, Scorpio, Pisces, Canc… ## $ pres12 <labelled> 3, 1, 2, 2, 1, 1, NA, NA, … ## $ wtssall <dbl> 0.9569935, 0.4784968, 0.9569935… ## $ income_rc <fct> Gt $170000, Gt $50000, Gt $7500… ## $ agegrp <fct> Age 45-55, Age 55-65, Age 65+, … ## $ ageq <fct> Age 34-49, Age 49-62, Age 62+, … ## $ siblings <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, … ## $ kids <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+,… ## $ religion <fct> None, None, Catholic, Catholic,… ## $ bigregion <fct> Northeast, Northeast, Northeast… ## $ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, … ## $ obama <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0… ``` --- # Numero figli per eta' ```r p<-gss_sm %>% ggplot(aes(x=age, y=childs)) p+geom_point(alpha=0.2)+ geom_smooth()+ facet_grid(sex~race) ``` <img src="lezione6_files/figure-html/unnamed-chunk-2-1.png" width="40%" /> --- # Istogramma ```r h<-ggplot(data=gss_sm,aes(x=age)) h+geom_histogram()+ facet_grid(~race) ``` <img src="lezione6_files/figure-html/unnamed-chunk-3-1.png" width="40%" /> --- # Grafico Densita' ```r h+ geom_density( fill="red")+ facet_grid(~race) ``` <img src="lezione6_files/figure-html/unnamed-chunk-4-1.png" width="40%" /> --- # Overlapping Densities ```r h+geom_density( aes(fill=race), alpha=0.3)+ ggtitle("Grafico Densità")+ * theme(plot.title = element_text(size=22)) ``` <img src="lezione6_files/figure-html/unnamed-chunk-5-1.png" width="40%" /> --- # UK Biobank data ```r library(haven) UKB<-read_dta("UKB_teaching.dta") ``` --- ```r names(UKB) ``` ``` ## [1] "sex" "year_birth" ## [3] "neb" "childless" ## [5] "smoking_pregnancy_mom" "educ" ## [7] "aam" "afs" ## [9] "afb" "meno" ## [11] "birthweight" "teen" ## [13] "bmi" "num_siblings" ## [15] "country" "gdp_spl" ## [17] "region_gdp" "pill_lad" ## [19] "idn" ``` --- # Variabili nel dataset * `neb` Number of childrem * `childless` (1= childless) * `smoking_pregnancy_mom` Mother smoked during pregnancy * `educ` Years of education * `aam` Age at Menarche * `afs` Age at First Sexual Intercourse * `afb` Age at First Birth * `meno` Age at Natural menopause * `birthweight` Birthweight * `teen` Teenage childbearing (1 yes, 0 no) * `bmi` Body Mass Index * `num_siblings` Number of Siblings * `country` Country (England, Scotland, Wales) * `gdp_spl` regional GDP at birth (aprox) * `region_gdp` Region of Birth * `pill_lad` Pill availability (% of women had use pill at age 18 in local Authority) * `idn` Personal ID number --- ```r glimpse(UKB) ``` ``` ## Rows: 161,468 ## Columns: 19 ## $ sex <dbl> 0, 0, 0, 0, 0, 0, 0, … ## $ year_birth <dbl> 1952, 1943, 1944, 194… ## $ neb <dbl> 0, 2, 2, 4, 2, 2, 0, … ## $ childless <dbl> 1, 0, 0, 0, 0, 0, 1, … ## $ smoking_pregnancy_mom <dbl> 1, 0, 0, 0, 0, 1, 1, … ## $ educ <dbl> 16, 16, 15, NA, 16, 1… ## $ aam <dbl> 17, 13, 14, 13, 15, 1… ## $ afs <dbl> 18, 23, 19, 24, 21, 1… ## $ afb <dbl> NA, 26, 21, 25, 23, 2… ## $ meno <dbl> NA, 50, 51, 52, 51, 5… ## $ birthweight <dbl> 2.75, NA, NA, 3.18, 2… ## $ teen <dbl> NA, 0, 0, 0, 0, 0, NA… ## $ bmi <dbl> 29.9974, 21.7784, 31.… ## $ num_siblings <dbl> NA, 3, 2, NA, 5, 5, 3… ## $ country <chr> "England", "England",… ## $ gdp_spl <dbl> 2.306183, 1.756356, 1… ## $ region_gdp <chr> "London Counties", "Y… ## $ pill_lad <dbl> 25.4014606, 0.1960784… ## $ idn <dbl> 1, 2, 3, 4, 5, 6, 7, … ``` --- # Tabella di frequenza ```r tab<-table(UKB$region_gdp) tab ``` ``` ## ## East Midlands London Counties ## 11996 16975 ## North Rest of SouthEast ## 45543 16397 ## Scotland SouthWest ## 14769 11349 ## Wales West Midlands ## 7653 14541 ## Yorks & Humberside ## 22245 ``` --- # Frequenze relative ```r tab2<-round(prop.table(tab),2) tab2 ``` ``` ## ## East Midlands London Counties ## 0.07 0.11 ## North Rest of SouthEast ## 0.28 0.10 ## Scotland SouthWest ## 0.09 0.07 ## Wales West Midlands ## 0.05 0.09 ## Yorks & Humberside ## 0.14 ``` --- # Tabelle con Kable ``` library(knitr) kable(tab2, "simple") Var1 Freq -------------------- ----- East Midlands 0.07 London Counties 0.11 North 0.28 Rest of SouthEast 0.10 Scotland 0.09 SouthWest 0.07 Wales 0.05 West Midlands 0.09 Yorks & Humberside 0.14 ``` --- # Summary tables ```r summ_table<- UKB %>% select(afs, afb, neb, childless, region_gdp) %>% group_by(region_gdp) %>% summarize_all(mean, na.rm=T) summ_table ``` ``` ## # A tibble: 9 x 5 ## region_gdp afs afb neb childless ## * <chr> <dbl> <dbl> <dbl> <dbl> ## 1 East Midlands 18.7 25.1 1.83 0.167 ## 2 London Counties 19.0 26.2 1.78 0.209 ## 3 North 19.0 24.8 1.87 0.161 ## 4 Rest of SouthEast 19.0 26.3 1.83 0.197 ## 5 Scotland 19.9 25.5 1.79 0.205 ## 6 SouthWest 18.7 25.5 1.89 0.163 ## 7 Wales 19.1 25.4 1.88 0.160 ## 8 West Midlands 18.8 25.4 1.81 0.178 ## 9 Yorks & Humberside 18.7 24.9 1.84 0.158 ``` --- # Tabelle HTML con kable ```r kable(summ_table, col.names =c("Regione","Età primo rapporto sessuale", "Età primo figlio", "Numero di figli", "Proporzione senza figli"), digits=2, caption = "Medie Regionali", format = "html") ``` <table> <caption>Medie Regionali</caption> <thead> <tr> <th style="text-align:left;"> Regione </th> <th style="text-align:right;"> Età primo rapporto sessuale </th> <th style="text-align:right;"> Età primo figlio </th> <th style="text-align:right;"> Numero di figli </th> <th style="text-align:right;"> Proporzione senza figli </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> East Midlands </td> <td style="text-align:right;"> 18.66 </td> <td style="text-align:right;"> 25.13 </td> <td style="text-align:right;"> 1.83 </td> <td style="text-align:right;"> 0.17 </td> </tr> <tr> <td style="text-align:left;"> London Counties </td> <td style="text-align:right;"> 19.00 </td> <td style="text-align:right;"> 26.22 </td> <td style="text-align:right;"> 1.78 </td> <td style="text-align:right;"> 0.21 </td> </tr> <tr> <td style="text-align:left;"> North </td> <td style="text-align:right;"> 19.02 </td> <td style="text-align:right;"> 24.84 </td> <td style="text-align:right;"> 1.87 </td> <td style="text-align:right;"> 0.16 </td> </tr> <tr> <td style="text-align:left;"> Rest of SouthEast </td> <td style="text-align:right;"> 19.01 </td> <td style="text-align:right;"> 26.29 </td> <td style="text-align:right;"> 1.83 </td> <td style="text-align:right;"> 0.20 </td> </tr> <tr> <td style="text-align:left;"> Scotland </td> <td style="text-align:right;"> 19.87 </td> <td style="text-align:right;"> 25.48 </td> <td style="text-align:right;"> 1.79 </td> <td style="text-align:right;"> 0.21 </td> </tr> <tr> <td style="text-align:left;"> SouthWest </td> <td style="text-align:right;"> 18.72 </td> <td style="text-align:right;"> 25.52 </td> <td style="text-align:right;"> 1.89 </td> <td style="text-align:right;"> 0.16 </td> </tr> <tr> <td style="text-align:left;"> Wales </td> <td style="text-align:right;"> 19.12 </td> <td style="text-align:right;"> 25.36 </td> <td style="text-align:right;"> 1.88 </td> <td style="text-align:right;"> 0.16 </td> </tr> <tr> <td style="text-align:left;"> West Midlands </td> <td style="text-align:right;"> 18.82 </td> <td style="text-align:right;"> 25.39 </td> <td style="text-align:right;"> 1.81 </td> <td style="text-align:right;"> 0.18 </td> </tr> <tr> <td style="text-align:left;"> Yorks & Humberside </td> <td style="text-align:right;"> 18.72 </td> <td style="text-align:right;"> 24.93 </td> <td style="text-align:right;"> 1.84 </td> <td style="text-align:right;"> 0.16 </td> </tr> </tbody> </table> --- # Risorse per tabelle in R https://haozhu233.github.io/kableExtra/awesome_table_in_html.html --- # Esercizio 1. Usare il dataset gapminder e creare una tabella con media `LifeEx` per ogni Continente e anno [Hint: usa `group_by` e `summarise`] 2. Trasforma il dataset in wide (ogni riga corrispondente ad un anno) [Hint: usa `pivot_wider`] 3. crea una tabella con kable in html --- # soluzione (1) ```r library(gapminder) TAB<-gapminder %>% group_by(continent , year) %>% summarise(LE=mean(lifeExp)) TAB ``` ``` ## # A tibble: 60 x 3 ## # Groups: continent [5] ## continent year LE ## <fct> <int> <dbl> ## 1 Africa 1952 39.1 ## 2 Africa 1957 41.3 ## 3 Africa 1962 43.3 ## 4 Africa 1967 45.3 ## 5 Africa 1972 47.5 ## 6 Africa 1977 49.6 ## 7 Africa 1982 51.6 ## 8 Africa 1987 53.3 ## 9 Africa 1992 53.6 ## 10 Africa 1997 53.6 ## # … with 50 more rows ``` --- # soluzione (2) ```r TAB %>% pivot_wider(names_from = continent, values_from = LE) -> TAB TAB ``` ``` ## # A tibble: 12 x 6 ## year Africa Americas Asia Europe Oceania ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1952 39.1 53.3 46.3 64.4 69.3 ## 2 1957 41.3 56.0 49.3 66.7 70.3 ## 3 1962 43.3 58.4 51.6 68.5 71.1 ## 4 1967 45.3 60.4 54.7 69.7 71.3 ## 5 1972 47.5 62.4 57.3 70.8 71.9 ## 6 1977 49.6 64.4 59.6 71.9 72.9 ## 7 1982 51.6 66.2 62.6 72.8 74.3 ## 8 1987 53.3 68.1 64.9 73.6 75.3 ## 9 1992 53.6 69.6 66.5 74.4 76.9 ## 10 1997 53.6 71.2 68.0 75.5 78.2 ## 11 2002 53.3 72.4 69.2 76.7 79.7 ## 12 2007 54.8 73.6 70.7 77.6 80.7 ``` --- # soluzione (3) ```r kable(TAB, digits=2, caption = "Fonte: Gapminder", format = "html") ``` <table> <caption>Fonte: Gapminder</caption> <thead> <tr> <th style="text-align:right;"> year </th> <th style="text-align:right;"> Africa </th> <th style="text-align:right;"> Americas </th> <th style="text-align:right;"> Asia </th> <th style="text-align:right;"> Europe </th> <th style="text-align:right;"> Oceania </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 39.14 </td> <td style="text-align:right;"> 53.28 </td> <td style="text-align:right;"> 46.31 </td> <td style="text-align:right;"> 64.41 </td> <td style="text-align:right;"> 69.25 </td> </tr> <tr> <td style="text-align:right;"> 1957 </td> <td style="text-align:right;"> 41.27 </td> <td style="text-align:right;"> 55.96 </td> <td style="text-align:right;"> 49.32 </td> <td style="text-align:right;"> 66.70 </td> <td style="text-align:right;"> 70.30 </td> </tr> <tr> <td style="text-align:right;"> 1962 </td> <td style="text-align:right;"> 43.32 </td> <td style="text-align:right;"> 58.40 </td> <td style="text-align:right;"> 51.56 </td> <td style="text-align:right;"> 68.54 </td> <td style="text-align:right;"> 71.09 </td> </tr> <tr> <td style="text-align:right;"> 1967 </td> <td style="text-align:right;"> 45.33 </td> <td style="text-align:right;"> 60.41 </td> <td style="text-align:right;"> 54.66 </td> <td style="text-align:right;"> 69.74 </td> <td style="text-align:right;"> 71.31 </td> </tr> <tr> <td style="text-align:right;"> 1972 </td> <td style="text-align:right;"> 47.45 </td> <td style="text-align:right;"> 62.39 </td> <td style="text-align:right;"> 57.32 </td> <td style="text-align:right;"> 70.78 </td> <td style="text-align:right;"> 71.91 </td> </tr> <tr> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 49.58 </td> <td style="text-align:right;"> 64.39 </td> <td style="text-align:right;"> 59.61 </td> <td style="text-align:right;"> 71.94 </td> <td style="text-align:right;"> 72.85 </td> </tr> <tr> <td style="text-align:right;"> 1982 </td> <td style="text-align:right;"> 51.59 </td> <td style="text-align:right;"> 66.23 </td> <td style="text-align:right;"> 62.62 </td> <td style="text-align:right;"> 72.81 </td> <td style="text-align:right;"> 74.29 </td> </tr> <tr> <td style="text-align:right;"> 1987 </td> <td style="text-align:right;"> 53.34 </td> <td style="text-align:right;"> 68.09 </td> <td style="text-align:right;"> 64.85 </td> <td style="text-align:right;"> 73.64 </td> <td style="text-align:right;"> 75.32 </td> </tr> <tr> <td style="text-align:right;"> 1992 </td> <td style="text-align:right;"> 53.63 </td> <td style="text-align:right;"> 69.57 </td> <td style="text-align:right;"> 66.54 </td> <td style="text-align:right;"> 74.44 </td> <td style="text-align:right;"> 76.94 </td> </tr> <tr> <td style="text-align:right;"> 1997 </td> <td style="text-align:right;"> 53.60 </td> <td style="text-align:right;"> 71.15 </td> <td style="text-align:right;"> 68.02 </td> <td style="text-align:right;"> 75.51 </td> <td style="text-align:right;"> 78.19 </td> </tr> <tr> <td style="text-align:right;"> 2002 </td> <td style="text-align:right;"> 53.33 </td> <td style="text-align:right;"> 72.42 </td> <td style="text-align:right;"> 69.23 </td> <td style="text-align:right;"> 76.70 </td> <td style="text-align:right;"> 79.74 </td> </tr> <tr> <td style="text-align:right;"> 2007 </td> <td style="text-align:right;"> 54.81 </td> <td style="text-align:right;"> 73.61 </td> <td style="text-align:right;"> 70.73 </td> <td style="text-align:right;"> 77.65 </td> <td style="text-align:right;"> 80.72 </td> </tr> </tbody> </table> # Boxplots ```r uk1<-UKB %>% filter(country!="NA") %>% ggplot(aes(y=afb, x=region_gdp)) uk1+geom_boxplot() ``` <img src="lezione6_files/figure-html/unnamed-chunk-16-1.png" width="40%" /> --- ```r uk1+geom_boxplot(aes(fill=country)) ``` <img src="lezione6_files/figure-html/unnamed-chunk-17-1.png" width="40%" /> --- # Ricodifico variabili (seleziono intervalli) ```r min(UKB$year_birth) ``` ``` ## [1] 1938 ``` ```r max(UKB$year_birth) ``` ``` ## [1] 1968 ``` ```r years<-seq(from = 1935, to = 1970, by = 5) years ``` ``` ## [1] 1935 1940 1945 1950 1955 1960 1965 1970 ``` --- # Ricodifico variabili ```r UKB<-UKB %>% mutate(birth_interval=cut(year_birth, breaks=years)) names(UKB) ``` ``` ## [1] "sex" "year_birth" ## [3] "neb" "childless" ## [5] "smoking_pregnancy_mom" "educ" ## [7] "aam" "afs" ## [9] "afb" "meno" ## [11] "birthweight" "teen" ## [13] "bmi" "num_siblings" ## [15] "country" "gdp_spl" ## [17] "region_gdp" "pill_lad" ## [19] "idn" "birth_interval" ``` --- # Ancora summary tables ```r summ_table2<- UKB %>% select(afs, afb, neb, childless, birth_interval) %>% group_by(birth_interval) %>% summarize_all(mean, na.rm=T) summ_table2 ``` ``` ## # A tibble: 7 x 5 ## birth_interval afs afb neb childless ## * <fct> <dbl> <dbl> <dbl> <dbl> ## 1 (1935,1940] 20.9 24.4 2.12 0.121 ## 2 (1940,1945] 20.2 24.5 2.01 0.125 ## 3 (1945,1950] 19.3 24.9 1.89 0.148 ## 4 (1950,1955] 18.4 25.6 1.76 0.194 ## 5 (1955,1960] 18.0 26.2 1.66 0.235 ## 6 (1960,1965] 18.1 26.9 1.55 0.265 ## 7 (1965,1970] 17.8 27.4 1.34 0.336 ``` --- # Creare binned scatter plot ### Un punto per ogni intervallo (media) ```r summ_table3<- UKB %>% select(aam, afs, afb, neb,teen, childless, year_birth) %>% group_by(year_birth) %>% summarize_all(.funs= list(media = ~mean(x=., na.rm=T), dev.std = ~sd(x=., na.rm=T), N = ~sum(!is.na(x=.)) )) ``` --- # Creo grafico da dataset wide ```r graf0<- summ_table3 %>% ggplot( aes(x=year_birth))+ theme_bw()+ labs( x="Anno di nascita", y= "Età", title="Comportamento riproduttivo in UK.", subtitle="Anni 1938-1968", caption= "Source: UK Biobank") graf1<-graf0 + geom_point(aes(y=afs_media, col="AFS"))+ geom_point(aes(y=afb_media, col="AFB"))+ geom_line(aes(y=afs_media, col="AFS"))+ geom_line(aes(y=afb_media, col="AFB")) ``` --- ```r graf1 ``` <img src="lezione6_files/figure-html/unnamed-chunk-23-1.png" width="40%" /> --- # Modificare leggenda **manualmente** `color` è un elemento di `aes()` in `geom_point` e `geom_line` ```r *graf2<-graf1+scale_colour_manual("Variabile", values = c('AFB' = 'turquoise', 'AFS' = 'tan1'), labels = c('AFB' = 'Età al primo figlio', 'AFS' = 'Età al primo rapporto sessuale') ) ``` --- ```r graf2 ``` <img src="lezione6_files/figure-html/unnamed-chunk-25-1.png" width="40%" /> --- # geom_ribbon() ```r graf3<-graf0+ geom_ribbon(aes(ymin=afb_media-1.96*afb_dev.std/sqrt(afb_N), ymax=afb_media+1.96*afb_dev.std/sqrt(afb_N), fill="AFB"))+ geom_ribbon(aes(ymin=afs_media-1.96*afs_dev.std/sqrt(afs_N), ymax=afs_media+1.96*afs_dev.std/sqrt(afs_N), fill="AFS"))+ scale_fill_manual("Variabile", values = c('AFB' = 'turquoise', 'AFS' = 'tan1'), labels = c('AFB' = 'Età al primo figlio', 'AFS' = 'Età al primo rapporto sessuale') ) ``` --- ```r graf3 ``` <img src="lezione6_files/figure-html/unnamed-chunk-27-1.png" width="40%" /> --- # Esercizio 1. Visualizza con `geom_ribbon` or `geom_errorbar` l'andamento della proporzione di donne senza figli. ( variabile `childless` ) --- # soluzione 1 ```r graf0+ geom_ribbon(aes(ymin=childless_media-1.96*childless_dev.std/sqrt(childless_N), ymax=childless_media+1.96*childless_dev.std/sqrt(childless_N)))+ylab("Proporzione") ``` <img src="lezione6_files/figure-html/unnamed-chunk-28-1.png" width="40%" /> --- # Density plots ```r uk2<-UKB %>% filter(country!="NA") %>% ggplot(aes(x=afb, group=birth_interval)) uk2+geom_density(aes(fill=birth_interval), alpha=.3) ``` <img src="lezione6_files/figure-html/unnamed-chunk-29-1.png" width="40%" /> --- # Violin Plots ```r uk2+geom_violin(aes(y=afb, x=birth_interval)) ``` <img src="lezione6_files/figure-html/unnamed-chunk-30-1.png" width="40%" /> --- # Ridge plots ```r library(ggridges) uk3<-ggplot(UKB, aes(x = afb, y = birth_interval, fill = birth_interval)) + geom_density_ridges() + theme_ridges() + theme(legend.position = "none") ``` --- ```r uk3 ``` <img src="lezione6_files/figure-html/unnamed-chunk-32-1.png" width="40%" /> --- ```r library(ggridges) library(ggplot2) library(viridis) # Plot uk4<-ggplot(UKB, aes(x = afb, y = birth_interval, fill = ..x..)) + geom_density_ridges_gradient() + scale_fill_viridis_c(name = "Anni") + labs(title = 'Età al primo figlio nel Regno Unito', caption="Dati UK Biobank")+ xlab("Età") + ylab("Anno di nascita") ``` --- ```r uk4 ``` <img src="lezione6_files/figure-html/unnamed-chunk-34-1.png" width="40%" /> --- # Matrici di correlazione ```r UKB %>% select("aam","afb", "afs", "neb", "bmi", "num_siblings", "educ", "birthweight", "smoking_pregnancy_mom") %>% cor(use="complete.obs") -> corMAT round(corMAT,2) ``` ``` ## aam afb afs neb bmi ## aam 1.00 0.04 0.08 0.01 -0.16 ## afb 0.04 1.00 0.37 -0.27 -0.14 ## afs 0.08 0.37 1.00 -0.07 -0.06 ## neb 0.01 -0.27 -0.07 1.00 0.08 ## bmi -0.16 -0.14 -0.06 0.08 1.00 ## num_siblings 0.06 -0.15 -0.02 0.10 0.06 ## educ -0.02 0.27 0.13 -0.05 -0.09 ## birthweight 0.00 0.03 0.00 0.00 0.00 ## smoking_pregnancy_mom -0.02 -0.09 -0.10 0.00 0.07 ## num_siblings educ birthweight ## aam 0.06 -0.02 0.00 ## afb -0.15 0.27 0.03 ## afs -0.02 0.13 0.00 ## neb 0.10 -0.05 0.00 ## bmi 0.06 -0.09 0.00 ## num_siblings 1.00 -0.16 -0.01 ## educ -0.16 1.00 0.02 ## birthweight -0.01 0.02 1.00 ## smoking_pregnancy_mom 0.04 -0.09 -0.09 ## smoking_pregnancy_mom ## aam -0.02 ## afb -0.09 ## afs -0.10 ## neb 0.00 ## bmi 0.07 ## num_siblings 0.04 ## educ -0.09 ## birthweight -0.09 ## smoking_pregnancy_mom 1.00 ``` --- # Correlation plot ```r library(corrplot) corrplot(corMAT, method="color") ``` <img src="lezione6_files/figure-html/unnamed-chunk-36-1.png" width="40%" /> --- ```r library(corrplot) corrplot(corMAT, type="upper") ``` <img src="lezione6_files/figure-html/unnamed-chunk-37-1.png" width="40%" /> --- ```r corrplot(corMAT, type="upper", order="hclust") ``` <img src="lezione6_files/figure-html/unnamed-chunk-38-1.png" width="40%" /> --- ```r uk3<-ggplot(data=UKB, aes(x=afs, y=afb, col=year_birth)) uk3+geom_point(alpha=.2) ``` <img src="lezione6_files/figure-html/unnamed-chunk-39-1.png" width="40%" /> --- # Rimuovo valori implausibili ```r uk3<-ggplot(data=subset(UKB, afb>=afs & afs>15 & afb>15), aes(x=afs, y=afb, col=year_birth)) uk3+geom_point(alpha=0.1) ``` <img src="lezione6_files/figure-html/unnamed-chunk-40-1.png" width="40%" /> --- ```r uk3+ geom_point(alpha=0.1)+ geom_smooth(method="lm", se=FALSE) ``` <img src="lezione6_files/figure-html/unnamed-chunk-41-1.png" width="40%" /> --- ```r uk3+geom_smooth(aes(col=birth_interval),method="lm", se=FALSE) ``` <img src="lezione6_files/figure-html/unnamed-chunk-42-1.png" width="40%" /> --- ```r library(GGally) median(UKB$year_birth) ``` ``` ## [1] 1950 ``` ```r UKB_sel<- UKB %>% mutate(year_binary=(year_birth>=1950)) %>% filter(afb>=afs & afs>15 & afb>15) %>% select(afb, afs, neb , aam, year_binary) ``` --- ```r ggpairs(UKB_sel, aes(col=year_binary)) ``` <img src="lezione6_files/figure-html/unnamed-chunk-44-1.png" width="40%" /> ---