Laboratorio Bio-demografico

class: center, middle, inverse, title-slide

# Laboratorio Bio-demografico
## Lezione 6
### Nicola Barban <br> Alma Mater Studiorum Università di Bologna <br> Dipartimento di Scienze Statistiche
### 25 Febbraio 2021<br> <img src="../img/UniBo-Universita-di-Bologna.png" style="width:20.0%" />

---

# Outline

* Tabelle
* Scatteplots
* Visualizzare distribuzioni
* Visualizzare correlazioni

### Dati
* General Social Survey
* UK Biobank

### Pacchetti
* `library(ggridges)`
* `library(ggplot2)`
* `library(viridis)`
* `library(ggridges)`
* `library(haven)`
* `library(knitr)`
* `library(gapminder)`

---
# General Social Survey
https://gss.norc.org

```r
load(file="gss.Rdata")
glimpse(gss_sm)
```

```
## Rows: 2,867
## Columns: 32
## $ year        <dbl> 2016, 2016, 2016, 2016, 2016, 2…
## $ id          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, …
## $ ballot      <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1,…
## $ age         <dbl> 47, 61, 72, 43, 55, 53, 50, 23,…
## $ childs      <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5…
## $ sibs        <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5,…
## $ degree      <fct> Bachelor, High School, Bachelor…
## $ race        <fct> White, White, White, White, Whi…
## $ sex         <fct> Male, Male, Male, Female, Femal…
## $ region      <fct> New England, New England, New E…
## $ income16    <fct> $170000 or over, $50000 to 5999…
## $ relig       <fct> None, None, Catholic, Catholic,…
## $ marital     <fct> Married, Never Married, Married…
## $ padeg       <fct> Graduate, Lt High School, High …
## $ madeg       <fct> High School, High School, Lt Hi…
## $ partyid     <fct> "Independent", "Ind,near Dem", …
## $ polviews    <fct> Moderate, Liberal, Conservative…
## $ happy       <fct> Pretty Happy, Pretty Happy, Ver…
## $ partners    <fct> NA, 1 Partner, 1 Partner, NA, 1…
## $ grass       <fct> NA, Legal, Not Legal, NA, Legal…
## $ zodiac      <fct> Aquarius, Scorpio, Pisces, Canc…
## $ pres12      <labelled> 3, 1, 2, 2, 1, 1, NA, NA, …
## $ wtssall     <dbl> 0.9569935, 0.4784968, 0.9569935…
## $ income_rc   <fct> Gt $170000, Gt $50000, Gt $7500…
## $ agegrp      <fct> Age 45-55, Age 55-65, Age 65+, …
## $ ageq        <fct> Age 34-49, Age 49-62, Age 62+, …
## $ siblings    <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, …
## $ kids        <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+,…
## $ religion    <fct> None, None, Catholic, Catholic,…
## $ bigregion   <fct> Northeast, Northeast, Northeast…
## $ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, …
## $ obama       <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0…
```
---
# Numero figli per eta'

```r
p<-gss_sm %>% 
  ggplot(aes(x=age, y=childs))

p+geom_point(alpha=0.2)+
  geom_smooth()+
  facet_grid(sex~race)
```

<img src="lezione6_files/figure-html/unnamed-chunk-2-1.png" width="40%" />
---

# Istogramma

```r
h<-ggplot(data=gss_sm,aes(x=age))

h+geom_histogram()+ facet_grid(~race)
```

<img src="lezione6_files/figure-html/unnamed-chunk-3-1.png" width="40%" />
---
# Grafico Densita'

```r
h+
  geom_density( fill="red")+
  facet_grid(~race)
```

<img src="lezione6_files/figure-html/unnamed-chunk-4-1.png" width="40%" />
---

# Overlapping Densities

```r
h+geom_density( aes(fill=race), alpha=0.3)+
  ggtitle("Grafico Densità")+
* theme(plot.title = element_text(size=22))
```

---

# UK Biobank data

```r
library(haven)
UKB<-read_dta("UKB_teaching.dta")
```
---

```r
names(UKB)
```

```
##  [1] "sex"                   "year_birth"           
##  [3] "neb"                   "childless"            
##  [5] "smoking_pregnancy_mom" "educ"                 
##  [7] "aam"                   "afs"                  
##  [9] "afb"                   "meno"                 
## [11] "birthweight"           "teen"                 
## [13] "bmi"                   "num_siblings"         
## [15] "country"               "gdp_spl"              
## [17] "region_gdp"            "pill_lad"             
## [19] "idn"
```

---
# Variabili nel dataset

* `neb` Number of childrem               
* `childless` (1= childless)          
* `smoking_pregnancy_mom` Mother smoked during pregnancy
* `educ`        Years of education         
*  `aam`    Age at Menarche
* `afs`      Age at First Sexual Intercourse            
* `afb`  Age at First Birth
* `meno`      Age at Natural menopause           
* `birthweight` Birthweight
* `teen`        Teenage childbearing (1 yes, 0 no)         
* `bmi`     Body Mass Index
* `num_siblings`     Number of Siblings    
* `country`     Country (England, Scotland, Wales)
* `gdp_spl`          regional GDP at birth (aprox)     
* `region_gdp`   Region of Birth
* `pill_lad`          Pill availability (% of women had use pill at age 18 in local Authority)   
* `idn` Personal ID number
---

```r
glimpse(UKB)
```

```
## Rows: 161,468
## Columns: 19
## $ sex                   <dbl> 0, 0, 0, 0, 0, 0, 0, …
## $ year_birth            <dbl> 1952, 1943, 1944, 194…
## $ neb                   <dbl> 0, 2, 2, 4, 2, 2, 0, …
## $ childless             <dbl> 1, 0, 0, 0, 0, 0, 1, …
## $ smoking_pregnancy_mom <dbl> 1, 0, 0, 0, 0, 1, 1, …
## $ educ                  <dbl> 16, 16, 15, NA, 16, 1…
## $ aam                   <dbl> 17, 13, 14, 13, 15, 1…
## $ afs                   <dbl> 18, 23, 19, 24, 21, 1…
## $ afb                   <dbl> NA, 26, 21, 25, 23, 2…
## $ meno                  <dbl> NA, 50, 51, 52, 51, 5…
## $ birthweight           <dbl> 2.75, NA, NA, 3.18, 2…
## $ teen                  <dbl> NA, 0, 0, 0, 0, 0, NA…
## $ bmi                   <dbl> 29.9974, 21.7784, 31.…
## $ num_siblings          <dbl> NA, 3, 2, NA, 5, 5, 3…
## $ country               <chr> "England", "England",…
## $ gdp_spl               <dbl> 2.306183, 1.756356, 1…
## $ region_gdp            <chr> "London Counties", "Y…
## $ pill_lad              <dbl> 25.4014606, 0.1960784…
## $ idn                   <dbl> 1, 2, 3, 4, 5, 6, 7, …
```
---
# Tabella di frequenza

```r
tab<-table(UKB$region_gdp)
tab
```

```
## 
##      East  Midlands     London Counties 
##               11996               16975 
##               North   Rest of SouthEast 
##               45543               16397 
##            Scotland           SouthWest 
##               14769               11349 
##               Wales       West Midlands 
##                7653               14541 
## Yorks  & Humberside 
##               22245
```
---

# Frequenze relative

```r
tab2<-round(prop.table(tab),2)
tab2
```

```
## 
##      East  Midlands     London Counties 
##                0.07                0.11 
##               North   Rest of SouthEast 
##                0.28                0.10 
##            Scotland           SouthWest 
##                0.09                0.07 
##               Wales       West Midlands 
##                0.05                0.09 
## Yorks  & Humberside 
##                0.14
```
---
# Tabelle con Kable

```
library(knitr)
kable(tab2, "simple")

Var1                   Freq
--------------------  -----
East  Midlands         0.07
London Counties        0.11
North                  0.28
Rest of SouthEast      0.10
Scotland               0.09
SouthWest              0.07
Wales                  0.05
West Midlands          0.09
Yorks  & Humberside    0.14
```
---

# Summary tables

```r
summ_table<- UKB %>% 
            select(afs, afb, neb, childless, region_gdp) %>% 
            group_by(region_gdp) %>% 
            summarize_all(mean, na.rm=T) 
summ_table
```

```
## # A tibble: 9 x 5
##   region_gdp            afs   afb   neb childless
## * <chr>               <dbl> <dbl> <dbl>     <dbl>
## 1 East  Midlands       18.7  25.1  1.83     0.167
## 2 London Counties      19.0  26.2  1.78     0.209
## 3 North                19.0  24.8  1.87     0.161
## 4 Rest of SouthEast    19.0  26.3  1.83     0.197
## 5 Scotland             19.9  25.5  1.79     0.205
## 6 SouthWest            18.7  25.5  1.89     0.163
## 7 Wales                19.1  25.4  1.88     0.160
## 8 West Midlands        18.8  25.4  1.81     0.178
## 9 Yorks  & Humberside  18.7  24.9  1.84     0.158
```
---
# Tabelle HTML con kable

```r
kable(summ_table, 
      col.names =c("Regione","Età primo rapporto sessuale", "Età primo figlio", "Numero di figli", "Proporzione senza figli"),
      digits=2,
      caption = "Medie Regionali",
      format = "html")
```

<table>
<caption>Medie Regionali</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Regione </th>
   <th style="text-align:right;"> Età primo rapporto sessuale </th>
   <th style="text-align:right;"> Età primo figlio </th>
   <th style="text-align:right;"> Numero di figli </th>
   <th style="text-align:right;"> Proporzione senza figli </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> East  Midlands </td>
   <td style="text-align:right;"> 18.66 </td>
   <td style="text-align:right;"> 25.13 </td>
   <td style="text-align:right;"> 1.83 </td>
   <td style="text-align:right;"> 0.17 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> London Counties </td>
   <td style="text-align:right;"> 19.00 </td>
   <td style="text-align:right;"> 26.22 </td>
   <td style="text-align:right;"> 1.78 </td>
   <td style="text-align:right;"> 0.21 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> North </td>
   <td style="text-align:right;"> 19.02 </td>
   <td style="text-align:right;"> 24.84 </td>
   <td style="text-align:right;"> 1.87 </td>
   <td style="text-align:right;"> 0.16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Rest of SouthEast </td>
   <td style="text-align:right;"> 19.01 </td>
   <td style="text-align:right;"> 26.29 </td>
   <td style="text-align:right;"> 1.83 </td>
   <td style="text-align:right;"> 0.20 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Scotland </td>
   <td style="text-align:right;"> 19.87 </td>
   <td style="text-align:right;"> 25.48 </td>
   <td style="text-align:right;"> 1.79 </td>
   <td style="text-align:right;"> 0.21 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> SouthWest </td>
   <td style="text-align:right;"> 18.72 </td>
   <td style="text-align:right;"> 25.52 </td>
   <td style="text-align:right;"> 1.89 </td>
   <td style="text-align:right;"> 0.16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wales </td>
   <td style="text-align:right;"> 19.12 </td>
   <td style="text-align:right;"> 25.36 </td>
   <td style="text-align:right;"> 1.88 </td>
   <td style="text-align:right;"> 0.16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> West Midlands </td>
   <td style="text-align:right;"> 18.82 </td>
   <td style="text-align:right;"> 25.39 </td>
   <td style="text-align:right;"> 1.81 </td>
   <td style="text-align:right;"> 0.18 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Yorks  &amp; Humberside </td>
   <td style="text-align:right;"> 18.72 </td>
   <td style="text-align:right;"> 24.93 </td>
   <td style="text-align:right;"> 1.84 </td>
   <td style="text-align:right;"> 0.16 </td>
  </tr>
</tbody>
</table>

---
# Risorse per tabelle in R
https://haozhu233.github.io/kableExtra/awesome_table_in_html.html

---
# Esercizio

1. Usare il dataset gapminder e creare una tabella con media `LifeEx` per ogni Continente e anno [Hint: usa `group_by` e `summarise`]
2. Trasforma il dataset in wide (ogni riga corrispondente ad un anno) [Hint: usa `pivot_wider`]
3. crea una tabella con kable in html 
---
# soluzione (1)

```r
library(gapminder)

TAB<-gapminder %>% 
  group_by(continent , year) %>% 
  summarise(LE=mean(lifeExp))
TAB
```

```
## # A tibble: 60 x 3
## # Groups:   continent [5]
##    continent  year    LE
##    <fct>     <int> <dbl>
##  1 Africa     1952  39.1
##  2 Africa     1957  41.3
##  3 Africa     1962  43.3
##  4 Africa     1967  45.3
##  5 Africa     1972  47.5
##  6 Africa     1977  49.6
##  7 Africa     1982  51.6
##  8 Africa     1987  53.3
##  9 Africa     1992  53.6
## 10 Africa     1997  53.6
## # … with 50 more rows
```
  
---
# soluzione (2)

```r
TAB %>% 
    pivot_wider(names_from = continent, values_from = LE) -> TAB
TAB
```

```
## # A tibble: 12 x 6
##     year Africa Americas  Asia Europe Oceania
##    <int>  <dbl>    <dbl> <dbl>  <dbl>   <dbl>
##  1  1952   39.1     53.3  46.3   64.4    69.3
##  2  1957   41.3     56.0  49.3   66.7    70.3
##  3  1962   43.3     58.4  51.6   68.5    71.1
##  4  1967   45.3     60.4  54.7   69.7    71.3
##  5  1972   47.5     62.4  57.3   70.8    71.9
##  6  1977   49.6     64.4  59.6   71.9    72.9
##  7  1982   51.6     66.2  62.6   72.8    74.3
##  8  1987   53.3     68.1  64.9   73.6    75.3
##  9  1992   53.6     69.6  66.5   74.4    76.9
## 10  1997   53.6     71.2  68.0   75.5    78.2
## 11  2002   53.3     72.4  69.2   76.7    79.7
## 12  2007   54.8     73.6  70.7   77.6    80.7
```

---
# soluzione (3)

```r
kable(TAB, 
      digits=2,
      caption = "Fonte: Gapminder",
      format = "html")
```

<table>
<caption>Fonte: Gapminder</caption>
 <thead>
  <tr>
   <th style="text-align:right;"> year </th>
   <th style="text-align:right;"> Africa </th>
   <th style="text-align:right;"> Americas </th>
   <th style="text-align:right;"> Asia </th>
   <th style="text-align:right;"> Europe </th>
   <th style="text-align:right;"> Oceania </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1952 </td>
   <td style="text-align:right;"> 39.14 </td>
   <td style="text-align:right;"> 53.28 </td>
   <td style="text-align:right;"> 46.31 </td>
   <td style="text-align:right;"> 64.41 </td>
   <td style="text-align:right;"> 69.25 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1957 </td>
   <td style="text-align:right;"> 41.27 </td>
   <td style="text-align:right;"> 55.96 </td>
   <td style="text-align:right;"> 49.32 </td>
   <td style="text-align:right;"> 66.70 </td>
   <td style="text-align:right;"> 70.30 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1962 </td>
   <td style="text-align:right;"> 43.32 </td>
   <td style="text-align:right;"> 58.40 </td>
   <td style="text-align:right;"> 51.56 </td>
   <td style="text-align:right;"> 68.54 </td>
   <td style="text-align:right;"> 71.09 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1967 </td>
   <td style="text-align:right;"> 45.33 </td>
   <td style="text-align:right;"> 60.41 </td>
   <td style="text-align:right;"> 54.66 </td>
   <td style="text-align:right;"> 69.74 </td>
   <td style="text-align:right;"> 71.31 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1972 </td>
   <td style="text-align:right;"> 47.45 </td>
   <td style="text-align:right;"> 62.39 </td>
   <td style="text-align:right;"> 57.32 </td>
   <td style="text-align:right;"> 70.78 </td>
   <td style="text-align:right;"> 71.91 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1977 </td>
   <td style="text-align:right;"> 49.58 </td>
   <td style="text-align:right;"> 64.39 </td>
   <td style="text-align:right;"> 59.61 </td>
   <td style="text-align:right;"> 71.94 </td>
   <td style="text-align:right;"> 72.85 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1982 </td>
   <td style="text-align:right;"> 51.59 </td>
   <td style="text-align:right;"> 66.23 </td>
   <td style="text-align:right;"> 62.62 </td>
   <td style="text-align:right;"> 72.81 </td>
   <td style="text-align:right;"> 74.29 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1987 </td>
   <td style="text-align:right;"> 53.34 </td>
   <td style="text-align:right;"> 68.09 </td>
   <td style="text-align:right;"> 64.85 </td>
   <td style="text-align:right;"> 73.64 </td>
   <td style="text-align:right;"> 75.32 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1992 </td>
   <td style="text-align:right;"> 53.63 </td>
   <td style="text-align:right;"> 69.57 </td>
   <td style="text-align:right;"> 66.54 </td>
   <td style="text-align:right;"> 74.44 </td>
   <td style="text-align:right;"> 76.94 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1997 </td>
   <td style="text-align:right;"> 53.60 </td>
   <td style="text-align:right;"> 71.15 </td>
   <td style="text-align:right;"> 68.02 </td>
   <td style="text-align:right;"> 75.51 </td>
   <td style="text-align:right;"> 78.19 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2002 </td>
   <td style="text-align:right;"> 53.33 </td>
   <td style="text-align:right;"> 72.42 </td>
   <td style="text-align:right;"> 69.23 </td>
   <td style="text-align:right;"> 76.70 </td>
   <td style="text-align:right;"> 79.74 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2007 </td>
   <td style="text-align:right;"> 54.81 </td>
   <td style="text-align:right;"> 73.61 </td>
   <td style="text-align:right;"> 70.73 </td>
   <td style="text-align:right;"> 77.65 </td>
   <td style="text-align:right;"> 80.72 </td>
  </tr>
</tbody>
</table>
# Boxplots

```r
uk1<-UKB %>% 
    filter(country!="NA") %>% 
    ggplot(aes(y=afb, x=region_gdp))
  
uk1+geom_boxplot()
```

<img src="lezione6_files/figure-html/unnamed-chunk-16-1.png" width="40%" />
---

```r
uk1+geom_boxplot(aes(fill=country))
```

<img src="lezione6_files/figure-html/unnamed-chunk-17-1.png" width="40%" />
---
# Ricodifico variabili (seleziono intervalli)

```r
min(UKB$year_birth)
```

```
## [1] 1938
```

```r
max(UKB$year_birth)
```

```
## [1] 1968
```

```r
years<-seq(from = 1935, to = 1970, by = 5)
years
```

```
## [1] 1935 1940 1945 1950 1955 1960 1965 1970
```

---
# Ricodifico variabili

```r
UKB<-UKB %>% mutate(birth_interval=cut(year_birth, breaks=years))
names(UKB)
```

# Ancora summary tables

```r
summ_table2<- UKB %>% 
            select(afs, afb, neb, childless, birth_interval) %>% 
            group_by(birth_interval) %>% 
            summarize_all(mean, na.rm=T)

summ_table2
```

```
## # A tibble: 7 x 5
##   birth_interval   afs   afb   neb childless
## * <fct>          <dbl> <dbl> <dbl>     <dbl>
## 1 (1935,1940]     20.9  24.4  2.12     0.121
## 2 (1940,1945]     20.2  24.5  2.01     0.125
## 3 (1945,1950]     19.3  24.9  1.89     0.148
## 4 (1950,1955]     18.4  25.6  1.76     0.194
## 5 (1955,1960]     18.0  26.2  1.66     0.235
## 6 (1960,1965]     18.1  26.9  1.55     0.265
## 7 (1965,1970]     17.8  27.4  1.34     0.336
```
---
# Creare binned scatter plot
### Un punto per ogni intervallo (media)

```r
summ_table3<- UKB %>% 
            select(aam, afs, afb, neb,teen,  childless, year_birth) %>% 
            group_by(year_birth) %>% 
            summarize_all(.funs= list(media  = ~mean(x=., na.rm=T), 
                                     dev.std = ~sd(x=., na.rm=T),
                                     N = ~sum(!is.na(x=.))
                                     ))
```
---
# Creo grafico da dataset wide

```r
graf0<-  summ_table3 %>% ggplot( aes(x=year_birth))+
  theme_bw()+
  labs(
  x="Anno di nascita",
  y= "Età",
  title="Comportamento riproduttivo in UK.",
  subtitle="Anni 1938-1968",
  caption= "Source: UK Biobank")

graf1<-graf0 +
  geom_point(aes(y=afs_media, col="AFS"))+
  geom_point(aes(y=afb_media, col="AFB"))+
  geom_line(aes(y=afs_media, col="AFS"))+
  geom_line(aes(y=afb_media, col="AFB"))
```
---

```r
graf1
```

<img src="lezione6_files/figure-html/unnamed-chunk-23-1.png" width="40%" />
---
# Modificare leggenda **manualmente**
`color` è un elemento di `aes()` in `geom_point` e `geom_line`

```r
*graf2<-graf1+scale_colour_manual("Variabile",
                        values = c('AFB' = 'turquoise',
                                   'AFS' = 'tan1'),
                        labels = c('AFB' = 'Età al primo figlio',
                                   'AFS' = 'Età al primo rapporto sessuale') )
```
---

```r
graf2
```

---
# geom_ribbon()

```r
graf3<-graf0+ geom_ribbon(aes(ymin=afb_media-1.96*afb_dev.std/sqrt(afb_N),
                         ymax=afb_media+1.96*afb_dev.std/sqrt(afb_N), fill="AFB"))+
       geom_ribbon(aes(ymin=afs_media-1.96*afs_dev.std/sqrt(afs_N),
                         ymax=afs_media+1.96*afs_dev.std/sqrt(afs_N), fill="AFS"))+
          scale_fill_manual("Variabile",
                        values = c('AFB' = 'turquoise',
                                   'AFS' = 'tan1'),
                        labels = c('AFB' = 'Età al primo figlio',
                                   'AFS' = 'Età al primo rapporto sessuale') )
```

---

```r
graf3
```

<img src="lezione6_files/figure-html/unnamed-chunk-27-1.png" width="40%" />
---
# Esercizio
1. Visualizza con `geom_ribbon` or `geom_errorbar` l'andamento della proporzione di donne senza figli. ( variabile `childless` )

---

# soluzione 1

```r
graf0+ geom_ribbon(aes(ymin=childless_media-1.96*childless_dev.std/sqrt(childless_N),
                         ymax=childless_media+1.96*childless_dev.std/sqrt(childless_N)))+ylab("Proporzione")
```

---

# Density plots

```r
uk2<-UKB %>% 
    filter(country!="NA") %>% 
    ggplot(aes(x=afb, group=birth_interval))
           
uk2+geom_density(aes(fill=birth_interval), alpha=.3)
```

<img src="lezione6_files/figure-html/unnamed-chunk-29-1.png" width="40%" />
---
# Violin Plots

```r
uk2+geom_violin(aes(y=afb, x=birth_interval))
```

<img src="lezione6_files/figure-html/unnamed-chunk-30-1.png" width="40%" />
---
# Ridge plots

```r
library(ggridges)

uk3<-ggplot(UKB, aes(x = afb, y = birth_interval, fill = birth_interval)) +
  geom_density_ridges() +
  theme_ridges() + 
  theme(legend.position = "none")
```
---

```r
uk3
```

---

```r
library(ggridges)
library(ggplot2)
library(viridis)

# Plot
uk4<-ggplot(UKB, aes(x = afb, y = birth_interval, fill = ..x..)) +
  geom_density_ridges_gradient() +
  scale_fill_viridis_c(name = "Anni") +
  labs(title = 'Età al primo figlio nel Regno Unito', caption="Dati UK Biobank")+ 
  xlab("Età") +
  ylab("Anno di nascita")
```
---

```r
uk4
```

---
# Matrici di correlazione

```r
UKB %>% select("aam","afb", "afs", "neb", "bmi", "num_siblings", "educ", "birthweight", "smoking_pregnancy_mom") %>% cor(use="complete.obs") -> corMAT

round(corMAT,2)
```

```
##                         aam   afb   afs   neb   bmi
## aam                    1.00  0.04  0.08  0.01 -0.16
## afb                    0.04  1.00  0.37 -0.27 -0.14
## afs                    0.08  0.37  1.00 -0.07 -0.06
## neb                    0.01 -0.27 -0.07  1.00  0.08
## bmi                   -0.16 -0.14 -0.06  0.08  1.00
## num_siblings           0.06 -0.15 -0.02  0.10  0.06
## educ                  -0.02  0.27  0.13 -0.05 -0.09
## birthweight            0.00  0.03  0.00  0.00  0.00
## smoking_pregnancy_mom -0.02 -0.09 -0.10  0.00  0.07
##                       num_siblings  educ birthweight
## aam                           0.06 -0.02        0.00
## afb                          -0.15  0.27        0.03
## afs                          -0.02  0.13        0.00
## neb                           0.10 -0.05        0.00
## bmi                           0.06 -0.09        0.00
## num_siblings                  1.00 -0.16       -0.01
## educ                         -0.16  1.00        0.02
## birthweight                  -0.01  0.02        1.00
## smoking_pregnancy_mom         0.04 -0.09       -0.09
##                       smoking_pregnancy_mom
## aam                                   -0.02
## afb                                   -0.09
## afs                                   -0.10
## neb                                    0.00
## bmi                                    0.07
## num_siblings                           0.04
## educ                                  -0.09
## birthweight                           -0.09
## smoking_pregnancy_mom                  1.00
```
---
# Correlation plot

```r
library(corrplot)
corrplot(corMAT, method="color")
```

<img src="lezione6_files/figure-html/unnamed-chunk-36-1.png" width="40%" />
---

```r
library(corrplot)
corrplot(corMAT, type="upper")
```

<img src="lezione6_files/figure-html/unnamed-chunk-37-1.png" width="40%" />
---

```r
corrplot(corMAT, type="upper", order="hclust")
```

<img src="lezione6_files/figure-html/unnamed-chunk-38-1.png" width="40%" />
---

```r
uk3<-ggplot(data=UKB, aes(x=afs, y=afb, col=year_birth))
uk3+geom_point(alpha=.2)
```

<img src="lezione6_files/figure-html/unnamed-chunk-39-1.png" width="40%" />
---
# Rimuovo valori implausibili

```r
uk3<-ggplot(data=subset(UKB, afb>=afs & afs>15 & afb>15), aes(x=afs, y=afb, col=year_birth))

uk3+geom_point(alpha=0.1)
```

<img src="lezione6_files/figure-html/unnamed-chunk-40-1.png" width="40%" />
---

```r
uk3+
  geom_point(alpha=0.1)+
  geom_smooth(method="lm", se=FALSE)
```

<img src="lezione6_files/figure-html/unnamed-chunk-41-1.png" width="40%" />
---

```r
uk3+geom_smooth(aes(col=birth_interval),method="lm", se=FALSE)
```

---

```r
library(GGally)
median(UKB$year_birth)
```

```
## [1] 1950
```

```r
UKB_sel<- UKB %>% 
  mutate(year_binary=(year_birth>=1950)) %>% 
  filter(afb>=afs & afs>15 & afb>15) %>%            
  select(afb, afs, neb , aam, year_binary)
```
---

```r
ggpairs(UKB_sel, aes(col=year_binary))
```

<img src="lezione6_files/figure-html/unnamed-chunk-44-1.png" width="40%" />
---