Laboratorio Bio-demografico

class: center, middle, inverse, title-slide

# Laboratorio Bio-demografico
## Lezione 11 - Modello di regressione lineare
### Nicola Barban <br> Alma Mater Studiorum Università di Bologna <br> Dipartimento di Scienze Statistiche
### 10 Marzo 2021<br> <img src="../img/UniBo-Universita-di-Bologna.png" style="width:20.0%" />

---

# Outline

- Modello di regressione lineare
- Variabili dummy
- regressione multipla
- Log-linear model
- "Fixed effects"

---
# Cenni storici
.pull-left[
* Il termine regressione è stato coniato dallo statistico Francis Galton.
* Galton osservò che genitori alti tendenzialemente avevano  figli più bassi e genitori bassi avevano figli più alti *regression to the mean*

* Il suo amico Karl Pearson raccolse  statistiche sull'altezza di membri della stessa famiglia. Trovò che figli di padri alti tendevano ad essere in media più altri dell'altezza media dei loro padri.

]
.pull-right[
![](images/galton_reg.png)

]

---

# Modello regressione lineare (bivariato)

`$$Y_i=\alpha + \beta X_i +\epsilon_i$$`

* `$Y_i$` variabile dipendente
* `$X_i$` variabile indipendente
* `$\epsilon_i$` residui i.i.d.
  - indipendenti
  - identicamente distribuiti
* `$E[Y_i| X_i=x]=$` Valore atteso condizionato.
* `$E[Y_i|X_i]=$` Insieme di tutti i valori attesi di Y dato X. *Conditional Expectation Function (CEF)*
---

# Metodo dei minimi quadrati (OLS)

`$$E[Yi|Xi] = α + β × X_i$$`

L'intercetta e la pendenza della retta di regressione sono i valori di `$\alpha$` e `$\beta$`  che minimizzano il *residual sum of squares:*

`$$RSS(\alpha, \beta) = E[Y_i − \alpha −\beta X_i]$$`

---

`$$RSS=\sum_{i=1}^N (\tilde{Y_i}-Y_i)^2$$`
![](images/xy08.pdf)
---
# Parametri di regressione

La soluzione per il caso bivariato è:
* `$\beta=  \frac{COV(Y_i, X_i)}{V(X_i)}$`
* `$\alpha=  E[Y_i]-\beta E[X_i].$`

* Un implicazione importante  è che se X e Y sono indipendenti,  **incorrelate** ( covarianza= 0) allora `$\beta=0$`
* L'intercetta `$\alpha$` indica il valore atteso di `$Y_i$` se `$X_i=0$`, i.e. `$E[Y_i|X_i=0]$`
*  La pendenza  `$\beta$` indica  l'effetto atteso di un cambio unitario in `$X_i$` su `$E[Y_i|X_i]$`, i.e. `$E[E[Y_i|X_i]-E[Y_i|X_i -1]]$`.

---
class: center

---
# Regression for dummies

Il valore atteso condizionato di  `$Y_i$` data una variabile dummy  `$Z_i={0,1}$`  che consiste in due valori è:

*	`$E[Y_i| Z_i]= \alpha + \beta \times Z_i$`

I cui parametri consistono in:

* `$E[Y_i| Z_i=0]=   \alpha$`
* `$E[Y_i| Z_i=1]=   \alpha + \beta$`

Quindi:
*	`$\beta=E[Y_i| Z_i=1]-E[Y_i| Z_i=0]$`

---
# Tidymodels Package

```r
# install.packages("tidymodels")
library(tidymodels)
```

```
#> ── Attaching packages ───────────────────────────────── tidymodels 0.0.1 ──
#> ✔ ggplot2   3.0.0     ✔ recipes   0.1.3
#> ✔ tibble    1.4.2     ✔ broom     0.5.0
#> ✔ purrr     0.2.5     ✔ yardstick 0.0.1
#> ✔ dplyr     0.7.6     ✔ infer     0.3.0
#> ✔ rsample   0.0.2
#> ── Conflicts ──────────────────────────────────── tidymodels_conflicts() ──
#> ✖ rsample::fill() masks tidyr::fill()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ recipes::step() masks stats::step()
```

---

```r
bin_mod<-lm(kid_score~mom_hs, data=kids)

kids %>%
  ggplot(aes(x=mom_hs, y=kid_score)) +
  geom_point() +
  theme_bw()+
  labs(
    title = "Test score and Mother HS",
    x= "Mother High School",
    y= "Test score",
    caption = "Data from Hill and Gelman 2007")+
  geom_hline(yintercept=coef(bin_mod)[1], col="darkgreen") +
    geom_hline(yintercept=coef(bin_mod)[1]+coef(bin_mod)[2], col="darkblue")
```
---
# Interactions
`$$Kids.score =\alpha + \beta Mom.IQ + \delta Mom.HS + \gamma Mom.IQ \times Mom.HS +\epsilon$$`

---
class: center

<img src="lezione11_files/figure-html/unnamed-chunk-4-1.png" width="80%" />
---
# Linear model

```r
library(broom)
lmfit<-lm(kid_score ~ mom_iq+mom_hs,  data=kids)
lmfit
```

```
## 
## Call:
## lm(formula = kid_score ~ mom_iq + mom_hs, data = kids)
## 
## Coefficients:
## (Intercept)       mom_iq      mom_hs1  
##     25.7315       0.5639       5.9501
```

---

---
# Linear model e *tidyverse*

```r
library(broom)
lmfit<-lm(kid_score ~ mom_iq+mom_hs,  data=kids)
tidy(lmfit)
```

```
## # A tibble: 3 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   25.7      5.88        4.38 1.49e- 5
## 2 mom_iq         0.564    0.0606      9.31 6.61e-19
## 3 mom_hs1        5.95     2.21        2.69 7.42e- 3
```

---
# Deriving statistics from LM **glance()**

```r
glance(lmfit)
```

```
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.214         0.210  18.1      58.7 2.79e-23     2 -1872. 3752. 3768.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```
---

# Predicted values from LM **augment()**

```r
augment(lmfit)
```

```
## # A tibble: 434 x 9
##    kid_score mom_iq mom_hs .fitted .resid .std.resid    .hat .sigma   .cooksd
##        <dbl>  <dbl> <fct>    <dbl>  <dbl>      <dbl>   <dbl>  <dbl>     <dbl>
##  1        65  121.  1        100.  -35.0      -1.94  0.00692   18.1 0.00870  
##  2        98   89.4 1         82.1  15.9       0.880 0.00477   18.1 0.00124  
##  3        85  115.  1         96.8 -11.8      -0.651 0.00489   18.1 0.000694 
##  4        83   99.4 1         87.8  -4.76     -0.263 0.00302   18.2 0.0000698
##  5       115   92.7 1         84.0  31.0       1.71  0.00393   18.1 0.00386  
##  6        98  108.  0         86.6  11.4       0.634 0.0136    18.1 0.00185  
##  7        69  139.  1        110.  -41.0      -2.28  0.0179    18.0 0.0317   
##  8       106  125.  1        102.    3.75      0.208 0.00880   18.2 0.000128 
##  9       102   81.6 1         77.7  24.3       1.34  0.00766   18.1 0.00465  
## 10        95   95.1 1         85.3   9.71      0.536 0.00350   18.2 0.000337 
## # … with 424 more rows
```

---
# Predict outcome da altri valori di X

```r
newdata <- kids %>%
  mutate(mom_iq = rnorm(434, mean=100, sd=15))
augment(lmfit, newdata = newdata)
```

```
## # A tibble: 434 x 7
##    kid_score mom_hs mom_iq mom_work mom_age .fitted .resid
##        <dbl> <fct>   <dbl>    <dbl>   <dbl>   <dbl>  <dbl>
##  1        65 1        87.0        4      27    80.8 -15.8 
##  2        98 1       103.         4      25    89.9   8.10
##  3        85 1        96.6        4      27    86.1  -1.14
##  4        83 1        88.3        3      25    81.5   1.54
##  5       115 1       109.         4      27    93.3  21.7 
##  6        98 0        97.2        1      18    80.6  17.4 
##  7        69 1        88.0        4      20    81.3 -12.3 
##  8       106 1       101.         3      23    88.9  17.1 
##  9       102 1        92.9        1      24    84.1  17.9 
## 10        95 1       106.         1      19    91.4   3.58
## # … with 424 more rows
```

---

# Grafico coefficienti

```r
tidy(lmfit, conf.int = TRUE) %>%
ggplot( aes(estimate, term, xmin = conf.low, xmax = conf.high, height = 0)) +
  geom_point() +
  geom_vline(xintercept = 0, lty = 4) +
  geom_errorbarh()
```

---

```r
tidy(lmfit, conf.int = TRUE) %>%
ggplot( aes(estimate, term, xmin = conf.low, xmax = conf.high, height = 0)) +
  geom_point() +
  geom_vline(xintercept = 0, lty = 4) +
  geom_errorbarh()+
  coord_flip()
```

<img src="lezione11_files/figure-html/unnamed-chunk-11-1.png" width="80%" />
---
# Le Variabli possono essere  rescalate

```r
  lm(kid_score ~ mom_iq, data=kids) %>%
  tidy()
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   25.8      5.92        4.36 1.63e- 5
## 2 mom_iq         0.610    0.0585     10.4  7.66e-23
```

```r
kids %>%
  mutate(mom_iq_rescaled=mom_iq/10) ->kids

lm(kid_score ~ mom_iq_rescaled, data=kids) %>%
  tidy()
```

```
## # A tibble: 2 x 5
##   term            estimate std.error statistic  p.value
##   <chr>              <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)        25.8      5.92       4.36 1.63e- 5
## 2 mom_iq_rescaled     6.10     0.585     10.4  7.66e-23
```
---

# `$\beta$` è invariante rispetto a spostamenti di scala

```r
  lm(kid_score ~ mom_iq, data=kids) %>%
  tidy()
```

```r
kids %>%
  mutate(mom_iq_rescaled2=mom_iq+100) ->kids

lm(kid_score ~ mom_iq_rescaled2, data=kids) %>%
  tidy()
```

```
## # A tibble: 2 x 5
##   term             estimate std.error statistic  p.value
##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)       -35.2     11.7        -3.00 2.87e- 3
## 2 mom_iq_rescaled2    0.610    0.0585     10.4  7.66e-23
```
---

# Trasformazioni logaritmiche

|        	| `$X$`                                  	| `$logX$`                                   	|
|--------	|--------------------------------------	|------------------------------------------	|
| `$Y$`    	| linear                               	| linear-log                               	|
|        	| `$\tilde{Y_i}=\alpha + \beta X_i$`     	| `$\tilde{Y_i}=\alpha + \beta log X_i$`     	|
|
| `$logY$` 	| log-linear                           	| log-log                                  	|
|        	| `$log \tilde{Y_i}=\alpha + \beta X_i$` 	| `$log \tilde{Y_i}=\alpha + \beta log X_i$` 	|

**natural logarithms,  `$e ≈ 2.71828$`**

---
# perchè usare trasformazioni logarimiche:

* E' comune usare trasformazioni logaritmiche quando  esistono relazioni non-lineare tra variabile dipendente e indipendente

* Trasformazioni logaritmiche sono usate inoltre per trasformare variabili asimmetriche. (e.g. reddito, dati di spesa ecc. )

---
# Modello Linear-log
* aumento di 1 unita in `$logX$` è associato con aumento atteso di `$\beta$` unità in Y.
* `$log X + 1 = log X + log e = log(eX )$`
* aggiungere   1 a $ X$ significa moltiplicare  `$X$` di `$e ≈ 2.72$`.
* Il cambio atteso in `$Y$` associato con un `$p%$` aumento in `$X$` può essere calcolato come `$\beta \times log([100 + p]/100)$`.
* Per valori piccoli di `$p$`, Si può usare la seguente approssimazione: `$log([100 + p]/100) ≈ p/100$`. Quindi per `$p=1$`, **L'aumento atteso in `$Y$` per un aumento di `$1$`% in `$X$` è uguale a:
`$\frac{\beta}{100}$`.**

---

# Modello Linear-log
* Aumento di 1 unita in `$X$` è associato con aumento atteso di `$e^\beta$` unità in `$Y$`.
* Il cambio atteso in `$Y$` associato con un aumento in `$X$` di `$c$` unità può essere calcolato come `$e^{c \beta}$`.
* Per valori piccoli di `$\tilde{\beta}$`, Si può usare la seguente approssimazione: `$e^\tilde{\beta} ≈ 1+\tilde{\beta}$`. **Quindi `$100 \times β$` è il cambio atteso in `$Y$` associato all'aumento di 1 unità in `$X$`.**

### Esempio:
*  `$β = 0,06$` quindi `$e^{0,06} ≈ 1.06$`
*   cambio di un'unità  in `$X$` corresponde ad un aumento in `$Y$` del `$6$`%.
---

```r
library(gapminder)
*library(broom)
data(gapminder)

gapminder$LogGdp=log(gapminder$gdpPercap)
linlog<-lm(lifeExp~LogGdp, data=gapminder)
tidy(linlog)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    -9.10     1.23      -7.41 1.93e-13
## 2 LogGdp          8.41     0.149     56.5  0.
```
### aumento di 1% in `gdpPercap` è associato con aumento  di 0.0841 anni in aspettativa di vita.
---
# Modello Log-lineare

```r
loglin<-lm(LogGdp~lifeExp, data=gapminder)
tidy(loglin)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)   3.54     0.0836       42.4 1.20e-268
## 2 lifeExp       0.0776   0.00137      56.5 0.
```
### aumento di 1 anno in `lifeExp`  è associato con aumento  di 0.0776%  in `gdpPercap`

---
# Effetti fissi

### Potremmo definire un'intercetta per ogni paese
 * Cattura caratteristiche specifiche del paese invarianti nel tempo

```r
linlog2<-lm(lifeExp~log(gdpPercap)+as_factor(country), data=gapminder)
tidy(linlog2)[1:2,]
```

```
## # A tibble: 2 x 5
##   term           estimate std.error statistic   p.value
##   <chr>             <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)      -27.8      2.50      -11.1 1.22e- 27
## 2 log(gdpPercap)     9.77     0.297      32.9 3.80e-181
```
---

# UK Biobank data

```r
library(haven)
UKB<-read_dta("data/UKB_teaching.dta")
```
---

```r
names(UKB)
```

```
##  [1] "sex"                   "year_birth"            "neb"                  
##  [4] "childless"             "smoking_pregnancy_mom" "educ"                 
##  [7] "aam"                   "afs"                   "afb"                  
## [10] "meno"                  "birthweight"           "teen"                 
## [13] "bmi"                   "num_siblings"          "country"              
## [16] "gdp_spl"               "region_gdp"            "pill_lad"             
## [19] "idn"
```

---
# Variabili nel dataset

* `neb` Number of childrem               
* `childless` (1= childless)          
* `smoking_pregnancy_mom` Mother smoked during pregnancy
* `educ`        Years of education         
*  `aam`    Age at Menarche
* `afs`      Age at First Sexual Intercourse            
* `afb`  Age at First Birth
* `meno`      Age at Natural menopause           
* `birthweight` Birthweight
* `teen`        Teenage childbearing (1 yes, 0 no)         
* `bmi`     Body Mass Index
* `num_siblings`     Number of Siblings    
* `country`     Country (England, Scotland, Wales)
* `gdp_spl`          regional GDP at birth (aprox)     
* `region_gdp`   Region of Birth
* `pill_lad`          Pill availability (% of women had use pill at age 18 in local Authority)   
* `idn` Personal ID number
---
# Regression model 1

```r
UKmod1<-lm(afb~pill_lad, data=UKB)

UKmod1 %>% 
summary()
```

```
## 
## Call:
## lm(formula = afb ~ pill_lad, data = UKB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9813  -3.2164  -0.1568   2.7935  24.5899 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.455e+01  1.776e-02  1382.0   <2e-16 ***
## pill_lad    3.766e-02  5.554e-04    67.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.443 on 110223 degrees of freedom
##   (51243 observations deleted due to missingness)
## Multiple R-squared:  0.04004,	Adjusted R-squared:  0.04003 
## F-statistic:  4597 on 1 and 110223 DF,  p-value: < 2.2e-16
```
---
# Regression model (con effetti fissi regionali)

```r
UKmod2<-lm(afb~pill_lad+I(region_gdp), data=UKB)

UKmod2 %>% 
summary()
```

```
## 
## Call:
## lm(formula = afb ~ pill_lad + I(region_gdp), data = UKB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9763  -3.1390  -0.1582   2.8222  25.0723 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      24.3591488  0.0502177 485.071  < 2e-16 ***
## pill_lad                          0.0367253  0.0005534  66.361  < 2e-16 ***
## I(region_gdp)London Counties      1.0078269  0.0642521  15.686  < 2e-16 ***
## I(region_gdp)North               -0.2475762  0.0548552  -4.513 6.39e-06 ***
## I(region_gdp)Rest of SouthEast    1.0520312  0.0645004  16.310  < 2e-16 ***
## I(region_gdp)Scotland             0.4537346  0.0664524   6.828 8.66e-12 ***
## I(region_gdp)SouthWest            0.3277391  0.0691255   4.741 2.13e-06 ***
## I(region_gdp)Wales                0.2354068  0.0777468   3.028 0.002463 ** 
## I(region_gdp)West Midlands        0.2440044  0.0661769   3.687 0.000227 ***
## I(region_gdp)Yorks  & Humberside -0.1846774  0.0604079  -3.057 0.002235 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.418 on 110215 degrees of freedom
##   (51243 observations deleted due to missingness)
## Multiple R-squared:  0.05104,	Adjusted R-squared:  0.05096 
## F-statistic: 658.7 on 9 and 110215 DF,  p-value: < 2.2e-16
```
---
# Regression model (background vars )

```r
UKmod3<-lm(afb~pill_lad+
             I(region_gdp)+
             smoking_pregnancy_mom+
             birthweight, data=UKB)

UKmod3 %>% 
summary()
```

```
## 
## Call:
## lm(formula = afb ~ pill_lad + I(region_gdp) + smoking_pregnancy_mom + 
##     birthweight, data = UKB)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.373  -3.036  -0.115   2.766  24.547 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      24.190146   0.114286 211.662  < 2e-16 ***
## pill_lad                          0.038899   0.000702  55.409  < 2e-16 ***
## I(region_gdp)London Counties      1.007286   0.084362  11.940  < 2e-16 ***
## I(region_gdp)North               -0.132193   0.072832  -1.815  0.06952 .  
## I(region_gdp)Rest of SouthEast    1.072189   0.084805  12.643  < 2e-16 ***
## I(region_gdp)Scotland             0.555858   0.088170   6.304 2.91e-10 ***
## I(region_gdp)SouthWest            0.245309   0.090447   2.712  0.00669 ** 
## I(region_gdp)Wales                0.296731   0.102652   2.891  0.00385 ** 
## I(region_gdp)West Midlands        0.282801   0.087740   3.223  0.00127 ** 
## I(region_gdp)Yorks  & Humberside -0.177092   0.079409  -2.230  0.02574 *  
## smoking_pregnancy_mom            -1.137578   0.038741 -29.363  < 2e-16 ***
## birthweight                       0.164841   0.028310   5.823 5.82e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.406 on 62605 degrees of freedom
##   (98851 observations deleted due to missingness)
## Multiple R-squared:  0.07183,	Adjusted R-squared:  0.07167 
## F-statistic: 440.4 on 11 and 62605 DF,  p-value: < 2.2e-16
```

---
# Coefficient plot

```r
library(coefplot)
multiplot(UKmod1, UKmod2, UKmod3, coefficients="pill_lad")+
  coord_flip()
```

<img src="lezione11_files/figure-html/unnamed-chunk-22-1.png" width="80%" />
---
# Regression model (interactions )

```r
UKmod4<-lm(afb~pill_lad*birthweight+
             I(region_gdp)+
             smoking_pregnancy_mom  ,data=UKB)

UKmod4 %>% 
summary()
```

```
## 
## Call:
## lm(formula = afb ~ pill_lad * birthweight + I(region_gdp) + smoking_pregnancy_mom, 
##     data = UKB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3527  -3.0201  -0.1072   2.7753  24.6270 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      24.682553   0.140551 175.613  < 2e-16 ***
## pill_lad                          0.016302   0.003821   4.266 1.99e-05 ***
## birthweight                       0.012164   0.038016   0.320  0.74898    
## I(region_gdp)London Counties      1.006663   0.084338  11.936  < 2e-16 ***
## I(region_gdp)North               -0.131131   0.072812  -1.801  0.07171 .  
## I(region_gdp)Rest of SouthEast    1.070450   0.084782  12.626  < 2e-16 ***
## I(region_gdp)Scotland             0.561074   0.088150   6.365 1.97e-10 ***
## I(region_gdp)SouthWest            0.249800   0.090425   2.763  0.00574 ** 
## I(region_gdp)Wales                0.296034   0.102624   2.885  0.00392 ** 
## I(region_gdp)West Midlands        0.282969   0.087715   3.226  0.00126 ** 
## I(region_gdp)Yorks  & Humberside -0.176468   0.079386  -2.223  0.02623 *  
## smoking_pregnancy_mom            -1.129728   0.038752 -29.152  < 2e-16 ***
## pill_lad:birthweight              0.006934   0.001153   6.015 1.80e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.405 on 62604 degrees of freedom
##   (98851 observations deleted due to missingness)
## Multiple R-squared:  0.07237,	Adjusted R-squared:  0.07219 
## F-statistic:   407 on 12 and 62604 DF,  p-value: < 2.2e-16
```

---
# Interaction Plot

```r
#install.packages("interactions")
library(interactions)
interact_plot(UKmod4, pred = pill_lad, modx = birthweight)
```

<img src="lezione11_files/figure-html/unnamed-chunk-24-1.png" width="80%" />
---
# Interactions with discrete var

```r
UKB<-UKB %>% 
  mutate(country=na_if(country, "NA"))

UKmod5<-lm(afb~pill_lad*I(country)+
             smoking_pregnancy_mom,data=UKB)

UKmod5 %>% 
summary()
```

```
## 
## Call:
## lm(formula = afb ~ pill_lad * I(country) + smoking_pregnancy_mom, 
##     data = UKB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.2540  -2.9987  -0.0547   2.8795  24.1550 
## 
## Coefficients:
##                               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                 24.9186609  0.0225216 1106.436  < 2e-16 ***
## pill_lad                     0.0389557  0.0006297   61.869  < 2e-16 ***
## I(country)Scotland           0.2017957  0.0661121    3.052  0.00227 ** 
## I(country)Wales              0.2331454  0.0880612    2.648  0.00811 ** 
## smoking_pregnancy_mom       -1.1819904  0.0310312  -38.090  < 2e-16 ***
## pill_lad:I(country)Scotland  0.0066229  0.0022529    2.940  0.00329 ** 
## pill_lad:I(country)Wales    -0.0080215  0.0027280   -2.940  0.00328 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.413 on 95289 degrees of freedom
##   (66172 observations deleted due to missingness)
## Multiple R-squared:  0.05618,	Adjusted R-squared:  0.05612 
## F-statistic: 945.4 on 6 and 95289 DF,  p-value: < 2.2e-16
```

---

```r
interact_plot(UKmod5, pred = pill_lad, modx = country )
```

---