class: center, middle, inverse, title-slide # Laboratorio Bio-demografico ## Lezione 11 - Modello di regressione lineare ### Nicola Barban
Alma Mater Studiorum Università di Bologna
Dipartimento di Scienze Statistiche ### 10 Marzo 2021
--- # Outline - Modello di regressione lineare - Variabili dummy - regressione multipla - Log-linear model - "Fixed effects" --- # Cenni storici .pull-left[ * Il termine regressione è stato coniato dallo statistico Francis Galton. * Galton osservò che genitori alti tendenzialemente avevano figli più bassi e genitori bassi avevano figli più alti *regression to the mean* * Il suo amico Karl Pearson raccolse statistiche sull'altezza di membri della stessa famiglia. Trovò che figli di padri alti tendevano ad essere in media più altri dell'altezza media dei loro padri. ] .pull-right[  ] --- # Modello regressione lineare (bivariato) `$$Y_i=\alpha + \beta X_i +\epsilon_i$$` * `\(Y_i\)` variabile dipendente * `\(X_i\)` variabile indipendente * `\(\epsilon_i\)` residui i.i.d. - indipendenti - identicamente distribuiti * `\(E[Y_i| X_i=x]=\)` Valore atteso condizionato. * `\(E[Y_i|X_i]=\)` Insieme di tutti i valori attesi di Y dato X. *Conditional Expectation Function (CEF)* --- # Metodo dei minimi quadrati (OLS) `$$E[Yi|Xi] = α + β × X_i$$` L'intercetta e la pendenza della retta di regressione sono i valori di `\(\alpha\)` e `\(\beta\)` che minimizzano il *residual sum of squares:* `$$RSS(\alpha, \beta) = E[Y_i − \alpha −\beta X_i]$$` --- `$$RSS=\sum_{i=1}^N (\tilde{Y_i}-Y_i)^2$$`  --- # Parametri di regressione La soluzione per il caso bivariato è: * `\(\beta= \frac{COV(Y_i, X_i)}{V(X_i)}\)` * `\(\alpha= E[Y_i]-\beta E[X_i].\)` * Un implicazione importante è che se X e Y sono indipendenti, **incorrelate** ( covarianza= 0) allora `\(\beta=0\)` * L'intercetta `\(\alpha\)` indica il valore atteso di `\(Y_i\)` se `\(X_i=0\)`, i.e. `\(E[Y_i|X_i=0]\)` * La pendenza `\(\beta\)` indica l'effetto atteso di un cambio unitario in `\(X_i\)` su `\(E[Y_i|X_i]\)`, i.e. `\(E[E[Y_i|X_i]-E[Y_i|X_i -1]]\)`. --- class: center <img src="lezione11_files/figure-html/unnamed-chunk-1-1.png" width="80%" /> --- # Regression for dummies Il valore atteso condizionato di `\(Y_i\)` data una variabile dummy `\(Z_i={0,1}\)` che consiste in due valori è: * `\(E[Y_i| Z_i]= \alpha + \beta \times Z_i\)` I cui parametri consistono in: * `\(E[Y_i| Z_i=0]= \alpha\)` * `\(E[Y_i| Z_i=1]= \alpha + \beta\)` Quindi: * `\(\beta=E[Y_i| Z_i=1]-E[Y_i| Z_i=0]\)` --- # Tidymodels Package ```r # install.packages("tidymodels") library(tidymodels) ``` ``` #> ── Attaching packages ───────────────────────────────── tidymodels 0.0.1 ── #> ✔ ggplot2 3.0.0 ✔ recipes 0.1.3 #> ✔ tibble 1.4.2 ✔ broom 0.5.0 #> ✔ purrr 0.2.5 ✔ yardstick 0.0.1 #> ✔ dplyr 0.7.6 ✔ infer 0.3.0 #> ✔ rsample 0.0.2 #> ── Conflicts ──────────────────────────────────── tidymodels_conflicts() ── #> ✖ rsample::fill() masks tidyr::fill() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() ``` --- ```r bin_mod<-lm(kid_score~mom_hs, data=kids) kids %>% ggplot(aes(x=mom_hs, y=kid_score)) + geom_point() + theme_bw()+ labs( title = "Test score and Mother HS", x= "Mother High School", y= "Test score", caption = "Data from Hill and Gelman 2007")+ geom_hline(yintercept=coef(bin_mod)[1], col="darkgreen") + geom_hline(yintercept=coef(bin_mod)[1]+coef(bin_mod)[2], col="darkblue") ``` --- # Interactions `$$Kids.score =\alpha + \beta Mom.IQ + \delta Mom.HS + \gamma Mom.IQ \times Mom.HS +\epsilon$$` --- class: center <img src="lezione11_files/figure-html/unnamed-chunk-4-1.png" width="80%" /> --- # Linear model ```r library(broom) lmfit<-lm(kid_score ~ mom_iq+mom_hs, data=kids) lmfit ``` ``` ## ## Call: ## lm(formula = kid_score ~ mom_iq + mom_hs, data = kids) ## ## Coefficients: ## (Intercept) mom_iq mom_hs1 ## 25.7315 0.5639 5.9501 ``` --- --- # Linear model e *tidyverse* ```r library(broom) lmfit<-lm(kid_score ~ mom_iq+mom_hs, data=kids) tidy(lmfit) ``` ``` ## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 25.7 5.88 4.38 1.49e- 5 ## 2 mom_iq 0.564 0.0606 9.31 6.61e-19 ## 3 mom_hs1 5.95 2.21 2.69 7.42e- 3 ``` --- # Deriving statistics from LM **glance()** ```r glance(lmfit) ``` ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.214 0.210 18.1 58.7 2.79e-23 2 -1872. 3752. 3768. ## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` --- # Predicted values from LM **augment()** ```r augment(lmfit) ``` ``` ## # A tibble: 434 x 9 ## kid_score mom_iq mom_hs .fitted .resid .std.resid .hat .sigma .cooksd ## <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 65 121. 1 100. -35.0 -1.94 0.00692 18.1 0.00870 ## 2 98 89.4 1 82.1 15.9 0.880 0.00477 18.1 0.00124 ## 3 85 115. 1 96.8 -11.8 -0.651 0.00489 18.1 0.000694 ## 4 83 99.4 1 87.8 -4.76 -0.263 0.00302 18.2 0.0000698 ## 5 115 92.7 1 84.0 31.0 1.71 0.00393 18.1 0.00386 ## 6 98 108. 0 86.6 11.4 0.634 0.0136 18.1 0.00185 ## 7 69 139. 1 110. -41.0 -2.28 0.0179 18.0 0.0317 ## 8 106 125. 1 102. 3.75 0.208 0.00880 18.2 0.000128 ## 9 102 81.6 1 77.7 24.3 1.34 0.00766 18.1 0.00465 ## 10 95 95.1 1 85.3 9.71 0.536 0.00350 18.2 0.000337 ## # … with 424 more rows ``` --- # Predict outcome da altri valori di X ```r newdata <- kids %>% mutate(mom_iq = rnorm(434, mean=100, sd=15)) augment(lmfit, newdata = newdata) ``` ``` ## # A tibble: 434 x 7 ## kid_score mom_hs mom_iq mom_work mom_age .fitted .resid ## <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 65 1 87.0 4 27 80.8 -15.8 ## 2 98 1 103. 4 25 89.9 8.10 ## 3 85 1 96.6 4 27 86.1 -1.14 ## 4 83 1 88.3 3 25 81.5 1.54 ## 5 115 1 109. 4 27 93.3 21.7 ## 6 98 0 97.2 1 18 80.6 17.4 ## 7 69 1 88.0 4 20 81.3 -12.3 ## 8 106 1 101. 3 23 88.9 17.1 ## 9 102 1 92.9 1 24 84.1 17.9 ## 10 95 1 106. 1 19 91.4 3.58 ## # … with 424 more rows ``` --- # Grafico coefficienti ```r tidy(lmfit, conf.int = TRUE) %>% ggplot( aes(estimate, term, xmin = conf.low, xmax = conf.high, height = 0)) + geom_point() + geom_vline(xintercept = 0, lty = 4) + geom_errorbarh() ``` <img src="lezione11_files/figure-html/unnamed-chunk-10-1.png" width="80%" /> --- ```r tidy(lmfit, conf.int = TRUE) %>% ggplot( aes(estimate, term, xmin = conf.low, xmax = conf.high, height = 0)) + geom_point() + geom_vline(xintercept = 0, lty = 4) + geom_errorbarh()+ coord_flip() ``` <img src="lezione11_files/figure-html/unnamed-chunk-11-1.png" width="80%" /> --- # Le Variabli possono essere rescalate ```r lm(kid_score ~ mom_iq, data=kids) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 25.8 5.92 4.36 1.63e- 5 ## 2 mom_iq 0.610 0.0585 10.4 7.66e-23 ``` ```r kids %>% mutate(mom_iq_rescaled=mom_iq/10) ->kids lm(kid_score ~ mom_iq_rescaled, data=kids) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 25.8 5.92 4.36 1.63e- 5 ## 2 mom_iq_rescaled 6.10 0.585 10.4 7.66e-23 ``` --- # `\(\beta\)` è invariante rispetto a spostamenti di scala ```r lm(kid_score ~ mom_iq, data=kids) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 25.8 5.92 4.36 1.63e- 5 ## 2 mom_iq 0.610 0.0585 10.4 7.66e-23 ``` ```r kids %>% mutate(mom_iq_rescaled2=mom_iq+100) ->kids lm(kid_score ~ mom_iq_rescaled2, data=kids) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -35.2 11.7 -3.00 2.87e- 3 ## 2 mom_iq_rescaled2 0.610 0.0585 10.4 7.66e-23 ``` --- # Trasformazioni logaritmiche | | `\(X\)` | `\(logX\)` | |-------- |-------------------------------------- |------------------------------------------ | | `\(Y\)` | linear | linear-log | | | `\(\tilde{Y_i}=\alpha + \beta X_i\)` | `\(\tilde{Y_i}=\alpha + \beta log X_i\)` | | | `\(logY\)` | log-linear | log-log | | | `\(log \tilde{Y_i}=\alpha + \beta X_i\)` | `\(log \tilde{Y_i}=\alpha + \beta log X_i\)` | **natural logarithms, `\(e ≈ 2.71828\)`** --- # perchè usare trasformazioni logarimiche: * E' comune usare trasformazioni logaritmiche quando esistono relazioni non-lineare tra variabile dipendente e indipendente * Trasformazioni logaritmiche sono usate inoltre per trasformare variabili asimmetriche. (e.g. reddito, dati di spesa ecc. ) --- # Modello Linear-log * aumento di 1 unita in `\(logX\)` è associato con aumento atteso di `\(\beta\)` unità in Y. * `\(log X + 1 = log X + log e = log(eX )\)` * aggiungere 1 a $ X$ significa moltiplicare `\(X\)` di `\(e ≈ 2.72\)`. * Il cambio atteso in `\(Y\)` associato con un `\(p%\)` aumento in `\(X\)` può essere calcolato come `\(\beta \times log([100 + p]/100)\)`. * Per valori piccoli di `\(p\)`, Si può usare la seguente approssimazione: `\(log([100 + p]/100) ≈ p/100\)`. Quindi per `\(p=1\)`, **L'aumento atteso in `\(Y\)` per un aumento di `\(1\)`% in `\(X\)` è uguale a: `\(\frac{\beta}{100}\)`.** --- # Modello Linear-log * Aumento di 1 unita in `\(X\)` è associato con aumento atteso di `\(e^\beta\)` unità in `\(Y\)`. * Il cambio atteso in `\(Y\)` associato con un aumento in `\(X\)` di `\(c\)` unità può essere calcolato come `\(e^{c \beta}\)`. * Per valori piccoli di `\(\tilde{\beta}\)`, Si può usare la seguente approssimazione: `\(e^\tilde{\beta} ≈ 1+\tilde{\beta}\)`. **Quindi `\(100 \times β\)` è il cambio atteso in `\(Y\)` associato all'aumento di 1 unità in `\(X\)`.** ### Esempio: * `\(β = 0,06\)` quindi `\(e^{0,06} ≈ 1.06\)` * cambio di un'unità in `\(X\)` corresponde ad un aumento in `\(Y\)` del `\(6\)`%. --- ```r library(gapminder) *library(broom) data(gapminder) gapminder$LogGdp=log(gapminder$gdpPercap) linlog<-lm(lifeExp~LogGdp, data=gapminder) tidy(linlog) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -9.10 1.23 -7.41 1.93e-13 ## 2 LogGdp 8.41 0.149 56.5 0. ``` ### aumento di 1% in `gdpPercap` è associato con aumento di 0.0841 anni in aspettativa di vita. --- # Modello Log-lineare ```r loglin<-lm(LogGdp~lifeExp, data=gapminder) tidy(loglin) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 3.54 0.0836 42.4 1.20e-268 ## 2 lifeExp 0.0776 0.00137 56.5 0. ``` ### aumento di 1 anno in `lifeExp` è associato con aumento di 0.0776% in `gdpPercap` --- # Effetti fissi ### Potremmo definire un'intercetta per ogni paese * Cattura caratteristiche specifiche del paese invarianti nel tempo ```r linlog2<-lm(lifeExp~log(gdpPercap)+as_factor(country), data=gapminder) tidy(linlog2)[1:2,] ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -27.8 2.50 -11.1 1.22e- 27 ## 2 log(gdpPercap) 9.77 0.297 32.9 3.80e-181 ``` --- # UK Biobank data ```r library(haven) UKB<-read_dta("data/UKB_teaching.dta") ``` --- ```r names(UKB) ``` ``` ## [1] "sex" "year_birth" "neb" ## [4] "childless" "smoking_pregnancy_mom" "educ" ## [7] "aam" "afs" "afb" ## [10] "meno" "birthweight" "teen" ## [13] "bmi" "num_siblings" "country" ## [16] "gdp_spl" "region_gdp" "pill_lad" ## [19] "idn" ``` --- # Variabili nel dataset * `neb` Number of childrem * `childless` (1= childless) * `smoking_pregnancy_mom` Mother smoked during pregnancy * `educ` Years of education * `aam` Age at Menarche * `afs` Age at First Sexual Intercourse * `afb` Age at First Birth * `meno` Age at Natural menopause * `birthweight` Birthweight * `teen` Teenage childbearing (1 yes, 0 no) * `bmi` Body Mass Index * `num_siblings` Number of Siblings * `country` Country (England, Scotland, Wales) * `gdp_spl` regional GDP at birth (aprox) * `region_gdp` Region of Birth * `pill_lad` Pill availability (% of women had use pill at age 18 in local Authority) * `idn` Personal ID number --- # Regression model 1 ```r UKmod1<-lm(afb~pill_lad, data=UKB) UKmod1 %>% summary() ``` ``` ## ## Call: ## lm(formula = afb ~ pill_lad, data = UKB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.9813 -3.2164 -0.1568 2.7935 24.5899 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.455e+01 1.776e-02 1382.0 <2e-16 *** ## pill_lad 3.766e-02 5.554e-04 67.8 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.443 on 110223 degrees of freedom ## (51243 observations deleted due to missingness) ## Multiple R-squared: 0.04004, Adjusted R-squared: 0.04003 ## F-statistic: 4597 on 1 and 110223 DF, p-value: < 2.2e-16 ``` --- # Regression model (con effetti fissi regionali) ```r UKmod2<-lm(afb~pill_lad+I(region_gdp), data=UKB) UKmod2 %>% summary() ``` ``` ## ## Call: ## lm(formula = afb ~ pill_lad + I(region_gdp), data = UKB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.9763 -3.1390 -0.1582 2.8222 25.0723 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 24.3591488 0.0502177 485.071 < 2e-16 *** ## pill_lad 0.0367253 0.0005534 66.361 < 2e-16 *** ## I(region_gdp)London Counties 1.0078269 0.0642521 15.686 < 2e-16 *** ## I(region_gdp)North -0.2475762 0.0548552 -4.513 6.39e-06 *** ## I(region_gdp)Rest of SouthEast 1.0520312 0.0645004 16.310 < 2e-16 *** ## I(region_gdp)Scotland 0.4537346 0.0664524 6.828 8.66e-12 *** ## I(region_gdp)SouthWest 0.3277391 0.0691255 4.741 2.13e-06 *** ## I(region_gdp)Wales 0.2354068 0.0777468 3.028 0.002463 ** ## I(region_gdp)West Midlands 0.2440044 0.0661769 3.687 0.000227 *** ## I(region_gdp)Yorks & Humberside -0.1846774 0.0604079 -3.057 0.002235 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.418 on 110215 degrees of freedom ## (51243 observations deleted due to missingness) ## Multiple R-squared: 0.05104, Adjusted R-squared: 0.05096 ## F-statistic: 658.7 on 9 and 110215 DF, p-value: < 2.2e-16 ``` --- # Regression model (background vars ) ```r UKmod3<-lm(afb~pill_lad+ I(region_gdp)+ smoking_pregnancy_mom+ birthweight, data=UKB) UKmod3 %>% summary() ``` ``` ## ## Call: ## lm(formula = afb ~ pill_lad + I(region_gdp) + smoking_pregnancy_mom + ## birthweight, data = UKB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.373 -3.036 -0.115 2.766 24.547 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 24.190146 0.114286 211.662 < 2e-16 *** ## pill_lad 0.038899 0.000702 55.409 < 2e-16 *** ## I(region_gdp)London Counties 1.007286 0.084362 11.940 < 2e-16 *** ## I(region_gdp)North -0.132193 0.072832 -1.815 0.06952 . ## I(region_gdp)Rest of SouthEast 1.072189 0.084805 12.643 < 2e-16 *** ## I(region_gdp)Scotland 0.555858 0.088170 6.304 2.91e-10 *** ## I(region_gdp)SouthWest 0.245309 0.090447 2.712 0.00669 ** ## I(region_gdp)Wales 0.296731 0.102652 2.891 0.00385 ** ## I(region_gdp)West Midlands 0.282801 0.087740 3.223 0.00127 ** ## I(region_gdp)Yorks & Humberside -0.177092 0.079409 -2.230 0.02574 * ## smoking_pregnancy_mom -1.137578 0.038741 -29.363 < 2e-16 *** ## birthweight 0.164841 0.028310 5.823 5.82e-09 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.406 on 62605 degrees of freedom ## (98851 observations deleted due to missingness) ## Multiple R-squared: 0.07183, Adjusted R-squared: 0.07167 ## F-statistic: 440.4 on 11 and 62605 DF, p-value: < 2.2e-16 ``` --- # Coefficient plot ```r library(coefplot) multiplot(UKmod1, UKmod2, UKmod3, coefficients="pill_lad")+ coord_flip() ``` <img src="lezione11_files/figure-html/unnamed-chunk-22-1.png" width="80%" /> --- # Regression model (interactions ) ```r UKmod4<-lm(afb~pill_lad*birthweight+ I(region_gdp)+ smoking_pregnancy_mom ,data=UKB) UKmod4 %>% summary() ``` ``` ## ## Call: ## lm(formula = afb ~ pill_lad * birthweight + I(region_gdp) + smoking_pregnancy_mom, ## data = UKB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.3527 -3.0201 -0.1072 2.7753 24.6270 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 24.682553 0.140551 175.613 < 2e-16 *** ## pill_lad 0.016302 0.003821 4.266 1.99e-05 *** ## birthweight 0.012164 0.038016 0.320 0.74898 ## I(region_gdp)London Counties 1.006663 0.084338 11.936 < 2e-16 *** ## I(region_gdp)North -0.131131 0.072812 -1.801 0.07171 . ## I(region_gdp)Rest of SouthEast 1.070450 0.084782 12.626 < 2e-16 *** ## I(region_gdp)Scotland 0.561074 0.088150 6.365 1.97e-10 *** ## I(region_gdp)SouthWest 0.249800 0.090425 2.763 0.00574 ** ## I(region_gdp)Wales 0.296034 0.102624 2.885 0.00392 ** ## I(region_gdp)West Midlands 0.282969 0.087715 3.226 0.00126 ** ## I(region_gdp)Yorks & Humberside -0.176468 0.079386 -2.223 0.02623 * ## smoking_pregnancy_mom -1.129728 0.038752 -29.152 < 2e-16 *** ## pill_lad:birthweight 0.006934 0.001153 6.015 1.80e-09 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.405 on 62604 degrees of freedom ## (98851 observations deleted due to missingness) ## Multiple R-squared: 0.07237, Adjusted R-squared: 0.07219 ## F-statistic: 407 on 12 and 62604 DF, p-value: < 2.2e-16 ``` --- # Interaction Plot ```r #install.packages("interactions") library(interactions) interact_plot(UKmod4, pred = pill_lad, modx = birthweight) ``` <img src="lezione11_files/figure-html/unnamed-chunk-24-1.png" width="80%" /> --- # Interactions with discrete var ```r UKB<-UKB %>% mutate(country=na_if(country, "NA")) UKmod5<-lm(afb~pill_lad*I(country)+ smoking_pregnancy_mom,data=UKB) UKmod5 %>% summary() ``` ``` ## ## Call: ## lm(formula = afb ~ pill_lad * I(country) + smoking_pregnancy_mom, ## data = UKB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.2540 -2.9987 -0.0547 2.8795 24.1550 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 24.9186609 0.0225216 1106.436 < 2e-16 *** ## pill_lad 0.0389557 0.0006297 61.869 < 2e-16 *** ## I(country)Scotland 0.2017957 0.0661121 3.052 0.00227 ** ## I(country)Wales 0.2331454 0.0880612 2.648 0.00811 ** ## smoking_pregnancy_mom -1.1819904 0.0310312 -38.090 < 2e-16 *** ## pill_lad:I(country)Scotland 0.0066229 0.0022529 2.940 0.00329 ** ## pill_lad:I(country)Wales -0.0080215 0.0027280 -2.940 0.00328 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.413 on 95289 degrees of freedom ## (66172 observations deleted due to missingness) ## Multiple R-squared: 0.05618, Adjusted R-squared: 0.05612 ## F-statistic: 945.4 on 6 and 95289 DF, p-value: < 2.2e-16 ``` --- ```r interact_plot(UKmod5, pred = pill_lad, modx = country ) ``` <img src="lezione11_files/figure-html/unnamed-chunk-26-1.png" width="80%" /> ---