Laboratorio Bio-demografico - Regressione logistica e multinomial logit

class: center, middle, inverse, title-slide

# Laboratorio Bio-demografico - Regressione logistica e multinomial logit
### Nicola Barban <br> Alma Mater Studiorum Università di Bologna <br> Dipartimento di Scienze Statistiche
### 11 Marzo 2021<br> <img src="../img/UniBo-Universita-di-Bologna.png" style="width:20.0%" />

---

#  Regressione Logistica
## Dataset: Pozzi in Bangladesh

*	Molti pozzi di acqua potabile in Bangladesh e altri paesi del Sud-Asia sono contaminati con arsenico. Il problema riguarda oltre 100 milioni di persone
* L'esposizione all'arsenico aumenta il rischio di tumori e altre malattie 
* Un gruppo di ricerca degli Stati Uniti e del Bangladesh ha analizzato tutti i pozzi, misurando le concentrazioni di arsenico e caratterizzandoli come **sicuri** o **non sicuri.** 
*  Gli abitanti nei pressi di pozzi non sicuri sono stati incoraggiati a cambiare ed utilizzare un pozzo sicuro nelle vicinanze

---

*  Alcuni anni dopo, i ricercatori sono ritornati e hanno verificato chi ha cambiato pozzo e chi no. 
*  Useremo una **logistic regression analysis**  per capire i fattori che hanno influenzato il cambio per gli abitanti che usavano pozzi non sicuri.

$$
Y_i = 1 \text{  if household i switched to a new well}
$$

$$
Y_i = 0 \text{  if household i continued using its own  well} 
$$
---
# Variabili indipendenti

Siamo interessati ai seguenti fattori

*  Distanza al pozzo sicuro più vicino
*  livelli di arsenico del pozzo correntemente usato
*  Il convolgimento di almeno un membro della famiglia in una organizzazione di comunità
*  Il livello di istruzione del capo famiglia
---

# Distribuzione binomiale

Possiamo vedere `$Y_i$` acome una variabile casuale che prende valore 1 e 0 con probabilità `$p_i$` e `$1-p_i$`.La distribuzione di `$Y_i$` è chiamata `$Bernoulli$` e può essere scritta in questo modo:

$$
	Pr(Y_i=y_i)= {p_i}^{y_i} (1-p_i)^{1-y_i}
$$
---
# Modellare dati binari

Vorremmo che le probabilità  `$p_i$` dipendano da un vettore di covariate osservate `$X_i$`. 
*  Il modo più semplice è quello di modellare `$p_i$`  che dipende da un vettore di covariate osservate `$X_i$` usando un modello lineare . Questo è anche chiamato: *linear probability model.*
* Il problema è che le probabilità devobno essere tra  0 e 1, ma un linear predictor `$X\beta$` può prendere qualsiasi valore reale.
* Una soluzione è quella di trasformare la probabilità `$p_i$` in \alert{odds:}

$$
		odds_i= \frac{p_i}{1-p_i}
$$
*  Odds possono prendere qualsiasi valore

$$ logit(p_i)= log \frac{p_i}{1-p_i} $$

---
# The logit function

---
# The logistic regression model

Non avrebbe senso stimare un modello lineare  continuo, `$X\beta + \epsilon$` su dati `$y$` che possono prendere solo i valori 0 e 1.Invece, modelliamo la probabilità che  `$y=1$`,
$$
	Pr(y_i=1) = logit^{-1}(X_i \beta),
$$
con l'assunzione che gli outocmes 
`$y_i$` siano indipendenti dato la loro probabilità.  `$X\beta$`  sono chiamati anche `$$linear predictor$$`.

---
# La funzione logit
La funzione  `$logit^{-1}(x)=\frac{e^x}{1+e^x}$` transforma da valori continui  all'intervallo `$(0,1)$`.

* `$Pr(y_i=1)=   p_i$` 
*	`$logit(p_i) =  X_i \beta$`

where `$logit = log(x/(1-x))$` is a function mapping the range `$(0,1)$` to the range `$(-\infty, \infty).$`

---
# Odds Ratio

Supponiamo di volter studiare l'effetto di essere attivo in una organizzazione di comunità  nella probabilità di cambiare. La variabile `$X_i$` può prendere i valori 0 or 1.

*   `$X_i$` influenza  la probabilità di cambio? Di quanto?
* confrontiamo gli *odds*, calcolando il rapporto `$(odds_i|X_i=1)$` e `$(odds_i|X_i=0)$`

$$
	OR_X =   \frac{odds_i|X_i=1}{odds_i|X_i=0} 
	 =  \frac{logit^{-1}(\alpha +\beta \times 1)}{1-logit^{-1}(\alpha +\beta \times 1 )} \times  \frac{1 -logit^{-1}(\alpha +\beta \times 0)}{logit^{-1}(\alpha +\beta \times 0 )} 
	 $$
	$$ 
	=  \frac{logit^{-1}(\alpha +\beta )}{1-logit^{-1}(\alpha +\beta  )} \times  \frac{1 -logit^{-1}(\alpha )}{logit^{-1}(\alpha  )} \\
	 =  e^{\alpha + \beta} \times \frac{1}{e^\alpha}=e^{\beta} 
$$

---
# Importiamo i dati

```r
library(haven)
library(tidyverse)
library(tidymodels)
wells <- read_dta("data/all.dta") %>% 
  filter( survey==1)
#View(wells)
```
---

```r
 mod0<- glm(formula=switch~assn, family="binomial", data=wells)
tidy(mod0)
```

```
## # A tibble: 2 x 5
##   term       estimate std.error statistic
##   <chr>         <dbl>     <dbl>     <dbl>
## 1 (Intercep…   -0.427    0.0418    -10.2 
## 2 assn         -0.114    0.0562     -2.02
## # … with 1 more variable: p.value <dbl>
```

```r
exp(-0.1136 )
```

```
## [1] 0.8926149
```
---

```r
tidy(mod0) %>% 
  mutate(OR=exp(estimate),
         OR.std=exp(std.error)) %>% 
select(OR, OR.std, p.value)
```

```
## # A tibble: 2 x 3
##      OR OR.std  p.value
##   <dbl>  <dbl>    <dbl>
## 1 0.652   1.04 1.62e-24
## 2 0.893   1.06 4.34e- 2
```

*  	The probability  to move to another well by   people who are active in community organizations  is 89\% lower than people who are not active
*  In other words, being active in community is **associated with a decrease of (1-0.89)=11% in the probability to move**
---

```r
 mod1<- glm(formula=switch~distcw, family="binomial", data=wells)
summary(mod1)
```

```
## 
## Call:
## glm(formula = switch ~ distcw, family = "binomial", data = wells)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.277  -1.016  -0.767   1.192   2.617  
## 
## Coefficients:
##               Estimate Std. Error
## (Intercept)  0.2417606  0.0561907
## distcw      -0.0021560  0.0001473
##             z value Pr(>|z|)    
## (Intercept)   4.302 1.69e-05 ***
## distcw      -14.641  < 2e-16 ***
## ---
## Signif. codes:  
##   0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
##   0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5330.9  on 4007  degrees of freedom
## Residual deviance: 5053.8  on 4006  degrees of freedom
## AIC: 5057.8
## 
## Number of Fisher Scoring iterations: 4
```

---
# Modello con interazioni

```r
wells<- wells %>% 
  mutate(dist100=distcw/100)

mod2<- glm(formula=switch~dist100*arsenic+assn+ed4, family="binomial", data=wells)
tidy(mod2)
```

```
## # A tibble: 6 x 5
##   term       estimate std.error statistic
##   <chr>         <dbl>     <dbl>     <dbl>
## 1 (Intercep…  -1.93      0.156     -12.3 
## 2 dist100     -0.175     0.0346     -5.05
## 3 arsenic      0.807     0.0643     12.5 
## 4 assn        -0.104     0.0628     -1.66
## 5 ed4          0.176     0.0400      4.41
## 6 dist100:a…   0.0566    0.0196      2.88
## # … with 1 more variable: p.value <dbl>
```
---
# Testing the parameters
Possiamo testare l'ipotesi:
$$
	H_0: \beta_j=0
$$
calcolando la statistica `$z_j$` che in un campione ampio è distribuita secondo una `$N(0,1)$`.
$$
	z_j=\frac{ \beta_j }{\sqrt{\hat{var}(\hat{\beta}_j)}}.
$$
I test può essere usato per calcolare un intervallo di confidenza al   95\% per `$\beta_j$`:

$$
\hat{\beta} \pm 1.96 \sqrt{\hat{var}(\hat{\beta}_j)}
$$

**GLi intervalli di confidenza degli Odds ratios non sono più simmetrici!**
---
# Probit analysis
Un'altra possibile **link function** è la funzione **probit** :

$$
	probit_i(p_i)=\Phi^{-1}(p_i),
$$
dove `$\Phi^{-1}$` è la *cumulative distribution function* di una normale standard.

---
# Latent-data model

* Logit: `$Pr(y_i=1)=logit^{-1}(X_i \beta)= \frac{e^x}{1+e^x}$`
	
$$ 
y_i=
		\begin{cases}
		1 & \text{if } z_i>0\\
		0 & \text{if } z_i<0\\
		\end{cases} 
$$

$$
	z_i=X_i\beta + \epsilon_i, \qquad \epsilon_i \sim logistic \text{ probability distribution}
$$

*  **Probit:** same model, but `$\epsilon_i \sim N(0, 1)$`
*  Probit coefs are logit coef divided by aprox. 1.6

[Why1.6?](http://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/)

---

```r
glm(formula=switch~dist100*arsenic+assn+ed4, family=binomial(link = "probit"), data=wells) %>% 
summary()
```

```
## 
## Call:
## glm(formula = switch ~ dist100 * arsenic + assn + ed4, family = binomial(link = "probit"), 
##     data = wells)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q  
## -3.4340  -0.7698  -0.5888   0.9990  
##     Max  
##  2.6235  
## 
## Coefficients:
##                 Estimate Std. Error
## (Intercept)     -1.12111    0.08737
## dist100         -0.10430    0.01870
## arsenic          0.45576    0.03513
## assn            -0.06536    0.03675
## ed4              0.10630    0.02360
## dist100:arsenic  0.03386    0.01093
##                 z value Pr(>|z|)    
## (Intercept)     -12.832  < 2e-16 ***
## dist100          -5.577 2.44e-08 ***
## arsenic          12.973  < 2e-16 ***
## assn             -1.778  0.07536 .  
## ed4               4.504 6.68e-06 ***
## dist100:arsenic   3.097  0.00195 ** 
## ---
## Signif. codes:  
##   0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
##   0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5328.9  on 4005  degrees of freedom
## Residual deviance: 4346.6  on 4000  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 4358.6
## 
## Number of Fisher Scoring iterations: 5
```
**Coefficients in a probit regression are typically close to logistic regression coefficients divided by 1.6**
---
# Effetti Marginali

Marginal effects for continuous variables measure the *instantaneous rate of change* (defined shortly). They are popular in some disciplines  because they often provide a good approximation to the amount of change in `$Y$` that will be produced by a 1-unit change in `$X_k$`.

Nel modello di regressione lineare:

`$Y_i=\alpha + \beta X_i + \epsilon$`	e *marginal effect*= `$\beta$`

---

---

---
# Predicted values

```r
library(margins)
mod1 <- glm(switch ~ arsenic, data = wells, family=binomial(link="logit"))

augment(mod1, type.predict="response") %>% 
  ggplot(aes(x=arsenic, y=.fitted)) +
  geom_point()
```

<img src="lezione12_files/figure-html/unnamed-chunk-7-1.png" width="60%" />
---
# Conditional predicted value (cplot)

```r
cplot(mod1, x = "arsenic", se.type = "shade")
```

```
##       xvals     yvals     upper
## 1  1.000000 0.1953373 0.2109323
## 2  1.291667 0.2464163 0.2622056
## 3  1.583333 0.3057764 0.3217845
## 4  1.875000 0.3723702 0.3892481
## 5  2.166667 0.4441875 0.4630144
## 6  2.458333 0.5184150 0.5400169
## 7  2.750000 0.5918386 0.6162705
## 8  3.041667 0.6613798 0.6879197
## 9  3.333333 0.7245864 0.7520424
## 10 3.625000 0.7799208 0.8069980
## 11 3.916667 0.8267952 0.8523781
## 12 4.208333 0.8654088 0.8887081
## 13 4.500000 0.8964918 0.9170714
## 14 4.791667 0.9210512 0.9387777
## 15 5.083333 0.9401723 0.9551328
## 16 5.375000 0.9548891 0.9673086
## 17 5.666667 0.9661163 0.9762898
## 18 5.958333 0.9746235 0.9828677
## 19 6.250000 0.9810367 0.9876590
## 20 6.541667 0.9858527 0.9911339
##        lower
## 1  0.1797423
## 2  0.2306270
## 3  0.2897684
## 4  0.3554922
## 5  0.4253606
## 6  0.4968132
## 7  0.5674067
## 8  0.6348399
## 9  0.6971304
## 10 0.7528435
## 11 0.8012124
## 12 0.8421096
## 13 0.8759122
## 14 0.9033247
## 15 0.9252118
## 16 0.9424696
## 17 0.9559427
## 18 0.9663793
## 19 0.9744145
## 20 0.9805715
```

<img src="lezione12_files/figure-html/unnamed-chunk-8-1.png" width="60%" />
---

```r
wells_new<-wells %>% 
  mutate(arsenic=runif(4008,min=0, max=10 ))
augment(mod1, type.predict="response",newdata=wells_new) %>% 
  ggplot(aes(x=arsenic, y=.fitted)) +
  geom_point()
```

---
# Marginal effects with interactions

```r
cplot(mod2, x = "dist100", dx = "arsenic", what = "effect", se.type = "shade")
```

<img src="lezione12_files/figure-html/unnamed-chunk-10-1.png" width="60%" />
---

```r
library(haven)
UKB<-read_dta("data/UKB_teaching.dta")
```
---

```r
names(UKB)
```

```
##  [1] "sex"                  
##  [2] "year_birth"           
##  [3] "neb"                  
##  [4] "childless"            
##  [5] "smoking_pregnancy_mom"
##  [6] "educ"                 
##  [7] "aam"                  
##  [8] "afs"                  
##  [9] "afb"                  
## [10] "meno"                 
## [11] "birthweight"          
## [12] "teen"                 
## [13] "bmi"                  
## [14] "num_siblings"         
## [15] "country"              
## [16] "gdp_spl"              
## [17] "region_gdp"           
## [18] "pill_lad"             
## [19] "idn"
```
---

```r
UKlog1<-glm(childless~pill_lad+I(region_gdp), data=UKB, family=binomial(link="logit"))

UKlog1 %>% 
tidy() %>% 
  mutate(OR=exp(estimate),
         OR.std=exp(std.error)) %>% 
select(term, OR, OR.std, p.value) 
```

```
## # A tibble: 10 x 4
##    term                OR OR.std  p.value
##    <chr>            <dbl>  <dbl>    <dbl>
##  1 (Intercept)      0.144   1.03 0.      
##  2 pill_lad         1.01    1.00 0.      
##  3 I(region_gdp)Lo… 1.27    1.03 1.24e-13
##  4 I(region_gdp)No… 0.971   1.03 3.08e- 1
##  5 I(region_gdp)Re… 1.19    1.03 1.53e- 7
##  6 I(region_gdp)Sc… 1.35    1.03 1.22e-19
##  7 I(region_gdp)So… 0.975   1.04 4.98e- 1
##  8 I(region_gdp)Wa… 0.967   1.04 4.15e- 1
##  9 I(region_gdp)We… 1.09    1.03 1.55e- 2
## 10 I(region_gdp)Yo… 0.935   1.03 3.66e- 2
```
---

```r
#cplot(UKlog1, x = "pill_lad", se.type = "shade")
```

---

```r
#cplot(UKlog1, x = "pill_lad", se.type = "shade", what="effect")
```
---
# Multinomial Logit Models

* An outcome is *nominal* when the categories are assumed to be unordered.
* For example, marital status can be grouped nominally into the categories of divorced, never married, married or widowed.
* Occupations might be organized as professional, white collar, blue collar, 
* The **multinomial logit model** (MNLM) is the most frequently used nominal regression model. This model essentially fit separate binary logits for each pair of outcome variables

---
# Italian high school  data
Data come from ITAGEN2.

```r
itagen<-read_dta("data/itagen2_peso.dta") %>% 
    filter(GRADE<=4,) %>% 
  mutate(hschool=as_factor(hschool),
         generation=as_factor(generation),
         SEX=as_factor(SEX),
         educParents=as_factor(educParents),
         GRADE=as_factor(GRADE)) 
```
---
The table shows the choice of secondary school track for  1,294  students  in Italy, according to their generation status,.

```r
TAB<-table(itagen$hschool,itagen$generation) %>% 
prop.table( 2) %>% 
  round(2)
kable(TAB, format = "html")
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Natives </th>
   <th style="text-align:right;"> Second Generation </th>
   <th style="text-align:right;"> 1.5 Generation </th>
   <th style="text-align:right;"> Immigrants </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Vocational course </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Vocational School </td>
   <td style="text-align:right;"> 0.15 </td>
   <td style="text-align:right;"> 0.23 </td>
   <td style="text-align:right;"> 0.37 </td>
   <td style="text-align:right;"> 0.53 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Polytechnic </td>
   <td style="text-align:right;"> 0.34 </td>
   <td style="text-align:right;"> 0.38 </td>
   <td style="text-align:right;"> 0.34 </td>
   <td style="text-align:right;"> 0.30 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> High school </td>
   <td style="text-align:right;"> 0.51 </td>
   <td style="text-align:right;"> 0.39 </td>
   <td style="text-align:right;"> 0.29 </td>
   <td style="text-align:right;"> 0.17 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Not applicable </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
</tbody>
</table>
N. Barban and  M. J. White (2011)  Immigrants' Children's Transition to Secondary School in Italy  - International Migration Review, 45(3):702--726

---
# Transition to Secondary School in Italy
* Choosing a secondary school is an important choice  in  educational track systems such as Italy (or germany), since it influences  educational achievement and labor force trajectory.
* We analyze the effect of generational status and length of residence on the transition to secondary school among immigrants living in Italy.
* Children of immigrants *are more likely to have inferior outcomes* on the middle school exam and to *enroll in vocational and polytechnic schools*.
* After *controlling for the family's human capital* and other key background factors, immigrant students show greater propensity to choose a vocational path.
* Differences remains  when *previous scholastic results* are taken into account.

---
# The Multinomial Distribution
* Consider a random variable `$Y_i$` that may take one of several discrete values, which we index `$1,2,...,J$`. In the example the response is \alert{secondary school} and it takes the values Vocational, Polytechnic and High school, which we index 1, 2 and 3. 
* Let `$\pi_{ij} = Pr(Y_i = j)$`
denote the probability that the i-th response falls in the j-th category. In the example `$\pi_{i1}$` is the probability that the i-th respondent is  vocational. 
* Assuming that the response categories are mutually exclusive and exhaustive, we have  `$\sum_i^J \pi_{i1} = 1$`  for each `$i$`, i.e. the probabilities add up to one for each individual, and we have only `$J-1$` parameters. 
* The probability distribution of the counts `$Y_{ij}$` given the total `$n_i$` is given by the multinomial distribution

---
# Multinomial logit model
In the multinomial logit model we assume that the log-odds of each response follow a linear model

`$$\eta_{ij}=log \frac{\pi_{ij}}{\pi_{iJ}} =\alpha_j+\sum_i^K \beta_{kj}$$`

where `$\alpha_j$` is a constant and `$\beta_{1j} \ldots \beta_{Kj}$` is a vector of regression coefficients, for `$j = 1, 2,\ldots J-1$`.

---
# Modeling the Probabilities
* The multinomial logit model may also be written in terms of the original probabilities `$\pi_{ij}$` rather than the log-odds.
* Starting from the previous equation and adopting the convention that `$\eta_{iJ} = 0$`, we can write:

`$\pi_{ij}=\frac{exp(\eta_{ij})}{\sum_{k=1}^{J} exp(\eta_{kj})}$`

---

```r
library("nnet")
itagen$hschool <- relevel(itagen$hschool, ref = "High school")

m <- multinom(hschool ~ generation  , data=itagen)
```

```
## # weights:  15 (8 variable)
## initial  value 1379.857035 
## iter  10 value 1281.307255
## final  value 1278.258614 
## converged
```

```r
tidy(m) 
```

```
## # A tibble: 8 x 6
##   y.level    term      estimate std.error
##   <chr>      <chr>        <dbl>     <dbl>
## 1 Vocationa… (Interce…   -1.22     0.103 
## 2 Vocationa… generati…    0.677    0.236 
## 3 Vocationa… generati…    1.47     0.224 
## 4 Vocationa… generati…    2.34     0.261 
## 5 Polytechn… (Interce…   -0.408    0.0777
## 6 Polytechn… generati…    0.357    0.201 
## 7 Polytechn… generati…    0.552    0.218 
## 8 Polytechn… generati…    0.986    0.272 
## # … with 2 more variables:
## #   statistic <dbl>, p.value <dbl>
```

```r
exp(2.319877)
```

```
## [1] 10.17442
```

---

```r
tidy(m) %>% 
  mutate(expB=exp(estimate),
         expSE=exp(std.error)) %>% 
  select( y.level,term, expB, expSE)
```

```
## # A tibble: 8 x 4
##   y.level      term            expB expSE
##   <chr>        <chr>          <dbl> <dbl>
## 1 Vocational … (Intercept)    0.296  1.11
## 2 Vocational … generationSe…  1.97   1.27
## 3 Vocational … generation1.…  4.35   1.25
## 4 Vocational … generationIm… 10.4    1.30
## 5 Polytechnic  (Intercept)    0.665  1.08
## 6 Polytechnic  generationSe…  1.43   1.22
## 7 Polytechnic  generation1.…  1.74   1.24
## 8 Polytechnic  generationIm…  2.68   1.31
```
`$e^{2.34}=10.38$` tell you that immigrants are 10 times more likely **than natives** to end up in vocational training compared to high school.

---
# Modello con interazioni

```r
m2 <- multinom(hschool ~ GRADE*generation, data=itagen)
```

```
## # weights:  87 (56 variable)
## initial  value 1379.857035 
## iter  10 value 1066.535013
## iter  20 value 1046.522909
## iter  30 value 1044.911273
## iter  40 value 1044.708395
## iter  50 value 1044.704889
## final  value 1044.704870 
## converged
```

```r
tidy(m2) 
```

```
## # A tibble: 56 x 6
##    y.level   term     estimate  std.error
##    <chr>     <chr>       <dbl>      <dbl>
##  1 Vocation… (Interc…    11.1    2.17e- 1
##  2 Vocation… GRADEEx…   -16.2    8.07e- 1
##  3 Vocation… GRADEVe…   -13.4    3.07e- 1
##  4 Vocation… GRADEGo…   -11.7    2.59e- 1
##  5 Vocation… GRADESu…   -10.2    2.84e- 1
##  6 Vocation… GRADEI …     0      2.48e-15
##  7 Vocation… GRADENo…     0    NaN       
##  8 Vocation… generat…    -2.07   2.42e- 1
##  9 Vocation… generat…     6.00   2.53e- 1
## 10 Vocation… generat…     3.74   4.06e- 1
## # … with 46 more rows, and 2 more
## #   variables: statistic <dbl>,
## #   p.value <dbl>
```

---

```r
fitted(m2) %>%  head()
```

```
##   High school Vocational School
## 2   0.3372108        0.17829337
## 3   0.1644749        0.40789088
## 4   0.8407941        0.00497500
## 5   0.6600994        0.06896491
## 6   0.8407941        0.00497500
## 7   0.8407941        0.00497500
##   Polytechnic
## 2   0.4844959
## 3   0.4276342
## 4   0.1542309
## 5   0.2709357
## 6   0.1542309
## 7   0.1542309
```

```r
newdata<-data.frame(generation = rep(c("Natives","Second Generation", "1.5 Generation" ,"Immigrants"),4),   
                    GRADE = c(rep("Sufficient",4),rep( "Good",4), rep( "Very Good",4), rep( "Excellent",4)))

P<-predict(m2, newdata = newdata, "probs") %>% data.frame()
```

---

```r
predictions1<-cbind(newdata, P) %>% 
  filter((GRADE=="Excellent" | GRADE=="Sufficient") & (generation=="Natives" | generation=="Immigrants")) %>% 
  select(GRADE, generation, High.school)

predictions2<-cbind(newdata, P) %>% 
  filter((GRADE=="Excellent" | GRADE=="Sufficient") & (generation=="Natives" | generation=="Immigrants")) %>% 
  select(GRADE, generation, Vocational.School) 
```
---

```r
predictions1
```

```
##         GRADE generation High.school
## 1  Sufficient    Natives  0.16447492
## 4  Sufficient Immigrants  0.01563354
## 13  Excellent    Natives  0.84079410
## 16  Excellent Immigrants  0.64705153
```
* Natives who got Excellent in the previous test have 84\% probability to choose high school, while the same probability is 64\% for recent immigrants

---

```r
predictions2
```

```
##         GRADE generation
## 1  Sufficient    Natives
## 4  Sufficient Immigrants
## 13  Excellent    Natives
## 16  Excellent Immigrants
##    Vocational.School
## 1       4.078909e-01
## 4       7.656167e-01
## 13      4.975000e-03
## 16      1.588798e-07
```
*  Natives who got Sufficient in the previous test have 41\% probability to choose Vocational school, while the same probability is 76\% for recent immigrants
---