The ssdeR
package provides a estimation function for sample selection models where a common dummy endogenous regressor appears both in the selection equation and in the censored equation. This model is analyzed in the framework of an endogenous switching model. Following Kim (2006), a simple two-step estimator is used for this model, which is easy to implement and numerically robust compared to other methods.
For an in depth derivation of the statistical framework, readers are advised considering Kim (2006), as this vignette mainly focuses on the application of the ssdeR
package to the study of causal linkages considering climate, conflict and asylum seeking flow presented in Abel et al. (2018).
As usual in many other regression packages for R [@R], the main model fitting function ssdeR()
uses a formula-based interface and returns an (S3) object of class ssdeR
:
ssdeR(formula, treatment, selection, data, subset,
na.action = FALSE, weights, cluster = NULL,
print.level = 0, control = ssdeR.control(...),
model = TRUE, x = FALSE, y = FALSE, ...)
A number of standard S3 methods are provided:
Method | Description |
---|---|
print() |
Simple printed display with coefficients |
summary() |
Standard regression summary; returns summary.htobit object (with print() method) |
vcov() |
Associated covariance matrix |
predict() |
(Different types of) predictions for new data |
fitted() |
Fitted values for observed data |
terms() |
Extract terms |
model.matrix() |
Extract model matrix (or matrices) |
nobs() |
Extract number of observations |
logLik() |
Extract fitted log-likelihood |
estfun() |
Extract estimating functions (= gradient contributions) for sandwich covariances |
Due to these methods a number of useful utilities work automatically, e.g., AIC()
, BIC()
, coeftest()
(lmtest
), etc.
To illustrate the package’s use in practice, the ssdeR
package is applied to dyadic migration data in the context of Abel et al. (2018). As the paper is currently under revision, readers are recommended to directly contact michael.brottrager@jku.at for a current version of the paper including the detailed data description.
data(ConflictMigration, package="ssdeR")
library(ssdeR)
This data.frame
contains cross-sectional information about 24336 country-pairs capturing the period 2011-2015.
Variable Name | Description |
---|---|
iso_i | ISO code of origin. |
iso_j | ISO code of destination. |
asylum_seekers_ij | log transformed number of asylum seekers from origin i in destination j. |
conflict_i | Conflict in origin i indicated by any reported battle related deaths in that country. |
isflow_ij | Non-zero flows between origin i and destination j. |
stock_ij | log transformed stock of origin natives in destination j before observational period. (t-1) |
dist_ij | Metric distance. (t-1) |
comlang_ij | Common Language in both origin and destination (Indicator). (t-1) |
colony_ij | Colonial relationship (Indicator). (t-1) |
polity_i | normalized (0-1) PolityIV score. (t-1) |
polity_j | normalized (0-1) PolityIV score. (t-1) |
pop_i | log transformed origin population. (t-1) |
pop_j | log transformed destination population. (t-1) |
gdp_j | log transformed GDP in destination. (t-1) |
diaspora_i | Origin diaspora outside. (t-1) |
ethMRQ_i | Ethnic Fractionalization measurement. (t-1) |
outmigration_i | log transformed total outmigration of of origin i. (t-1) |
inmigration_j | log transformed total inmigration in to destination j. (t-1) |
spei_i | 12 month average SPEI index. (t-1) |
battledeaths_i | log transformed battledeaths in i. (t-1) |
Our modelling framework aims at assessing quantitatively the determinants of asylum seeking flows using a gravity equation setting similar to that proposed for bilateral migration data (Cohen et al., 2008) but addressing explicitly the statistical problems caused endogenous selection in origin-destination pairs and non-random treatments. In this sense, our statistical problem is similar to those often encountered in health care studies, where for example the enrollment in a healthcare maintenance organisation (treatment) affects a person’s decision on both whether to use healthcare at all (extensive margin) and how much to spend for healthcare (intensive margin), given a positive decision. In our setting, however, conflict (treatment) itself is not randomly ‘assigned’ across our population of origin countries, that is, we have to consider the treatment itself to be endogenous as well. As with the healthcare example given above, this treatment (conflict) potentially affects the probability that we observe non-zero flows between some origin-destination country pairs (extensive margin). In other words, we have to account for a selection of countries in sending out migrants to a certain country of destination. Furthermore, conflict potentially affects the number of migrants seeking asylum in some destination country. These figures, however, are only observed in the case of actual flows and thus have to be considered as being potentially (non-randomly) censored.
This setting leaves us with three simultaneous equations, where two of them contain our common endogenous binary regressors (i.e. conflict onset). In order to estimate this framework of simultaneous equations, we apply a simple two-step estimation technique proposed by Kim (2006). Translated to our context, we are interested in the following sample selection model,
\[ \begin{aligned} c_i^* & = Z_{c,i}^{\prime} \gamma_1 + \epsilon_{c,i}, \quad c_i = I(c_i^*>0) \\ s_{ij}^* & = Z_{s,ij}^{\prime} \gamma_2 + c_i \beta_2 + \epsilon_{s,ij} , \quad s_{ij} = I(s_{ij}^*>0) \\ a_{ij}^* & = Z_{a,ij}^{\prime} \gamma_3 + c_i \beta_3 + \epsilon_{a,ij} , \quad a_{ij} = a_{ij}^*s_{ij} \\ \end{aligned} \]
where the first equation specifies the occurrence of conflict (\(c_i = 1\)) in country \(i\), the second equation addresses whether a non-zero flow of asylum seeking applications takes place from country \(i\) to country \(j\) (\(s_{ij}=1\)) and the last equation models the size of the flow of applications in logs \(a_{ij}\)to destination country j for origin-destination pairs with non-trivial flows. \(I(x)\)is an indicator function taking the value one if x is true and zero otherwise and the exogenous controls for each one of the equations in the model are summarized in the vectors \(Z_{c,i}, Z_{s,ij}\) and \(Z_{a,ij}\) respectively. The error terms, \(\epsilon_{c,i},\epsilon_{s,ij}\) and \(\epsilon_{a,ij}\), are assumed jointly multivariate normal and potentially correlated, thus capturing the endogenous selection of origin countries that present non-zero asylum applications to destination countries. Following Kim (2006), this sample selection model with a common endogenous regressor in the selection equation and the censored outcome equation is estimated as a hybrid of the bivariate probit and the type-II Tobit model containing the common endogenous binary conflict indicator. This implies that we have to control for the endogeneity caused by \(c_i\) and the selection bias caused by the censoring indicator \(s_{ij}\) at the same time.
Instead of a simulation assisted Full Maximum Likelihood (FIML) approach, we follow Kim (2006) and employ a simple two-step estimation technique by first estimating the bivariate probit model with structural shift and further use the estimation results of this first stage as control functions for the censored outcome equation using a simple Generalized Method of Moments (GMM) estimator. This way we can interpret the model as a Type V-Tobit model with bivariate selection and parameter restrictions. This approach bears the advantage of being numerically robust and easy to implement since it relaxes the strong normality assumptions imposed when using the FIML approach.
Results <- ssdeR(formula = asylum_seekers_ij ~ stock_ij + dist_ij + I(dist_ij^2) +
comlang_ij + colony_ij +
polity_i + pop_i + polity_i + pop_i +
gdp_j,
treatment = conflict_i ~ battledeaths_i + spei_i +
polity_i + I(polity_i^2) +
diaspora_i + ethMRQ_i ,
selection = isflow_ij ~ dist_ij + I(dist_ij^2) +
outmigration_i + inmigration_j ,
cluster = c("iso_i","iso_j"),
data = ConflictMigration)
## conflict_i
summary(Results)
##
## Call:
## ssdeR(formula = asylum_seekers_ij ~ stock_ij + dist_ij + I(dist_ij^2) +
## comlang_ij + colony_ij + polity_i + pop_i + polity_i + pop_i +
## gdp_j, treatment = conflict_i ~ battledeaths_i + spei_i + polity_i +
## I(polity_i^2) + diaspora_i + ethMRQ_i, selection = isflow_ij ~
## dist_ij + I(dist_ij^2) + outmigration_i + inmigration_j, data = ConflictMigration,
## cluster = c("iso_i", "iso_j"))
##
##
## successive function values within tolerance limit
## Standardized residuals:
## Min 1Q Median 3Q Max
## -5.9093355 -1.3562449 -0.1294025 1.2351413 8.0190794
##
## Coefficients (treatment model):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.66131253 0.75088761 -2.21246 0.026935 *
## battledeaths_i 0.33939266 0.04993654 6.79648 1.0721e-11 ***
## spei_i -1.00959371 0.52210474 -1.93370 0.053150 .
## polity_i 3.68067113 3.02927898 1.21503 0.224354
## I(polity_i^2) -3.59877565 2.85551627 -1.26029 0.207565
## diaspora_i -3.14820563 3.54748134 -0.88745 0.374838
## ethMRQ_i -0.14824035 0.75184182 -0.19717 0.843695
##
## Coefficients (selection model):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.24573825 0.19527169 -16.62165 < 2.22e-16 ***
## dist_ij -0.24301450 0.03285985 -7.39548 1.4089e-13 ***
## I(dist_ij^2) -0.05142557 0.02208049 -2.32900 0.019859 *
## outmigration_i 0.19248499 0.02081563 9.24714 < 2.22e-16 ***
## inmigration_j 0.27055934 0.02292991 11.79941 < 2.22e-16 ***
## conflict_i 0.53659403 0.12314817 4.35730 1.3167e-05 ***
##
## Coefficients (outcome model):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.49365517 0.45900372 7.61139 2.7117e-14 ***
## stock_ij 0.20887866 0.53123694 0.39319 0.69417692
## dist_ij 0.38972670 0.18044433 2.15982 0.03078685 *
## I(dist_ij^2) -0.09448760 0.35260753 -0.26797 0.78872381
## comlang_ij 0.24992492 0.06903778 3.62012 0.00029447 ***
## colony_ij 0.35339862 0.15037328 2.35014 0.01876623 *
## polity_i -1.13785543 0.10389902 -10.95155 < 2.22e-16 ***
## pop_i -0.11696954 1.73621136 -0.06737 0.94628670
## gdp_j 0.19566249 1.02550189 0.19080 0.84868478
## y1 1.34985209 0.13074581 10.32425 < 2.22e-16 ***
##
## Auxiliary Parameters:
## Estimate Std. Error z value Pr(>|z|)
## rho120 -0.38884784 0.09008341 -4.31653 1.5850e-05 ***
## rho121 -0.14721524 0.10280387 -1.43200 0.15214
## m_11 -0.12410240 0.08514262 -1.45758 0.14496
## m_12 -2.52134623 0.10830152 -23.28080 < 2.22e-16 ***
## m_01 -0.49479919 0.08701260 -5.68652 1.2965e-08 ***
## m_02 -1.82254938 0.09737346 -18.71711 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## df.residual: 24322
## Log-likelihood: -33754.63 on 14 Df
##
## AIC: 67537.27
## BIC: 69693.27
## ---
## Log-likelihood (Bivariate Probit): -17229.64 on 15 Df
## Number of iterations in First Stage BHHH maximisation : 24
To compute the (marginal) of both, the treatment
and selection
models, the ssdeR
provides the user with the marginal.effects.ssdeR()
method.This method computes the marginal effects based on Mullahy, J. (2017).
If option model = "selection"
is chosen, marginal.effects.ssdeR()
returns the marginal effects in the bivariate probit model. In case, model = "treatment"
, the marginal effects computation reduces to simple probit marginal effects and in case model = "outcome"
, simple 3rd-step parameter estimates are returned.
As ssdeR
estimates a bivariate probit model with structural shift, selection
model indirect effects are just the treatment
model’s direct effects.
Standard errors are computed using the delta method.
marginal.effects.ssdeR(Results, "treatment")
## direct.effect std.err
## battledeaths_i 0.12702036 0.006342958
## spei_i -0.37784835 0.197276417
## polity_i 1.37752001 4.172892411
## I(polity_i^2) -1.34686998 3.846009158
## diaspora_i -1.17824062 4.179786627
## ethMRQ_i -0.05548011 0.041712269
marginal.effects.ssdeR(Results, "selection")
## direct.effect std.err
## dist_ij -0.05667176 0.0018622256
## I(dist_ij^2) -0.01199261 0.0002648027
## outmigration_i 0.04488811 0.0009343745
## inmigration_j 0.06309530 0.0014467696
## conflict_i 0.01898079 0.0023374493
marginal.effects.ssdeR(Results, "outcome")
## direct.effect std.err
## stock_ij 0.2088787 0.53123694
## dist_ij 0.3897267 0.18044433
## I(dist_ij^2) -0.0944876 0.35260753
## comlang_ij 0.2499249 0.06903778
## colony_ij 0.3533986 0.15037328
## polity_i -1.1378554 0.10389902
## pop_i -0.1169695 1.73621136
## gdp_j 0.1956625 1.02550189
## y1 1.3498521 0.13074581
Cameron, A. C. and Trivedi, P. K. (2005) , Cambridge University Press.
Greene, W. H. (2003) , Prentice Hall.
Heckman, J. (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models, , 5(4), p. 475-492.
Johnston, J. and J. DiNardo (1997) , McGraw-Hill.
Lee, L., G. Maddala and R. Trost (1980) Asymetric covariance matrices of two-stage probit and two-stage tobit methods for simultaneous equations models with selectivity. , 48, p. 491-503.
Mullahy, J. (2017) Marginal effects in multivariate probit models. , 52: 447.
il Kim, K. (2006). Sample selection models with a common dummy endogenous regressor in simultaneous equations: A simple two-step estimation. , 91(2), 280-286.
Petersen, S., G. Henningsen and A. Henningsen (2017) . Unpublished Manuscript. Department of Management Engineering, Technical University of Denmark.
Toomet, O. and A. Henningsen, (2008) Sample Selection Models in R: Package sampleSelection. 27(7),
Wooldridge, J. M. (2003) , Thomson South-Western.}