ssdeR {ssdeR}R Documentation

Sample selection models with a common dummy endogenous regressor in simultaneous equations: A simple two-step estimation

Description

Estimates sample selection models where a common dummy endogenous regressor appears both in the selection equation and in the censored equation. We interpret this model as an endogenous switching model and develop a simple two step estimation procedure. For model derivation and see Kim (2006) (https://doi.org/10.1016/j.econlet.2005.12.003)

Usage

ssdeR(formula, treatment, selection, data, subset,
      na.action = FALSE, weights, cluster = NULL,
      print.level = 0, control = ssdeR.control(...),
      model = TRUE, x = FALSE, y = FALSE, ...)

ssdeR.control(method = "BHHH", iterlim = NULL, start = NULL, robust = FALSE, ...)

Arguments

formula

formula, Outcome equation (Continuous dependent variable).

treatment

formula, Treatment Equation (Binary dependent variable).

selection

formula Selection equation (Binary dependent variable).

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which ssdeR is called.

subset

an optional index vector specifying a subset of observations to be used in the fitting process.

na.action

Restricted to na.pass

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

cluster

Character vector of up to 2 cluster variables.)

print.level

integer. Various debugging information, higher value gives more information. Not supported in this build.

control

Further controls for maxLik maximization etc. (see maxLik)

model

keep model.frame if model = T.

x

keep independent second stage variables if x = T.

y

keep dependent outcome varibale if y = T.

method

Maximisation method used in the bivariate probit model (First Stage). Default is "BHHH"

iterlim

User specified maximal number of iterations. Default is 5000.

start

User specified vector of starting values.

robust

Robust standard errors.

...

Details

This package provides a estimation function for sample selection models where a common dummy endogenous regressor appears both in the selection equation and in the censored equation. This model is analyzed in the framework of an endogenous switching model. Following Kim (2006), a simple two-step estimator is used for this model, which is easy to implement and numerically robust compared to other methods.

Value

ssdeR() returns an object of class ssdeR.

The first stage model (firststage) is estimated by Maximum Likelihood, which has all the components of a 'maxLik' object, and vcov, which contains the (cluster-) robust variance-covariance matrix of the first stage model.

Furthermore, the returned 'ssdeR' object returns the following list of components:

coefficients

estimated coefficients, the outcome model. coefficient for the auxiliary parameters μ_{ij} are treated as a parameters.

residuals

estimated residuals, the outcome model.

fitted.values

fitted values, the outcome model.

loglik

log likelihood of the outcome model.

df.residual

degrees of freedom of the outcome model.

vcov

variance covariance matrix of the estimated coefficients.

n

total number of used observations 1st and 2nd stage.

controls

List of controls applied to the fit functions.

weights

Vector of weights. If no weightes were supported, weights is just a vector of ones with length n = censored observations.

param

List object. nParam is the number of covariates used in the outcome model. nObs is the total number of used observations, NT1 and NT0 refer to the number of treated and untreated, respectively. NS1, NS2 refer to the number of censored and uncensored observations, respectively.

firststage

object of class 'maxLik' that contains the results of the 1st step (bivariate probit estimation) and the (cluster-) robust variance-covariance matrix (if requested).

Author(s)

Michael Brottrager

References

Cameron, A. C. and Trivedi, P. K. (2005) Microeconometrics: Methods and Applications, Cambridge University Press.

Greene, W. H. (2003) Econometric Analysis, Fifth Edition, Prentice Hall.

Heckman, J. (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models, Annals of Economic and Social Measurement, 5(4), p. 475-492.

Johnston, J. and J. DiNardo (1997) Econometric Methods, Fourth Edition, McGraw-Hill.

Lee, L., G. Maddala and R. Trost (1980) Asymetric covariance matrices of two-stage probit and two-stage tobit methods for simultaneous equations models with selectivity. Econometrica, 48, p. 491-503.

Mullahy, J. (2017) Marginal effects in multivariate probit models. Empircal Economics, 52: 447.

il Kim, K. (2006). Sample selection models with a common dummy endogenous regressor in simultaneous equations: A simple two-step estimation. Economics Letters, 91(2), 280-286.

Petersen, S., G. Henningsen and A. Henningsen (2017) Which Households Invest in Energy-Saving Home Improvements? Evidence From a Danish Policy Intervention. Unpublished Manuscript. Department of Management Engineering, Technical University of Denmark.

Toomet, O. and A. Henningsen, (2008) Sample Selection Models in R: Package sampleSelection. Journal of Statistical Software 27(7), http://www.jstatsoft.org/v27/i07/

Wooldridge, J. M. (2003) Introductory Econometrics: A Modern Approach, 2e, Thomson South-Western.

See Also

See maxLik or selection for further information.

Examples

# ----------------------------------------------------------------------------------- #
# 1. Climate, Conflict, Crossborder Migration Data
# ----------------------------------------------------------------------------------- #
# This is the data used Abel, Crespo-Cuaresma, Brottrager, Muttarak (2018).
# As the paper is currently under revision, readers are recommended to directly
# contact <michael.brottrager@jku.at> for a current version of the paper including
# the detailed data description.
#
# Note that running the model on the data provided below takes some
# considerable amount of time.
#
# Please consider looking at the accompaning vignette.
#
# ----------------------------------------------------------------------------------- #
# data(ConflictMigration, package="ssdeR")
#
# Results <- ssdeR(formula = asylum_seekers_ij ~ stock_ij + dist_ij + I(dist_ij^2) +
#                    comlang_ij + colony_ij  +
#                    polity_i + pop_i + polity_i + pop_i +
#                    gdp_j,
#                  treatment = conflict_i ~ battledeaths_i + spei_i +
#                    polity_i + I(polity_i^2) +
#                    diaspora_i + ethMRQ_i ,
#                  selection = isflow_ij ~ dist_ij + I(dist_ij^2) +
#                    outmigration_i + inmigration_j ,
#                  cluster = c("iso_i","iso_j"),
#                  data = ConflictMigration)
#
# summary(Results)
# marginal.effects.ssdeR(Results, "treatment")
# marginal.effects.ssdeR(Results, "selection")
# marginal.effects.ssdeR(Results, "outcome")
# ----------------------------------------------------------------------------------- #



# ----------------------------------------------------------------------------------- #
# 2.  Simulation Data
# ----------------------------------------------------------------------------------- #
library(MASS)
set.seed(12072018)
n <- 5000
gamma1 <- c(-0.2, 0.2, 0.1)
gamma2 <- c(0.2, -0.1, 0.2)
gamma3 <- c(2, 0.5, 0.3)
beta2 <- 1
beta3 <- -2
rho120 <- -0.5
rho121 <- 0.5

muZ1 <- c(1,-1)
muZ2 <- c(1,1)
muZ3 <- c(2,0)

s_Z1 <- matrix(c(3, 0.1, 0.1, 3), nrow = 2, ncol = 2)
s_Z2 <- matrix(c(3, 0.1, 0.1, 3), nrow = 2, ncol = 2)
s_Z3 <- matrix(c(4, 0.3, 0.3, 4), nrow = 2, ncol = 2)

Z1 <- matrix(c(rep(1, n), mvrnorm(n = n, mu = muZ1, Sigma = s_Z1)), nrow = n, ncol = 3)
Z2 <- matrix(c(rep(1, n), mvrnorm(n = n, mu = muZ2, Sigma = s_Z2)), nrow = n, ncol = 3)
Z3 <- matrix(c(rep(1, n), mvrnorm(n = n, mu = muZ3, Sigma = s_Z3)), nrow = n, ncol = 3)

eps <- rnorm(n, 0, 1)


y1star <- Z1 %*% gamma1 + eps
y1 <- as.numeric(y1star>0)


eps0 <- rho120*eps + sqrt(1-rho120^2)*rnorm(n, 0, 1)
eps1 <- rho121*eps + sqrt(1-rho121^2)*rnorm(n, 0, 1)

y2star <- Z2 %*% gamma2 + y1*beta2 + y1*eps1 + (1-y1)*eps0
y2 <- as.numeric(y2star>0)

rho230 <- 0.5
rho231 <- 0.3

y3star <- Z3 %*% gamma3 + beta3*y1 + rnorm(n, 0, 1.5) +
  y1*eps1*rho231 + (1-y1)*eps0*rho230
cens <- ifelse(y2 == 1, 1, NA)
y3 <- y3star*cens

df <- matrix(cbind(y1, y2, y3, Z1[,c(2:3)],Z2[,c(2:3)],Z3[,c(2:3)]), nrow = n,
             dimnames = list(c(1:n), c("y1","y2", "y3",
                                       "d1", "d2", "s1",
                                       "s2", "x1", "x2")))

df <- as.data.frame(do.call("rbind", replicate(1, df, simplify = FALSE)))
rownames(df) <- 1:(n)

m1 <- ssdeR(formula = y3 ~ x1 + x2,
            treatment = y1 ~ d1 +d2,
            selection = y2 ~ s1 +s2,
            data = df)

summary(m1)
marginal.effects.ssdeR(m1, "treatment")
marginal.effects.ssdeR(m1, "selection")
marginal.effects.ssdeR(m1, "outcome")
# ----------------------------------------------------------------------------------- #

[Package ssdeR version 0.1.0 Index]