seqimpute {seqimpute}R Documentation

Imputation of missing data in sequence analysis

Description

Imputation of missing data present in a dataset through the prediction based on either a multinomial, a linear or an ordinal regression model. In order to specify even more the prediction, fixed as well as time-dependant covariates be included in the model. The prediction of the missing values is based on the theory of Prof. Brendan Halpin. It considers a various amount of surrounding available information to perform the prediction process. In fact, we can among others specify np (the number of past variables taken into account) and nf (the number of future information taken into account).

Usage

seqimpute(
  OD,
  regr = "mlogit",
  k,
  np = 1,
  nf = 0,
  nfi = 1,
  npt = 1,
  available = TRUE,
  CO = matrix(NA, nrow = 1, ncol = 1),
  COt = matrix(NA, nrow = 1, ncol = 1),
  pastDistrib = FALSE,
  futureDistrib = FALSE,
  mi = 1,
  mice.return = FALSE,
  include = FALSE,
  noise = 0
)

Arguments

OD

matrix object containing sequences of a multinomial variable with missing data (coded as NA).

regr

character object corresponding to the type of regression model the user want to use to compute. The prediction (either multinomial with "mlogit", linear with "lm" or ordinal with "lrm") (default mlogit).

k

numeric object corresponding to the number of categories of the multinomial variable numbered from 1 to k.

np

numeric object corresponding to the number of previous observations in the imputation model of the internal gaps (default 1).

nf

numeric object corresponding to the number of future observations in the imputation model of the internal gaps (default 0).

nfi

numeric object corresponding to the number of future observations in the imputation model of the initial gaps (default 1).

npt

numeric object corresponding to the number of previous observations in the imputation model of the terminal gaps (default 1).

available

logical object allowing the user to choose whether to consider the already imputed data in our predictive model (available = TRUE) or not (available = FALSE) (default TRUE).

CO

data.frame object containing some covariates among which the user can choose in order to specify his model more accurately (default empty matrix 1x1 (matrix(NA,nrow=1,ncol=1))).

COt

data.frame object containing some time-dependent covariates that help specifying the predictive model more accurately (default empty matrix 1x1 (matrix(NA,nrow=1,ncol=1))).

pastDistrib

logical object allowing to take account of the past distribution in the multinomial logistic regression model or not (default FALSE).

futureDistrib

logical object allowing to take account of the future distribution in the multinomial logistic regression model or not (default FALSE).

mi

numeric object corresponding to the number of imputations the program is going to perform (default: 1).

mice.return

If TRUE, an object of class mids, that can be directly used by the mice, is returned.

include

logical object that determines, in the case where a data.frame is returnes, if the original dataset should be included or not. This parameter does not apply if mice.return=TRUE.

noise

numeric object adding a noise on the predicted variable pred determined by the multinomial model (by introducing a variance noise for each components of the vector pred) (the user can choose any value for noise, but we recommend to choose a rather relatively small value situated in the interval [0.005-0.03]) (default 0).

Details

The imputation process is divided into several steps. According to the location of the gaps of NA among the original dataset, we have defined 5 types of gaps:

- Internal Gaps (simple usual gaps)

- Initial Gaps (gaps situated at the very beginning of a sequence)

- Terminal Gaps (gaps situaed at the very end of a sequence)

- Left-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval [0,np])

- Right-hand side SLG (Specially Located Gaps) (gaps of which the ending location is included in the interval [ncol(OD)-nf,ncol(OD)])

Order of imputation of the gaps types: 1. Internal Gaps 2. Initial Gaps 3. Terminal Gaps 4. Left-hand side SLG 5. Right-hand side SLG

Value

Returns either an S3 object of class mids if mice.return = TRUE or a dataframe, where the imputed dataset are stacked vertically. In the second case, two columns are added: .imp integer that refers to the imputation number (0 corresponding to the original dataset if include=TRUE) and .id character corresponding to the rownames of the dataset to impute.

Author(s)

Andre Berchtold <andre.berchtold@unil.ch> Kevin Emery

References

HALPIN, Brendan, March 2013. Imputing Sequence Data : Extensions to initial and terminal gaps, Stata's mi. Unviversity of Limerick Department of Sociology Working Paper Series. Working Paper WP2013-01, p.3. Available at : http://www.ul.ie/sociology/pubs/wp2013-01.pdf

Examples

## Not run: 
data(OD, CO, COt)

RESULT <- seqimpute(OD=OD, k=2, np=1, nf=0, nfi=1, npt=1, CO=CO, COt=COt, mi=1)

## End(Not run)

[Package seqimpute version 1.2.1 Index]