Logit Regression Model

Benjamin Mayr

2018-07-21

Introduction

The [logitr] package fits logit regression models using maximum likelihood estimation. The model assumes an underlying latent Binomial variable.

For a more detailed and complete model with more available and extractable information use stats:::glm. This package was made for learning purposes only and using the standard glm functions instead of this is recommended.


Implementation

As usual in many other regression packages for R [@R], the main model fitting function logitr() uses a formula-based interface and returns an (S3) object of class logitr:

logitr(formula, data, subset, na.action,
  model = TRUE, x = FALSE, y = TRUE,
  control = logitr_control(...), ...)

The underlying workhorse function is logitr_fit() which has a matrix interface and returns an unclassed list.

A number of standard S3 methods are provided:

Method Description
print() Simple printed display with coefficients
summary() Standard regression summary; returns summary.logitr object (with print() method)
coef() Extract coefficients
vcov() Associated covariance matrix
predict() Predictions for new data
fitted() Fitted values for observed data
residuals() Extract the Pearson Residuals
terms() Extract terms
model.matrix() Extract model matrix (or matrices)
nobs() Extract number of observations
logLik() Extract fitted log-likelihood

Due to these methods some standard utilities also work automatically, e.g., AIC() and BIC()


Illustration

As this package was created for learning purposes only and merely replicates an already existing function in R the goal was to also get results that match those output by stats:::glm as closely as possible. Because I am not working with any specific data sets I created a dataset arbitrarily using some of the randomization available in R.

## generating data
set.seed(123)
x1 <- rnorm(30,3,2) + 0.1 * c(1:30)
x2 <- rbinom(30,1,0.3)
x3 <- rpois(n=30,lambda = 4)
x3[16:30] <- x3[16:30] - rpois(n=15, lambda = 2)
xdat <- cbind(x1,x2,x3)
ydat <- c(rbinom(5,1,0.1), rbinom(10,1,0.25), rbinom(10,1,0.75), rbinom(5,1,0.9))
## comparison of model outputs
(m0 <- logitr(ydat~xdat))
## Probit model
## 
## Coefficients:
## (Intercept)       xdatx1       xdatx2       xdatx3  
##    1.489967    -0.004592    -0.700656    -0.345386  
## 
## Log-likelihood: -18.69 on 4 Df
(m1 <- glm(ydat~xdat, family = "binomial"))
## 
## Call:  glm(formula = ydat ~ xdat, family = "binomial")
## 
## Coefficients:
## (Intercept)       xdatx1       xdatx2       xdatx3  
##    1.489967    -0.004593    -0.700655    -0.345385  
## 
## Degrees of Freedom: 29 Total (i.e. Null);  26 Residual
## Null Deviance:       41.46 
## Residual Deviance: 37.37     AIC: 45.37
## comparing AIC and BIC
AIC(m0)
## [1] 45.37385
AIC(m1)
## [1] 45.37385
BIC(m0)
## [1] 50.97864
BIC(m1)
## [1] 50.97864

While there are slight deviations on the last digit of the coefficients, the results are otherwise mostly identical with both the AIC and BIC also being a perfect match. With this I am fairly satisfied with the result of this project and although I won’t be publishing this I learned an incredible ammount of new things.