Quasi Variances for Model Coefficients

Computes a set of quasi variances (and corresponding quasi standard errors) for estimated model coefficients relating to the levels of a categorical (i.e., factor) explanatory variable. For details of the method see Firth (2000), Firth (2003) or Firth and de Menezes (2004). Quasi variances generalize and improve the accuracy of “floating absolute risk” (Easton et al., 1991). This device for economical model summary was first suggested by Ridout (1989).

Usage

qvcalc(object, ...)

# Default S3 method
qvcalc(
  object,
  factorname = NULL,
  coef.indices = NULL,
  labels = NULL,
  dispersion = NULL,
  estimates = NULL,
  modelcall = NULL,
  ...
)

# S3 method for class 'coxph'
qvcalc(object, factorname = NULL, coef.indices = NULL, ...)

# S3 method for class 'itempar'
qvcalc(object, ...)

# S3 method for class 'lm'
qvcalc(object, factorname = NULL, coef.indices = NULL, dispersion = NULL, ...)

# S3 method for class 'survreg'
qvcalc(object, factorname = NULL, coef.indices = NULL, ...)

Arguments

object: For qvcalc.default, this is the covariance (sub)matrix for the parameters of interest (including any that may have been constrained to zero). For the generic qvcalc, the object can be any object for which the relevant S3 method has been defined. These currently include many types of regression model (via qvcalc.lm), including objects of classes coxph and survreg; and also objects of class itempar.
...: other arguments to pass to qv.default
factorname: Either NULL, or a character vector of length 1
coef.indices: Either NULL, or a numeric vector of length at least 3
labels: An optional vector of row names for the qvframe component of the result (redundant if object is a model)
dispersion: an optional scalar multiplier for the covariance matrix, to cope with overdispersion for example
estimates: an optional vector of estimated coefficients (redundant if object is a model, for example)
modelcall: optional, the call expression for the model of interest (redundant if object is a model with its own call component)

Value

A list of class qv, with components

covmat: the full variance-covariance matrix for the estimated coefficients corresponding to the factor of interest
qvframe: a data frame with variables estimate, SE, quasiSE and quasiVar, the last two being a quasi standard error and quasi-variance for each level of the factor of interest
relerrs: relative errors for approximating the standard errors of all simple contrasts
factorname: the factor name if given
coef.indices: the coefficient indices if given
modelcall: if object is a model, object$call; otherwise NULL

Details

The qvcalc.default method is the computational backend for all other, class-specific methods.

In qvcalc.default, none of the arguments other than object is used in computing the result. The remaining arguments are simply passed through to the result object as components to help with record-keeping etc.

In qvcalc.lm, at least one of factorname or coef.indices must be non-NULL. The value of coef.indices, if non-NULL, determines which rows and columns of the model's variance-covariance matrix to use. If coef.indices contains a zero, then an extra row and column are included at the indicated position, to represent the zero variances and covariances associated with a reference level. If coef.indices is NULL, then factorname should be the name of a factor effect in the model, and is used in order to extract the necessary variance-covariance estimates.

For qvcalc.itempar, the "itempar" object must have the full variance-covariance matrix in its "vcov" attribute, and must have its "alias" attribute be TRUE. These attributes result from use of the default arguments vcov = TRUE, alias = TRUE when the itempar function is called.

Ordinarily the quasi variances are positive and so their square roots (the quasi standard errors) exist and can be used in plots, etc.

Occasionally one (and only one) of the quasi variances is negative, and so the corresponding quasi standard error does not exist (it appears as NaN). This is fairly rare in applications, and when it occurs it is because the factor of interest is strongly correlated with one or more other predictors in the model. It is not an indication that quasi variances are inaccurate. An example is shown below using data from the car package: the quasi variance approximation is exact (since type has only 3 levels), and there is a negative quasi variance. The quasi variances remain perfectly valid (they can be used to obtain inference on any contrast), but it makes no sense to plot `comparison intervals' in the usual way since one of the quasi standard errors is not a real number.

References

Easton, D. F, Peto, J. and Babiker, A. G. A. G. (1991) Floating absolute risk: an alternative to relative risk in survival and case-control analysis avoiding an arbitrary reference group. Statistics in Medicine 10, 1025–1035. doi:10.1002/sim.4780100703

Firth, D. (2000) Quasi-variances in Xlisp-Stat and on the web. Journal of Statistical Software 5.4, 1–13. doi:10.18637/jss.v005.i04

Firth, D. (2003) Overcoming the reference category problem in the presentation of statistical models. Sociological Methodology 33, 1–18. doi:10.1111/j.0081-1750.2003.t01-1-00125.x

Firth, D. and de Mezezes, R. X. (2004) Quasi-variances. Biometrika 91, 65–80. doi:10.1093/biomet/91.1.65

McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.

Menezes, R. X. de (1999) More useful standard errors for group and factor effects in generalized linear models. D.Phil. Thesis, Department of Statistics, University of Oxford.

Ridout, M.S. (1989). Summarizing the results of fitting generalized linear models to data from designed experiments. In: Statistical Modelling: Proceedings of GLIM89 and the 4th International Workshop on Statistical Modelling held in Trento, Italy, July 17–21, 1989 (A. Decarli et al., eds.), pp 262–269. New York: Springer.

Author

David Firth, d.firth@warwick.ac.uk

Examples


##  Overdispersed Poisson loglinear model for ship damage data
##  from McCullagh and Nelder (1989), Sec 6.3.2
if (require(MASS)) {
    data(ships)
    ships$year <- as.factor(ships$year)
    ships$period <- as.factor(ships$period)
    shipmodel <- glm(formula = incidents ~ type + year + period,
                     family = quasipoisson,
                     data = ships,
                     subset = (service > 0),
                     offset = log(service))
    shiptype.qv <- qvcalc(shipmodel, "type")

    ## We can plot "comparison intervals" as follows:
    ##   plot(shiptype.qv, xlab = "ship type")

    ## An equivalent result by using the coef.indices argument instead:
    ##   shiptype.qv2 <- qvcalc(shipmodel, coef.indices = c(0, 2:5))

    summary(shiptype.qv, digits = 4)
}
#> Model call:  glm(formula = incidents ~ type + year + period, family = quasipoisson,      data = ships, subset = (service > 0), offset = log(service)) 
#> Factor name:  type 
#>       estimate     SE quasiSE quasiVar
#>     A  0.00000 0.0000  0.2010  0.04039
#>     B -0.54334 0.2309  0.1127  0.01270
#>     C -0.68740 0.4279  0.3753  0.14081
#>     D -0.07596 0.3779  0.3239  0.10491
#>     E  0.32558 0.3067  0.2322  0.05390
#> Worst relative errors in SEs of simple contrasts (%):  -0.7 0.9 
#> Worst relative errors over *all* contrasts (%):  -2.1 1.6 

## Example of a "coxph" model
if(require(survival)) {
    data("veteran", package = "survival")
    cancer_model <- coxph(Surv(time,status) ~ celltype, data = veteran)
    celltype_qv <- qvcalc(cancer_model, "celltype")
    summary(celltype_qv)
}
#> Loading required package: survival
#> Warning: data set ‘veteran’ not found
#> Model call:  coxph(formula = Surv(time, status) ~ celltype, data = veteran) 
#> Factor name:  celltype 
#>                estimate        SE   quasiSE   quasiVar
#>     squamous  0.0000000 0.0000000 0.2023772 0.04095651
#>     smallcell 1.0012532 0.2535074 0.1501828 0.02255487
#>     adeno     1.1477130 0.2928805 0.2049427 0.04200152
#>     large     0.2301455 0.2772930 0.1991861 0.03967510
#> Worst relative errors in SEs of simple contrasts (%):  -1.7 2.4 
#> Worst relative errors over *all* contrasts (%):  -4.1 3.1 

## Example of a "survreg" model
if(require(survival)) {
    data("veteran", package = "survival")
    cancer_model2 <- survreg(Surv(time,status) ~ celltype, data = veteran,
                             dist = "weibull")
    celltype_qv2 <- qvcalc(cancer_model2, "celltype")
    summary(celltype_qv2)
}
#> Warning: data set ‘veteran’ not found
#> Model call:  survreg(formula = Surv(time, status) ~ celltype, data = veteran,      dist = "weibull") 
#> Factor name:  celltype 
#>                 estimate        SE   quasiSE   quasiVar
#>     squamous   0.0000000 0.0000000 0.1852658 0.03432341
#>     smallcell -1.0831923 0.2405345 0.1537001 0.02362371
#>     adeno     -1.2162022 0.2743860 0.2022180 0.04089213
#>     large     -0.2627843 0.2745644 0.2024398 0.04098189
#> Worst relative errors in SEs of simple contrasts (%):  -0.1 0.1 
#> Worst relative errors over *all* contrasts (%):  -0.2 0.1 

## Based on an example from ?itempar
if(require(psychotools)) {
    data("VerbalAggression", package = "psychotools")
    raschmod <- raschmodel(VerbalAggression$resp2)
    ip1 <- itempar(raschmod)
    qv1 <- qvcalc(ip1)
    summary(qv1) }
#> Loading required package: psychotools
#>                    estimate        SE   quasiSE   quasiVar
#>     S1WantCurse -1.38337955 0.1400077 0.1427898 0.02038892
#>     S1DoCurse   -1.38338563 0.1400078 0.1427899 0.02038896
#>     S1WantScold -0.73070582 0.1306459 0.1328148 0.01763977
#>     S1DoScold   -0.55660577 0.1293742 0.1314680 0.01728384
#>     S1WantShout -0.24905230 0.1283297 0.1303847 0.01700016
#>     S1DoShout    0.69813569 0.1349252 0.1376080 0.01893598
#>     S2WantCurse -1.90929270 0.1534737 0.1571194 0.02468649
#>     S2DoCurse   -1.03672005 0.1341048 0.1364969 0.01863141
#>     S2WantScold -0.87273748 0.1320543 0.1343125 0.01803984
#>     S2DoScold   -0.11314599 0.1283548 0.1304310 0.01701224
#>     S2WantShout -0.18103325 0.1283049 0.1303675 0.01699569
#>     S2DoShout    1.31200512 0.1478902 0.1515414 0.02296478
#>     S3WantCurse -0.69552829 0.1303489 0.1324995 0.01755613
#>     S3DoCurse    0.04031978 0.1287445 0.1308728 0.01712769
#>     S3WantScold  0.51356595 0.1324282 0.1349026 0.01819870
#>     S3DoScold    1.33478884 0.1485182 0.1522130 0.02316879
#>     S3WantShout  1.35770834 0.1491611 0.1529003 0.02337851
#>     S3DoShout    2.87092053 0.2219058 0.2297809 0.05279924
#>     S4WantCurse -1.24501930 0.1373885 0.1399977 0.01959934
#>     S4DoCurse   -0.87273748 0.1320543 0.1343125 0.01803984
#>     S4WantScold  0.17794106 0.1294241 0.1316244 0.01732499
#>     S4DoScold    0.21263744 0.1296453 0.1318676 0.01738906
#>     S4WantShout  0.87109769 0.1378342 0.1407482 0.01981006
#>     S4DoShout    1.84022315 0.1654484 0.1702428 0.02898261
#> Worst relative errors in SEs of simple contrasts (%):  -0.7 1.3 
#> Worst relative errors over *all* contrasts (%):  -6.9 1.4 

##  Example of a negative quasi variance
##  Requires the "car" package
if (FALSE) { # \dontrun{
    library(car)
    data(Prestige)
    attach(Prestige)
    mymodel <- lm(prestige ~ type + education)
    library(qvcalc)
    type.qvs <- qvcalc(mymodel, "type")
    ##  Warning message:
    ##  In sqrt(qv) : NaNs produced
    summary(type.qvs)
    ##  Model call:  lm(formula = prestige ~ type + education)
    ##  Factor name:  type
    ##          estimate       SE  quasiSE  quasiVar
    ##    bc    0.000000 0.000000 2.874361  8.261952
    ##    prof  6.142444 4.258961 3.142737  9.876793
    ##    wc   -5.458495 2.690667      NaN -1.022262
    ##  Worst relative errors in SEs of simple contrasts (%):  0 0
    ##  Worst relative errors over *all* contrasts (%):  0 0
    plot(type.qvs)
    ##  Error in plot.qv(type.qvs) :  No comparison intervals available,
    ##  since one of the quasi variances is negative.  See ?qvcalc for more.
} # }