r - Can I pool imputed random effect model estimates using the mi package? -


it appears mi package has had pretty big rewrite @ point within past couple of years.

the "old" way of doing things well-outlined in following tutorial: http://thomasleeper.com/rcourse/tutorials/mi.html

the "new" way of doing things (sticking leeper's simulation demo) looks this:

#load mi library(mi) #set seed set.seed(10) #simulate data (with observations missing) x1 <- runif(100, 0, 5) x2 <- rnorm(100) y <- 2*x1 + 20*x2 + rnorm(100) mydf <- cbind.data.frame(x1, x2, y) mydf$x1[sample(1:nrow(mydf), 20, false)] <- na mydf$x2[sample(1:nrow(mydf), 10, false)] <- na  # convert missing_data.frame mydf_mdf <- missing_data.frame(mydf)  # impute mydf_imp <- mi(mydf_mdf) 

although function names have changed, pretty similar "old" way of doing things.

the biggest change (from vantage) replacement of following "old" functions

lm.mi(formula, mi.object, ...)

glm.mi(formula, mi.object, family = gaussian, ...)

bayesglm.mi(formula, mi.object, family = gaussian, ...)

polr.mi(formula, mi.object, ...)

bayespolr.mi(formula, mi.object, ...)

lmer.mi(formula, mi.object, rescale=false, ...)

glmer.mi(formula, mi.object, family = gaussian, rescale=false, ...).

previously, user compute model each imputed dataset using 1 of these functions , pool results using mi.pooled() (or coef.mi() if following leeper example).

in current version of mi (i have v1.0 installed), these last steps appear have been combined single function, pool(). pool() function appears read family , link function assigned variable during imputation process above , estimate model bayesglm using specified formula shown below.

# run models on imputed data , pool results summary(pool(y ~ x1 + x2, mydf_imp))  ##  ## call: ## pool(formula = y ~ x1 + x2, data = mydf_imp) ##  ## deviance residuals:  ##      min        1q    median        3q       max   ## -1.98754  -0.40923   0.03393   0.46734   2.13848   ##  ## coefficients: ##             estimate std. error t value pr(>|t|)     ## (intercept) -0.34711    0.25979  -1.336    0.215     ## x1           2.07806    0.08738  23.783 1.46e-13 *** ## x2          19.90544    0.11068 179.844  < 2e-16 *** ## --- ## signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (dispersion parameter gaussian family taken 0.7896688) ##  ##     null deviance: 38594.916  on 99  degrees of freedom ## residual deviance:    76.598  on 97  degrees of freedom ## aic: 264.74 ##  ## number of fisher scoring iterations: 7 

this looks coming close recovering our simulated beta values (2 , 20). in other words, behaving expected.

let's take larger set of data naively simulated random effect sake of getting grouping variable.

mydf2 <- data.frame(x1 = rep(runif(100, 0, 5), 20)                    ,x2 = rep(rnorm(100, 0, 2.5), 20)                    ,group_var = rep(1:20, each = 100)                    ,noise = rep(rnorm(100), 20))  mydf2$y <- 2*mydf2$x1 + 20*mydf2$x2 + mydf2$noise  mydf2$x1[sample(1:nrow(mydf2), 200, false)] <- na mydf2$x2[sample(1:nrow(mydf2), 100, false)] <- na  # convert missing_data.frame mydf2_mdf <- missing_data.frame(mydf2)  show(mydf2_mdf)  ## object of class missing_data.frame 2000 observations on 5 variables ##  ## there 4 missing data patterns ##  ## append '@patterns' missing_data.frame access corresponding pattern every observation or perhaps use table() ##  ##                 type missing method  model ## x1        continuous     200    ppd linear ## x2        continuous     100    ppd linear ## group_var continuous       0   <na>   <na> ## noise     continuous       0   <na>   <na> ## y         continuous       0   <na>   <na> ##  ##             family     link transformation ## x1        gaussian identity    standardize ## x2        gaussian identity    standardize ## group_var     <na>     <na>    standardize ## noise         <na>     <na>    standardize ## y             <na>     <na>    standardize 

since missing_data.frame() appears intepreting group_var continuous, use change() function mi reassign "un" "unordered categorical" , proceed above.

mydf2_mdf <- change(mydf2_mdf, y = "group_var", = "type", = "un"  )  # impute mydf2_imp <- mi(mydf2_mdf) 

now, unless version 1.0 of mi has removed functionality of previous versions (i.e. functionality available lmer.mi , glmer.mi), assume addition of random effect in formula should point pool() appropriate lme4 function. however, initial error message suggests not case.

# run models on imputed data , pool results summary(pool(y ~ x1 + x2 + (1|group_var), mydf2_imp)) ## warning in ops.factor(1, group_var): '|' not meaningful factors ## warning in ops.factor(1, group_var): '|' not meaningful factors ## error in if (prior.scale[j] < min.prior.scale) {: missing value true/false needed 

following warning message , extracting integers out of factor me estimate, results suggest pool() still estimating fixed-effect model bayesglm , holding attempted random-effect constant.

summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), mydf2_imp))  ##  ## call: ## pool(formula = y ~ x1 + x2 + (1 | as.numeric(as.character(group_var))),  ##     data = mydf2_imp) ##  ## deviance residuals:  ##      min        1q    median        3q       max   ## -1.93633  -0.69923   0.01073   0.56752   2.12167   ##  ## coefficients: ##                                               estimate std. error  t value ## (intercept)                                  1.383e-01  2.596e+02    0.001 ## x1                                           1.995e+00  1.463e-02  136.288 ## x2                                           2.000e+01  8.004e-03 2499.077 ## 1 | as.numeric(as.character(group_var))true -3.105e-08  2.596e+02    0.000 ##                                             pr(>|t|)     ## (intercept)                                        1     ## x1                                            <2e-16 *** ## x2                                            <2e-16 *** ## 1 | as.numeric(as.character(group_var))true        1     ## --- ## signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (dispersion parameter gaussian family taken 0.8586836) ##  ##     null deviance: 5384205.2  on 1999  degrees of freedom ## residual deviance:    1713.9  on 1996  degrees of freedom ## aic: 5377 ##  ## number of fisher scoring iterations: 4 

my questions are:

  1. is possible generate pooled random effects estimates using mi?, ,
  2. if yes, how?

you can specify fun argument pool() function change estimator. in case, summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), data = mydf2_imp, fun = lmer)). may or may not work, legal syntax. if fails, can use complete function created completed data.frames, call lmer on each, , average results yourself, dfs <- complete(mydf2_imp) estimates <- lapply(dfs, fun = lme4, formula = y ~ x1 + x2 + (1|as.numeric(as.character(group_var)))) rowmeans(sapply(estimates, fun = fixef))


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -