[1] "Created: Wed Apr  1 16:28:54 2015"

See the introduction for an overview.

library(lme4)
library(ggplot2)
options(digits=5,show.signif.stars=FALSE)

Load in and plot the data:

data(irrigation, package="faraway")
summary(irrigation)
     field   irrigation variety     yield     
 f1     :2   i1:4       v1:8    Min.   :34.8  
 f2     :2   i2:4       v2:8    1st Qu.:37.6  
 f3     :2   i3:4               Median :40.1  
 f4     :2   i4:4               Mean   :40.2  
 f5     :2                      3rd Qu.:42.7  
 f6     :2                      Max.   :47.6  
 (Other):4                                    
ggplot(irrigation, aes(y=yield, x=field, shape=variety, color=irrigation)) + geom_point()

plot of chunk unnamed-chunk-3

Fit the model

mmod <- lmer(yield ~ irrigation * variety + (1|field),data=irrigation)
print(summary(mmod),correlation=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ irrigation * variety + (1 | field)
   Data: irrigation

REML criterion at convergence: 45.4

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-0.745 -0.551  0.000  0.551  0.745 

Random effects:
 Groups   Name        Variance Std.Dev.
 field    (Intercept) 16.20    4.02    
 Residual              2.11    1.45    
Number of obs: 16, groups:  field, 8

Fixed effects:
                       Estimate Std. Error t value
(Intercept)               38.50       3.03   12.73
irrigationi2               1.20       4.28    0.28
irrigationi3               0.70       4.28    0.16
irrigationi4               3.50       4.28    0.82
varietyv2                  0.60       1.45    0.41
irrigationi2:varietyv2    -0.40       2.05   -0.19
irrigationi3:varietyv2    -0.20       2.05   -0.10
irrigationi4:varietyv2     1.20       2.05    0.58
anova(mmod)
Analysis of Variance Table
                   Df Sum Sq Mean Sq F value
irrigation          3   2.46   0.818    0.39
variety             1   2.25   2.250    1.07
irrigation:variety  3   1.55   0.517    0.25

No inferential information is provided although confidence intervals can be computed:

confint(mmod)
                          2.5 %   97.5 %
.sig01                  1.76705  5.22153
.sigma                  0.67350  1.84234
(Intercept)            33.80068 43.19933
irrigationi2           -5.44585  7.84585
irrigationi3           -5.94585  7.34585
irrigationi4           -3.14585 10.14585
varietyv2              -1.67942  2.87942
irrigationi2:varietyv2 -0.37198  0.43796
irrigationi3:varietyv2 -0.17198  0.63796
irrigationi4:varietyv2  1.22802  2.03796

aov approach

The model can be fit using aov:

lmod <- aov(yield ~ irrigation*variety + Error(field), irrigation)
summary(lmod)

Error: field
           Df Sum Sq Mean Sq F value Pr(>F)
irrigation  3   40.2    13.4    0.39   0.77
Residuals   4  138.0    34.5               

Error: Within
                   Df Sum Sq Mean Sq F value Pr(>F)
variety             1   2.25   2.250    1.07   0.36
irrigation:variety  3   1.55   0.517    0.25   0.86
Residuals           4   8.43   2.107               

The analysis takes account of the fact that the irrigation does not vary within the field. Note that the F-statistics are the same as the ANOVA table obtained originally from lmer.

pbkrtest package

We can test the interaction effect:

library(pbkrtest)
lmodr <- lmer(yield ~ irrigation * variety + (1|field),data=irrigation)
lmoda <- lmer(yield ~ irrigation + variety + (1|field),data=irrigation)
KRmodcomp(lmodr, lmoda)
F-test with Kenward-Roger approximation; computing time: 0.13 sec.
large : yield ~ irrigation * variety + (1 | field)
small : yield ~ irrigation + variety + (1 | field)
      stat  ndf  ddf F.scaling p.value
Ftest 0.25 3.00 4.00         1    0.86

Because of the balance in the data, the F-test requires no adjustment and the outcome is identical with that presented in the printed textbook. We can also test the main effect terms although we are not able to exactly reproduce the results in the text because we must frame the test as model comparisons in contrast to the ANOVA table in text.

First we test the variety term by removing it from the main effects model and making the comparison.

lmodi <- lmer(yield ~ irrigation + (1|field),data=irrigation)
KRmodcomp(lmoda, lmodi)
F-test with Kenward-Roger approximation; computing time: 0.16 sec.
large : yield ~ irrigation + variety + (1 | field)
small : yield ~ irrigation + (1 | field)
      stat  ndf  ddf F.scaling p.value
Ftest 1.58 1.00 7.00         1    0.25

The test-statistic and p-value are not identical with the text because the ANOVA table uses the denominator mean square and degrees of freedom from the full model with the interaction. We can test the irrigation effect in the same way:

lmodv <- lmer(yield ~  variety + (1|field),data=irrigation)
KRmodcomp(lmoda, lmodv)
F-test with Kenward-Roger approximation; computing time: 0.06 sec.
large : yield ~ irrigation + variety + (1 | field)
small : yield ~ variety + (1 | field)
      stat  ndf  ddf F.scaling p.value
Ftest 0.39 3.00 4.00         1    0.77

In this case, it is clear that neither irrigation or variety have an impact on the yield. The parametric bootstrap method can also be used here to test the fixed effects although in this case, due the balance of data, it is not going to provide any advantage over the KR-adjusted result.

RLRsim

We can test the field effect like this:

library(RLRsim)
pmod <- lmer(yield ~ irrigation + variety + (1|field),data=irrigation, REML=FALSE)
lmod <- lm(yield ~ irrigation + variety,data=irrigation)
exactLRT(pmod, lmod)

    simulated finite sample distribution of LRT. (p-value based on
    10000 simulated values)

data:  
LRT = 11.042, p-value = 4e-04

Clearly the fields do vary as is already apparent from the plot above.

lmerTest

library(lmerTest)
mmod <- lmer(yield ~ irrigation * variety + (1|field),data=irrigation)
print(summary(mmod),correlation=FALSE)
Linear mixed model fit by REML t-tests use Satterthwaite approximations
  to degrees of freedom [merModLmerTest]
Formula: yield ~ irrigation * variety + (1 | field)
   Data: irrigation

REML criterion at convergence: 45.4

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-0.745 -0.551  0.000  0.551  0.745 

Random effects:
 Groups   Name        Variance Std.Dev.
 field    (Intercept) 16.20    4.02    
 Residual              2.11    1.45    
Number of obs: 16, groups:  field, 8

Fixed effects:
                       Estimate Std. Error    df t value Pr(>|t|)
(Intercept)               38.50       3.03  4.49   12.73  0.00011
irrigationi2               1.20       4.28  4.49    0.28  0.79159
irrigationi3               0.70       4.28  4.49    0.16  0.87716
irrigationi4               3.50       4.28  4.49    0.82  0.45458
varietyv2                  0.60       1.45  4.00    0.41  0.70058
irrigationi2:varietyv2    -0.40       2.05  4.00   -0.19  0.85502
irrigationi3:varietyv2    -0.20       2.05  4.00   -0.10  0.92708
irrigationi4:varietyv2     1.20       2.05  4.00    0.58  0.59026
anova(mmod)
Analysis of Variance Table of type 3  with  Satterthwaite 
approximation for degrees of freedom
                   Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
irrigation           2.46   0.818     3     4   0.388   0.77
variety              2.25   2.250     1     4   1.068   0.36
irrigation:variety   1.55   0.517     3     4   0.245   0.86

The p-values match those previously obtained.

We can test the random effect term like this:

rand(mmod)
Analysis of Random effects Table:
      Chi.sq Chi.DF p.value
field   6.11      1    0.01

We see that the field effect is significant although we would prefer the RLRsim version of this test as being more accurate.

MCMCglmm

See the One way anova analysis for an introduction to MCMCglmm. We proceed directly with the final prior (using parameter expansion):

library(MCMCglmm)
eprior <- list(R=list(V=1,nu=0.02),G=list(G1=list(V=1,nu=0.02,alpha.V=1000)))
bmod <- MCMCglmm(yield ~ irrigation + variety, ~field, data=irrigation,verbose=FALSE,prior=eprior,pr=TRUE)
xyplot(bmod$Sol[,1:5])

plot of chunk unnamed-chunk-13

xyplot(bmod$Sol[,6:13])

plot of chunk unnamed-chunk-13

xyplot(log(bmod$VCV))

plot of chunk unnamed-chunk-13

The diagnostics are satisfactory

summary(bmod)

 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000 

 DIC: 61.554 

 G-structure:  ~field

      post.mean l-95% CI u-95% CI eff.samp
field      36.1    0.163      115     1000

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units      2.01    0.406     4.61     1000

 Location effects: yield ~ irrigation + variety 

             post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)     38.491   31.056   47.836     1181 <0.001
irrigationi2     0.823  -11.532   11.328     1000   0.87
irrigationi3     0.497  -11.672   12.944     1137   0.93
irrigationi4     3.947   -7.948   14.630     1000   0.41
varietyv2        0.715   -0.730    2.169     1000   0.28

The output shows pMCMC which is twice the posterior probability observed on the other side of zero from the posterior mean. This is the same way in which a bootstrap p-value would be calculated and might be viewed as a “Bayesian p-value” (a dubious analogy). These suggest no strong evidence of a difference in the irrigation methods or varieties.

We can plot the posterior distributions.

library(reshape2)
ref <- data.frame(bmod$Sol[,2:4])
colnames(ref) <- c("2-1","3-1","4-1")
rdf <- melt(ref)
colnames(rdf) <- c("irrigation","yield")
ggplot(rdf, aes(x=yield,color=irrigation))+geom_density()

plot of chunk unnamed-chunk-15

These are differences with the method 1 irrigation level. As we see all three overlap zero substantially hence the large p-value above. Furthermore, the three distributions all overlap considerably so there is little evidence of any difference between the four irrigation methods.

We check the variety contrast between 1 and 2:

ref <- data.frame(bmod$Sol[,5])
colnames(ref) <- "yield"
ggplot(ref,aes(x=yield))+geom_density()

plot of chunk unnamed-chunk-16

We can also look at the field posterior distributions:

ref <- data.frame(bmod$Sol[,6:13])
colnames(ref) <- as.character(1:8)
rdf <- melt(ref)
colnames(rdf) <- c("field","yield")
ggplot(rdf, aes(x=yield,color=field))+geom_density()

plot of chunk unnamed-chunk-17

We see two kinds of field posterior distribution - (1,2,3,4) and (5,6,7,8). We can look at the posterior distribution for the blend SD:

hist(sqrt(bmod$VCV[,1]), 50, xlab="yield", main="Field SD")

plot of chunk unnamed-chunk-18

and also for the error SD:

hist(sqrt(bmod$VCV[,2]), 50, xlab="yield", main="Error SD")

plot of chunk unnamed-chunk-19

We can also look at the joint distribution:

rdf <- data.frame(bmod$VCV)
ggplot(rdf, aes(x=sqrt(field),y=sqrt(units)))+geom_density2d()+geom_abline(int=0,slope=1)

plot of chunk unnamed-chunk-20

Clear that the field variation is much greater than the error variation.

Check on versions of R packages used:

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.2.2   MCMCglmm_2.21    ape_3.1-4        coda_0.16-1     
 [5] lattice_0.20-29  lmerTest_2.0-20  RLRsim_3.0       pbkrtest_0.4-1  
 [9] MASS_7.3-33      ggplot2_0.9.3.1  lme4_1.1-7       Rcpp_0.11.3     
[13] Matrix_1.1-4     knitr_1.6        rmarkdown_0.2.68

loaded via a namespace (and not attached):
 [1] bitops_1.0-6        caTools_1.16        cluster_1.15.2     
 [4] colorspace_1.2-4    corpcor_1.6.7       dichromat_2.0-0    
 [7] digest_0.6.4        evaluate_0.5.3      formatR_0.10       
[10] Formula_1.1-1       gdata_2.13.3        gplots_2.13.0      
[13] grid_3.1.1          gtable_0.1.2        gtools_3.3.1       
[16] Hmisc_3.14-3        htmltools_0.2.4     KernSmooth_2.23-12 
[19] labeling_0.2        latticeExtra_0.6-26 minqa_1.2.3        
[22] munsell_0.4.2       nlme_3.1-117        nloptr_1.0.4       
[25] numDeriv_2012.9-1   parallel_3.1.1      plyr_1.8.1         
[28] proto_0.3-10        RColorBrewer_1.0-5  scales_0.2.3       
[31] splines_3.1.1       stringr_0.6.2       survival_2.37-7    
[34] tensorA_0.36        tools_3.1.1         yaml_2.1.11