Download Experimental Design and Data Analysis for Biologists - Quinn & Keough - Cambridge 2002 PDF

TitleExperimental Design and Data Analysis for Biologists - Quinn & Keough - Cambridge 2002
TagsAnalysis Of Variance Regression Analysis Analysis Of Covariance Hypothesis Statistical Hypothesis Testing
File Size5.7 MB
Total Pages557
Table of Contents
                            Cover Page
Book Info
Title Page
ISBN 0521811287
Contents (with page links)
	1 Introduction
	2 Estimation
	3 Hypothesis testing
	4 Graphical exploration of data
	5 Correlation and regression
	6 Multiple and complex regression
	7 Design and power analysis
	8 Comparing groups or treatments – analysis of variance
	9 Multifactor analysis of variance
	10 Randomized blocks and simple repeated measures: unreplicated two factor designs
	11 Split-plot and repeated measures designs: partly nested analyses of variance
	12 Analyses of covariance
	13 Generalized linear models and logistic regression
	14 Analyzing frequencies
	15 Introduction to multivariate analyses
	16 Multivariate analysis of variance and discriminant analysis
	17 Principal components and correspondence analysis
	18 Multidimensional scaling and cluster analysis
	19 Presentation of results
	References, Index
	In this book
	Learning by example
	This book is a bridge
	Some acknowledgments
Chapter 1 Introduction
	1.1 Scientific method
	1.2 Experiments and other tests
	1.3 Data, observations and variables
	1.4 Probability
	1.5 Probability distributions
Chapter 2 Estimation
	2.1 Samples and populations
	2.2 Common parameters and statistics
	2.3 Standard errors and confidence intervals for the mean
	2.4 Methods for estimating parameters
	2.5 Resampling methods for estimation
	2.6 Bayesian inference – estimation
Chapter 3 Hypothesis testing
	3.1 Statistical hypothesis testing
	3.2 Decision errors
	3.3 Other testing methods
	3.4 Multiple testing
	3.5 Combining results from statistical tests
	3.6 Critique of statistical hypothesis testing
	3.7 Bayesian hypothesis testing
Chapter 4 Graphical exploration of data
	4.1 Exploratory data analysis
	4.2 Analysis with graphs
	4.3 Transforming data
	4.4 Standardizations
	4.5 Outliers
	4.6 Censored and missing data
	4.7 General issues and hints for analysis
Chapter 5 Correlation and regression
	5.1 Correlation analysis
	5.2 Linear models
	5.3 Linear regression analysis
	5.4 Relationship between regression and correlation
	5.5 Smoothing
	5.6 Power of tests in correlation and regression
	5.7 General issues and hints for analysis
Chapter 6 Multiple and complex regression
	6.1 Multiple linear regression analysis
	6.2 Regression trees
	6.3 Path analysis and structural equation modeling
	6.4 Nonlinear models
	6.5 Smoothing and response surfaces
	6.6 General issues and hints for analysis
Chapter 7 Design and power analysis
	7.1 Sampling
	7.2 Experimental design
	7.3 Power analysis
	7.4 General issues and hints for analysis
Chapter 8 Comparing groups or treatments – analysis of variance
	8.1 Single factor (one way) designs
	8.2 Factor effects
	8.3 Assumptions
	8.4 ANOVA diagnostics
	8.5 Robust ANOVA
	8.6 Specific comparisons of means
	8.7 Tests for trends
	8.8 Testing equality of group variances
	8.9 Power of single factor ANOVA
	8.10 General issues and hints for analysis
Chapter 9 Multifactor analysis of variance
	9.1 Nested (hierarchical) designs
	9.2 Factorial designs
	9.3 Pooling in multifactor designs
	9.4 Relationship between factorial and nested designs
	9.5 General issues and hints for analysis
Chapter 10 Randomized blocks and simple repeated measures: unreplicated two factor designs
	10.1 Unreplicated two factor experimental designs
	10.2 Analyzing RCB and RM designs
	10.3 Interactions in RCB and RM models
	10.4 Assumptions
	10.5 Robust RCB and RM analyses
	10.6 Specific comparisons
	10.7 Efficiency of blocking (to block or not to block?)
	10.8 Time as a blocking factor
	10.9 Analysis of unbalanced RCB designs
	10.10 Power of RCB or simple RM designs
	10.11 More complex block designs
	10.12 Generalized randomized block designs
	10.13 RCB and RM designs and statistical software
	10.14 General issues and hints for analysis
Chapter 11 Split-plot and repeated measures designs: partly nested analyses of variance
	11.1 Partly nested designs
	11.2 Analyzing partly nested designs
	11.3 Assumptions
	11.4 Robust partly nested analyses
	11.5 Specific comparisons
	11.6 Analysis of unbalanced partly nested designs
	11.7 Power for partly nested designs
	11.8 More complex designs
	11.9 Partly nested designs and statistical software
	11.10 General issues and hints for analysis
Chapter 12 Analyses of covariance
	12.1 Single factor analysis of covariance (ANCOVA)
	12.2 Assumptions of ANCOVA
	12.3 Homogeneous slopes
	12.4 Robust ANCOVA
	12.5 Unequal sample sizes (unbalanced designs)
	12.6 Specific comparisons of adjusted means
	12.7 More complex designs
	12.8 General issues and hints for analysis
Chapter 13 Generalized linear models and logistic regression
	13.1 Generalized linear models
	13.2 Logistic regression
	13.3 Poisson regression
	13.4 Generalized additive models
	13.5 Models for correlated data
	13.6 General issues and hints for analysis
Chapter 14 Analyzing frequencies
	14.1 Single variable goodness-of-fit tests
	14.2 Contingency tables
	14.3 Log-linear models
	14.4 General issues and hints for analysis
Chapter 15 Introduction to multivariate analyses
	15.1 Multivariate data
	15.2 Distributions and associations
	15.3 Linear combinations, eigenvectors and eigenvalues
	15.4 Multivariate distance and dissimilarity measures
	15.5 Comparing distance and/or dissimilarity matrices
	15.6 Data standardization
	15.7 Standardization, association and dissimilarity
	15.8 Multivariate graphics
	15.9 Screening multivariate data sets
	15.10 General issues and hints for analysis
Chapter 16 Multivariate analysis of variance and discriminant analysis
	16.1 Multivariate analysis of variance (MANOVA)
	16.2 Discriminant function analysis
	16.3 MANOVA vs discriminant function analysis
	16.4 General issues and hints for analysis
Chapter 17 Principal components and correspondence analysis
	17.1 Principal components analysis
	17.2 Factor analysis
	17.3 Correspondence analysis
	17.4 Canonical correlation analysis
	17.5 Redundancy analysis
	17.6 Canonical correspondence analysis
	17.7 Constrained and partial “ordination”
	17.8 General issues and hints for analysis
Chapter 18 Multidimensional scaling and cluster analysis
	18.1 Multidimensional scaling
	18.2 Classification
	18.3 Scaling (ordination) and clustering for biological data
	18.4 General issues and hints for analysis
Chapter 19 Presentation of results
	19.1 Presentation of analyses
	19.2 Layout of tables
	19.3 Displaying summaries of the data1
	19.4 Error bars
	19.5 Oral presentations
	19.6 General issues and hints
Document Text Contents
Page 1

Page 2

This page intentionally left blank

Page 278

can be reduced. First, the design is nearly always
unreplicated, so there is only one replicate unit
within each of the cells used. By definition, this
means that there is no estimate of the

2 so some

higher order interaction terms must be used as
the residual for hypothesis tests. Second, the
logical basis of these designs is the assumption
that most of the important effects will be main
effects or simple (e.g. two factor) interactions, and
complex interactions will be relatively unimpor-
tant. The experiment is conducted using a subset
of cells that allows estimation of main effects and
simple interactions but confounds these with
higher order interactions that are assumed to be

The combination of factor levels to be used is
tricky to determine but, fortunately, most statisti-
cal software now includes experimental design
modules that generate fractional factorial design
structures. This software often includes methods
such as Plackett–Burman and Taguchi designs,
which set up fractional factorial designs in ways
that try to minimize confounding of main effects
and simple interactions.

A recent biological example of such a design
comes from Dufour & Berland (1999), who studied
the effects of a variety of different nutrients and
other compounds on primary productivity in sea-
water collected from near atolls and from ocean
sites. Part of their experiment involved eight
factors (nutrients N, P, and Si; trace metals Fe, Mo
and Mn; combination of B12, biotin and thiamine
vitamins; ethylene diamine tetra-acetic acid
EDTA) each with two levels. This is a 28 factorial
experiment. They only had 16 experimental units
(test tubes on board ship) so they used a fractional
factorial design that allowed tests of main effects,
five of the six two factor interactions and two of
the four three factor interactions.

It is difficult to recommend these designs for
routine use in biological research. We know that
interactions between factors are of considerable
biological importance and it is difficult to decide
a priori in most situations which interactions are
less likely than others. Possibly such designs have
a role in tightly controlled laboratory experi-
ments where previous experience suggests that
higher order interactions are not important.
However, the main application of these designs

will continue to be in industrial settings where
additivity between factor combinations is a realis-
tic expectation. Good references include Cochran
& Cox (1957), Kirk (1995) and Neter et al. (1996).

Mixed factorial and nested designs
Designs that combine both nested and factorial
factors are common in biology. One design is
where one or more factors, usually random, are
nested within two or more crossed factors. For
example, Twombly (1996) used a clever experi-
ment to examine the effects of food concentration
for different sibships (eggs from the same female
at a given time) on the development of the fresh-
water copepod Mesocyclops edax. There were four
food treatments, a fixed factor: constant high food
during development, switch from high food to
low food at naupliar stage three, the same switch
at stage four, and also at stage five. There were 15
sibships, which represented a random sample of
possible sibships. For each combination of food
treatment and sibship, four replicate Petri dishes
were used and there were two individual nauplii
in each dish. Two response variables were
recorded: age at metamorphosis and size at meta-
morphosis. The analyses are presented in Table
9.23 and had treatment and sibship as main
effects. Because sibship was random, the food
treatment effect was tested against the food treat-
ment by sibship interaction. Dishes were nested
within the combinations of treatment and sibship
and this factor was the denominator for tests of
sibship and the food treatment by sibship interac-
tion. For age at metamorphosis, individual
nauplii provided the residual term and the linear
model was:

(age at metamorphosis)

(food treatment)
� (sibship)


(food treatment�sibship)

(dish within food treatment and sibship)



For size at metamorphosis, replicate measure-
ments were taken on each individual nauplius so
the effect of individuals nested within dishes
nested within each treatment and sibship combi-
nation could also be tested against the residual
term, the variation between replicate measure-
ments. This linear model was:


Page 279

(size at metamorphosis)


(food treatment)
� (sibship)


(food treatment�sibship)

(dish within food treatment and sibship)

(individual within dish within food
treatment and sibship)



Note that both models could be simplified to a
two factor ANOVA model by simply using means
for each dish as replicates within each treatment
and sibship combination. We would end up with
the same SS and F tests as in the factorial part of
the complete analyses. Note also that individuals
within each dish (and replicate measurements on
each individual) simply contribute to the dish
means but make no real contribution to the df for
tests of main effects or their interaction. Power for
the tests of sibship and the treatment by sibship
interaction could only be improved by increasing
the number of dishes and for the test of treatment
by increasing the number of sibships.

Some designs require models with more
complex mixtures of nested and crossed factors.
For example, factor B might be nested within factor

A but crossed with factor C. These partly nested
linear models will be examined in Chapter 12.

9.2.13 Power and design in factorial

For factorial designs, power calculations are sim-
plest for designs in which all factors are fixed.
Power for tests of main effects can be done using
the principles described in the previous chapter,
effectively treating each main effect as a one
factor design. Power tests for interaction terms are
more difficult, mainly because it is harder to
specify an appopriate form of the effect size. Just
as different patterns of means lead to different
non-centrality parameters in one factor designs,
combining two or more factors generates a large
number of treatment combinations, and a great
diversity of non-centrality parameters. Calcula-
ting the non-centrality parameter (and hence,
power) is not difficult, but specifying exactly
which pattern of means would be expected under
some alternative hypothesis is far more difficult.
Despite the difficulty specifying effects, the fixed
effect factorial models have the advantage that


Table 9.23 ANOVA table for experiment from Twombly (1996) examining the effects of treatment (fixed factor)
and sibship (random factor) on age at metamorphosis and size at metamorphosis of copepods, with randomly
chosen dishes for each combination of treatment and sibship for age and randomly chosen individual copepods
from each randomly chosen dish for size

Age at metamorphosis

Source Denominator df

Treatment Treatment�Sibship 3, 42
Sibship Dish (Treatment�Sibship) 14, 153
Treatment�Sibship Dish (Treatment�Sibship) 42, 153
Dish (Treatment�Sibship) Residual 153, 166

Size at metamorphosis

Source Denominator df

Treatment Treatment�Sibship 3, 42
Sibship Dish (Treatment�Sibship) 14, 10
Treatment�Sibship Dish (Treatment�Sibship) 42, 101
Dish (Treatment�Sibship) Individual (Dish (Treatment�Sibship)) 101, 141
Individual (Dish (Treatment�Sibship)) Residual 141, 698

Page 556

residuals (cont.)
linear regression models 87, 95–6
logistic regression models 368–370
multiple regression models 125
nested ANOVA models 213–14
nonlinear models 152
partly nested ANOVA models 313
principal components analysis 453
randomized complete block and

repeated measures ANOVA
models 271–2, 277–8

single factor ANCOVA model 347
single factor ANOVA model 184,

two way contingency tables 387–8

response surfaces 153
restricted maximum likelihood

estimation (REML) 190
ridge regression 129–30
RM designs see repeated measures

(RM) designs
robust analysis of covariance

(ANCOVA) 352–3
robust correlation 76
robust factorial ANOVA 250
robust MANOVA 434
robust pairwise multiple

comparisons 201
robust parametric tests 45
robust partly nested ANOVA 320
robust principal components analysis

robust randomized complete block

ANOVA 284–5
robust regression 104–6, 143
robust single factor analysis of

variance (ANOVA) 195
randomization tests 196
rank-based (non-parametric) tests

tests with heterogeneous variances

running means 107
Ryan’s test 200–1

sample coefficient of variation 17
sample range 16
sample size 14, 51–2, 157
sample space 52–3
sample standard deviation 17
sample variance 16, 20, 22

and populations 14–15
exploring 58–62

sampling designs 155–7

sampling distribution of the mean 18
scalable decision criteria 45

and clustering for biological data

constrained 469–70
correspondence analysis 461–2
multidimensional 473–88, 492
principal components analysis

scanned images 509
scatterplot matrix (SPLOM) 61–2
scatterplots 61, 502–3

linear regression 96–7
multiple linear regression 125–6

Scheffe’s test 201
Schwarz Bayesian information

criterion (BIC) 139
scientific method 1–5
scree diagram 452
screening multivariate data sets

missing observations 419–20
multivariate outliers 419

sequential Bonferroni 50
significance levels 33

arbitrary 53
simple main effects test 252–3
simple random sampling 14, 155
single factor designs 173–6, 184–6

assumptions 191–4
independence 193–4
normality 192
variance homogeneity 193

comparing models 186–7
diagnostics 194–5
linear models 178–84
null hypothesis 186–7
power analysis 204–6
presentation of results 496
unequal sample sizes 187–8

single factor MANOVA
linear combination 426, 430
null hypothesis 430–2

single variable goodness-of-fit tests

size of sample 14, 51–2, 157
skewed distributions 10–11, 62–3

transformations 65–6
small sample sizes, two way

contingency tables 388
smoothing functions 107–9, 152–3
Spearman’s rank correlation

coefficient (r
) 76

specific comparisons of means 196–7

planned comparisons or contrasts

specific contrasts versus unplanned
pairwise comparisons 201

unplanned pairwise comparisons

partly nested designs 318–20
randomized complete block and

repeated measures designs

splines 108
split-plot designs 301–5, 309
spread 16–17, 60
square root transformation 65
standard deviation 16–17
standard error of the mean 16, 18–19
standard errors for other statistics

standard normal distribution 10
standard scores 18
standardizations 67–8

multivariate data 415–17
standardized partial regression

slopes 123–4
standardized regression slopes 86,

standardized residuals 95
statistical analysis, role in scientific

method 5
statistical hypothesis testing 32–54

alternatives to 53–4
associated probability and Type I

error 34–5
classical 32–4
critique 51

arbitrary significance levels 53
dependence on sample size and

stopping rules 51–2
null hypothesis always false 53
P values as measure of evidence

sample space – relevance of data

not observed 52–3
Fisher’s approach 33–4
hybrid approach 34
hypothesis tests for a single

population 35–6
hypothesis tests for two

populations 37–9
Neyman and Pearson’s approach

one- and two-tailed tests 37
parametric tests and their

assumptions 39–42


Page 557

statistical population 14
statistical significance versus

biological significance 44
statistical software

and MANOVA 433
and partly nested designs 335–7
and randomized complete block

designs 298–9
statistics 14

probability distributions 12–13
step-down analysis, MANOVA 432
stepwise variable selection 140
stopping rules 52
stratified sampling 156
structural equation modeling (SEM)

146–7, 150
Student–Neuman–Keuls (SNK) test 200
studentized residuals 95, 194
Student’s t distribution 12
systematic component, generalized

linear models 359
systematic sampling 156–7

t distribution 12–13, 19
t statistic 33, 35
t tests 35–6

assumptions 39–42
tables, layout of 497–8
test statistics 32–3
theoretic models 2–3
three way contingency tables 388–9

complete independence 393
conditional independence and

odds ratios 389–93
log-linear models 395–400
marginal independence and odds

ratios 393
time as blocking factor 287
tolerance values, multiple regression

transformations 96, 218, 415

and additivity 67, 280
and distributional assumptions

and linearity 67
angular transformations 66
arcsin transformation 66
Box–Cox family of transformations

factorial ANOVA models 249–50
fourth root transformations 65

linear regression models 98
logarithmic transformation 65
multiple linear regression models

power transformations 65–6
rank transformation 66–7
reciprocal transformations 65
square root transformation 65

transforming data 64–7
translation 67
treatment–contrast interactions 254
trends, tests for in single factor

ANOVA 202–3
trimmed mean 15
truncated data 69–70
Tukey’s HSD test 199–200
Tukey’s test for (non)-additivity

two way contingency tables 381–2

log-linear models 394–5
null hypothesis 385–6, 394–5
odds and odds ratios 386–7
residuals 387–8
small sample sizes 388
table structure 382–5

two-tailed tests 37
Type I errors 34–5, 41–4

and multiple testing 48–9
graphical representation 43

Type II errors 34, 42–4, 164
graphical representation 43

unbalanced data 69
unbalanced designs

factorial designs 241–7
nested designs 216–7
partly nested designs 322–3
randomized complete block

designs 287–9
single factor (one way) designs

uncertainty 7
unequal sample sizes 69, 187–8

factorial designs 242–4
nested designs 216
partly nested designs 322–3
single factor (one way) designs

univariate ANOVAs, MANOVA 432

univariate F tests, adjusted 282–3,

unplanned comparisons of adjusted
means, ANCOVA 353

unplanned pairwise comparisons of
means 199–201

versus specific contrasts 201
unreplicated two factor experimental

designs 263–8
interactions in 277–80

variability 16–17
variable selection procedure,

multiple regression 139–40
variables 7

probability distributions 10–12
variance components 188–90, 216–18,

247, 249
variance inflation factor 128
variance–covariance matrix 402–3
variances 10, 16

confidence intervals 22–3, 189
homogeneity assumption

factorial ANOVA models 249
linear models 63–4
linear regression models 93
nested ANOVA models 218
partly nested ANOVA models 318
randomized complete block

ANOVA models 280–2
single factor ANOVA models 193

verbal models 2

Wald statistic 363–4, 367
Weibull distribution 11
weighted least squares 99–100, 142
Wilcox modification of

Johnson–Neyman procedure

Wilcoxon signed-rank test 47
Wilk’s lambda 430–1
window width, smoothing functions

Winsorized mean 15

X random regression 100–4, 142–3

z distribution 10–13, 19
z scores 68
zero values 63, 69


Similer Documents