Q: How to ensure you get the correct sample size?
A: Leverage power analysis to determine sample size.
One of the pivotal aspects of planning an experiment is the calculation of the sample size. It is naturally neither practical nor feasible to study the whole population in any study. Hence, a set of users is selected from the population, which is less in number (size) but adequately represents the population from which it is drawn so that true inferences about the population can be made from the results obtained. This set of individuals is known as the “sample" int DOE (design of experiment).
It is a basic statistical principle with which we define the sample size before we start a study so as to avoid bias in interpreting results. If we include very few users in a study, the results cannot be generalized to the population as this sample will not represent the size of the target population. Further, the study then may not be able to detect the difference between test groups, making the study unethical. On the other hand, if we study more subjects than required, we put more users than needed, also making the study wasting precious resources, including the researchers’ time.
Generally, the sample size for any study depends on the:
- Acceptable level of significance
- Power of the study
- Expected effect size
- Expected variance
Level of Significance
This is the pre-defined threshold value to reject the null hypothesis. Usually p<0.05 will be taken as statistically significant to accept that the result is observed due to chance. To put in different words, it is possible to accept the detection of a difference 5 out of 100 times when actually no difference exists (i.e., get a false positive result). It is denoted by letter α(alpha) as Type 1 error.
Power
Exactly conversely, type II of error indicate failing to detect a difference when actually there is a difference (i.e., false negative result). The false negative rate is the proportion of positive instances that were erroneously reported as negative and is referred to in statistics by the letter β(beta). The power of the study then is equal to (1 –β) and is the probability of failing to detect a difference when actually there is a difference. The power of a study increases as the chances of committing a Type II error decrease. Usually most studies accept a power of 80%. This means that we are accepting that one in five times (that is 20%) we will miss a real difference.
Expected effect size
The effect size is the minimum deviation from the null hypothesis that the hypothesis test set up to detect. Suppose you are comparing different responses between treatment group and control group, and the measuring metrics are group means denoted as μ1 and μ2, the expected effect size equals to abs( μ1- μ2). Generally speaking, the smaller the difference you want to detect, the larger the required sample size.
Expected variance
The variance describes the variability/noise among measurement units. As variance gets larger, it gets harder to detect a significant difference, so the bigger sample size requires. The pilot study or similar study could help to determine the expected variance.
Reference
- Design and Analysis of Experiments. by Montgomery, D.C.X (Hoboken, New Jersey: John Wiley & Sons, Inc., 2000)
proc power;
oneSampleFreq
/* hypotheses of interest */
nullProportion=0.1
proportion=0.2
sides=1
/* decision rule */
alpha=0.3
/* sample size */
nTotal=10 20 25 30 40 50 100
power=.
;
run;
*normal approximation;
proc power;
oneSampleFreq
/* hypotheses of interest */
nullProportion=0.1
proportion=0.2
sides=1
/* decision rule */
alpha=0.3
/* solve for sample size */
nTotal=.
power=.80
/* different distributional assumptions */
test=z
method=normal
;
run;
* Power with Two Samples;
** SAS
proc power;
twoSampleFreq
/* hypotheses of interest */
refProportion=0.01
proportionDiff=0.001/*first factor difference*/ .0025/*second factor difference*/
sides=1 2
/* decision rule */
alpha=0.05
/* sample size */
nTotal=.
power=.6 .99
;
plot y=power
yopts=(crossref=YES ref=.8)
vary(color by proportionDiff,
symbol by sides);
run;
proc power;
twoSampleFreq
/* hypotheses of interest */
refProportion=0.1
proportiondiff=0.1
sides=1
/* decision rule */
alpha=0.05
/* sample size */
nTotal=.
power=.6 .8 .99
/* how balanced the test is */
groupWeights=(1 1) (10 15) (1 2) (1 3) (1 10)
;
plot;
run;
** R
install.packages("pwr")
require(pwr)
delta <- 20
sigma <- 60
d <- delta/sigma
pwr.t.test(d=d, sig.level=.05, power = .80, type = 'two.sample')
oneSampleFreq
/* hypotheses of interest */
nullProportion=0.1
proportion=0.2
sides=1
/* decision rule */
alpha=0.3
/* sample size */
nTotal=10 20 25 30 40 50 100
power=.
;
run;
*normal approximation;
proc power;
oneSampleFreq
/* hypotheses of interest */
nullProportion=0.1
proportion=0.2
sides=1
/* decision rule */
alpha=0.3
/* solve for sample size */
nTotal=.
power=.80
/* different distributional assumptions */
test=z
method=normal
;
run;
* Power with Two Samples;
** SAS
proc power;
twoSampleFreq
/* hypotheses of interest */
refProportion=0.01
proportionDiff=0.001/*first factor difference*/ .0025/*second factor difference*/
sides=1 2
/* decision rule */
alpha=0.05
/* sample size */
nTotal=.
power=.6 .99
;
plot y=power
yopts=(crossref=YES ref=.8)
vary(color by proportionDiff,
symbol by sides);
run;
proc power;
twoSampleFreq
/* hypotheses of interest */
refProportion=0.1
proportiondiff=0.1
sides=1
/* decision rule */
alpha=0.05
/* sample size */
nTotal=.
power=.6 .8 .99
/* how balanced the test is */
groupWeights=(1 1) (10 15) (1 2) (1 3) (1 10)
;
plot;
run;
** R
install.packages("pwr")
require(pwr)
delta <- 20
sigma <- 60
d <- delta/sigma
pwr.t.test(d=d, sig.level=.05, power = .80, type = 'two.sample')
No comments:
Post a Comment