Determining Sample SizeHow many patients dowe need in our study?Term 2, 2006Advanced Methodsin Biostatistics, II1

GOALS Review of the inputs for determining sample size Compare sample sizes for Parallel, Crossover andFactorial designs Increase understanding of the impact of assumptionsand practical constraints Show the effect on sample size of uncertainty in theinputs and outline approaches to deal withuncertaintyTerm 2, 2006Advanced Methodsin Biostatistics, II2

Factors that affect sample size Study design– parallel, crossover, factorial, nesting structuresNature of the endpoint (measured, event, event time)– for events, “n” might be “# of events”Analysis planMonitoring planInherent variability of response and co-variability of responsesAdherence and dropoutsGoals: test size&power; CI length, pr(correct decision).Balancing biologically reasonable assumptions, time, effort,expense and other practical constraints .Generally, you need to consider a rangeof sample size values linked to assumptionsTerm 2, 2006Advanced Methodsin Biostatistics, II3

Focused factors forHypothesis Testing sample size(in the context of the general list)Goal is to: have sufficient power to choose betweentwo simple hypotheses Variability of the “building block” response (σ2) Type I error (α), significant level Type II error (β), power 1- β Size of minimal difference considered important (Δ)Term 2, 2006Advanced Methodsin Biostatistics, II4

A review of hypothesis testtype I and II errors and powerConclusion fromthe analysisReject HoFail to reject HoTerm 2, 2006“Truth”Ho trueHa trueType I errorCorrect conclusion(α )(Power 1-β)Correct conclusionType II error(β )Advanced Methodsin Biostatistics, II5

Hypothesis testing sample size(single group, or paired differences)n σ 2 (Zα / 2 Zβ )2Δ22 σ (Z α / 2 Zβ )2 Δ Required sample size depends inversely on the square ofthe effect sizeEffect size Δ (sometimes Δ σ isreferred to asthe effect size)Decreasing it by a factor of 2 increases n by a factor of 4Term 2, 2006Advanced Methodsin Biostatistics, II6

Sample size formula fora two group comparisonWhat causes sample size? Variability Type I error Type II error ÙPower Δ Term 2, 2006Advanced Methodsin Biostatistics, II7

Example: Prophylaxis for Toxoplasmosis* Primary endpoint: Toxoplasmic encephalitis (TE)Control (placebo) group event rate: 30% in 2.5 yrsTreatment (pyrimethamine) group event rate: 15% in 2.5 yrsÎ treatment effect: Δ 15 percentage point rate reduction Death rate for causes unrelated to TE: 33% in 2.5 years Dropouts Adherence Type I rate and power:α 0.05 (2-sided)1-β 0.80 .These assumptions/guesses were based on very little information* Jacobson M, Besch C, Child C, et al. Eur J Clin Microbiol Infect Dis., 10: 195-8, 1991Term 2, 2006Advanced Methodsin Biostatistics, II8

Influence of Effect Size ΔTOXO Sample Size With α 0.05 (2-sided) and power: 1-β 0.80Event Rate (%)Δ 0102067130301515502653017124240030201033650Term 2, 2006Advanced Methodsin Biostatistics, II9

A few sample size formulas,(there are thousands of these!)Δμ 2 μ1Term 2, 2006L2Var(estimate n1,n2,.,design, 2analysis.)4Zα/Advanced Methodsin Biostatistics, II10

Term 2, 2006Advanced Methodsin Biostatistics, II11

Focused factors forConfidence Interval length sample size Most items on the “factors” slide Variance of a single observation (σ) The maximal CI length (L)– some prefer to use the “margin of error,”the half-length: d L/2 The coverage probability (1-α) for a a two-sidedinterval– you can do a one-sided intervalTerm 2, 2006Advanced Methodsin Biostatistics, II12

CI length sample size σ n 4Z α / 2 L 22 Required sample size is inversely related tothe square of the maximal length Decreasing “L” by a factor of 3 increases n by a factor of 9Term 2, 2006Advanced Methodsin Biostatistics, II13

USA TODAY/CNN/Gallup Poll results*Poll results are based on telephone interviewswith “National adults” conducted April 1-2, 2005Q26. As a result of the recent rise in gas prices, would you say you have or have notdone each of the following?2005 APR 1-2 (sorted by Yes, have)Seriously considered getting more fuel-efficient car the nexttime you buy a vehicleYes, haveNo, have not5742For results based on the total sample of “National Adults,” one can say with95% confidence that the margin of sampling error is 3 percentagepoints.What does this mean?If you are conducting a survey requiring the sameprecision, how many people do you need to interview?* 5-04-03-poll.htmTerm 2, 2006Advanced Methodsin Biostatistics, II14

Sample size for estimating a proportion p Estimate p with 3% margin of error (d 0.03, L .06)Term 2, 2006Advanced Methodsin Biostatistics, II15

General CI length sample sizewhen the Normal distribution is a good approximation Estimate the effect of interest using “estimate”- Difference in means- Regression slope Use this estimate for a CI by computingCI estimate 1.96 se(estimate) Find a sample size (and other features) so that:L2Var(estimate n1,n2 , ., design, analysis, .) 4Z 2α/2“Var” depends on all aspects of the design and analysisTerm 2, 2006Advanced Methodsin Biostatistics, II16

Sample size for a rare outcomeor for survival analysis In these situations, the number of events needs to be sufficientlylarge, not simply “n”For example, if we need a total of 50 events, with event probability“p” we need n 50/p to, on average, generate the required numberof events:RequiredEvent probability“Sample Size”1/105001/1005,0001/100050,000These sample sizes give the required expected number of events,but do not guarantee that the required number will occur(more on this later)Term 2, 2006Advanced Methodsin Biostatistics, II17

Sample size for a rare outcomeor for survival analysis(Optional, technical details)Term 2, 2006Advanced Methodsin Biostatistics, II18

Sample size for a rare outcomeor for survival analysis(Optional technical details, continued)Term 2, 2006Advanced Methodsin Biostatistics, II19

Sample Size in a Factorial DesignTerm 2, 2006Advanced Methodsin Biostatistics, II20

Example of the efficiency of a factorial designAspirin, sulfinpyrazone, or both in unstable angina.Results of a Canadian multicenter trial* A randomized trial of 555 patients, hospitalized incoronary care units with unstable angina Primary outcome was cardiac death or nonfatalmyocardial infarction Patients received one of the four treatmentcombinations: aspirin, sulfinpyrazone, both or neither– Aspirin was included only when the studystatistician, at the last minute, promoted thefactorial design* Cairns, J. et al., N Engl J Med. 1985;313:1369-75Term 2, 2006Advanced Methodsin Biostatistics, II21

Sample Size for a Factorial DesignResults from the Canadian Aspirin StudyNumber of cardiac deaths and nonfatal MIsin two years (number of patients) PlaceboASATotalPlacebo18 (139)8 (139)26 (278)Sulfinpyrazone18 (140)9 (137)27 (277)Total36 (279)17 (276)17 (555)Suppose we are designing a parallel study to detect a 50%reduction in the primary outcome with α 0.05 and (1-β) 0.8Assume p1 15% (observed rate 18/139 13%)Δ p1 - p2 15% - 7.5% 7.5%A parallel design requires 277 patients for each group; a total of554 patients to evaluate a single treatmentThe factorial design delivers “2 for the price of 1”– Assuming assumptions are satisfied!!!Term 2, 2006Advanced Methodsin Biostatistics, II22

Sample Size in a Crossover DesignA crossover study comparing bronchodilatorsPeak expiratory flow (L/min) 8 hours after 661648VarianceTerm 2, 2006Advanced Methodsin Biostatistics, II23

Sample Size in Crossover Design (cont’d) The variance of the estimated treatment effect in a paralleldesign is 6 times larger than that in a crossover study– Parallel study needs n to be 6 times larger– Parallel study needs 12.6 times more total patients to detectthe same effect with the same size and powerBut, the crossover design needs two measurements for eachpatient, whereas parallel design needs only oneTerm 2, 2006Advanced Methodsin Biostatistics, II24

Factors that influence VariabilityBiological variability Depends on the target population Depends on the choice of outcomes– Outcome may have high variability even within ahomogenous target population– Number of primary events (duration of follow-up &competing events) Missing data (Losses to follow-up) Measurement variability– All factors identified in the measurement module– Measurement errorTerm 2, 2006Advanced Methodsin Biostatistics, II25

Choice of αOne-sided vs two sided test Often α is specified at 0.05– For a two-sided test, zα/2 1.96– For a one-sided test, zα 1.64 If we conduct a one-sided test,– for the same sample size, power increases– the required sample size to attain the same powerdecreases If we are only interested in the positive treatment effect,why waste α on the “other side?”Term 2, 2006Advanced Methodsin Biostatistics, II26

Why two-sided? Protects the study against the unexpected Even if you aren’t interested in the “other side” anddesign for a one-sided test (or CI), there is a powerpenalty in that someone wanting to conduct a twosided analysis will have reduced power– They would have needed a larger study tomaintain an overall 0.05 and desired power, butare stuck with what you give themReality test: Can you honestly say that if results arestrongly in the “other direction” you are going toignore them and not report them?Term 2, 2006Advanced Methodsin Biostatistics, II27

Cardiac Arrhythmia Suppression Trial (CAST) CAST demonstrated that ventricular arrhythmia suppression is afailed surrogate for death from arrhythmiaBUT Many thought it was unethical to do the study and “subject”participants to the placebo There was strong pressure to design CAST as a one-tailed test The DSMB statistician argued for two-sided, but eventuallysettled for one-sided at α 0.025 “Lets require very strong evidence” of the beneficial effect– not initially designed to report that the treatment can causeharm Of course, the other 0.025 was held in reserve to have someavailable for the harm conclusion Results showed that patients treated with active drug had ahigher rate of death from arrhythmias than those taking placeboTerm 2, 2006Advanced Methodsin Biostatistics, II28

Considerations in Specifying theTreatment Effect (Δ) Smallest difference of clinical significance/importance Stage of research Realistic estimates based on:– Previous research– Expected event rate– Expected non-compliance rate– Expected switchover rate “Instrumentation” variability Implications on sample size .Term 2, 2006Advanced Methodsin Biostatistics, II29

Factors which influence “Realized Δ” Due to the squared effect of Δ on sample size, it isimportant to control in the design and/or incorporatein the sample size assessment:– Measurement variance– Non-compliance– Switchover from the assigned treatment regime– Lag time for the treatment effect– .Identify factors you can control and control themDesign to deal with those you can’t controlTerm 2, 2006Advanced Methodsin Biostatistics, II30

Choice of (1-β):Power of detecting a difference Power of a test is the probability of detecting a true,underlying difference– In practice, one that is both worth detecting, butbiologically reasonable α is usually set to 0.05, Power (1-β) is often set to 0.8 Sample size needs to be selected to ensure thedesired power for the scientifically relevant differenceof interestTerm 2, 2006Advanced Methodsin Biostatistics, II31

Influence of 1-βTOXO Sample Size With α 0.05 (2-sided)Event Rate (%)Placebo Trt30301520Term 2, 2006Δ (%)1510PercentReduction5033.3Advanced Methodsin Biostatistics, IIPower(1–β)Sample Sizeper group0.901610.801200.70950.903920.802930.7023132

Power and Effect size Trade-off Power depends on the alternative hypothesis definedby the effect size Δ When sample size n is limited, for any fixed α we cantrade-off Power and Effect Size– This trade-off is represented by the OperatingCharacteristic curve We can always increase the effect size to reach acertain power, but it is very important to check if theassumed effect size is biologically plausibleTerm 2, 2006Advanced Methodsin Biostatistics, II33

Power and Effect size trade-off (cont’d)HoHa67% reduction0%20 %HoHa50% reduction0%Δ15 %HoHa33% reduction0%Term 2, 200610 %Advanced Methodsin Biostatistics, II34

α0.βPower 1-β0.81.0Operating Characteristic curve0%5%10%15%20%25%30%Size Δ(Δ%)EffectEffectsizeTerm 2, 2006Advanced Methodsin Biostatistics, II35

Sometimes (frequently) we cannot getthe “required” sample size The total number of available patients may be limitedby factors such as money, size of target populationand accrual time If the sample size is limited, we can try to reduce σ(more precise measurements), change Δ, adjustpower But, need to be realistic! In situations where conducting a high power test isimpossible, it may still be worthwhile to conduct ahigh-quality study as an input to a meta-analysis, oras a pilot, phase II study But, .Term 2, 2006Advanced Methodsin Biostatistics, II36

Sample Size andStatistical Significance A large sample size can produce statistical significancewhen Δ is very small (not of practical, clinical or publichealth importance) A small sample size may fail to detect a difference,even when Δ is of practical, clinical or public healthimportance The literature is populated by false positives and underrepresents the potentially true positivesFreiman et al., “The importance of beta, the type II error and samplesize in the design and interpretation of the randomized control trial.Survey of 71 "negative" trials.” NEJM 1978;299:690-4Term 2, 2006Advanced Methodsin Biostatistics, II37

Things to considerin determining sample size(testing template)StudydesignTypes ysis planPopulationPowerAccrualtime# Eligiblepatients Term 2, 2006Type I errorClinicaljudgmentLag timeBiologicplausibilityAdvanced Methodsin Biostatistics, IINon-compliance“crossovers”38

Computer programs and approaches Sample size for a desired type I error, power andeffect size Power for a given sample size, type I error and effectsize Minimal detectable effect size for a given samplesize, type I error and power Similar for CIsTerm 2, 2006Advanced Methodsin Biostatistics, II39

Summary Sample size (or the monitoring plan) should bespecified before conducting the study Inputs for the sample size calculation should bebased on results from other studies or reasonablejudgment Interim analysis provides an opportunity to checkthese assumptions and adapt Study design, the statistical analysis plan, missingdata, . all play an important role in determining thesample sizeDo a comprehensive assessment and be realisticTerm 2, 2006Advanced Methodsin Biostatistics, II40

Bayesian Experimental Design We are all Bayesians in the design phase We use previous information and our opinions todetermine goals and inputs Generally, there is considerable uncertainty regardingthe inputs So, use distributions and produce effective designs Bayesian design for either Bayesian or frequentist(traditional) goalsTerm 2, 2006Advanced Methodsin Biostatistics, II41

Bringing in uncertaintyin determining Sample Size(to control CI length)1. When you know the formula and can do the math2. When you know the formula, but can’t do the math3. When you don’t know the formula or there isn’t aformulaTerm 2, 2006Advanced Methodsin Biostatistics, II42

When you know the formulaand can do the math σ n 4Z α / 2 L 22You pick theseTerm 2, 2006Advanced Methodsin Biostatistics, II43

What if you don’t know σ2 Using some background information on σ2 , do some“what ifs” or pick a conservative value– e.g., assume background data indicate that σ2 hasa log-normal distribution, with mean “avg” andvariance C2 (avg)2– C is the coefficient of variation To control expected CI length, use avg n 4Z α / 2 2 L 2 This is just like using a “best guess” for σ2Term 2, 2006Advanced Methodsin Biostatistics, II44

“On average” length control canleave you with a too-wide CI So, consider controlling the probability that the CI istoo wide That is, find an “n” so that using thelog-normal distribution,pr(CI length L n) γ ( .10, for example) With this sample size we have only a 10% chance ofobtaining a too-wide interval Could do a sequential study, continuing until L fallsbelow the maximal length– This guarantees control, but pays by not knowingthe sample size up frontTerm 2, 2006Advanced Methodsin Biostatistics, II45 2, 2006Advanced Methodsin Biostatistics, II46 2, 2006Advanced Methodsin Biostatistics, II47

Number of simulationreplicationsLTerm 2, 2006Advanced Methodsin Biostatistics, II48

Number of simulationreplicationsLTerm 2, 2006Advanced Methodsin Biostatistics, II49

For hypothesis testing:In the region of “usual” powers,the average power is less than thepower of the averageTerm 2, 2006Advanced Methodsin Biostatistics, II50

A baseline/follow-up designRandomize (or “assign” to treatment)Baselinet 0Term 2, 2006Follow-upt 1Advanced Methodsin Biostatistics, II51

Sample size for a baseline/follow-up designYijt measurement for person i in group j (j 1 or 2)at time t ( t 0 or 1)Gj 0, if group 1; Gj 1, if group 2ModelYijt μ αt γGj βtGj eijt cor(eij0, eij1) ρ α is the time trend; γ is the group main effect β is the treatment effect- The treatment by time interactionTerm 2, 2006Advanced Methodsin Biostatistics, II52

The transparent analysis(a difference of differences)(Yij1 – Yij0) α βGj (eij1 – eij0)With indicating averagingβ (Y 21 – Y 20 ) – (Y 11 – Y 10 ) β residuals The parameter γ disappears With n per treatment group: Var( β ) Term 2, 20064σ 2 (1 ρ)nAdvanced Methodsin Biostatistics, II53

The “optimal” analysis(a difference of adjusted differences)Use (Yij1 – ρYij0) as the building blockWith indicating averaging β (Y 21 – ρY 20 ) – (Y 11 – ρY 10 ) (1- ρ) γ β residualsThe parameter γ does not disappearSo, this is a biased estimate unless γ 0– i.e., need a randomized study so thatgroups are comparable at baseline With n per treatment group: Var( β ) Term 2, 20062σ 2 (1 ρ)n2 4σ 2 (1 ρ)nAdvanced Methodsin Biostatistics, II54

Applications & Comments The optimal analysis is dangerous unless the study israndomized so that the group effect is 0 at baseline So, need the “transparent” (FU – BL) analysis for anon-randomized study or a crossover study– The non-randomized is still dangerous due toregression to the mean– The crossover study must use the transparentanalysis because, though randomized, it is to thesequences AB and BA, so not comparable atbaseline If ρ 0, the optimal analysis compares the follow-upmeasurements only and so is really a “two group”comparisonTerm 2, 2006Advanced Methodsin Biostatistics, II55

Example for a weight loss study Goal is to reduce BMI among overweight/obeseindividuals Baseline BMI: mean 33.1; σ 6.0 Correlation between baseline and 6 month follow-up:ρ .90 or .95 Effect size of interest: Δ 1.0 is important andpossible Let’s try some sample sizes, σ and ρ values and seewhat Δs emerge as “detectable” with acceptablepowerTerm 2, 2006Advanced Methodsin Biostatistics, II56

Demonstration with the “R” programTerm 2, 2006Advanced Methodsin Biostatistics, II57

Bonus SlidesTerm 2, 2006Advanced Methodsin Biostatistics, II58

Five steps to obtainthe sample size formulaγ is a generic quantity of interestTerm 2, 2006Advanced Methodsin Biostatistics, II59

1. Create Two simple hypothesisHo:Term 2, 2006 0Ha:Advanced Methodsin Biostatistics, II Δ60

2. Find an estimator &calculate its propertiesHo: 0Ha: ΔUnbiased EstimatorTerm 2, 2006Advanced Methodsin Biostatistics, II61

3. Create a Test Statistic &find its distributionHo: 0Ha: ΔTest StatisticTerm 2, 2006Advanced Methodsin Biostatistics, II62

4. Setup the rejection ruleTerm 2, 2006Advanced Methodsin Biostatistics, II63

5. Calculate the power of the testStandardizationTerm 2, 2006Advanced Methodsin Biostatistics, II64

Use the standard error formula from step 2 tocalculate the required sample size nHa: m ΔHo: m 0Unbiased EstimatorTerm 2, 2006Advanced Methodsin Biostatistics, II65

Sample size for estimating Δ Goal: Estimate Δ with a certain precision (“narrowness ofCI”) Let Standardization (assuming Normality – usually by CLT): 95% C.I. for g is:Term 2, 2006be an unbiased estimator: i.e.Advanced Methodsin Biostatistics, II66

Sample size for estimating(cont’d) The 95% C.I. is: To control the half width (d) of the CI, it requires a sample size nsuch that Where is the sample size in the formula?– Inside! E.g.Term 2, 2006andAdvanced Methodsin Biostatistics, II67

Sample Size required to detect an effect size Δbetween two proportionsHa: m ΔHo: m 0Unbiased EstimatorTerm 2, 2006Advanced Methodsin Biostatistics, II68

Term 2, 2006 Advanced Methods in Biostatistics, II 16 General CI length sample size when the Nor