«EPIDEMIOLOGY AND BIOSTASTICS: AN INTRODUCTION TO CLINICAL RESEARCH Bryan Kestenbaum, MD MS University of Washington, Seattle TABLE OF CONTENTS. ...»
EPIDEMIOLOGY AND BIOSTASTICS: AN INTRODUCTION TO CLINICAL
Bryan Kestenbaum, MD MS
University of Washington, Seattle
TABLE OF CONTENTS.
Chapter Title Page
1 Measures of disease frequency 1
2 General considerations in clinical research design 9
3 Case reports and case series 18 4 Cross-sectional studies 20 5 Cohort studies 22 6 Case-control studies 32 7 Randomized trials 42 8 Misclassification 54 9 Confounding 66 10 Control of confounding 75 11 Effect modification 84 12 Screening 90 13 Diagnostic testing 107
BIOSTATISTICS14 Summary measures (Abigail Shoben) 116 15 Introduction to statistical inference 124 16 Hypothesis testing 131 17 Interpreting hypothesis tests (Abigail Shoben) 140 18 Linear regression 145 19 Non-linear regression models 161 20 Survival analysis 166 EDITORS Kathryn L Adeney MD MPH, Department of Epidemiology, University of Washington, Seattle Noel S. Weiss MD Dr.PH, Department of Epidemiology, University of Washington, Seattle Abigail B. Shoben, MS, Department of Biostatistics, University of Washington, Seattle PREFACE This textbook was born from a disparate collection of written materials that were created to teach Epidemiology and Biostatistics to second year medical students at the University of Washington.
These materials included handouts, practice problems, guides to reading research articles, quizzes, notes from student help sessions, and student emails. The primary goal of these written materials, and now this book, is to recreate the perspective of learning Epidemiology and Biostatistics for the first time. With critical editing assistance from Epidemiology faculty, graduate students in Epidemiology and Biostatistics, and the students themselves, I have tried to preserve the innate logic and connectedness of clinical research methods and to demonstrate their application.
The textbook is designed to provide students with the tools necessary to form their own informed conclusions from the clinical research literature. More than ever, a clear understanding of the fundamental aspects of Epidemiology and Biostatistics is needed to successfully navigate the increasingly complex methods utilized by modern clinical research studies.
This book could not have been created without the dedicated help of the editors, the teaching assistants, and the students, who asked the important questions. I would especially like to thank my family who patiently allowed me so much time to write.
1. Measures of disease frequency:
a. Clarify the significance of a particular health problem b. Help guide resource allocation c. Provide basic insight into the pathogenesis of disease
2. Point prevalence describes the amount of disease at a particular point in time.
3. Incidence describes the number of new cases of disease that develop over time.
4. Incidence may be expressed as incidence proportion or incidence rate.
5. Incidence rate accounts for follow-up time
6. Measures of disease frequency can be stratified, or broken up, by person, place, or time characteristics to gain insight into a disease process.
In December 1998, a 55-year old woman presented to her local emergency department complaining of profound weakness and difficulty walking. She first noticed pain and weakness in her shoulders about 7 days earlier. The weakness progressed to involve her thigh muscles; she then developed nausea and noticed that her urine appeared dark. Over the next 48 hours, her weakness further intensified and she became unable to stand on her own power.
Her previous medical conditions included high blood pressure, asthma, and high serum cholesterol levels. Her father had died at an early age from heart disease. She did not smoke cigarettes and rarely drank alcohol. Her regular medications included aspirin, diltiazem, and cerivastatin. She first started taking cerivastatin 2 weeks earlier to treat high cholesterol levels.
She appeared ill. There was no fever, the blood pressure was 140/95 mm Hg, and the pulse was 48 beats per minute. She was unable to raise her hips or her shoulders against gravity and her quadriceps muscles were diffusely tender. The rest of her physical examination, including neurologic function, was normal.
The urine was dark amber in color. Laboratory testing revealed a serum creatinine level of 8.9 mg/dl, indicating severe kidney failure, and a serum potassium level of 7.6 mEq/l (normal level is 2.5 – 4.5 mEq/l). She was admitted to the hospital for emergent dialysis.
Further diagnostic testing revealed a serum level of creatine kinase, an enzyme that normally exists inside of muscle tissue, of 178,000 Units per liter (normal level is 200 Units per liter).
The patient was diagnosed with acute rhabdomyolysis, a condition characterized by severe, systemic muscle breakdown with release of muscle contents into the blood. One of these muscle components, myoglobin, is toxic to the kidney and causes kidney failure.
Cerivastatin (Baycol), a synthetic inhibitor of 3-hydroxy-3-methylglutaryl-coenzyme-A reductase, belongs to a class of cholesterol lowering medications called “statins.” The drug was approved on the basis of lowering serum cholesterol levels in 1997. At the time of this patient’s presentation, no rhabdomyolysis cases associated with cerivastatin use had been reported in the literature. Could cerivastatin be causing this patient’s rare and potentially fatal condition?
Epidemiology is concerned with investigating the cause of disease. In this example, there are some reasons to suspect that cerivastatin might be causing rhabdomyolysis. The disease developed soon after initiation of cerivastatin. Similar cholesterol-lowering medications can also damage muscle tissue, though the severe rhabdomyolysis seen in this case would be rare.
During the first 100 days following approval, the cerivastatin manufacturer received 7 case reports of rhabdomyolysis among people using cerivastatin. Should these 7 cases be cause for concern? It is difficult to answer this question based on the case report data alone. The next step is to estimate the frequency of rhabdomyolysis in cerivastatin users. According to company sales data, there were 3100 cerivastatin prescriptions dispensed during the first 100 days of drug approval. Based on these data, the estimated frequency of rhabdomyolysis in cerivastatin users is 7 / 3100, or 0.2%.
This disease frequency seems relatively small, but rhabdomyolysis is a rare and potentially fatal condition. The next step is to compare the frequency of rhabdomyolysis among cerivastatin users to that of an appropriate control population. One possible control population might be people who were using similar cholesterol lowering medications. In previous clinical trials, a total of 33,683 people had been assigned to a statin medication other than cerivastatin; 8 of these people developed rhabdomyolysis. Based on these trial data, the estimated frequency of rhabdomyolysis among people using other statin drugs is 8 / 33683, or 0.02%.
While these findings may be partially distorted, due to differences in the compared populations, the observed 10-fold greater frequency of rhabdomyolysis among cerivastatin users is concerning.
Two years after cerivastatin was approved, the manufacturer conducted an internal investigation of rhabdomyolysis rates. They found cerivastatin use to be associated with a 20-fold greater risk of rhabdomyolysis compared with other approved statin medications. Their findings were not reported or published. Case reports of rhabdomyolysis associated with cerivastatin use began to surface in the medical literature in 2000-2001.1 By August of 2001, there were 31 fatal cases of rhabdomyolysis attributed to cerivastatin. At this time the company voluntarily removed the drug from the market.2
I. IMPORTANCE OF MEASURES OF DISEASE FREQUENCYThe cerivastatin example demonstrates that measures of disease frequency represent key initial information needed to investigate the cause of disease. While it may be tempting to dive in and conduct novel discovery studies or high profile clinical trials, an important first question about a disease process is, “how frequently does the disease occur?” Measures of disease frequency can
help answer several important questions:
1. Measures of disease frequency can provide big picture information about a disease, framing public health questions and guiding resource allocation. For example, years after the invention of chronic dialysis for kidney failure, researchers observed that rates of cardiovascular death among dialysis patients were approximately 30-fold greater than those of the general population.3 These disease frequency data lead to a dramatic increase in funding for research of links between kidney and cardiovascular diseases.
2. Measures of disease frequency describe the absolute risk of a disease. For example, many studies have reported that smoking causes a more than ten-fold increase in the relative risk of lung cancer. Rate data reveals that the cumulative lifetime risk of lung cancer for a person who smokes is approximately 18%. The rate data are important for counseling patients and for understanding the impact of the disease on the population.
3. Measures of disease frequency can be categorized, or stratified, by person, place, and/or time characteristics to gain insight into the pathogenesis (mechanism) of disease. For example, rates of multiple sclerosis, an autoimmune disease that affects the central nervous system, vary considerably by geographic region within the United States. Areas with the lowest sunlight exposure have the highest incidence of multiple sclerosis. These rate data lead some researchers to investigate whether vitamin D deficiency might participate in the pathogenesis of multiple sclerosis.4 Vitamin D is obtained from sunlight exposure, and can suppress inflammation and T-cell function.
The two most commonly used measures of disease frequency are prevalence and incidence.
II. PREVALENCE Point prevalence measures the amount of a disease at one particular point in time. Prevalence is
defined as the proportion of people who have the disease:
Because prevalence is always a ratio of some number of people / some number of people, prevalence estimates are often multiplied by 100% and expressed as %.
What is the prevalence of anxiety disorder among 2nd year medical students?
Solution: Administer a standardized test for anxiety disorder to 200 2nd year medical students;
find that 12 meet the definition of anxiety disorder. Prevalence = 12 / 200 x 100% = 6%.
In the medical literature, the term “prevalent” is also used to indicate a previous history of a chronic disease. For example, “prevalent diabetes” and “prevalent coronary disease” may be used in clinical research studies to indicate previous or current diagnoses of these conditions, because they are rarely cured and considered to be present indefinitely after diagnosis. In contrast, a previous history of a short-lived disease, such as influenza, would not be considered to represent prevalent disease, unless that condition was found to exist at the time of measurement.
Prevalence measures help to describe the current burden of a disease in a population in order to facilitate planning and resource allocation. For example, if the prevalence of anxiety disorder was truly 6% among 2nd year medical students, the medical school might consider implementing specific counseling programs for students with this disorder. Analogously, if the prevalence of diabetes is found to be 40% among patients in a particular chronic kidney disease clinic, then that clinic might implement routine blood glucose monitoring.
III. INCIDENCE Incidence is a measure of the number of new cases of disease that develop over time. There are
two definitions of incidence, differing only by the choice of the denominator:
Another term for incidence rate is incidence density.
What is the incidence of influenza infection among UW medical students during a three-month period from January through March 2002?
Solution: Suppose that there are 500 UW medical students beginning in January 2000, and 5 new cases of influenza develop from January through March (three months of follow-up).
Incidence proportion = 5 cases / 500 people x 100% = 1%, or 1 per 100 people
Incidence rates are typically reported as the number of cases of disease per some rounded measurement of time at risk, such as 1,000 or 100,000 person-years. The inclusion of time at-risk in the denominator of incidence rate provides a more precise description of incidence than incidence proportion, particularly if study subjects contribute different amounts of time at risk to a study. For the influenza example, suppose that some of the medical students in the study are assigned to a distant clinical rotation for part of the study period, and cannot report influenza to a research study center during that time. Because the study could not detect the development of influenza for these away months, time at-risk should be adjusted to consider only months in
which the disease could be captured. Table 1.1 presents data for the first 6 students in the study:
The total time at risk contributed by these 6 students is 12 months. If two cases of influenza developed in these 6 students, then the incidence rate of influenza for these 6 students would be 2 cases / 12 person-months = 16.7 cases per 100 person-months.
The calculation of incidence rate from person-time data may be best appreciated using a diagram representing time at risk and disease status for each individual in a study, as shown in Figure 1.1.