«EPIDEMIOLOGY AND BIOSTASTICS: AN INTRODUCTION TO CLINICAL RESEARCH Bryan Kestenbaum, MD MS University of Washington, Seattle TABLE OF CONTENTS. ...»
In this diagram, students one and three are followed for the full three-month study period and do not develop influenza. Student two is also followed for the full three-month period, but develops influenza at the end of the study. Student four develops influenza after only two months of follow-up. Since a study subject is longer at risk for developing incident (new) disease once the disease has occurred, we would consider the total time at risk for student four to be 2 months.
Students five and six do not develop the disease and contribute approximately 1.25 and 0.25 months of time at-risk, respectively.
The importance of counting time at risk is highlighted by examples in which follow-up time differs between comparison groups. For example, a study compares rates of cellulitis, a common skin infection, between children seen in primary care clinics at a county and a university hospital.
Investigators study 500 children from each site and follow them for up to 5 years. Results using
incidence proportion data indicate that cellulitis is more common at the university hospital:
However, these raw rate data do not include time at risk. It is possible that children from the county hospital are lost to follow-up or dropout of the study more frequently than those from the university hospital. If this were the case, then the incidence proportion data would be misleading.
Examining the same data using incidence rates reveal a different result:
The incidence rate of cellulitis is actually higher at the county hospital after accounting for person time at risk.
Incidence measures help to provide clues as to the cause or development of a disease. For the anxiety disorder example, suppose that the incidence proportion of anxiety disorder was 5% per year among medical students. This incidence measure would consider only new cases of anxiety disorder that developed during medical school; students with prevalent anxiety disorder at the beginning of medical school would not be counted. These incidence data suggest that certain aspects of medical school might contribute to anxiety disorder, prompting a more thorough search for possible causal factors. In contrast, the prevalence data for anxiety disorder alerted to a high burden of disease in the student population, motivating implementation of treatment programs.
IV. RELATIONSHIP BETWEEN PREVALENCE AND INCIDENCEThe prevalence of a disease is a function of how often new cases develop and how long the disease state lasts. For example, the incidence of influenza may be relatively high during influenza season, however the prevalence of influenza at any point in time is likely to be low, because illness is short-lived; people either recover quickly, or in rare cases, die from the disease.
In contrast, the prevalence of diabetes is likely to be high because there is a steady incidence of new cases, and the disease, though treatable, is rarely cured.
The mathematical relationship between prevalence and incidence is P = I x D, where P is prevalence, I is incidence, and D is the duration of disease. Figure 1.2 presents a graphical depiction of the relationship between incidence and prevalence.
Relationship between incidence and prevalence of disease.
Individuals in a population will acquire a disease at some rate (incidence). They will remain with the disease until they either get well, die, or leave the population (and cannot be counted).
V. STRATIFICATION OF DISEASE FREQUENCY BY PERSON, PLACE, AND TIMEOnce we calculate measures of disease frequency, we can examine whether these measures vary by personal characteristics, geography, and/or time periods. Stratification refers to the process of separating analysis by subgroups. For example, the prevalence of diabetes among all United States adults is approximately 9.0%; the prevalence of diabetes stratified by race is 8.2% among whites and 14.9% among Native Americans.
A study was conducted to describe the epidemiology of latex allergy in healthcare workers. At the beginning of the study (baseline), 500 females and 540 males underwent skinprick testing; 40 females and 22 males tested positive for latex sensitivity. Follow-up skin tests were performed two years later, and 29 new cases of latex sensitivity were detected.
The overall prevalence of latex sensitivity among healthcare workers at baseline is 62/1040 x 100% = 6%.
The prevalence stratified by sex is 40/500 x 100% = 8% in females, and 22/540 x 100% = 4% in males. Therefore, latex sensitivity appears to be twice as common in women compared to men at baseline. To calculate incidence, the number of new cases of latex sensitivity that develop over time, we will exclude the 62 people who already had prevalent latex sensitivity at baseline.
Incidence proportion of latex allergy = 29 / (1040 - 62) x 100% = 3% Incidence rate of latex allergy = 29 / ((1040 - 62) x 2 years) = 15 cases per 1000 person-years Information about sex is not provided for the new cases of latex sensitivity, so incidence data cannot be stratified by sex in this example.
A. Disease frequency measurements stratified by characteristics of person Examples of personal characteristics include age, race/ethnicity, and sex. For example, polycythemia vera is a myeloproliferative disorder characterized by an abnormal increase in red blood cell mass. The estimated prevalence of polycythemia vera among individuals aged 35-44 is 9 cases per 100,000, whereas the estimated prevalence in people aged 75-84 is 163 cases per 100,000. Polycythemia rates are also greater in men and in people of Jewish/Eastern European ancestry. These stratified disease frequency data begin to define risk factors for the disease.
B. Disease frequency measurements stratified by characteristics of place The incidence of multiple sclerosis varies considerably by geographic region within the United States. Areas with the lowest sunlight exposure, such as Seattle, have the highest incidence of multiple sclerosis. Vitamin D is ascertained from sunlight exposure and may play an important role in suppressing autoimmunity. Circulating vitamin D levels are particularly low in regions with reduced sunlight exposure. These disease frequency data, stratified by place, suggest the hypothesis that vitamin D deficiency may play a role in the pathogenesis of multiple sclerosis.
C. Disease frequency measurements stratified by characteristics of time In 1970, approximately 5% of all births in the United States were by Cesarean section delivery.
By the year 2000, nearly 25% of U.S. babies were born by Cesarean section. These strong temporal changes in rates generate a number of hypotheses.5 One possibility is that maternal age has also increased during this time period, leading to more complicated pregnancies that may require Cesarean section. A second possibility is that improved fetal monitoring technology that can detect small changes in fetal status may prompt more surgical intervention. A third possibility is that the routine use of repeat Cesarean section has become standard practice in the United States because of data demonstrating an increased risk of uterine rupture in women who have a vaginal birth after a first Cesarean section.6 Disease frequency measurements stratified by time are often hypothesis generating, motivating further studies to uncover the true causes of a disease process.
Stratified measures of disease frequency can also be used to corroborate experimental data. For example, animal models have suggested that estrogen can slow the progression of chronic kidney disease by reducing expression of pro-inflammatory cytokines and decreasing the extent of fibrosis within the kidney.7 Can measures of disease frequency in humans be used to corroborate these provocative experimental data?
One possibility is to obtain estimates of the incidence of chronic kidney disease, stratified by sex and menopausal status, as depicted by the hypothetical data presented in Table 1.2.
Rates of chronic kidney disease according to sex and premenopausal status.
These data reveal a lower chronic kidney disease incidence rate among premenopausal women, compared with postmenopausal women and men. These disease frequency data support the hypothesis that estrogen protects against chronic kidney disease, and represent a first step toward investigation of this process in humans.
GENERAL CONSIDERATIONS IN CINICAL RESEARCH DESIGN
1. The study population refers to all people who enter a study.
2. Common exclusion criteria in clinical studies include:
a. Exclusion of people who have prevalent disease to focus on incident outcomes b. Exclusion of people who have major disease risk factors to focus on the exposure of interest.
c. Exclusion of people whose disease development may be missed in the study
3. The choice of study population influences the generalizability of study findings.
4. The exposure is a factor that may explain or predict the presence of an outcome.
5. The outcome is a factor that is being explained or predicted in the study.
6. Observational studies observe the exposure; interventional studies assign the exposure.
7. Several factors favor causal inference in epidemiology research:
a. Randomized evidence b. Strong associations c. Temporal relationship d. Exposure-varying response e. Biologic plausibility This chapter presents fundamental elements of a clinical / epidemiological research study: the study population, exposure, outcome, and the general study design. Specific study designs, along with their inherent strengths and weaknesses, are discussed in subsequent chapters. The chapter concludes with a discussion of factors that favor causal inference.
I. STUDY POPULATIONA. Definition of the study population The term study population, or patient population in a research study, refers to all of the people who enter a study, regardless of whether they are treated, exposed, develop the disease, or drop out after the study has begun. Typically, a study population originates from some larger source population, which is then narrowed using exclusion criteria.
Consider a study to address the hypothesis that estrogen use increases the risk of developing venous thromboembolism (VTE). Biologic data suggest a link between estrogen use and VTE, because estrogen interferes with a circulating factor that normally inhibits blood clot formation.
One approach to studying this question in humans would be to identify a group of estrogen users and a group of non-users, and then to follow them prospectively for the development of VTE.
What exclusion criteria should be applied to best address the research question?
First, investigators may exclude women who have a previous history of VTE. Typically, studies of disease development focus on the incidence of disease, and therefore exclude people who have prevalent disease at the beginning of the study. Prevalent disease may be defined by any history of a chronic disease if that disease cannot be fully eradicated. For example, a previous history of coronary heart disease or diabetes is typically considered to represent ‘prevalent disease’ in a clinical research study because these conditions are rarely cured, though frequently treated.
Second, investigators may exclude women with known major VTE risk factors, such as cancer or recent major surgery. Excluding women who have known causes of VTE will increase confidence that any new VTE cases that develop during the study can be attributed to the use versus non-use of estrogen, rather than to some other factor. However, there are limits to using exclusion as a means to focus on a specific cause of disease. There are many risk factors for VTE, including genetic mutations, smoking, and kidney disease. Excluding women with any VTE risk factor would significantly diminish the size of the available study population, and would markedly restrict interpretation of study findings. In clinical practice, physicians do not test for a battery of rare mutations before prescribing estrogen. Results of a study that excluded women who had any predisposing mutation to VTE would have diminished generalizability to clinical practice.
Third, investigators may decide to exclude women whose VTE might be missed during the study.
For example, subjects who plan on moving from the area may develop VTE in another geographic location and might not be counted. Subjects who have a history of frequently missing clinic appointments might be difficult to contact and less likely to complete surveillance procedures, potentially developing VTE that could be missed. When selecting a suitable study population, it is important for investigators to consider how they will capture the disease in question, and consider limiting the study population to subjects whose disease would be counted if it were to develop during the study.
B. Choice of study population and generalizability of study findings The choice of study population directly influences the generalizability or applicability of study findings. The following examples illustrate how different types of study populations influence the generalizability of the results.
Clinic-based descriptive study of resistant soft tissue infections in children Study objective: Describe prevalence of resistant organisms in kids with soft tissue infection Study findings: Among 30 children with soft tissue infection demonstrated by wound culture, 7 (23%) had organisms that were resistant to first line antibiotics.
Study population: We studied 30 consecutive children with soft tissue infection from our outpatient pediatric clinic in greater Minneapolis. Children were included if they were 2-16 years old, had a soft tissue infection that required incision and drainage, and demonstrated organisms by culture examination.