FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 13 |

«by Yang Liu A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Information) in the ...»

-- [ Page 1 ] --

Mining Social Media to Understand Consumers’

Health Concerns and the Public’s Opinion on

Controversial Health Topics


Yang Liu

A dissertation submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy


in the University of Michigan


Doctoral Committee:

Associate Professor Kai Zheng, Co-Chair

Associate Professor Qiaozhu Mei, Co-Chair

Associate Professor David A. Hanauer

Associate Professor Joyce M. Lee c Yang Liu 2016 All Rights Reserved


I would like to thank my advisors Kai Zheng and Qiaozhu Mei, who have been a wonderful source of support, inspiration and encouragement during my PhD program.

I am greatly indebted to my committee members, David Hanauer and Joyce Lee, for their medical expertise and consistent high standard of research.

There are many other people without whom this dissertation would not have been possible: V.G. Vinod Vydiswaran, whom I have closely collaborated with and learned a lot from; Matthew Davis and Helen Levy, who brought with their public health policy perspective; Maria Woodward and Shreya Prabhu, who have generously given their time and offered ophthalmic expertise; and Jia Liu, Tricia OBrien, Esha Sondhi, and Sonia Zhang, who have helped me with enormous amount of annotation.

I am fortunate to have had many wonderful collaborators while at University of Michigan. Yan Chen, Roy Chen and Wei ai, with whom I worked closely with on a series of economic projects, have provided me with invaluable experience and knowledge of experimental economics. The Health Informatics Innovation group and Foreseer group have been a great source of ideas, feedback and friendship.

Finally, I would like to thank my parents, Aihong Cheng and Xianli Liu, for their love and continuous support.



ACKNOWLEDGEMENTS.......................... ii LIST OF FIGURES............................... vi LIST OF TABLES................................ viii LIST OF ABBREVIATIONS......................... x


................................... xi


I. Introduction.............................. 1 II. Systematic Literature Review................... 4

2.1 Methods.............................

–  –  –


3.1 Distributions of the categories of site-defined and user-created groups. 24

3.2 Frequency of tweets and users tweeting with those terms/hashtags. 34

3.3 Frequency of the geo-tagged diabetes tweets in top countries.... 38

–  –  –

4.3 Top annotation disagreements on judging medical relevance..... 55

4.4 Top annotation disagreements between two error categories..... 56

–  –  –

ADR Adverse Drug Reaction ACA Affordable Care Act API application programming interface ATAM Ailment Topic Aspect Model BRFSS Behavioral Risk Factor Surveillance System CDC The U.S. Centers for Disease Control and Prevention CHV Consumer Health Vocabulary CRF conditional random field ILI Influenza-like Illness LDA Latent Dirichlet Allocation LIWC Linguistic Inquiry and Word Count MMR measles, mumps, and rubella POMS the Profile of Mood States PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses RBF radial basis function SVM Support Vector Machine UMLS Unified Medical Language System

–  –  –

Co-Chairs: Kai Zheng and Qiaozhu Mei Social media websites are increasingly used by the general public as a venue to express health concerns and discuss controversial medical and public health issues.

This information could be utilized for the purposes of public health surveillance as well as solicitation of public opinions. In this thesis, I developed methods to extract healthrelated information from multiple sources of social media data, and conducted studies to generate insights from the extracted information using text-mining techniques.

To understand the availability and characteristics of health-related information in social media, I first identified the users who seek health information online and participate in online health community, and analyzed their motivations and behavior by two case studies of user-created groups on MedHelp and a diabetes online community on Twitter. Through a review of tweets mentioning eye-related medical concepts identified by MetaMap, I diagnosed the common reasons of tweets mislabeled by natural language processing tools tuned for biomedical texts, and trained a classifier to exclude non medically-relevant tweets to increase the precision of the extracted data.

xi Furthermore, I conducted two studies to evaluate the effectiveness of understanding public opinions on controversial medical and public health issues from social media information using text-mining techniques. The first study applied topic modeling and text summarization to automatically distill users’ key concerns about the purported link between autism and vaccines. The outputs of two methods cover most of the public concerns of MMR vaccines reported in previous survey studies. In the second study, I estimated the public’s view on the Affordable Care Act (ACA) by applying sentiment analysis to four years of Twitter data, and demonstrated that the the rates of positive/negative responses measured by tweet sentiment are in general agreement with the results of Kaiser Family Foundation Poll. Finally, I designed and implemented a system which can automatically collect and analyze online news comments to help researchers, public health workers, and policy makers to better monitor and understand the public’s opinion on issues such as controversial health-related topics.

–  –  –

Social media has revolutionized the way people disclose their personal health concerns and express opinions on controversial public health issues. It provides a unique platform for sharing health-related information without time and location constraints.

According to a 2014 Pew Research Center survey, 74% of adults with Internet access use social media sites. (Pew, 2014) Another Pew report shows that 11% of social network site users, have posted comments, queries, or information about health or medical matters. (Fox, 2011) In the meanwhile, both the government and individual companies have spent tremendous resources and efforts to track public health conditions,1 risky health behaviors,2 and public opinions on controversial public health issues3 through personal interviews or telephone surveys. Policy makers and public health researchers rely these poll results to monitor population health and develop intervention strategies.

Despite the large sample size, the traditional polling methods (Groves et al., 2011) have several disadvantages including their untimeliness, high cost, and respondents’ limited availability. Health-related information in social media is a valuable source of information which can be used to overcome these disadvantages. Content analysis of online discussions of controversial public health issues can generate insights about 1 http://www.cdc.gov/nchs/nhis.htm 2 http://www.cdc.gov/brfss/about/index.htm 3 http://kff.org/report-section/kaiser-health-tracking-poll-april-2015-methodology/ 1 public opinions. It can further help us estimate the tendency of public sentiment in real time with very low cost. Collections of personal health concerns expressed in social media can also be translated into effective signals of outbreak of disease epidemics in early stage. (Ginsberg et al., 2009) Finally, statistical analysis of this big data set can help clinical researchers discover new medical knowledge, such as adverse drug events (White et al., 2014) and disease comorbidities.

Despite these opportunities, several challenges to mining social media text have prevented us from effectively utilizing this valuable information. First, the availability and characteristics of medically-relevant data in social media remain unclear. This issue makes it difficult for researchers to determine what questions such social media data can help to answer, and the validity and generalizability of the results generated. Secondly, comparing to other traditional health information sources such as electronic health records, social media data, which could be generated by anybody on the Internet, is inherently noisy due to misspellings, casual language style, and heterogeneous contexts. Extraction of health-related information from this noisy data set can be very challenging. Careless extraction of the data can lead to false alarms of disease outbreaks or biased public opinion estimates. Finally, the lack of efficient and effective methods to analyze and make sense of social media data further impedes the full utilization of this information. Since most existing text-mining and medical natural language processing techniques are designed for processing biomedical text (e.g. clinician notes, published scientific literature), their performance on social media data is questionable without careful evaluations against human-labeled ground truth.

In this thesis, I addressed each of these three challenges respectively. First, I summarized previous work by conducting a systematic literature review of studies on understanding the motivation of online health information sharing and seeking behavior, methods of extracting and analyzing health-related information in social media, and 2 systems and tools leveraging such methods. I also investigated end user motivation and behaviors in two scenarios, namely user self-initiated groups in a health forum and an online diabetes community on Twitter. Second, to extract health-related information in Twitter, I applied a state-of-the-art medical natural language processing tool, MetaMap, to identify potential mentions of medical concepts. I then evaluated the performance of MetaMap by comparing the eye-related concepts it identified to the results of a manual review of a sample of tweets. Using the manually annotated sample, I trained a classifier to correct the errors introduced by MetaMap to achieve higher accuracy. Third, I applied text-mining and natural language processing techniques to study public opinions using different social media data, and demonstrated the effectiveness of these tools by comparing the machine-generated results to humanannotated data or traditional poll results. Finally, I built a system to incorporate the techniques mentioned above, and to automate the process to facilitate information extraction and insight generation using the framework I developed.

Chapter II presents a literature review of existing techniques and tools for analyzing health-related information from social media discussions. Section 3.1 in Chapter III is based on part of our work published in ICWSM 2014 (Vydiswaran et al., 2014). Section 3.2 is based on unpublished work done in collaboration with Joyce Lee, David Hanauer and Qiaozhu Mei. Section 4.3 in Chapter IV is unpublished work done in collaboration with Vinod Vydiswaran, Kai Zheng, David Hanauer, Qiaozhu Mei, Trishia O’Brien, and Esha Sondhi. Section 5.1 in Chapter V is unpublished work done in collaboration with Vinod Vydiswaran, Kai Zheng, David Hanauer, and Qiaozhu Mei. Section 5.2 is ongoing work in collaboration with Matthew Davis, Kai Zheng, and Helen Levy.

–  –  –

Our goal of this chapter is to summarize prior work in health sciences and computer science pertaining to the following four topics: (1) users’ motivations and concerns of sharing health-related data on social media websites, (2) methods of distilling health-related data from social media content including methods of identifying medical concepts expressed in consumer language, (3) both quantitative and qualitative methods of analyzing health-related data, and (4) frameworks and applications using health-related data.

2.1 Methods

A systematic literature review was conducted according to guidelines in the PRISMA statement. (Moher et al., 2009) After consulting other health/computer science interdisciplinary literature reviews, (Saha et al., 2007; Crutzen et al., 2011; Fry and Neff, 2009; Fernandez-Luque et al., 2011a), I chose to search four databases in health sciences and computer science: PubMed, WebofScience, Google Scholar, and ACM digital library. The following queries were used to search in the title and abstract fields (full text for Google Scholar) in the literature databases: health AND (twitter or tweets or facebook or myspace or youtube or “social media” or “user generated content”). The publication year must be later than 2005, and the language was limited 4 to English only. The eligible publications must be analysis of the content from popular social media websites instead of health-specific online communities. Furthermore, studies about the following topics were excluded: health policy research; using social media websites as a communication channel of health promotion or patient education;

or health issues caused by using social media. In addition, references of relevant articles were reviewed, leading to 20 more articles being included. The PRISMA diagram is shown in Figure 2.1.

–  –  –

2.2.1 Benefits and Concerns of Sharing Personal Health Data Although social media has been widely adopted by all population regardless of gender, education, race, health status, or health care access, (Chou et al., 2009b;

Fisher and Clayton, 2012; Shaw and Johnson, 2011) understanding users’ benefits and motivation of sharing their personal health data is still critical to inform future research to improve the design of social media systems and to increase their actual benefits to users.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 13 |

Similar works:

«National Medical Policy Subject: Laser Photocoagulation of Macular Drusen Policy Number: NMP64 Effective Date*: October 2003 Updated: November 2015 This National Medical Policy is subject to the terms in the IMPORTANT NOTICE at the end of this document For Medicaid Plans: Please refer to the appropriate State's Medicaid manual(s), publication(s), citations(s) and documented guidance for coverage criteria and benefit guidelines prior to applying Health Net Medical Policies The Centers for...»

«DISSERTATION Titel der Dissertation NOVEL INSIGHTS INTO XENOBIOTIC TRANSPORT BY ORGANIC ANION TRANSPORTING POLYPEPTIDES (OATPs) AND OATPEXPRESSION PROFILING IN OVARIAN CARCINOMA AND OTHER SOLID TUMORS angestrebter akademischer Grad Doktor der Naturwissenschaften (Dr. rer.nat.) Verfasserin / Verfasser: Martin Svoboda Dissertationsgebiet (lt. A090-441 Genetik-Mikrobiologie Studienblatt): Betreuerin / Betreuer: Ao. Univ.-Prof. Dr. Walter Jäger Wien, im Juni 2010 Acknowledgments I would like to...»

«Supporting your commitment to excellence Zero Day Stay ‘Emergency’ Admissions in Thames Valley Higher volumes at particular acute sites after adjusting for population characteristics Dr Rod Jones Statistical Advisor Healthcare Analysis & Forecasting www.hcaf.biz Table of Contents Table of Contents Aims Executive Summary Key Points Effect of the Healthcare System Implications to PbR Effect of Population Characteristics Introduction Method of Analysis Population Factors Influencing...»

«[ 564 ] THE REACTION OF THE EAR CARTILAGE OF THE RABBIT AND GUINEA-PIG TO TRAUMA BY J. JOSEPH, G. A. THOMAS AND J. TYNEN Department of Anatomy, Guy's Hospital Medical School, London INTRODUCTION In the course of investigations into the behaviour of autografts of ileum and the lining of the urinary bladder transplanted to the ear in rabbits (Joseph, 1960), it was noticed that in many animals the cartilage of the ear proliferated and formed bone. It was difficult, however, to decide whether the...»

«Recovering Ordinary Lives The strategy for occupational therapy in mental health services 2007–2017 Results from service user and carer focus groups College of Occupational Therapists About the publisher The College of Occupational Therapists is a wholly owned subsidiary of the British Association of Occupational Therapists (BAOT) and operates as a registered charity. It represents the profession nationally and internationally, and contributes widely to policy consultations throughout the UK....»

«Republic of Serbia MINISTRY OF HEALTH INFLUENZA PREPAREDNESS PLAN BEFORE AND DURING PANDEMIC OF THE REPUBLIC OF SERBIA Adopted by Government of Republic of Serbia October 2005 INFLUENZA PREPAREDNESS PLAN BEFORE AND DURING PANDEMIC OF THE REPUBLIC OF SERBIA PLAN OF ACTIVITIES BEFORE AND DURING INFLUENZA PANDEMIC Importance of adoption of the plan Influenza pandemic represents a worldwide outbreak emerging from the creation of new influenza virus subtype A that has never circulated throughout...»

«PROJECT REPORT Candidate Number: 105359 MSc: Public Health for Eye Care Title: COMMON EYE CONDITIONS IN CHILDREN UNDER FIVE YEARS: Knowledge and Practices among caregivers and maternal-child health workers in two rural districts of Kenya Supervisor: PROFESSOR ALLEN FOSTER Word Count: 10,041 Submitted in part fulfilment of the requirements for the degree of MSc in Public Health for Eye Care (PHEC) For Academic Year 2011-2012 TABLE OF CONTENTS TABLE OF CONTENTS List of Figures List of Tables...»

«Occupational Therapy Skin Care Guideline Best Practice for the Prevention and Treatment of Pressure Ulcers Created by the Occupational Therapists in Vancouver Coastal Health and Providence Health Care April 2008 OCCUPATIONAL THERAPY SKIN CARE GUIDELINE FOR VCH/PHC BEST PRACTICE FOR THE PREVENTION AND TREATMENT OF PRESSURE ULCERS ACKNOWLEDGEMENTS This guideline has benefited from the contributions of many occupational therapists. We would like to acknowledge the following contributors:...»

«High Performance Oilless Commercial Refrigerant Recovery Unit User Manual 2079-1130 Rev. 6 – May 2010 Introduction Congratulations on your purchase of the STINGER high performance oilless recovery unit. Bacharach has worked hard to make the STINGER the highest performing, most portable, and easiest to use recovery unit on the market. We are committed to your complete satisfaction! CAUTION: These instructions are for personnel trained and experienced in the handling of refrigerants....»

«COMBINED EVIDENCE OF COVERAGE AND DISCLOSURE FORM This Combined EOC and Disclosure Form is only a summary of the vision care plan. The terms and conditions of the Group Contract must be consulted to determine the exact terms and conditions of coverage. 6701 Center Drive West, Suite 790, Los Angeles, California 90045 PH: (888) 493-4070 TTY: (877) 627-2456 M-F, 8am to 5pm PST Welcome to March Vision Care! Thank you for choosing March Vision Care, Inc. (“MARCH”) as your vision care plan. This...»

«WHO/CDS/EPR/GIP/2006.5 WHO Influenza Pandemic Task Force Report of the first meeting Geneva, Switzerland 25 September 2006 EPIDEMIC AND PANDEMIC ALERT AND RESPONSE WHO Influenza Pandemic Task Force Report of the first meeting Geneva, Switzerland 25 September 2006 © World Health Organization 2006 All rights reserved. The designations employed and the presentation of the material in this publication do not imply the expression of any opinion whatsoever on the part of the World Health...»

«STANDARD TREATMENT GUIDELINES AND ESSENTIAL DRUGS LIST FOR THE MINISTRY OF HEALTH, TONGA2007. Standard Treatment Guidelines Tonga 2007 Standard Treatment Guidelines and Essential Drugs List: Ministry of Health. First Edition, 2007 Copyright © 2007, Ministry of Health, Tonga All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, scanned or transmitted in any form without the permission of the copyright owner. Ministry of Health PO Box 59 Nuku’alofa...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.