«This Technology Watch report looks at different examples and applications associated with the big data paradigm, identifies commonalities among them ...»
Big today, normal tomorrow
ITU-T Technology Watch Report
This Technology Watch report looks at different examples and applications associated with the big data
paradigm, identifies commonalities among them by describing their characteristics, and highlights some of the
technologies enabling the upsurge of big data. As with many emerging technologies, several challenges need
to be identified and addressed to facilitate the adoption of big data solutions in a wider range of scenarios. Big data standardization activities related to the ITU-T work programme are described in the final section of this report.
The rapid evolution of the telecommunication/information and communication technology (ICT) environment requires related technology foresight and immediate action in order to propose ITU-T standardization activities as early as possible.
ITU-T Technology Watch surveys the ICT landscape to capture new topics for standardization activities. Technology Watch Reports assess new technologies with regard to existing standards inside and outside ITU-T and their likely impact on future standardization.
Acknowledgements This report was written by Martin Adolph of the ITU Telecommunication Standardization Bureau.
Please send your feedback and comments to email@example.com.
The opinions expressed in this report are those of the author and do not necessarily reflect the views of the International Telecommunication Union or its membership.
This report, along with other Technology Watch Reports can be found at http://itu.int/techwatch.
Cover picture: Shutterstock Technology Watch is managed by the Policy & Technology Watch Division, ITU Telecommunication Standardization Bureau.
Call for proposals Experts from industry, research and academia are invited to submit topic proposals and abstracts for future reports in the Technology Watch series. Please contact us at firstname.lastname@example.org for details and guidelines.
ITU 2013 All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU.
Big today, normal tomorrow November 2013 ITU-T Technology Watch Table of contents Page
2. Big data everywhere – applications in health, science, transport and beyond................ 2
3. What makes data big – characteristics of big data
4. What makes data big – enablers
5. Challenges and opportunities for big data adoption
1. Introduction In early 2013 several European countries were rocked by a food scandal which uncovered a network of fraud, mislabeling and sub-standard supply chain management.
This was not the first food scandal, and will surely not be the last. For restaurant chains with thousands of branches and hundreds of suppliers worldwide, it is nearly impossible to monitor the origin and quality of each ingredient. Data and sophisticated real-time analytics are means to discover early (or, better yet, prevent) irregularities.
The events leading to the discovery and resolution of the scandal point to the promises and challenges of data management for multiparty, multidimensional, international systems. Billions of individual pieces of data are amassed each day, from sources including supplier data, delivery slips, restaurant locations, employment records, DNA records, data from Interpol’s database of international criminals, and also customer complaints and user-generated content such as location check-ins, messages, photos and videos on social media sites. But more data does not necessarily translate into better information. Gleaning insight and knowledge requires ‘connecting the dots’ by aggregating data and analyzing it to detect patterns and distill accurate, comprehensive, actionable reports.
Big data – a composite term describing emerging technological capabilities in solving complex tasks – has been hailed by industry analysts, business strategists and marketing pros as a new frontier for innovation, competition and productivity. “Practically everything that deals with data or business intelligence can be rebranded into the new gold rush”1, and the hype around big data looks set to match the stir created by cloud computing (see Figure 1) where existing offerings were rebranded as ‘cloud-enabled’ overnight and whole organizations moved to the cloud.
Putting the buzz aside, big data motivates researchers from fields as diverse as physics, computer science, genomics and economics – where it is seen as an opportunity to invent and investigate new methods and algorithms capable of detecting useful patterns or correlations present in big chunks of data. Analyzing more data in shorter spaces of time can lead to better, faster decisions in areas spanning finance, health and research.
This Technology Watch report looks at different examples and applications associated with the big data paradigm (section 2), identifies commonalities among them by describing their characteristics (section 3), and highlights some of the technologies enabling the upsurge of big data (section 4). As with many emerging technologies, several challenges need to be identified (section 5) and addressed to facilitate the adoption of big data solutions in a wider range of scenarios. Global standardization can contribute to addressing such challenges and will help companies enter new markets, reduce costs and increase efficiency. Big data standardization activities related to the ITU-T work programme are described in the final section of this report.
1 Forbes: “Big Data, Big Hype: Big Deal,” 31 December 2012, http://www.forbes.com/sites/edddumbill/2012/12/31/bigdata-big-hype-big-deal/ <
Note: Numbers represent search interest relative to the highest point on the chart.
Source: Google Trends, http://www.google.com/trends/
2. Big data everywhere – applications in health, science, transport and beyond Data is critical in the healthcare industry where it documents the history and evolution of a patient’s illness and care, giving healthcare providers the tools they need to make informed treatment decisions. With medical image archives growing by 20 to 40 per cent annually, by 2015, an average hospital is expected to be generating 665 terabytes of medical data each year.2 McKinsey analysts predict3 that, if large sets of medical data were routinely collected and electronic health records were filled with high-resolution X-ray images, mammograms, 3D MRIs, 3D CT scans, etc., we could better predict and cater to the healthcare needs of a population; which would not only drive gains in efficiency and quality, but also cut the costs of healthcare dramatically. Applications of big data analytics in the healthcare domain are as numerous as they are multifaceted, both in research and practice, and below we highlight just a few.
Remote patient monitoring, an emerging market segment of machine-to-machine communications (M2M), is proving a source of useful, quite literally lifesaving, information. People with diabetes, for instance, are at risk of long-term complications such as blindness, kidney disease, heart disease and stroke. Remote tracking of a glucometer (a blood sugar reader) helps monitor a patient’s compliance with the recommended glucose level. Electronic health records are populated with data in near real time. Time series of patient data can track a patient’s status, identify abnormalities and form the basis of treatment decisions. More generally, exploiting remote patient monitoring systems for chronically ill patients can reduce physician appointments, emergency department visits and inhospital bed days; improving the targeting of care and reducing long-term health complications.
2 Forbes, http://www.forbes.com/sites/netapp/2013/04/17/healthcare-big-data/ 3 McKinsey, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
It is not only the ill who use technology to monitor every detail of their biological processes.4 The term Quantified Self describes a movement in which people are exploiting wearable sensors to track, visualize, analyze and share their states, movements and performance.5 Fitness products and sleep monitors are some of the more popular self-quantification tools, and their users populate real-time data streams and global data factories.
Which treatment works best for specific patients?
Studies have shown that wide variations exist in healthcare practices, providers, patients, outcomes and costs across different regions. Analyzing large datasets of patient characteristics, outcomes of treatments and their cost can help identify the most clinically effective and cost-efficient treatments to apply. Comparative effectiveness research has the potential to reduce incidences of ‘overtreatment’, where interventions do more harm than good, and ‘under-treatment’, where a specific therapy should have been prescribed but was not. In the long run, over- and under-treatment both have the potential for worse outcomes at higher costs.
Scaling-up comparative effectiveness research can change how we view global health and improve the way public health crises are managed. Consider pneumonia, the single largest cause of child death worldwide. According to WHO data6, each year the disease claims the lives of more than 1.2 million children under the age of five – more than AIDS, malaria and tuberculosis combined.
Pneumonia is preventable with simple interventions and can be treated with low-cost, low-tech medication and care. However, the growing resistance of the bacterium to conventional antibiotics does underline an urgent need for vaccination campaigns to control the disease. Health data is vital in getting this message across to policy makers, aid organizations and donors, but, no matter how accurate and complete raw statistics and endless spreadsheets may be, their form is not one that lends itself to easy analysis and interpretation. Models, analytics and visualizations of deep oceans of data work together to provide a view of a particular problem in the context of other problems, as well as in the contexts of time and geography. Data starts ‘telling its life story’, in turn becoming a vital decision making tool. The Global Health Data Exchange7 is such a go-to repository for population health data enriched by a set of tools to visualize and explore the data.8 Analyzing global disease patterns and identifying trends at an early stage is mission critical for actors in the pharmaceutical and medical products sector, allowing them to model future demand and costs for their products and so make strategic R&D investment decisions.
High-throughput biology harnesses advances in robotics, automated digital microscopy and other lab technologies to automate experiments in a way that makes large-scale repetition feasible. For example, work that might once have been done by a single lab technician with a microscope and a pipette can now be done at high speed, on a large scale. It is used to define better drug targets, i.e., nucleic acids or native proteins in the body whose activity can be modified by a drug to result in a desirable therapeutic effect.9 4 NY Times, http://bits.blogs.nytimes.com/2012/09/07/big-data-in-your-blood/ 5 TED, http://www.ted.com/talks/gary_wolf_the_quantified_self.html 6 WHO, http://www.who.int/mediacentre/factsheets/fs331/en/index.html 7 Global Health Data Exchange, http://ghdx.healthmetricsandevaluation.org/ 8 BBC, http://www.bbc.com/future/story/20130618-a-new-way-to-view-global-health 9 Wikipedia, http://en.wikipedia.org/wiki/Biological_target#Drug_targets Big data - big today, normal tomorrow (November 2013) 3 ITU-T Technology Watch Automated experiments generate very large amounts of data about disease mechanisms and they deliver data of great importance in the early stages of drug discovery. Combined with other medical datasets, they allow scientists to analyze biological pathways systematically, leading to an understanding of how these pathways could be manipulated to treat disease.10 Data to solve the mysteries of the universe Located just a few minutes’ drive from ITU headquarters, CERN is host to one of the biggest known experiments in the world, as well as an example of big data, par excellence. For over 50 years, CERN has been tackling the growing torrents of data produced by its experiments studying fundamental particles and the forces by which they interact. The Large Hadron Collider (LHC) consists of a 27-kilometer ring of superconducting magnets with a number of accelerating structures to boost the energy of the particles along the way. The detector sports 150 million sensors and acts as a 3D camera, taking pictures of particle collision events at the speed of 40 million times per second.11 Recognizing that this data likely holds many of the long-sought answers to the mysteries of the universe, and responding to the need to store, distribute and analyze the up to 30 petabytes of data produced each year, the Worldwide LHC Computing Grid was established in 2002 to provide the necessary global distributed network of computer centers. A lot of CERN’s data is unstructured and only indicates that something has happened. Scientists around the world now collaborate to structure, reconstruct and analyze what has happened and why.
Understanding the movement of people Mobility is a major challenge for modern, growing cities, and the transport sector is innovating to increase efficiency and sustainability. Passengers swiping their RFID-based public transport pass leave a useful trace that helps dispatchers to analyze and direct fleet movements. Companies, road operators and administrations possess enormous databases of vehicle movements based on GPS probe data, sensors and traffic cameras, and they are making full use of these data treasure chests to predict traffic jams in real time, route emergency vehicles more effectively, or, more generally, better understand traffic patterns and solve traffic-related problems.
Drivewise.ly and Zendrive are two California-based startups working on data-driven solutions aimed at making drivers better, safer and more eco-friendly. The assumption is that driving habits and commuting patterns can be recognized or learned by collecting the data captured with the sensors of a driver’s smartphone (e.g., GPS, accelerometer) and referencing it to datasets collected elsewhere.