Big Data Analytics in Healthcare – How Laboratories Can Play a Leading Role

Kaufman, Harvey W., MD, FCAP, MBA
Senior Medical Director,
Information Ventures,
Quest Diagnostics
Also by this Author 

The capture and analysis of big data offers considerable potential to improve the quality of healthcare and gain insights into public health. Dr. Harvey Kaufman, Senior Medical Director, Quest Diagnostics, discusses this potential, and the important role laboratories can play in making it a reality.

The Evolving Role of Laboratories

The increasing volume of data from a multitude of sources offers opportunities for improving healthcare. Dr. Kaufman believes clinical laboratories have a key role to play in the analysis and interpretation of that data, thereby providing healthcare professionals with information to guide decision-making.

“Historically we’ve been a transactional enterprise,” he says. “Whether in a hospital or reference laboratory, we receive an order, collect the specimen, perform the test, report the results, and that’s the end of our involvement. While we certainly need to continue providing those services, we now also need to go beyond that traditional role and analyze the data in the context of other information about the patient or about the population – that’s what will provide both clinical and economic value. It’s no longer just a matter of what a patient’s cholesterol level is, but also what that level means for that particular patient.”

The Potential of Big Data

A McKinsey consultancy report from 2013 suggested that healthcare could realize $300 billion in savings annually by leveraging Big Data.1 Part of this amount is based on reducing waste and delays, but it is also based on delivering appropriate treatment to the right patients.

“Big Data is a term that, in part, refers to the large volume of data – both structured and unstructured – that is fast accumulating in data warehouses,” explains Dr. Kaufman. “It’s not only the amount of data that’s important, but also how organizations use that data. I differentiate large data, which just focuses on quantity of data, from Big Data that is multidimensional. That is to say, it links data from one source to data from other sources to provide greater insight about a patient’s status or trends in public health. Big Data can be analyzed for insights that lead to better decisions and strategic actions. The act of gathering and storing large amounts of information for eventual analysis is nothing new, but our ability to harness this data is rapidly improving.”

“Big Data challenges our traditional approach to organizing and interpreting data, namely garbage in, garbage out,” continues Dr. Kaufman. “Big Data indiscriminately takes in lots of data but finds patterns, which may or may not have significance. Our judgment is needed to relate observations with outcomes, and to evaluate which associations are coincidental and which are causal.”

How Labs Have Used Data Analytics

“Harvard Business Review identified ‘data scientists’ as the ‘sexiest job of the 21st century’,”2 notes Dr. Kaufman.“In fact, laboratory professionals have been masters of generating test results and large data for many decades. Collectively, we generate billions of test results each year and a large proportion of hospital information systems are focused on laboratories and laboratory data. There is more expertise for evaluating data among laboratorians than in any other group of healthcare professionals. We had our start by looking at test results, proficiency testing, quality control data, patient test result distributions, and method evaluations. We have been using data for determining root causes of failures, defects, and opportunities.

“In addition to those of us engaged in pathology and laboratory medicine, there are plenty of clinical researchers who publish studies with statistical analyses. More recently, with the advent of Next-Generation Sequencing, we have experts looking at complex genetic associations. We work with public health agencies, health plans, and others to provide insights into the health of populations.

“There is still an unmet need for aggregating patient data across different systems and even integrating activity, sleep, diet, and stress monitors to provide insights into patient care. There are more than 300 self-tracking devices on the market and we have enormous opportunities to harness that data, pulling it in from where it resides in different devices, systems and databases.” 

From Data to Insights

The large datasets collected by laboratories provide a basis for understanding emerging diseases and tracking chronic conditions. “By aggregating and analyzing data from these large repositories we gain insights, which allow us to focus interventions where they are most needed and to monitor the success of different interventions. An example of how the analysis of aggregated data can provide insights into public health trends is illustrated by the Quest Diagnostics Health TrendsTM reports.3 These are studies based on mining aggregated, de-identified data from tests performed in a wide range of clinical areas.

“In a study on LDL cholesterol we included 247 million LDL cholesterol test results from 105 million people over an 11-year period,”4 says Dr. Kaufman. “The findings on H1N1 Influenza were reported as the initial outbreak occurred.5 We had access to a large set of data that allowed us to report on results that were as recent as the previous week. The study in Diabetes Care looked at newly identified patients with diabetes in the states that expanded Medicaid versus the states that had not.”6

Another example of how Quest Diagnostics harnesses large quantities of data is the Drug Testing IndexTM 7,which is possibly the longest-standing clinical laboratory data analysis initiative of its kind. Government agencies have used these studies to better allocate resources and evaluate the success of programs. For the 2013 report more than 125 million urine drug tests administered from 1988 to 2012 were reviewed.8

Aggregated Patient Data

Other examples of how large quantities of data can be harnessed are studies based on the massive databases of the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) and SEER-Medicare. “These contain highly complex data with information on demographics and diagnosis, and, in the case of SEER-Medicare, additional information on Medicare recipients,” explains Dr. Kaufman. “A 2014 study based on SEER-Medicare found that 31% of patients with a diagnosis of low-risk prostate cancer and 48% of those with intermediate-risk prostate cancer received a staging bone scan despite the lack of support in medical guidelines.9 The bone scans also led to additional radiographs, CT scans, and MRIs in more than 20% of the patients. All this costs money and anxiety with no demonstrated clinical benefit.”

Challenges to Overcome

There is a growing number of examples of how healthcare is using big data to gain valuable insights but they are still the exception not the norm. For the use of big data to become more widespread Dr. Kaufman believes those working in the field must address certain challenges.

“What will we need to move to the next phase of healthcare data analytics?” he asks. “This was summarized nicely in a Harvard Business Review article at the end of 2014.10 Data integration remains a huge challenge.  Even with LOINC codes that define laboratory results, we have too many LOINC codes and our data structure makes integration an enormous challenge.  The National Institutes of Health launched the Big Data to Knowledge initiative (BD2K) to enable the biomedical research community to better access and manage Big Data.  We need the same sort of leadership to better define standards for the laboratory community and those who use our data.

“Generating new knowledge means applying predictive analytics.  Other industries have applied machine learning and visual graphics to gain new insights.  Natural language processors that can also capture unstructured data such as that found in most tissue pathology reports will help in evaluating data. Memorial Sloan Kettering Cancer Center and others are teaming up with I.B.M. Watson Health to sift through mountains of data to help physicians come to the right diagnosis more quickly and to provide the most likely effective interventions at the right time. 

“The biggest challenge is getting people to respond to this new knowledge and to change their behavior. Financial incentives and disincentives that are part of the new landscape in healthcare may be the most important driver of change.  Payers like CMS and health plans are pushing providers to demonstrate improved outcomes and to bend the cost curve. They will not be successful unless they improve how they capture and analyze the data they generate.”



  1. Big Data: the next frontier for innovation, competition and productivity. McKinsey Global Institute. May 2011. Accessed on March 22, 2016
  2. Davenport TH, Patil, DJ. Data Scientist: the sexiest job of the 21st century. Harvard Bus RevOctober 2012. Accessed on March 22, 2016.
  3. Quest Diagnostics Health TrendsTM Accessed on March 22, 2016.
  4. Quest Diagnostics Health TrendsTM – H1N1 Influenza. Accessed on March 22, 2016.
  5. Kaufman HW, Blatt AJ, Huang X, Odeh MA, Superko HR (2013) Blood Cholesterol Trends 2001–2011 in the United States: Analysis of 105 Million Patient Records. PLoS ONE 8(5): e63416.
  6. Kaufman HW, Chen Z, Fonseca VA, McPhaul MJ. Surge in Newly Identified Diabetes Among Medicaid Patients in 2014 Within Medicaid Expansion States Under the Affordable Care Act. Diabetes Care. Published Ahead of Print, published online March 22, 2015. Accessed on March 22, 2016.
  7. Quest Diagnostics Drug Testing IndexTM  Accessed on March 22, 2016.
  8. Drug Use Among American Workers Declined 74% Over Past 25 Years, Finds Unprecedented Analysis of More Than 125 Million Workplace Urine Drug Tests. Accessed on March 22, 2016.
  9. Falchook AD, Salloum RG, Hendrix LH, Chen RC. Use of bone scan during initial prostate cancer workup, downstream procedures, and associated Medicare costs. Int J Radiat Oncol Biol Phys. 2014 Jun 1;89(2):243-8.
  10. Shah ND, Pathak J. Why Health Care May Finally Be Ready for Big Data. Harvard Bus Rev. December 03, 2014. Accessed on March 22, 2016. 

Released on Wednesday, July 06, 2016