Wednesday, May 3, 2017


On the Wisdom of Crowds:
Collective Predictive Analytics

“All great lies have a seed of truth” (James Cottrell, personal communication, 2004). In 1907, Sir Francis Galton (1855-1911) – a British statistician whose body of research focused on human intelligence and who also happened to be Charles Darwin’s cousin – observed that in a festival contest in Cornwall, where people attempted to guess the weight of an ox, the average of all guesses was consistently close to the ox’s actual weight (Galton, 1907; Ball, 2014; Gega, 2000). Author James Surowiecki resurrected this observation for his book, The Wisdom of Crowds, in 2005 (Surowiecki, 2005).

While Galton is often considered the “father” of modern collective intelligence based on a mean, his 1907 publication in the journal Nature focused on how well crowds predicted the median because it was subject to less error – or more confidence in its interval. The mean was within one pound; the median was within nine pounds, leading Galton to answer with more specificity to a reader’s letter that, because in small sets the exclusion of a measurement could greatly alter the mean and impact the median much less, he found collective intelligence most applicable for predicting median ranges rather than means (Galton, Letters to the editor: The ballot box, 1907). However, recent re-examinations of Galton’s data indicate that errors in the original calculations such that the 800-strong crowd’s estimate of the mean was not within one pound, but the exact weight of the ox (Wallis, 2014).

This theory of “collective intelligence” has been studied and analyzed for the 110 subsequent years to try and validate how, when, and in what circumstances it accurately and inaccurately predicts. Critics of Surowiecki’s popularizing review of collective intelligence point out crowds’ skill at optimizing while being less skilled at innovation or creativity (Lanier, 2010). Moreover, collective intelligence is poor at defining the right question – often the most important aspect of inquiry – and scalar results (Lanier, 2010).

One example of collective or aggregated intelligence’s predictive failure is in economic measurements. Experts who opine on measurements of economic growth are often incorrect and, moreover, this error is compounded because the prediction sets an expectation that, when missed, causes negative reactions in financial markets (Cassino, 2016). The failing of collective predictive intelligence in financial markets may largely be to this double-action – they set an expectation and the inaccuracy relative to the expectation makes markets respond disproportionately. While the origins and dynamics of irrational markets and “black swan events” are beyond the author’s scope here, these failings of collective intelligence may be an early causal event.

Other evidence suggests that collective intelligence can be effective creatively; however, mostly when applied to brainstorming-type activities among a cohort of experts. For example, in a 2011 contest in the Harvard Medical School community, 40,000 faculty, staff, and students competed to derive the most important questions – what is unknown but needed to be known – to cure type 1 diabetes, with impressive results (Harvard Medical School, 2011).

For now, the lessons learned regarding collective intelligence appear to be this: it is best when an intellectually diverse cohort of experts answer a predefined question and focuses on optimization around medians or brainstorming (Ball, 2014). It is least best when the crowd thinks the same, includes many non-experts, faces a spectrum or scalar of answers, and whose work product is not the basis for many future decisions on which error could be compounded.

Works Cited

Ball, P. (2014, July 8). 'Wisdom of the crowd;' myths & realities. BBC: Future, pp. http://www.bbc.com/future/story/20140708-when-crowd-wisdom-goes-wrong.

Cassino, D. (2016, July 8). The ‘wisdom of the crowd’ has a pretty bad track record at predicting jobs reports. Havard Business Review, pp. https://hbr.org/2016/07/the-wisdom-of-the-crowd-has-a-pretty-bad-track-record-at-predicting-jobs-reports.

Galton, F. (1907). Letters to the editor: The ballot box. Nature, http://galton.org/cgi-bin/searchImages/galton/search/essays/pages/galton-1907-ballot-box_1.htm.

Galton, F. (1907). Vox populi. Nature, 450-451.

Gega, S. (2000, May). Sir Francis Galton. Retrieved from Muskingham College: http://muskingum.edu/~psych/psycweb/history/galton.htm

Harvard Medical School. (2011, April 6). The wisdom of crowds: Contest yields innovative strategies for conquering Type 1 diabetes. Retrieved from Harvard Medical School: https://hms.harvard.edu/news/wisdom-crowds-4-6-11

Lanier, J. (2010). You are Not a Gadget: A Manifesto. London: Allen Lane.

Surowiecki, J. (2005). The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations,. New York City: Anchor Books.

Wallis, K. (2014). Revisiting Francis Galton's forecasting competition. Statistical Science, 420-424.

Tuesday, January 17, 2017

Big Data will Slow, Not Accelerate, Discovery

The generations of us living from 1984 to 2020 are witnessing the largest and most rapid cycle of creation since the Big Bang.  That is, one 36-year window that is competing with the 14.5 billion years of existence that humans know of. This creation guides what information is available to us, our options, and our choices in every area of life, thousands of times every day.  It can determine life or death, the transference of wealth, the geopolitics of Earth, and the expansion of knowledge.  We are its creators, and we can neither stop its creation nor know it in any tangible way.  It is:  data.

So, what happened?  How did this come about? What made Big Data, big?  It has largely come about because of the ubiquity of processors in the late 20th century, the creation of and growth of users of the Internet (from 1,000 in 1984 to 2.7 billion in 2016, half of whom are on Facebook), then miniaturization of processors into smart devices like phones, watches, etc.; however, mostly now, the exponential growth of data is the result of unstructured databases (e.g., MongoDB, NoSQL, etc.) to hold all our digital interactions and behaviors.  This unstructured behavioral data currently accounts for about 75% of the data being created.  But, we haven’t seen anything yet because the growth of data will explode even more over the next 15 years from the Internet-of-Things. (Wall, 2014)

So, how big is “Big?”  According to IBM, in 2012, we created 2.5 gigabytes of data per day.  At this rate, in another much-cited statistic by IBM, which was originally written by Norwegian, Ase Dragland from think-tank SINTEF in 2013, 90% of all data in the world was created in the prior two years. (Dragland, 2013) However, this data growth rate – the speed of creation – is actually accelerating.  According to research data scientist Richard Ferres from the Australian National Data Service (ANDS), we are creating data 10x faster every two years. (Ferres, 2015) In other words, starting from 1 in 1985, we were at a speed of 1x1015 in 2015 (e.g. one quadrillion “miles per hour”), and in 2017, our speed of data creation is 1x1016 (e.g., ten quadrillion “miles per hour”).

If that acceleration wasn’t fast enough, we’re soon going to be creating data a lot faster still, because of the Internet-of-Things (IoT).  The IoT is the collective name for the billions of devices that are being embedded with sensors to communicate data in networks – think of the “smart” refrigerator by Samsung that tracks what groceries are inside it, or the car or home alarm system or baby monitor that you can control with your mobile phone. Technology research group Gartner estimates there were 6.7 billion such devices or sensors on-line in 2016, (Gartner, 2015) and competitor research group IDC estimates there will be 30 billion by 2020. (IDC, 2014) Recall, there are approximately 2.7 billion Internet users.  So, the prediction is the number of data creators will increase by 10x within the next three years alone.  Simplistic math would suggest that 10x the number of data creators accelerating at 10x every two years may mean that within 3-4 years, the speed of our annual data creation will accelerate 100x every two years.

But these numbers are at such a scale as to make them difficult for human brains to understand or imagine.  Two gigabytes, which was 80% of the amount of data we collectively created in 2012 every day, is about 20 yards (60 feet) of average-length books on a shelf, or about 6.67 kilometers (4.15 miles) of books per year.  But because we created data 10x faster in 2016, it means we were up to 66.7 kilometers (41.5 miles) of books face-to-back per year.  If we accelerate 10x faster by 2018, as predicted, and 100x faster as suggested above by 2020, that would mean we will create 667 kilometers (415 miles) of books on a shelf in 2018, and will be creating 6,667 kilometers (4,150 miles) of books on a shelf, every year, by 2020. At this rate, if we were publishing books instead of electronic data, it would be enough to encircle the Earth at the equator every year sometime before 2022 – only six years away.

Imagine if meaningful knowledge or discovery in Big Data were diamonds in the Earth.  To mine or find them, we have to collect tens of thousands of cubic yards of soil.  Then, someone comes along with an invention that enables us to collect billions of cubic yards of soil premised on the theory that we will find orders of magnitude more diamonds in orders of magnitude more dirt.  Maybe.  But, for sure, it makes the mission of diamond miners (data scientists to us) orders of magnitude harder too.

Worse yet, unless and until we become proficient at its use, big data statistics often creates more false knowledge than true knowledge.  The most common thing a researcher does in trying to discover meaningful new relationships in this data is calculate correlations (e.g., every time X changes, Y also changes); however, these correlations are often “false” because we presume they are causal (X changing causes Y to change) leading to misinformation.  To determine a causal relationship requires Bayesian statistics, a rather advanced statistical toolbox with which many data scientists, let alone executive decision makers, are unfamiliar with.  However, the error-prone process doesn’t end there because there are two major categories of Bayesian statistics – naïve (assuming data points function independent from each other) and network (assuming data points influence each other).  If and when a data scientist is familiar with Bayes, 50% of the time they use the wrong application of the formula. The bottom line being that most of the types of correlations and basic statistics that people initially apply to Big Data give false or misleading information.

In our analogy, not only is our speed of creating Big Data increasing at an increasing rate, meaning we have to move millions of cubic yards of soil one year, billions the second year, and tens of billions the third year, the soil we’re “processing” is riddled with fake diamonds.  While often mistakenly attributed to physicist Stephen Hawking or US Librarian of Congress, Daniel Boorstin, it was actually historian Henry Thomas Buckle, in the second volume of his 1861 series “History of the Civilization of England” who first observed that: “the greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.” (Buckle, 1861)

We don’t necessarily need bigger data, although we’re certainly going to get it.  We need more meaningful data.  Therefore, the exponential amassing of data that is underway at an unprecedented rate in the history of humankind, and is about to accelerate even more, is creating more noise to sift through to find meaningful knowledge then before the Big Data era.  It is becoming harder to identify what is important.  The evolution of humankind via the discovery of knowledge, therefore, will be accelerated not by the gluttonous creation of ever-bigger data but by focusing on the most meaningful data and creating and sequestering that.

Works Cited

Buckle, H. T. (1861). An Examination of the Scotch Intellect During the 18th Century. In H. T. Buckle, History of the Civilization of England (p. 408). New York: D. Appleton & Co.
Dragland, A. (2013, May 22). Big Data - For Better or Worse. Retrieved from SINTEF: www.sintef.no/en/latest-news/
Ferres, R. (2015, July 14). The Growth Curve of Data. Australia: Quora.
Gartner. (2015, November 10). Gartner Says 6.4 Billion Connected "Things" Will Be in Use in 2016, Up 30 Percent From 2015. Retrieved from Gartner: http://www.gartner.com/newsroom/id/3165317
IDC. (2014, April). The Digital Universe of Opportunities: Rich Data & The Increasing Value of the Internet-of-Things. Retrieved from IDC - EMC: https://www.emc.com/leadership/digital-universe/2014iview/internet-of-things.htm
Wall, M. (2014, March 4). Big Data: Are You Ready for Blast-Off? BBC News, pp. www.bbc.com/news/business-26383058.




Thursday, January 12, 2017

DHHS Clinical Guidelines Are Harming Patients 5-24% of the Time

This article summarizes two randomly-selected clinical guidelines and juxtaposes each in a tabular format to the 23-point rubric published by the Appraisal of Guidelines for Research & Evaluation (AGREE) organization. Both guidelines are searchable in ontology-type categories relative to the audience they pertain to (e.g., age ranges, bodily systems, etc.). There is no sorting-type mechanism that would allow providers to find the relevant portion of the guidelines sought after out of the dozens of pages of each “guideline.”
In both cases, the National Guideline Clearinghouse maintained by the US Department of Health and Human Services was found to be vastly inferior to using the commercial service UpToDate, largely because of the frequency of updating. A cursory review of the guidelines in the US Government clearinghouse indicated many guidelines were more than several years old. In a 2014 study published in the journal CMAJ evaluating the survival validity (e.g., how long guidelines were valid without updating) found that 5% of guidelines were invalid in one year, 14% were invalid after two years, 19% were invalid after three years, and 22% were invalid after four years. (Laura Martínez García, 2014) In other words, a significant portion, if not a majority, of the clinical guidelines being propagated by the US Department of Health & Human Services are now invalid and may actually harm, not help, patients. Conversely, UpToDate has hundreds of thousands of researchers and physicians peer-reviewing entries frequently to ensure it is “up to date.”
The clinical guideline for the treatment of chronic hepatitis B (NGC: 010903) is a 45-page standard written by the American Association for the Study of Liver Diseases in January 2016. This clinical standard follows a format that is similar upon first impression, but different in quality of content, than the example for adult sinusitis. It begins with a general summary of treatment (e.g., specific anti-viral therapy medication combinations). It continues by differentiating recommended recovery treatments for adults if they are certain co-morbidities (e.g., viremia, pregnancy, etc.) or are non-responsive to first-line medications. It follows a format with a section for recommended algorithms (none), risk assessing, methodologies in its design (e.g., similar to study design methodologies), and a 15 outcome of treatment considerations. (American Association for the Study of Liver Diseases, 2017)
The clinical guideline for the treatment of adult sinusitis (NGC: 010703) is a 70-page standard written by the American Academy of Otolaryngology and Head and Neck Surgery Foundation in September 2007, and revised in April 2015. The standard is broken down into four sections to be performed in sequence: (1) differential diagnoses (e.g., acute bacterial rhinosinusitis (ABRS)); (2) symptomatic relief goals; (3) medication choice (e.g., amoxicillin); and, (4) recovery therapies (contingencies if primary treatment recommendations fail). While the sinusitis standard has the same categories of description as the standard for Hepatitis B, the answers are often perfunctory and lack much development or details. (American Academy of Otolaryngology - Head and Neck Surgery Foundation, 2017)

Works Cited

American Academy of Otolaryngology - Head and Neck Surgery Foundation. (2017, January 8). Clinical practice guideline (update): adult sinusitis. Retrieved from US DHHS: AHRQ National Guideline Clearninghouse: https://www.guideline.gov/summaries/summary/49207/clinical-practice-guideline-update-adult-sinusitis
American Association for the Study of Liver Diseases. (2017, January 8). AASLD guidelines for treatment of chronic hepatitis B. Retrieved from US DHHS: AHRQ - National Guideline Clearinghouse: https://www.guideline.gov/search?f_Clinical_Specialty=Infectious+Diseases&fLockTerm=Infectious+Diseases&f_Meets_Revised_Inclusion_Criteria=yes&page=1
Laura Martínez García, A. J. (2014). The validity of recommendations from clinical guidelines: a survival analysis. CMAJ, 1211–1219.
National Quality Forum. (2017, January 8). Abdominal Aortic Aneurysm (AAA) Repair Mortality Rate (IQI 11). Retrieved from National Quality Forum: http://www.qualityforum.org/QPS/QPSTool.aspx#qpsPageState=%7B%22TabType%22%3A1,%22TabContentType%22%3A2,%22SearchCriteriaForStandard%22%3A%7B%22TaxonomyIDs%22%3A%5B%2216%3A389%22%5D,%22SelectedTypeAheadFilterOption%22%3Anull,%22Keyword%22%3A%22%22,%22Page
NQF. (2017, January 8). Accidental Puncture or Laceration Rate (PDI #1). Retrieved from National Qualify Forum: http://www.qualityforum.org/QPS/QPSTool.aspx#qpsPageState=%7B%22TabType%22%3A1,%22TabContentType%22%3A2,%22SearchCriteriaForStandard%22%3A%7B%22TaxonomyIDs%22%3A%5B%2216%3A389%22%5D,%22SelectedTypeAheadFilterOption%22%3Anull,%22Keyword%22%3A%22%22,%22Page






Monday, January 9, 2017

The Unintended Consequences of Pay-for-Performance Healthcare

While pay-for-performance (P4P) is logical and all the rage, there is a contrarian view upon a deeper, more critical analysis. From that analysis, three concerns come to mind: (1) can incentives cause a reduction in intrinsic motivations; (2) how often, when, and why do incentives lead to abuse and corruption; and, (3) why do we (and the government) assume that providers who chose an altruistic career with a relatively low return on investment for the amount of training necessary in time and money would be susceptible to financial incentives anyway?
In a 2013 study published in the journal Health Psychology, researchers examined the ability for financial “incentives to undermine or ‘crowd out’ intrinsic motivation.” Ironically, it found that financial incentives for improved healthcare behavior did not interfere or “crowd out” intrinsic motivation; however, only because the intrinsic motivation of patients who were being financially incentivized was so low to start with. However, this result suggests the converse may be true. Namely, that those with a high intrinsic motivation (e.g., providers) may have this intrinsic altruistic motivation “crowded out” by financial incentives. (Marianne Promberger, 2013)
Moreover, any financial incentive to coerce providers to behave in a certain way can cross a free-will Rubicon wherein once they agree to modify their treatment in exchange for money, the providers are being controlled by the incentive instead of their best natural medical judgment. The New York Times did an examination of this very issue in 2014, wherein they coined the term “moral licensing,” which they described as when “the physician is able to rationalize forcing or withholding treatment, regardless of clinical judgment or patient preference, as acceptable for the good of the population.” (Pamela Hartzband, 2014)
Finally, as the author has noted in other writing for Northwestern University, arguably the most comprehensive meta-analysis of pay-for-performance in healthcare, conducted by the Rand Corporation in 2016 examining 49 studies published in peer-reviewed journals found that pay-for-performance had minimal impact to improve the quality of care. (Cheryl Damberg, 2016) While that may be because institutions choose the wrong metrics to measure, or definitions, or baselines for comparisons, it may also just be that providers took their Hippocratic Oath seriously and, largely, try to stay focused on acting in each patient’s best interests without outside interference, coercion, or influence.
One viable and promising solution in the form of a new standard of care is decision-support systems. While their existence has been around for decades, improved processing power, artificial intelligence, cloud computing, the Internet-of-Things, and patient-generated mhealth data create a confluence wherein purely objective standards of care can be recommended for every patient. Beyond decision support systems, but using their systems and methods, is personalized genomic medicine. One strategic goal of our work at Bioinformatix’ Rx&You is to collect Total Satisfaction Quality (TSQ) data from large cohorts of patients in regard to the efficacy of their medication regimens. This information, when combined with their adverse event history, cost data, and patients’ genomic variants we believe can create a “Codex” via comparative effectiveness research (e.g., which medications work best for whom and when, which are the most dangerous, which are the best value).

Works Cited

Cheryl Damberg, M. S. (2016). Measuring Success in Health Care Value-Based Purchasing Programs. Santa Monica, CA: RAND Corporation.
Marianne Promberger, T. M. (2013). When Do Financial Incentives Reduce Intrinsic Motivation? Comparing Behaviors Studied in Psychological and Economic Literatures. Health Psychol, 950–957.
Pamela Hartzband, J. G. (2014, November 18). How Medical Care is Being Corrupted. The New York Times, pp. https://www.nytimes.com/2014/11/19/opinion/how-medical-care-is-being-corrupted.html?_r=0.