By Haroon Chaudhry MD
President and CEO, Xenon Health
The expansion of electronic health care records across hospitals and hospital systems, other healthcare institutions and physician offices has created a substantially large set of data that can be used to streamline healthcare delivery and improve patient outcomes. Machine learning involves computer pattern recognition of data sets and utilization of previous computations to produce dependable decisions. In other words, the computer learns without being programmed to perform specific tasks. Machine learning can be leveraged in a variety of settings in healthcare to assist or improve decision making.
Natural language processing, or NLP, is the manipulation of human language, including text, by software. Deep learning applications attempt to simulate neuronal activity of the human brain and recognize digital representations of images, sound and other information. Deep learning is being utilized to mine semantic interactions of radiologic images and reports from PACS, or picture archiving and communication systems. One study utilized natural language processing to mine 200,000 images and match them with their descriptions in automated fashion. Semantic diagnostic knowledge can be derived by mapping patterns between radiologists’ text reports and their related images. The objective would be to extract and associate radiologic images with clinically semantic labels via interleaved text/image data mining and deep learning on PACS databases. Due to the enormous collection of radiology images and reports stored in hospital PACS, this type of initiative has significant implications for future diagnostic protocols, potentially setting the stage for an automated system of quality control in diagnostic imaging interpretation.
In healthcare, computational risk stratification involves many technical challenges. However, the goal of using automated mining to improve clinical outcomes persists. Nosocomial infections are infections acquired during hospitalization. Statistically, one in twenty-five patients in US acute care facilities get such infections. Some studies have attempted to derive data-driven models that can assist in formulating effective prevention strategies. Such models are based on thousands of time-varying and time-invariant variables. There are complex temporal dependencies among the many variables leading to significant challenges in constructing effective models. However, advances in computational power and the significantly expanding volume of health record data is allowing researchers to construct more accurate predictive models.
EHR phenotyping uses patient data to identify features that may represent computational clinical phenotypes. Such phenotypes can be used to identify high risk patients and improve the prediction of morbidity and mortality. These computable phenotypes are similar in nature to decision trees. Sparse tensor factorization of multimodal patient data, transformed into count data, generates concise sets of sparse factors that are recognizable by medical professionals. Patients can be treated as weighted composites of such factors.
Deep topic modeling involves modeling data characterized by vectors of word counts. Examples of deep topic models include HDP, or hierarchical DP and nested HDP (nHDP). In DP based models, the general mechanism is parameterized by distributions over topics with each topic characterized by distribution over words. In non DP-based models, modules are parameterized by a deep hierarchy of binary units. The DFPA model characterizes documents based on distributions over topics and words while utilizing a deep architecture based on binary units.
Personalized Medicine, also known as precision medicine is an integrative analysis of heterogeneous data from a patient’s history to enhance care. The objective is to utilize clinical markers that are collected when a patient first presents and then longitudinally during follow-up encounters. The goal for ML in precision medicine is to use latent factors that influence disease expression as opposed to standard regression models that utilize observed characteristics alone to explain variability. Using such a prognostic tool that integrates various kinds of longitudinal data can assist in identifying patients at greatest risk of disease progression and to take an appropriate course of action. Challenges in both the traditional standard regression model and the latent factor model include the fact that data collected during routine clinical visits can be inconsistent. In addition, predicting disease patterns can be difficult regardless of the model due to unknown patient factors such as genetic mutations.
Batch reinforcement learning methods are being used to aid in medical decision making. Batch RL is a reinforced learning setting where a set of transitions sampled from the system is fixed. The goal of the learning system is then to derive an optimal policy solution out of this sample batch.
In medicine, human decision making is based on a variety of competing objectives that include not just effectiveness of the treatment, but also factors such as potential side effects and cost. As ML algorithmic studies continue, the objective is to optimize the various factors in the treatment decision making process to enhance patient outcomes and deliver value to the healthcare system.