Predicting Hospital Readmission using Machine Learning

Electronic medical records contain text composed by hospital employees; this text often describes medical and socio-economic information that appears nowhere else in the electronic medical record. This data has historically been ignored by data analysts, as unstructured text is uniquely challenging to analyze: phrasing differs across authors and misspellings and punctuation errors are frequent.
The advent of GPU computing and new research in machine learning has given us new tools to improve healthcare through analysis of this text. We use these tools to predict and prevent unplanned hospital readmissions; when a patient is readmitted to the hospital, they suffer the emotional and physical stress of a prolonged health problem, and the hospital takes a financial penalty imposed by the Centers for Medicare & Medicaid Services.
By using a combination of Word2Vec (developed by Google) and a convolutional neural network on our data, we were able to develop a model that predicts 30-day readmission more precisely than other well-established models.

Lab Reference Studies

Our original investigations during the invention of the Rothman Index were based on the associations between various in-hospital patients’ clinical measurements and the risks of mortality one year after discharge.  We discovered that there was an increased risk of mortality associated with values of certain laboratory tests that were considered “normal” by the usual reference levels.  This led us to further testing and proposal of a new methodology for determining reference levels based on clinical risks rather than population norms as has been the case.  Abstracts include “A New Theory for Reference Intervals and Analyte Test Reporting based on Clinical Risks derived from Readily- Available EMR Data” which was recognized as a “Best Abstract in the Informatics Division of the American Association of Clinical Chemistry” and “High” Normal” Potassium Poses Mortality Risk for All Patients”. Four papers are currently in process.

Proactive Patient Acuity

Proactive Patient ACuity sTewardship (PACT) trial

Primary purpose is to evaluate the (a) difference in clinical outcomes, e.g., 30 day ReAdmissions (30 dReAm) and supportive care consult rates before and after implementation of respective prompts in the Electronic Medical Record (EMR) utilizing Rothman Index (RI) monitoring thresholds for patients undergoing SMH hospitalist care and (b) 30 dReAdm rate in patients who continue to be monitored with RI after discharge to SMH-Nursing Rehabilitation Center. A historical cohort will be retrospectively matched with the prospective sample in PACT trial. Secondary purpose of the study is to demonstrate that findings derived from RI monitoring protocols also can contribute to building a “learning health system” that can leverage EMR data science methods to characterize clinical patterns before positive and adverse clinical events and to identify trends to improve patient care and safety across the acute/post- acute care continuum.

A secondary purpose is to conduct a sub-study to test the agreement of a registered nurse with study participants’ inventory of their body systems that is obtained through a series of questions to identify signs and/or symptoms which they are experiencing. A similar body system inventory is used to compute the RI. Independent living residents and residents undergoing rehabilitation care at Plymouth Harbor will be asked to participate. The goal is to transform the RI into a self-assessment tool that engages 
patients in tracking and reporting their condition. Incentivizing meaningful patient engagement may result in a bond between provider and patient to optimize care and reduce unnecessary admissions to the hospital.

Develop and Validate a Model to Identify Alzheimer’s Disease and other Dementias using Electronic Medical Record Data: A Feasibility Study

The Alzheimer’s Disease (AD) study is designed to develop and validate an index to identify undiagnosed patients and those at risk for developing AD and other dementias using EMR data collected at Sarasota Memorial Hospital since 1999. Secondary objectives are to gain a deeper understanding of the clinical correlates and disease progression of AD and to identify medicines not prescribed for AD that may provide some protective or disease modifying impact. We anticipate that application of our data science approach in hospitalized patients will reveal unknown clinical and pathophysiological correlates that provide insight into factors potentially involved in susceptibility or resilience to AD and other dementias. Gaining a better understanding of disease correlates may suggest strategies for primary and secondary prevention.


Specific aims are:

  1. Case finding, characterization and model building
    1. Identify patients with a current diagnosis of AD
    2. Characterize static and longitudinal patterns of pathophysiology and co-morbidities
    3. Isolate modifiable risks that are candidates for secondary prevention therapies
    4. Quantitative (e.g., demographics, vital signs, laboratory test results) and qualitative
      (e.g., clinical notes, physician orders) features will be employed to build a case-
      definition model
  2. Identify cognitive / functional resilience patterns in similar patients without
    diagnosis of AD
  3. Identify potential medicines for “repositioning” to blunt AD onset and/or

Data-driven Clinical Phenotyping of Hospitalized Patients with Multiple Chronic Conditions: A Natural EXperiment using Electronic Medical Records (MCC NatX Study)

Clinicians are encouraged to follow evidence-based guidelines in managing their patients’ conditions, and frequently they must rely on guidelines that have been designed for a single chronic condition. The presence of Multiple Chronic Conditions (MCC) creates many challenges for clinicians, including the need to decide what evidence to use in making clinical decisions and the need to consider patients’ context and personal preferences in relation to clinical decision-making. This study will examine the records of patients with multiple chronic conditions (MCC) to test the feasibility of characterizing their clinical pathophysiological presentation using cross-sectional and longitudinal data to encourage a change from an approach focused on single chronic diseases to an integrated approach that systematically generates practice-based evidence to inform quality improvement, clinical research, “institutional learning” about the population served, and on-going clinical practice guideline development and updates.
Capturing more granular clinical pathophysiological presentation will lead to better understanding of MCC groups and their clinical impact. Analyses of outcomes within and between MCC groups may reveal sub-groups at risk for different comorbidities leading ultimately to development of models to improve patient-centric diagnosis, prognosis, and prediction of treatment response. This study also provides an opportunity to examine the validity of the Rothman Index (RI) across a spectrum of patient characteristics and disease burden contexts. The RI was developed independently of demographics, medical history, diagnosis, and treatment regimens.

Specific aims are:
1.     Identify and characterize cross-sectional clinical pathophysiological presentation of MCC groups and examine associations with attendant outcomes for patients at least 18 years old.
2.     Examine longitudinal stability / instability of clinical pathophysiological presentation and outcome patterns in each patient undergoing multiple admissions in one of the five most prevalent MCC groups. Cross-sectional characterization (SA1) of the most recent admission will be the baseline.
3.     Test assumption that the excess 1-year mortality risk computed for each of the 26 variables included in the Rothman Index is invariant across age, sex, race and chronological periods of hospital care, and subgroups of patients experiencing various illnesses including MCC