Unsupervised Probabilistic Models for Sequential Electronic Health Records

Alan D. Kaplan; J Greene; Jean Louis Vincent; Priyadip Ray

doi:10.48550/arxiv.2204.07292

Abstract

1 min read

We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Related publications

Preprint2024

Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models

Alan D. Kaplan, Priyadip Ray, J Greene, Jean Louis Vincent

Preprint2019

Nonstationary Multivariate Gaussian Processes for Electronic Health Records

Rui Meng, Braden Soper, Herbert Lee, Jean Louis Vincent, J Greene, Priyadip Ray

Article2016

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review

Benjamin A. Goldstein, Ann Marie Návar, Michael Pencina, John P A Ioannidis

Article2020

Reporting of demographic data and representativeness in machine learning models using electronic health records

Selen Bozkurt, Eli M. Cahan, Martin Seneviratne, Ran Sun, Juan Antonio Lossio-Ventura, John P A Ioannidis, Tina Hernandez‐Boussard

Article2015

Desiderata for computable representations of electronic health records-driven phenotype algorithms

Huan Mo, William K. Thompson, Luke V. Rasmussen, Jennifer A. Pacheco, Guoqian Jiang, Richard C. Kiefer, Qian Zhu, Jie Xu, Enid Montague, David Carrell, Todd Lingren, Frank Mentch, Yizhao Ni, Firas Wehbe, Peggy Peissig, Gerard Tromp, Eric B. Larson, Christopher G. Chute, Jyotishman Pathak, Joshua C. Denny, Peter Speltz, Abel Kho, Gail P. Jarvik, Adrian Bejan,