Aaron Schein


Assistant Professor of Stats & Data Science at UChicago


Curriculum vitae


schein@uchicago.edu


Data Science Institute


University of Chicago


Chicago, IL



Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data


Conference paper


Aaron Schein, Anjali Nagulpally, Hanna Wallach, Patrick Flaherty
Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021

View PDF
Cite

Cite

APA
Schein, A., Nagulpally, A., Wallach, H., & Flaherty, P. (2021). Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).

Chicago/Turabian
Schein, Aaron, Anjali Nagulpally, Hanna Wallach, and Patrick Flaherty. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.

MLA
Schein, Aaron, et al. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.


Other materials: [Code] 
Abstract: We present a new non-negative matrix factorization model for (0, 1) bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of (0, 1) bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.

Share