Conference paper
Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021
Assistant Professor of Stats & Data Science at UChicago
APA
Click to copy
Schein, A., Nagulpally, A., Wallach, H., & Flaherty, P. (2021). Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).
Chicago/Turabian
Click to copy
Schein, Aaron, Anjali Nagulpally, Hanna Wallach, and Patrick Flaherty. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.
MLA
Click to copy
Schein, Aaron, et al. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.
BibTeX Click to copy
@inproceedings{aaron2021a,
title = {Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data},
year = {2021},
author = {Schein, Aaron and Nagulpally, Anjali and Wallach, Hanna and Flaherty, Patrick},
booktitle = {Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI)}
}
Abstract: We present a new non-negative matrix factorization model for (0, 1) bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of (0, 1) bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.