Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data

Conference paper

Aaron Schein, Anjali Nagulpally, Hanna Wallach, Patrick Flaherty
Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021

View PDF

Cite

APA Click to copy
Schein, A., Nagulpally, A., Wallach, H., & Flaherty, P. (2021). Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).

Chicago/Turabian Click to copy
Schein, Aaron, Anjali Nagulpally, Hanna Wallach, and Patrick Flaherty. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.

MLA Click to copy
Schein, Aaron, et al. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.

BibTeX Click to copy

@inproceedings{aaron2021a,
  title = {Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data},
  year = {2021},
  author = {Schein, Aaron and Nagulpally, Anjali and Wallach, Hanna and Flaherty, Patrick},
  booktitle = {Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI)}
}

Other materials: [Code]

Abstract: We present a new non-negative matrix factorization model for (0, 1) bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of (0, 1) bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.