Aaron Schein

Assistant Professor of Stats & Data Science at UChicago


Curriculum vitae


schein@uchicago.edu


Data Science Institute

University of Chicago

Chicago, IL



Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data


Conference paper


Aaron Schein, Anjali Nagulpally, Hanna Wallach, Patrick Flaherty
Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021

View PDF
Cite

Cite

APA   Click to copy
Schein, A., Nagulpally, A., Wallach, H., & Flaherty, P. (2021). Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).


Chicago/Turabian   Click to copy
Schein, Aaron, Anjali Nagulpally, Hanna Wallach, and Patrick Flaherty. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.


MLA   Click to copy
Schein, Aaron, et al. “Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data.” Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2021.


BibTeX   Click to copy

@inproceedings{aaron2021a,
  title = {Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data},
  year = {2021},
  author = {Schein, Aaron and Nagulpally, Anjali and Wallach, Hanna and Flaherty, Patrick},
  booktitle = {Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI)}
}

Other materials: [Code] 
Abstract: We present a new non-negative matrix factorization model for (0, 1) bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of (0, 1) bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.

Share



Follow this website


You need to create an Owlstown account to follow this website.


Sign up

Already an Owlstown member?

Log in