Kylie Bemis

Out-of-memory computing strategies for R and Bioconductor

Regular talk, 9:40 - 10:40 PM

Computing on larger-than-memory datasets is a common challenge in analytical workflows in R, especially for bioinformatics applications. While R provides no built-in tools for out-of-memory computing, a rich ecosystem of packages has grown to address these issues on both CRAN and Bioconductor. Despite the existence of such packages, computing on larger-than-memory datasets still poses challenges that can be difficult to diagnose. Many larger-than-memory problems lend themselves well to parallelization, but parallelization introduces a new set of memory management challenges. In this talk, I will present some of the common pitfalls of parallel computing on larger-than-memory data in R, and possible strategies to solve them. Examples will focus on bioinformatics and the Bioconductor ecosystem, but the strategies and lessons are applicable to any R or Python program analyzing larger-than-memory datasets.



Tim Anderson
Pronouns: she/her
Boston, MA, USA
Kylie Bemis is an Assistant Teaching Professor in the Khoury College of Computer Sciences at Northeastern University. She holds a B.S. degree in Statistics and Mathematics, a M.S. degree in Applied Statistics, and a Ph.D. in Statistics from Purdue University. In 2013, she interned at the Canary Center at Stanford for Cancer Early Detection, where she developed the Cardinal software package for statistical analysis of mass spectrometry imaging experiments. In 2015, she was awarded the John M. Chambers Statistical Software Award by the American Statistical Association for her work on Cardinal. In 2016, she joined the Olga Vitek lab for Statistical Methods for Studies of Biomolecular Systems at Northeastern University as a postdoctoral fellow. In 2019, she joined Northeastern as faculty, where she now teaches data science and develops curriculum for the MS in Data Science program. Her research interests include machine learning and statistical computing for bioinformatics.