Skip to main content


Biostatistics & Bioinformatics offers a vibrant seminar series featuring researchers and statisticians from UC San Diego and other academic institutions. Seminars are typically held from 1 - 2 p.m. PST on the third Wednesday of each month in MTF 168. Check out the upcoming and past presentations below. 

Upcoming Seminars


Thomas A. Louis, PhD
1 - 2 p.m. PST



Bin Nan, PhD
1 - 2 p.m. PST

Recent Seminars


"Investigation of sexual contact networks by integrating multiple data sources." 

Ravi Goyal, PhD

ABSTRACT: To effectively mitigate the spread of communicable diseases, it is necessary to understand the interactions that enable disease transmissions among individuals in a population; we refer to the set of these interactions as the community contact network. The structure of the network can have profound effects on both the spread of infectious disease and the effectiveness of control programs. Therefore, understanding the contact network permits more efficient use of resources. Measuring the structure of the network, however, is a challenging problem. We present a Bayesian approach to integrate multiple data sources associated with transmission of infectious disease to more precisely and accurately estimate important properties of the contact network. In this manuscript, we show that integration of data associated with infectious diseases that are routinely collected can lead to large increases in precision and accuracy of our contact network estimates.



"Biostatistical methods for wearable devices with applications to NHANES and UK Biobank"

Ciprian Crainiceanu, PhD

ABSTRACT: Wearable devices, such as accelerometers and heart monitors, are used in health research because they provide objective, continuous, unbiased, and detailed information about human activity either in the laboratory or the free-living environment. In this talk I will explore the different resolutions of the data, ways to summarize it, and inferential methods for exploring the associations with health outcomes. We will illustrate these methods using large, publicly available datasets, including the NHANES and UK Biobank. 


"Robust functional principal components analysis with application to accelerometry data"

Chongzhi Di, PhD

ABSTRACT: Accelerometers are widely used to objectively measure physical activity in biomedical studies. They collect high resolution functional data, which are often highly skewed and have outliers. Standard functional principal component analysis (FPCA) are based on empirical covariance operators and might not work well in these settings. To address these challenges, we propose a new robust approach for FPCA, based on a functional pairwise spatial sign operator (PASS). Theoretical properties of the proposed method are established. In particular, it is shown that the PASS has the same set of eigenfunctions as the standard covariance operator and that their corresponding eigenvalues are in the same order.  Through extensive simulation studies, the proposed robust FPCA is shown to perform well under various types of functional data. We applied the method to an ancillary study of the Women’s Health Initiative that recorded 7-day accelerometry data on 6500 women. 


"Microbiome Data Science - Phylogenetic Tree, Bacterial Growth and Biosynthetic Gene Clusters"

Hongzhe Li, PhD

ABSTRACT: The gut microbiome plays an important role in  maintenance of human health. High-throughput shotgun metagenomic sequencing of a large set of samples provides an important tool to interrogate the gut microbiome.   Besides providing footprints of taxonomic community composition and genes, these data can be further explored to study the bacterial growth dynamics and metabolic potentials via generation of small molecules and secondary metabolites. Everything from microbiome diagnosis to microbiome-based therapy  will rely on vast amounts of data analysis. In this talk, I will present several computational and statistical method for analysis of data measured on phylogenetic tree and methods for estimating the bacterial growth rate for metagenome-assembled genomes (MAGs) and for predicting all biosynthetic gene clusters (BGCs) in bacterial genomes. The key statistical and computational tools used include optimal permutation recovery based on low-rank matrix projection and improved LSTM deep learning methods to improve prediction of BGCs. I will demonstrate the application of these methods using several ongoing microbiome studies of inflammatory bowel disease at University of Pennsylvania.


"Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach"

Mi-Ok Kim, PhD


With the increasing availability of data in the public domain, there has been a growing interest in exploiting information from external sources to improve the analysis of smaller scale studies. An emerging challenge in the era of big data is that the subject-level data are high dimensional, but the external information is an aggregate level and of a lower dimension. Moreover, heterogeneity and uncertainty in the auxiliary information are often not accounted for in information synthesis. In this paper, we propose a unified framework to summarize various forms of aggregated information via estimating equations and develop a penalized empirical likelihood approach to incorporate such information in logistic regression. When the homogeneity assumption is violated, we extend the method to account for population heterogeneity among different sources of information. When the uncertainty in the external information is not negligible, we propose a variance estimator adjusting for the uncertainty. The proposed estimators are asymptotically more efficient than the conventional penalized maximum likelihood estimator and enjoy the oracle property even with a diverging number of predictors. Simulation studies show that the proposed approaches yield higher accuracy in variable selection compared with competitors. We illustrate the proposed methodologies with a pediatric kidney transplant study.


Past Seminars