Skip to main content

Seminars

Biostatistics & Bioinformatics offers a vibrant seminar series featuring researchers and statisticians from UC San Diego and other academic institutions. Seminars are typically held from 1 - 2 p.m. PST on the first Wednesday of each month in MTF 168. Check out the upcoming and past presentations below. 

Upcoming Seminars

01/11/2023
Chito Hernandez, PhD
1 - 2 pm PST

02/01/2023
Annie Qu, PhD
1 - 2 pm PST

04/05/2023
Joseph Hogan, ScD
1 - 2 pm PST

06/07/2023
Damla Senturk, PhD
1 - 2 pm PST

Recent Seminars

11/02/2022
Bin Nan, PhD
1 - 2 pm PST

 

10/05/2022
"Accurate assignment of disease liability to genetic variants using only population data: Cystic Fibrosis Application"

Thomas A. Louis, PhD

ABSTRACT: The growing size of public variant repositories motivates testing the accuracy of pathogenicity prediction of DNA variants using population data alone. Under the à priori assumption that the ratios of the prevalence of variants in healthy populations vs that in affected populations form two distinct distributions (pathogenic and benign), we used a Bayesian model with a mixture prior to compute the probability that a variant belongs to either distribution. I will outline the statistical approach, provide a subset of findings, and discuss alternative analytic and decision threshold approaches.

 

10/06/2022
"The Generalized Linear Stochastic Block Model for Multi-Subject Networks"

Thomas E. Nichols, PhD

ABSTRACT: Community estimation methods play an important role in the study of functional and structural brain networks. However, most of the existing work is based on per-subject or group-averaged networks. Aggregating per-subject results is challenging as each subject will have distinct community membership, and simple methods cannot account for systematic between-subject effects like age or gender.  I will review work from my group on embedding a generalized linear regression into a stochastic block model, where block membership is unknown and the influence of covariates is accounted for at the block level. Starting from a basic model, essentially logistic regression for edge occurrence, I will describe an extension to allow random subject effects, properly accounting for intrasubject dependence.  I will close with a side discussion on the essential but neglected issue of temporal autocorrelation in resting-state fMRI networks, on how naive computation of z-scores has a dramatic impact on quantities like local efficiency and betweenness.

 

09/06/2022
Special Seminar: The Role of Network Science in Research on Infectious Disease Control
Victor DeGruttola, PhD

 

05/04/2022
"Investigation of sexual contact networks by integrating multiple data sources." 

Ravi Goyal, PhD

ABSTRACT: To effectively mitigate the spread of communicable diseases, it is necessary to understand the interactions that enable disease transmissions among individuals in a population; we refer to the set of these interactions as the community contact network. The structure of the network can have profound effects on both the spread of infectious disease and the effectiveness of control programs. Therefore, understanding the contact network permits more efficient use of resources. Measuring the structure of the network, however, is a challenging problem. We present a Bayesian approach to integrate multiple data sources associated with transmission of infectious disease to more precisely and accurately estimate important properties of the contact network. In this manuscript, we show that integration of data associated with infectious diseases that are routinely collected can lead to large increases in precision and accuracy of our contact network estimates.

 

04/06/2022
"Biostatistical methods for wearable devices with applications to NHANES and UK Biobank"

Ciprian Crainiceanu, PhD

ABSTRACT: Wearable devices, such as accelerometers and heart monitors, are used in health research because they provide objective, continuous, unbiased, and detailed information about human activity either in the laboratory or the free-living environment. In this talk I will explore the different resolutions of the data, ways to summarize it, and inferential methods for exploring the associations with health outcomes. We will illustrate these methods using large, publicly available datasets, including the NHANES and UK Biobank. 


03/16/2022

"Robust functional principal components analysis with application to accelerometry data"

Chongzhi Di, PhD

ABSTRACT: Accelerometers are widely used to objectively measure physical activity in biomedical studies. They collect high resolution functional data, which are often highly skewed and have outliers. Standard functional principal component analysis (FPCA) are based on empirical covariance operators and might not work well in these settings. To address these challenges, we propose a new robust approach for FPCA, based on a functional pairwise spatial sign operator (PASS). Theoretical properties of the proposed method are established. In particular, it is shown that the PASS has the same set of eigenfunctions as the standard covariance operator and that their corresponding eigenvalues are in the same order.  Through extensive simulation studies, the proposed robust FPCA is shown to perform well under various types of functional data. We applied the method to an ancillary study of the Women’s Health Initiative that recorded 7-day accelerometry data on 6500 women. 

02/25/2022
"Microbiome Data Science - Phylogenetic Tree, Bacterial Growth and Biosynthetic Gene Clusters"

Hongzhe Li, PhD

ABSTRACT: The gut microbiome plays an important role in  maintenance of human health. High-throughput shotgun metagenomic sequencing of a large set of samples provides an important tool to interrogate the gut microbiome.   Besides providing footprints of taxonomic community composition and genes, these data can be further explored to study the bacterial growth dynamics and metabolic potentials via generation of small molecules and secondary metabolites. Everything from microbiome diagnosis to microbiome-based therapy  will rely on vast amounts of data analysis. In this talk, I will present several computational and statistical method for analysis of data measured on phylogenetic tree and methods for estimating the bacterial growth rate for metagenome-assembled genomes (MAGs) and for predicting all biosynthetic gene clusters (BGCs) in bacterial genomes. The key statistical and computational tools used include optimal permutation recovery based on low-rank matrix projection and improved LSTM deep learning methods to improve prediction of BGCs. I will demonstrate the application of these methods using several ongoing microbiome studies of inflammatory bowel disease at University of Pennsylvania.


02/02/2022

"Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach"

Mi-Ok Kim, PhD

ABSTRACT:

With the increasing availability of data in the public domain, there has been a growing interest in exploiting information from external sources to improve the analysis of smaller scale studies. An emerging challenge in the era of big data is that the subject-level data are high dimensional, but the external information is an aggregate level and of a lower dimension. Moreover, heterogeneity and uncertainty in the auxiliary information are often not accounted for in information synthesis. In this paper, we propose a unified framework to summarize various forms of aggregated information via estimating equations and develop a penalized empirical likelihood approach to incorporate such information in logistic regression. When the homogeneity assumption is violated, we extend the method to account for population heterogeneity among different sources of information. When the uncertainty in the external information is not negligible, we propose a variance estimator adjusting for the uncertainty. The proposed estimators are asymptotically more efficient than the conventional penalized maximum likelihood estimator and enjoy the oracle property even with a diverging number of predictors. Simulation studies show that the proposed approaches yield higher accuracy in variable selection compared with competitors. We illustrate the proposed methodologies with a pediatric kidney transplant study.

 

Past Seminars