The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.

Academic Article


  • Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (
  • Keywords

  • cancer, double-exponential distribution, generalized linear model, outcome prediction, spike-and-slab lasso, Algorithms, Bayes Theorem, Computer Simulation, Gene Expression Profiling, Linear Models, Models, Genetic, Predictive Value of Tests, Quantitative Trait Loci
  • Digital Object Identifier (doi)

    Author List

  • Tang Z; Shen Y; Zhang X; Yi N
  • Start Page

  • 77
  • End Page

  • 88
  • Volume

  • 205
  • Issue

  • 1