Splice variants have been shown to play an important role in tumor initiation and progression and can serve as novel cancer biomarkers. However, the clinical importance of individual splice variants and the mechanisms by which they can perturb cellular functions are still poorly understood. To address these issues, we developed an efficient and robust computational method to: (1) identify splice variants that are associated with patient survival in a statistically significant manner; and (2) predict rewired protein-protein interactions that may result from altered patterns of expression of such variants. We applied our method to the lung adenocarcinoma dataset from TCGA and identified splice variants that are significantly associated with patient survival and can alter protein-protein interactions. Among these variants, several are implicated in DNA repair through homologous recombination. To computationally validate our findings, we characterized the mutational signatures in patients, grouped by low and high expression of a splice variant associated with patient survival and involved in DNA repair. The results of the mutational signature analysis are in agreement with the molecular mechanism suggested by our method. To the best of our knowledge, this is the first attempt to build a computational approach to systematically identify splice variants associated with patient survival that can also generate experimentally testable, mechanistic hypotheses. Code for identifying survival-significant splice variants using the Null Empirically Estimated P-value method can be found at https://github.com/thecodingdoc/neep. Code for construction of Multi-Granularity Graphs to discover potential rewired protein interactions can be found at https://github.com/scwest/SINBAD.
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Modeling and Simulation
- Molecular Biology
- Cellular and Molecular Neuroscience
- Computational Theory and Mathematics