Independent Validation of Early-Stage Non-Small Cell Lung Cancer Prognostic Scores Incorporating Epigenetic and Transcriptional Biomarkers With Gene-Gene Interactions and Main Effects
BackgroundDNA methylation and gene expression are promising biomarkers of various cancers, including non-small cell lung cancer (NSCLC). Besides the main effects of biomarkers, the progression of complex diseases is also influenced by gene-gene (G×G) interactions.Research QuestionWould screening the functional capacity of biomarkers on the basis of main effects or interactions, using multiomics data, improve the accuracy of cancer prognosis?Study Design and MethodsBiomarker screening and model validation were used to construct and validate a prognostic prediction model. NSCLC prognosis-associated biomarkers were identified on the basis of either their main effects or interactions with two types of omics data. A prognostic score incorporating epigenetic and transcriptional biomarkers, as well as clinical information, was independently validated.ResultsTwenty-six pairs of biomarkers with G×G interactions and two biomarkers with main effects were significantly associated with NSCLC survival. Compared with a model using clinical information only, the accuracy of the epigenetic and transcriptional biomarker-based prognostic model, measured by area under the receiver operating characteristic curve (AUC), increased by 35.38% (95% CI, 27.09%-42.17%; P = 5.10 × 10–17) and 34.85% (95% CI, 26.33%-41.87%; P = 2.52 × 10–18) for 3- and 5-year survival, respectively, which exhibited a superior predictive ability for NSCLC survival (AUC3 year, 0.88 [95% CI, 0.83-0.93]; and AUC5 year, 0.89 [95% CI, 0.83-0.93]) in an independent Cancer Genome Atlas population. G×G interactions contributed a 65.2% and 91.3% increase in prediction accuracy for 3- and 5-year survival, respectively.InterpretationThe integration of epigenetic and transcriptional biomarkers with main effects and G×G interactions significantly improves the accuracy of prognostic prediction of early-stage NSCLC survival. DNA methylation and gene expression are promising biomarkers of various cancers, including non-small cell lung cancer (NSCLC). Besides the main effects of biomarkers, the progression of complex diseases is also influenced by gene-gene (G×G) interactions. Would screening the functional capacity of biomarkers on the basis of main effects or interactions, using multiomics data, improve the accuracy of cancer prognosis? Biomarker screening and model validation were used to construct and validate a prognostic prediction model. NSCLC prognosis-associated biomarkers were identified on the basis of either their main effects or interactions with two types of omics data. A prognostic score incorporating epigenetic and transcriptional biomarkers, as well as clinical information, was independently validated. Twenty-six pairs of biomarkers with G×G interactions and two biomarkers with main effects were significantly associated with NSCLC survival. Compared with a model using clinical information only, the accuracy of the epigenetic and transcriptional biomarker-based prognostic model, measured by area under the receiver operating characteristic curve (AUC), increased by 35.38% (95% CI, 27.09%-42.17%; P = 5.10 × 10–17) and 34.85% (95% CI, 26.33%-41.87%; P = 2.52 × 10–18) for 3- and 5-year survival, respectively, which exhibited a superior predictive ability for NSCLC survival (AUC3 year, 0.88 [95% CI, 0.83-0.93]; and AUC5 year, 0.89 [95% CI, 0.83-0.93]) in an independent Cancer Genome Atlas population. G×G interactions contributed a 65.2% and 91.3% increase in prediction accuracy for 3- and 5-year survival, respectively. The integration of epigenetic and transcriptional biomarkers with main effects and G×G interactions significantly improves the accuracy of prognostic prediction of early-stage NSCLC survival. Lung cancer is a leading cause of cancer-related death worldwide and was estimated to cause 1.76 million deaths in 2018.1Bray F. Ferlay J. Soerjomataram I. et al.Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA Cancer J Clin. 2018; 68: 394-424Crossref PubMed Scopus (33702) Google Scholar The 5-year survival rate among patients with lung cancer remains relatively low, ranging from 4% to 17% depending on clinical characteristics.2Hirsch F.R. Scagliotti G.V. Mulshine J.L. et al.Lung cancer: current therapies and new targeted treatments.Lancet. 2017; 389(10066): 299-311Abstract Full Text Full Text PDF Scopus (1048) Google Scholar Compared with patients diagnosed with late-stage disease, early-stage patients often have a considerably more favorable prognosis. However, significant heterogeneity in clinical prognosis is observed for patients with early-stage non-small cell lung cancer (NSCLC) with similar clinical characteristics, which indicates the importance of understanding molecular mechanisms.3Tang S. Pan Y. Wang Y. et al.Genome-wide association study of survival in early-stage non-small cell lung cancer.Ann Surg Oncol. 2015; 22(2): 630-635Crossref Scopus (41) Google Scholar Identifying molecular changes in oncogene and/or tumor suppressor genes that are associated with NSCLC survival is helpful for developing targeted therapies to prolong patients' survival time. DNA methylation is a heritable, reversible, epigenetic modification that affects the spatial conformation of DNA and regulates gene expression.4Egger G. Liang G. Aparicio A. et al.Epigenetics in human disease and prospects for epigenetic therapy.Nature. 2004; 429: 457-463Crossref PubMed Scopus (2322) Google Scholar,5Feinberg A.P. Tycko B. The history of cancer epigenetics.Nat Rev Cancer. 2004; 4(2): 143-153Crossref Scopus (1684) Google Scholar DNA methylation is a molecular biomarker and may be a therapeutic target for the treatment of cancer.6Shen S. Zhang R. Guo Y. et al.A multi-omic study reveals BTG2 as a reliable prognostic marker for early-stage non-small cell lung cancer.Mol Oncol. 2018; 12: 913-924Crossref PubMed Scopus (16) Google Scholar,7Wei Y. Liang J. Zhang R. et al.Epigenetic modifications in KDM lysine demethylases associate with survival of early-stage NSCLC.Clin Epigenetics. 2018; 10(1): 41Crossref Scopus (11) Google Scholar In addition, gene-gene (G×G) interactions have long been recognized to regulate the progression of complex diseases, including NSCLC.8Zhang R. Lai L. He J. et al.EGLN2 DNA methylation and expression interact with HIF1A to affect survival of early-stage NSCLC.Epigenetics. 2019; 14: 118-129Crossref PubMed Scopus (10) Google Scholar The development of cancer may be related to interactions between several key genes.9Lin Z. Hui L. Yufei H. et al.Cancer progression prediction using Gene Interaction Regularized Elastic Net.IEEE/ACM Trans Comput Biol Bioinform. 2017; 14: 145-154Crossref PubMed Scopus (19) Google Scholar Lung cancer prognosis-associated biomarkers have been proposed on the basis of omics data, including DNA methylation,10Sandoval J. Mendez-Gonzalez J. Nadal E. et al.A prognostic DNA methylation signature for stage I non-small-cell lung cancer.J Clin Oncol. 2013; 31(32): 4140-4147Crossref Scopus (190) Google Scholar gene expression,11Shedden K. Taylor J.M. Enkemann S.A. et al.Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study.Nat Med. 2008; 14(8): 822-827Google Scholar microRNA,12Tan X. Qin W. Zhang L. et al.A 5-microRNA signature for lung squamous cell carcinoma diagnosis and hsa-miR-31 for prognosis.Clin Cancer Res. 2011; 17(21): 6802-6811Crossref Scopus (159) Google Scholar and long noncoding RNA.13Zhou M. Guo M. He D. et al.A potential signature of eight long non-coding RNAs predicts survival in patients with non-small cell lung cancer.J Transl Med. 2015; 13(1): 231Crossref Scopus (141) Google Scholar However, most studies are limited to a single type of omics data, which results in less accurate prognostic models.14Zhao Q. Shi X. Xie Y. et al.Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA.Brief Bioinform. 2015; 16(2): 291-303Crossref Scopus (85) Google Scholar For example, our previous integrative omics study of the BTG2 gene showed that this gene could slightly improve the prediction accuracy of early-stage NSCLC survival.6Shen S. Zhang R. Guo Y. et al.A multi-omic study reveals BTG2 as a reliable prognostic marker for early-stage non-small cell lung cancer.Mol Oncol. 2018; 12: 913-924Crossref PubMed Scopus (16) Google Scholar However, a large-scale integrative analysis of multiomics data has identified genes with either important main effects or gene-gene (G×G) interactions, based on which a more accurate prognostic prediction model of NSCLC can be constructed. Specifically, we used a two-stage study design and performed an integrative analysis of pan-cancer-related genes to identify prognostic biomarkers with either a main effect or G×G interactions using epigenome and transcriptome data from multiple study centers. We then built a prognostic prediction model for early-stage NSCLC by incorporating both selected epigenetic and transcriptional biomarkers. Only patients with early-stage (stage I or II) lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) were included in our study. DNA methylation data were harmonized from five international study centers, including Harvard, Spain, Norway, Sweden, and the Cancer Genome Atlas (TCGA). Gene expression data were composed of four datasets from the Gene Expression Omnibus (GEO) and TCGA. The Harvard cohort consisted of patients seen at Massachusetts General Hospital (MGH), and histologically confirmed as having primary NSCLC, recruited since 1992.15Guo Y. Zhang R. Shen S. et al.DNA Methylation of LRRC3B: a biomarker for survival of early-stage non-small cell lung cancer patients.Cancer Epidemiol Biomarkers Prev. 2018; 27: 1527-1535Crossref PubMed Scopus (6) Google Scholar We profiled 151 early-stage patients from this cohort. A lung pathologist at MGH evaluated each specimen for the amount (tumor cellularity, > 70%) and quality of tumor cells. The specimens were classified histologically according to World Health Organization criteria. The institutional review boards at the Harvard T. H. Chan School of Public Health and MGH approved the study. All patients provided written informed consent. The Spanish cohort included 226 patients with early-stage NSCLC recruited from eight subcenters between 1991 and 2009.10Sandoval J. Mendez-Gonzalez J. Nadal E. et al.A prognostic DNA methylation signature for stage I non-small-cell lung cancer.J Clin Oncol. 2013; 31(32): 4140-4147Crossref Scopus (190) Google Scholar Patients provided written consent and tumors were surgically collected. This study was approved by the Bellvitge Biomedical Research Institute institutional review boards. The Norwegian cohort consisted of 133 patients with early-stage NSCLC from Oslo University Hospital, recruited between 2006 and 2011.16Bjaanæs M.M. Fleischer T. Halvorsen A.R. et al.Genome-wide DNA methylation analyses in lung adenocarcinomas: association with EGFR, KRAS and TP53 mutation status, gene expression and prognosis.Mol Oncol. 2016; 10: 330-343Crossref PubMed Scopus (44) Google Scholar The project was developed with the approval of the Oslo University Institutional Review Board and regional ethics committee (S-05307). All patients provided informed consent. Tumor tissues were snap frozen in liquid nitrogen and stored at –80°C until DNA isolation. Tumor DNA was collected from 103 patients with early-stage NSCLC, including 80 patients with LUAD and 23 patients with LUSC, at the Skåne University Hospital in Lund, Sweden.17Karlsson A. Jonsson M. Lauss M. et al.Genome-wide DNA methylation analysis of lung carcinoma reveals one neuroendocrine and four adenocarcinoma epitypes associated with patient outcome.Clin Cancer Res. 2014; 20(23): 6127-6140Crossref Scopus (62) Google Scholar The study was developed under the approval of the Regional Ethical Review Board in Lund, Sweden (Registration nos. 2004/762 and 2008/702). A total of 332 LUAD and 285 LUSC with full DNA methylation, survival time, and covariates data were included. Level 1 HumanMethylation450 DNA methylation data from patients with early-stage NSCLC were downloaded on October 1, 2015. Transcriptome information from 425 patients with early-stage NSCLC was profiled using the Affymetrix Human Genome U133A Plus 2.0 Array (e-Table 1). Only data from patients with available survival time, clinical stage, and tumor tissue expression values were analyzed. DNA methylation was assessed with Illumina Infinium HumanMethylation450 BeadChips (Illumina Inc.). Raw image data were imported into GenomeStudio Methylation Module V1.8 (Illumina Inc.) to calculate methylation signals and to perform normalization, background subtraction, and quality control (QC). Unqualified probes were excluded if they fitted any one of the following quality control criteria: (1) failed detection (P > .05) in ≥ 5% samples; (2) coefficient of variance < 5%; (3) all samples were methylated or all were unmethylated; (4) common single-nucleotide polymorphisms located in probe sequence or in 10-bp flanking regions; (5) cross-reactive probes18Chen Y-a Lemire M. Choufani S. et al.Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.Epigenetics. 2013; 8(2): 203-209Crossref Scopus (803) Google Scholar; or (6) data did not pass QC in all centers. Samples with > 5% undetectable probes were excluded. Methylation signals were further processed for quantile normalization (betaqn function in R package minfi) as well as type I and II probe correction (BMIQ function in R package lumi). They were adjusted for batch effects (ComBat function in R package sva) according to the best pipeline by a comparative study.19Marabita F. Almgren M. Lindholm M.E. et al.An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform.Epigenetics. 2013; 8: 333-346Crossref PubMed Scopus (137) Google Scholar Details of the QC process are described in e-Figure 1. The TCGA workgroup completed the mRNA sequencing data processing and QC. Raw counts were normalized using RNA-sequencing by expectation maximization. Level 3 gene quantification data were downloaded from the TCGA data portal and were further checked for quality. Gene probes were excluded if the missing rate > 80%, and the batch effect was corrected with ComBat. The expression value of each gene was transformed on a log2 scale and standardized before association analysis. DNA methylation and gene expression of 719 pan-cancer-related genes were then used for subsequent association analysis. Gene symbols for the 719 pan-cancer-related genes were obtained from the Catalogue of Somatic Mutations in Cancer (COSMIC). After QC, there were 12,806 CpG probes identified for association analysis. CpG probes from five genes (BTG2,6Shen S. Zhang R. Guo Y. et al.A multi-omic study reveals BTG2 as a reliable prognostic marker for early-stage non-small cell lung cancer.Mol Oncol. 2018; 12: 913-924Crossref PubMed Scopus (16) Google Scholar KDM,7Wei Y. Liang J. Zhang R. et al.Epigenetic modifications in KDM lysine demethylases associate with survival of early-stage NSCLC.Clin Epigenetics. 2018; 10(1): 41Crossref Scopus (11) Google Scholar EGLN2,8Zhang R. Lai L. He J. et al.EGLN2 DNA methylation and expression interact with HIF1A to affect survival of early-stage NSCLC.Epigenetics. 2019; 14: 118-129Crossref PubMed Scopus (10) Google Scholar LRRC3B,15Guo Y. Zhang R. Shen S. et al.DNA Methylation of LRRC3B: a biomarker for survival of early-stage non-small cell lung cancer patients.Cancer Epidemiol Biomarkers Prev. 2018; 27: 1527-1535Crossref PubMed Scopus (6) Google Scholar and SIPA1L320Zhang R. Lai L. Dong X. et al.SIPA1L3 methylation modifies the benefit of smoking cessation on lung adenocarcinoma survival: an epigenomic-smoking interaction analysis.Mol Oncol. 2019; 13(5): 1235-1248Crossref Scopus (9) Google Scholar) reported in our previous study were also included. The flow of analysis is depicted in Figure 1. Epigenetic and transcriptional analyses were performed simultaneously, and a discovery phase and validation phase were used to identify NSCLC prognostic biomarkers. In each procedure, we conducted analysis of both the main effects and gene-gene interactions among biomarkers. Patients having DNA methylation data from Harvard, Spain, Norway, and Sweden, as well as patients having gene expression data from GEO, were assigned to the discovery phase for epigenetic analysis and transcriptional analysis, respectively. Patients having two types of omics data from TCGA were assigned to the validation phase. For the main effect analysis, we used sure independence screening (SIS) and LASSO Cox penalized regression to screen biomarkers with main effects that were relevant to survival, using the R package SIS. SIS LASSO is a two-stage procedure. At the first stage, SIS selects the biomarkers with the strongest marginal associations with survival. At the second stage, LASSO was used to perform variable selection and parameter estimation simultaneously among the biomarkers selected at the first stage. During the LASSO procedure, tuning parameter selection was based on Bayesian information criteria. To capture biomarkers that might be missed at the first stage, we repeatedly applied the SIS LASSO algorithm to the remaining unselected biomarkers until no new biomarkers can be recruited.21Saldana D.F. Feng Y. SIS: an R package for sure independence screening in ultrahigh dimensional statistical models.J Stat Software. 2018; 83(2): 1-25Google Scholar This iterative procedure is termed iterative SIS (ISIS) LASSO. To account for the biologic heterogeneity between LUAD and LUSC, we used a histology-stratified multivariate Cox proportional hazards model to test these biomarkers, using the R package survival. The stratified model adjusted for the differences between LUAD and LUSC in baseline hazards. The other covariates adjusted in the model were age, sex, study center, clinical stage, and smoking status. For the G×G interaction analysis, a histology-stratified multivariate Cox proportional hazards model adjusted for the aforementioned covariates was applied to identify biomarkers with G×G interactions. The P value thresholds for multiple testing were established by the Bonferroni method, which set the significance level to .05 divided by the number of tests. This way, the overall type I error would be controlled at the .05 level. In our study, the significance level of G×G interaction analysis of epigenetic and transcriptional biomarkers was defined as 6.10 × 10–10 = 0.05/(12,806 × 12,805/2) and 1.94 × 10–7 = 0.05/(719 × 718/2), respectively. Significant biomarkers observed in the discovery phase were further confirmed in the validation phase and were retained if the P value was ≤ .05 and there was consistent direction of the effect across two phases. We also performed a test of proportional hazards assumption for each significant biomarker. The hazard ratio (HR) and 95% CI were described as per 1% level of DNA methylation or gene expression increment. Sensitivity analysis was performed to confirm these robustly significant biomarkers. Patients were excluded if their DNA methylation (logit2 transformed) or expression (log2 transformed) values were out of range, based on mean ± 3 × SD. For those identified biomarkers, we applied a forward stepwise regression strategy to build up a multibiomarker Cox proportional hazards model in the discovery phase, which was then validated in TCGA samples. In the forward stepwise regression, a likelihood ratio test was applied to test the main effect or G×G interaction of biomarkers if Pentry ≤ .05 and Pelimination > .05. Sensitivity analysis was also performed using two different thresholds: .10 and .15. Epigenetic and transcriptional scores were calculated on the basis of a weighted linear combination of individual values of the DNA methylation and gene expression, with weights derived from the Cox model. Integrative scores were synthesized by epigenetic and transcriptional scores. Finally, the prognostic score was defined as the linear combination of clinical information and integrative score (see e-Appendix 1). Kaplan-Meier survival curves adjusted for the covariates were drawn to represent the survival difference among patients with different scores. We predicted 3- and 5-year overall survival of patients, using the nearest neighbor method for time-to-event data.22Heagerty P.J. Lumley T. Pepe M.S. Time-dependent ROC curves for censored survival data and a diagnostic marker.Biometrics. 2000; 56(2): 337-344Crossref Scopus (1470) Google Scholar The accuracy of the prediction is presented using a receiver operating characteristic (ROC) curve and was measured by area under the ROC curve (AUC), computed by the R package survivalROC. The prediction accuracy was confirmed with an independent TCGA population in the validation phase. The 95% CI and P value of the AUC improvement were calculated on the basis of 1,000-time bootstrap resampling. Stratification analysis of prognostic scores was carried out within subgroups stratified by age, sex, smoking status, clinical stage, and histology. The concordance index (Cindex), an average accuracy of predictive survival across follow-up years, as well as the 95% CI, which ranges from 0.5 to 1.0, were calculated to estimate the predictive performance.23Brentnall A.R. Cuzick J. Use of the concordance index for predictors of censored survival data.Stat Methods Med Res. 2018; 27(8): 2359-2373Crossref Scopus (24) Google Scholar A nomogram was generated with R package rms to facilitate application of our model. We assessed the potential functions of the identified genes at the protein level by taking advantage of limited public resources. First, we evaluated the association between protein expression and gene expression, using the reverse-phase protein array from the TCGA database. Second, we performed differential expression analysis between tumor and normal tissues, and further investigated the main effects of genes and G×G interactions between genes on LUAD survival, using the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database. Differential protein expression analysis was performed with the R package limma, which generated a linear model to estimate fold changes and SEs prior to empirical Bayes smoothing.24Ritchie M.E. Phipson B. Wu D. et al.limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Res. 2015; 43(7): e47Crossref Scopus (9772) Google Scholar Finally, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was carried out with Metascape. Gene network analysis was conducted with GeneMANIA,25Warde-Farley D. Donaldson S.L. Comes O. et al.The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function.Nucleic Acids Res. 2010; 38(Web Server issue): W214-W220Crossref Scopus (1745) Google Scholar a plugin of the Cytoscape application. The critical hubs, highly connected to nodes in a module, were defined as the highest connectivity degrees. P values were two-sided. All statistical analyses were performed with R version 3.5.1 (R Foundation), unless otherwise specified. After QC, 1,230 (Ndiscovery = 613 and Nvalidation = 617) patients with 12,806 CpG probes and 719 gene probes were included in the association analysis. The demographic and clinical information are described in e-Tables 2, 3. For the main effect analysis of DNA methylation and gene expression, 23 CpG probes (e-Tables 4-6) and 13 gene probes (e-Tables 7, 8) were selected by ISIS LASSO, respectively. However, only cg19286631TRIM27 was significantly associated with survival in both phases (HRdiscovery = 1.03 [95% CI, 1.01-1.05], P = 1.43 × 10–2; HRvalidation = 1.03 [95% CI, 1.01-1.06], P = 1.13 × 10–3) and remained significant in sensitivity analysis. Also, only one gene probe located in the NDRG1 gene remained significant in the validation phase (HRdiscovery = 1.41 [95% CI, 1.05-1.89], P = 2.16 × 10–2; HRvalidation = 1.12 [95% CI, 1.01-1.42], P = 4.33 × 10–2) and sensitivity analysis. For the G×G interaction analysis, we observed 2,495 and 40 G×G interactions from epigenetic and transcriptional analysis, respectively, in the discovery phase. Finally, 149 and 2 G×G interactions were retained in the validation phase that were also significant in the sensitivity analysis (e-Tables 9-13). By forward stepwise regression analysis in the discovery phase, we observed one CpG probe with a main effect and 25 pairs of CpG probes with G×G interactions in the multibiomarker model (e-Table 14), which was used to calculate the epigenetic score (e-Table 15) (HRdiscovery = 2.71 [95% CI, 2.41-3.05]; P = 1.15 × 10–61). One gene probe with a main effect and one pair of gene probes with a G×G interaction were retained in the multibiomarker model and used to calculate the transcriptional score (HRdiscovery = 2.44 [95% CI, 1.78-3.35]; P = 2.79 × 10–8). The associations between survival and each of these scores were independently confirmed in the validation phase when adjusted for covariates (epigenetic score: HRvalidation = 2.72 [95% CI, 2.31-3.20], P = 6.06 × 10–33; transcriptional score: HRvalidation = 2.64 [95% CI, 1.73-4.04], P = 7.51 × 10–6; integrative score: HRvalidation = 2.72 [95% CI, 2.32-3.18], P = 5.68 × 10–35; prognostic score: HRvalidation = 2.72 [95% CI, 2.34-3.17], P = 5.04 × 10–38). To evaluate the discriminative ability of these scores, samples in the validation phase were categorized into low-, medium-, and high-score groups based on the tertiles of epigenetic, transcriptional, integrative, and prognostic scores, respectively. Compared with the epigenetic low-score group, the medium- and high-score groups had 4.39- and 21.24-fold mortality risk, respectively (HRMedium vs Low = 4.39 [95% CI, 2.42-7.99], P = 1.22 × 10–6; HRHigh vs Low = 21.24 [95% CI, 11.23-40.17], P = 5.67 × 10–21) (Fig 2A). Patients with a high transcriptional score had significantly worse survival (HRMedium vs Low = 1.46 [95% CI, 0.92-2.33], P = 1.04 × 10–1; HRHigh vs Low = 2.26 [95% CI, 1.41-3.60], P = 6.52 × 10–4) (Fig 2B). The significant survival difference was enhanced among patients with different integrative scores (HRMedium vs Low = 4.32 [95% CI, 2.39-7.83], P =1.33 × 10–6; HRHigh vs Low = 24.32 [95% CI, 12.71-46.56], P = 5.76 × 10–22) (Fig 2C). Moreover, when combined with clinical information, including age, sex, study center, clinical stage, and smoking status, the prognostic score significantly discriminated NSCLC survival (HRMedium vs Low = 7.32 [95% CI, 3.50-15.33], P = 1.29 × 10–7; HRHigh vs Low = 28.85 [95% CI, 13.13-63.43], P = 5.83 × 10–17) (Fig 2D). The discriminative ability of the prognostic score is further illustrated by categorizing patients on the basis of the quintile level of the score. Figure 2E manifests an ordering relation: patients in higher-quintile groups had lower 3- and 5-year survival rates, as well as shorter median survival time. This indicates that patients with higher mortality risks can be detected by using our score system (HRLevel 5 vs 1 = 66.09 [95% CI, 25.13-173.80], P = 1.98 × 10–17; HRLevel 4 vs 1 = 21.02 [95% CI, 8.13-54.31], P = 3.24 × 10–10; HRLevel 3 vs 1 = 9.13 [95% CI, 3.51-23.78], P = 5.93 × 10–6; HRLevel 2 vs 1 = 4.40 [95% CI, 1.68-11.53], P = 2.53 × 10–3) (Fig 2F). The performance of the prognostic score was further confirmed in the analysis stratified by covariates (Fig 3).Figure 3Forest plots of results from stratification analysis of prognostic score. HR with 95% CI of the prognostic score on non-small cell lung cancer survival in various subgroups is stratified by clinical characteristics. LUAD = lung adenocarcinoma; LUSC = lung squamous cell carcinoma. See Figure 2 legend for expansion of other abbreviation.View Large Image Figure ViewerDownload Hi-res image Download (PPT) We then independently validated the predictive ability of these biomarkers. The model with only clinical information, as aforementioned, had very limited prediction ability (AUC3 year = 0.65, AUC5 year = 0.66). However, by adding biomarkers with either main effects or G×G interactions, the AUCs significantly increased by 35.38% (95% CI, 27.09%-44.17%; P = 5.10 × 10–17) and 34.85% (95% CI, 26.33%-41.87%; P = 2.52 × 10–18) for 3- and 5-year survival, respectively, and exhibited a superior predictive ability for NSCLC survival (AUC3 year = 0.88 [95% CI, 0.83-0.93]; AUC5 year = 0.89 [95% CI, 0.83-0.93]) (Fig 4). G×G interactions contributed an additional 65.2% for 3-year and 91.3% for the 5-year prediction accuracy increase. In the sensitivity analysis, we reanalyzed the stepwise regression using two different thresholds (P = .10 and .15) and found that the majority of the selected biomarkers were the same as those in the original regression model (e-Table 16). We then recalculated these scores, retested their associations with NSCLC survival, and obtained similar results (e-Table 17). Meanwhile, the AUCs of our prognostic model using different thresholds were comparable: 0.88P = .05 vs 0.85P = .10 vs 0.86P = .15 for 3-year survival; 0.89P = .05 vs 0.83P = .10 vs 0.86P = .15 for 5-year survival (e-Figs 2 and 3). Moreover, we found that the effects of these four scores did not differ significantly between patients with LUAD and patients with LUSC (PEpigenetic score = .6572; PTranscriptional score = .1823; PIntegrative score = .5532; PPrognostic score = .9653) (e-Table 18). Our prognostic model retained similar prediction ability in both the LUAD (AUC3 year = 0.91, AUC5 year = 0.89, and Cind
Ruyang Zhang, Chao Chen, Xuesi Dong et al. 2020Article