logo

SCIENTIA SINICA Informationis, Volume 47, Issue 11: 1510-1522(2017) https://doi.org/10.1360/N112017-00108

A method for mining core modules of cancer based on multi-omics biological network

More info
  • ReceivedMay 16, 2017
  • AcceptedJun 12, 2017
  • PublishedNov 3, 2017

Abstract

Cancer is closely linked to factors such as human living environment, individual genetic factors, etc. Because of the serious threat that cancer brings to human health, numerous scientific institutions around the world are engaged in cancer research to understand its pathogenesis. With the advent of next-generation sequencing technology, it will be more convenient to find important information about cancer and the relationships in genome. This paper considers multi-omics data from a data integration perspective. It takes lncRNA omic data into consideration and constructs a biological network model to mine cancer-associated core gene modules through a cluster method. We present a systematic approach to the identification core gene modules that can lead to the occurrence of cancer. We apply this approach to lung squamous cell carcinoma and find core gene modules containing 15 genes that have strong relationship with cancer by analyzing their functions and pathways. We also distinguish high-risk and low-risk groups by survival analysis. The results show that our approach can identify core gene modules and their dysregulated genes by integrating multi-omics biological data, which is useful in cancer research.


Funded by

国家自然科学基金(61532014,61571163,61402132,61671189)


References

[1] Reuter J A, Spacek D V, Snyder M P. High-throughput sequencing technologies.. Mol Cell, 2015, 58: 586-597 CrossRef PubMed Google Scholar

[2] Chuang H Y, Lee E, Liu Y T, et al. Network-based classification of breast cancer metastasis. Mol Syst Biol, 2007, 3: 141--150. Google Scholar

[3] Zhang S, Liu C C, Li W. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data.. Nucleic Acids Res, 2012, 40: 9379-9391 CrossRef PubMed Google Scholar

[4] Hoadley K A, Yau C, Wolf D M. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.. Cell, 2014, 158: 929-944 CrossRef PubMed Google Scholar

[5] Yang Y, Han L, Yuan Y, et al. Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types. Nat Commun, 2014, 5: 3231. Google Scholar

[6] Han L, Yuan Y, Zheng S Y, et al. The pan-cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nat Commun, 2014, 5: 3963. Google Scholar

[7] Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia. PLoS Comput Biol, 2014, 10: e1003908 CrossRef PubMed ADS Google Scholar

[8] Ping Y, Deng Y, Wang L. Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data.. Nucleic Acids Res, 2015, 43: 1997-2007 CrossRef PubMed Google Scholar

[9] Wang Y F, Liu L L, Jin H F, et al. Study on expression profile of long non-coding rna in gastric cancer cell lines under hypoxia. J Mod Oncol, 2013, 21: 225--228. Google Scholar

[10] Tang C Y, Silva-Fisher J M, Dang H X. Abstract 971: A novel long noncoding RNA, onco-lncRNA 230, induces apoptosis and invasion in lung squamous cell carcinoma. Cancer Res, 2016, 76: 971-971 CrossRef Google Scholar

[11] Wu C H, Hsu C L, Lu P C. Identification of lncRNA functions in lung cancer based on associated protein-protein interaction modules. Sci Rep, 2016, 6: 35939 CrossRef PubMed ADS Google Scholar

[12] Yuan F, Meng Z H, Yu G. An improved dbscan clustering algorithm. J Comput Res Dev, 2005, 42: 50--54. Google Scholar

[13] Wen J, Zheng B, Hu Y, et al. Comparative proteomic analysis of the esophageal squamous carcinoma cell line ec109 and its multi-drug resistant subline ec109/cddp. Int J Oncol, 2010, 36: 265--274. Google Scholar

[14] Massion P P, Zou Y, Chen H. Smoking-related genomic signatures in non-small cell lung cancer.. Am J Respir Crit Care Med, 2008, 178: 1164-1172 CrossRef PubMed Google Scholar

[15] Fuja T J, Lin F, Osann K E. Somatic mutations and altered expression of the candidate tumor suppressors csnk1 epsilon, dlg1, and edd/hhyd in mammary ductal carcinoma. Cancer Res, 2004, 64: 942-951 CrossRef Google Scholar

[16] Justilien V, Walsh M P, Ali S A. The PRKCI and SOX2 oncogenes are coamplified and cooperate to activate Hedgehog signaling in lung squamous cell carcinoma.. Cancer Cell, 2014, 25: 139-151 CrossRef PubMed Google Scholar

[17] Shinmura K, Kiyose S, Nagura K. TNK2 gene amplification is a novel predictor of a poor prognosis in patients with gastric cancer.. J Surg Oncol, 2014, 109: 189-197 CrossRef PubMed Google Scholar

[18] Fields A P, Justilien V. The guanine nucleotide exchange factor (GEF) Ect2 is an oncogene in human cancer.. Adv Enzyme Regulation, 2010, 50: 190-200 CrossRef PubMed Google Scholar

[19] Farfsing A, Engel F, Seiffert M. Gene knockdown studies revealed CCDC50 as a candidate gene in mantle cell lymphoma and chronic lymphocytic leukemia.. Leukemia, 2009, 23: 2018-2026 CrossRef PubMed Google Scholar

[20] Hagerstrand D, Tong A, Schumacher S E. Systematic interrogation of 3q26 identifies TLOC1 and SKIL as cancer drivers.. Cancer Discovery, 2013, 3: 1044-1057 CrossRef PubMed Google Scholar

[21] Arai M, Yokosuka O, Hirasawa Y. Sequential gene expression changes in cancer cell lines after treatment with the demethylation agent 5-Aza-2-deoxycytidine.. Cancer, 2006, 106: 2514-2525 CrossRef PubMed Google Scholar

[22] Uematsu K, He B, You L. Activation of the Wnt pathway in non small cell lung cancer: evidence of dishevelled overexpression.. Oncogene, 2003, 22: 7218-7221 CrossRef PubMed Google Scholar

[23] Dowling P, Clarke C, Hennessy K. Analysis of acute-phase proteins, AHSG, C3, CLI, HP and SAA, reveals distinctive expression patterns associated with breast, colorectal and lung cancer.. Int J Cancer, 2012, 131: 911-923 CrossRef PubMed Google Scholar

[24] Muzny D M, Scherer S E, Kaul R. The DNA sequence, annotation and analysis of human chromosome 3. Nature, 2006, 440: 1194-1198 CrossRef PubMed ADS Google Scholar

  • Figure 1

    (Color online) Progress of core module mining method

  • Figure 2

    (Color online) lncRNA experimental result. (a) Results while including lncRNA and random data; protectłinebreak (b) results while excluding lncRNA data

  • Figure 3

    Core gene module with regulating factors

  • Figure 4

    Core gene module

  • Figure 5

    Result of distinguish normal and tumor samples according to core genes

  • Figure 6

    Result of function and pathway analysis

  • Figure 7

    (Color online) TCGA survival analysis result. (a) 15-genes; (b) 12-genes

  • Figure 8

    (Color online) GEO survival analysis result. (a) 15-genes GSE8894; (b) 15-genes GSE17710; (c) 12-genes GSE8894; (d) 12-genes GSE17710

  • Table 1   Chromosome information of core genes
    Gene Chromosome Start point End point
    TP53 hs17 7668402 7687550
    CDKN2A hs9 21967752 21995043
    DROSHA hs5 31400494 31532175
    DAB2 hs5 39371674 39425233
    SMC4 hs3 160399304 160434962
    PRKCI hs3 170222432 170305982
    SKIL hs3 170357678 170396849
    ECT2 hs3 172750682 172829273
    DVL3 hs3 184155311 184173614
    AP2M1 hs3 184174846 184184091
    DNAJB11 hs3 186570676 186585800
    AHSG hs3 186612928 186621318
    CCDC50 hs3 191329082 191398670
    TNK2 hs3 195863364 195909009
    DLG1 hs3 197042560 197299272
  •   

    Algorithm 1 基于密度的关键基因模块聚类算法

    输入: Dataset: 一个包含$n$个对象的数据集
    $\varepsilon$: 扫描半径参数
    MinPts: 邻域密度阀值
    输出: 具有簇标签的基因集合
    方法:
    $1$. 创建空间为$n\times~n$的二维矩阵dis用于存储关键基因间距离;
    $2$. 计算dataset中关键基因间距离并保存到dis 矩阵;
    $3$. 对于dataset中的对象,根据dis标记满足$\varepsilon$范围内密度大于Minpts 的对象为core point;
    $4$. 标记core point在$\varepsilon$范围内的非core point对象为border point;
    $5$. 标记dataset中既不是core point也不是border point的对象为noise point;
    $6$. 标记dataset中所有对象为unvisited;
    $7$. for core point中的每个对象$p~$ //深度优先连接所有core point
    $8$. 创建栈Stack;
    $9$. if $p$是visited
    $10$. Continue;
    $11$. 将$p$标记为visited压入Stack
    $12$. while(Stack不为空)
    $13$. $v$ $\leftarrow$ Stack弹出栈顶;
    $14$. for $v$邻域内的每个对象$q$
    $15$. if $q$是visited
    $16$. Continue;
    $17$. 将$q$纳入$p$所在cluster;
    $18$. 将$q$标记为visited;
    $19$. 将$q$压入Stack;
    $20$. end for
    $21$. end while
    $22$. end for
    $23$. for border point中的每个对象$b$
    $24$. 将$b$纳入$\varepsilon$范围内任意core point所属cluster;
    $25$. end for
    $26$. 输出dataset中core point 与border point 对象以及其对应的cluster.
  • Table 2   Function ID and function name
    ID Term
    GO:0035556 Intracellular signal transduction
    hsa04144 Endocytosis
    GO:0032147 Activation of protein kinase activity
    hsa05203 Viral carcinogenesis
    GO:0043065 Positive regulation of apoptotic process
    GO:0046677 Response to antibiotic
    GO:0071479 Cellular response to ionizing radiation
    GO:0090004 Positive regulation of establishment of protein localization to plasma membrane
    GO:0071158 Positive regulation of cell cycle arrest
    GO:0042326 Negative regulation of phosphorylation
    hsa04390 Hippo signaling pathway
    GO:0045197 Establishment or maintenance of epithelial cell apical/basal polarity
    GO:0090399 Replicative senescence
    GO:0045893 Positive regulation of transcription, DNA-templated
    GO:0007050 Cell cycle arrest
    GO:1903077 Negative regulation of protein localization to plasma membrane
    hsa05166 HTLV-I infection
    GO:0070830 Bicellular tight junction assembly

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1