More info
  • ReceivedApr 24, 2018
  • AcceptedMay 22, 2018
  • PublishedJul 18, 2018


Thanks to the fast improvement of the computing power and the rapid development of the computational chemistry and biology, the computer-aided drug design techniques have been successfully applied in almost every stage of the drug discovery and development pipeline to speed up the process of research and reduce the cost and risk related to preclinical and clinical trials. Owing to the development of machine learning theory and the accumulation of pharmacological data, the artificial intelligence (AI) technology, as a powerful data mining tool, has cut a figure in various fields of the drug design, such as virtual screening, activity scoring, quantitative structure-activity relationship (QSAR) analysis, de novo drug design, and in silico evaluation of absorption, distribution, metabolism, excretion and toxicity (ADME/T) properties. Although it is still challenging to provide a physical explanation of the AI-based models, it indeed has been acting as a great power to help manipulating the drug discovery through the versatile frameworks. Recently, due to the strong generalization ability and powerful feature extraction capability, deep learning methods have been employed in predicting the molecular properties as well as generating the desired molecules, which will further promote the application of AI technologies in the field of drug design.

Funded by

the National Natural Science Foundation of China(21210003,81230076,to,H.J.,81773634,to,M.Z.,81430084,to,K.C.)

the “Personalized Medicines—Molecular Signature-based Drug Discovery and Development”

Strategic Priority Research Program of the Chinese Academy of Sciences(XDA12050201,to,M.Z.)

National Key Research & Development Plan(2016YFC1201003,to,M.Z.)

and the National Basic Research Program(2015CB910304,to,X.L.)


This work was supported by the National Natural Science Foundation of China (21210003 and 81230076 to H.J., 81773634 to M.Z. and 81430084 to K.C.), the “Personalized Medicines—Molecular Signature-based Drug Discovery and Development”, Strategic Priority Research Program of the Chinese Academy of Sciences (XDA12050201 to M.Z.), National Key Research & Development Plan (2016YFC1201003 to M.Z.), and the National Basic Research Program (2015CB910304 to X.L.).

Interest statement

The author(s) declare that they have no conflict of interest.


[1] Abagyan R., Totrov M., Kuznetsov D.. ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation. J Comput Chem, 1994, 15: 488-506 CrossRef Google Scholar

[2] Ain Q.U., Aleksandrova A., Roessler F.D., Ballester P.J.. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput Mol Sci, 2015, 5: 405-424 CrossRef PubMed Google Scholar

[3] Altae-Tran H., Ramsundar B., Pappu A.S., Pande V.. Low data drug discovery with one-shot learning. ACS Cent Sci, 2017, 3: 283-293 CrossRef PubMed Google Scholar

[4] Andras, P. (2017). High-dimensional function approximation with neural networks for large volumes of data. IEEE Trans Neural Netw Learn Syst 99, 1--9. Google Scholar

[5] Angermueller C., Pärnamaa T., Parts L., Stegle O.. Deep learning for computational biology. Mol Syst Biol, 2016, 12: 878 CrossRef Google Scholar

[6] Artursson, P., and Karlsson, J. (1991). Correlation between oral drug absorption in humans and apparent drug permeability coefficients in human intestinal epithelial (Caco-2) cells. Biochem Biophys Res Commun 175, 880--885. Google Scholar

[7] Ash S., Cline M.A., Homer R.W., Hurst T., Smith G.B.. ChemInform abstract: SYBYL line notation (SLN): a versatile language for chemical structure representation. ChemInform, 1997, 28: no CrossRef Google Scholar

[8] Ashburn T.T., Thor K.B.. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov, 2004, 3: 673-683 CrossRef PubMed Google Scholar

[9] Bai F., Morcos F., Cheng R.R., Jiang H., Onuchic J.N.. Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis. Proc Natl Acad Sci USA, 2016, 113: E8051-E8058 CrossRef PubMed Google Scholar

[10] Bender, A., And, H.Y.M., Glen, R.C., and Reiling, S. (2004). Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D):  evaluation of performance. J Chem Inf Comput Sci 44, 1708--1718. Google Scholar

[11] Cabreiro, F., Au, C., Leung, K.Y., Vergara-Irigaray, N., Cocheme, H.M., Noori, T., Weinkove, D., Schuster, E., Greene, N.D., and Gems, D. (2013). Metformin retards aging in C. elegans by altering microbial folate and methionine metabolism. Cell 153, 228--239. Google Scholar

[12] Cao D.S., Xu Q.S., Hu Q.N., Liang Y.Z.. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics, 2013, 29: 1092-1094 CrossRef PubMed Google Scholar

[13] Chen B., Sheridan R.P., Hornak V., Voigt J.H.. Comparison of random forest and pipeline pilot naïve bayes in prospective QSAR predictions. J Chem Inf Model, 2012, 52: 792-803 CrossRef PubMed Google Scholar

[14] Chen R., Li L., Weng Z.. ZDOCK: an initial-stage protein-docking algorithm. Proteins, 2003, 52: 80-87 CrossRef PubMed Google Scholar

[15] Chen Y.C.. Beware of docking!. Trends Pharmacol Sci, 2015, 36: 78-95 CrossRef PubMed Google Scholar

[16] Coley C.W., Barzilay R., Green W.H., Jaakkola T.S., Jensen K.F.. Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model, 2017, 57: 1757-1772 CrossRef PubMed Google Scholar

[17] Coley, C.W., Rogers, L., Green, W.H., and Jensen, K.F. (2018). SCScore: synthetic complexity learned from a reaction corpus. J Chem Inf Model 58, 252--261. Google Scholar

[18] Copeland R.A.. The dynamics of drug-target interactions: drug-target residence time and its impact on efficacy and safety. Expert Opin Drug Discovery, 2010, 5: 305-310 CrossRef PubMed Google Scholar

[19] Cortes, C., Kuznetsov, V., and Mohri, M. (2014). Ensemble methods for structured prediction. Proceedings of 31st International Conference on Machine Learning 2014, 1134--1142. Google Scholar

[20] Cukuroglu E., Engin H.B., Gursoy A., Keskin O.. Hot spots in protein–protein interfaces: Towards drug discovery. Prog Biophys Mol Biol, 2014, 116: 165-173 CrossRef PubMed Google Scholar

[21] Dahl, G.E., Jaitly, N., and Salakhutdinov, R. (2014). Multi-task neural networks for QSAR predictions. Comput Sci, arXiv:1406.1231v1. Google Scholar

[22] Dang N.L., Hughes T.B., Krishnamurthy V., Swamidass S.J.. A simple model predicts UGT-mediated metabolism. Bioinformatics, 2016, 32: 3183-3189 CrossRef PubMed Google Scholar

[23] Danishuddin, and Khan, A.U. (2016). Descriptors and their selection methods in QSAR analysis: paradigm for drug design. Drug Discov Today 21, 1291--1302. Google Scholar

[24] De Haes W., Frooninckx L., Van Assche R., Smolders A., Depuydt G., Billen J., Braeckman B.P., Schoofs L., Temmerman L.. Metformin promotes lifespan through mitohormesis via the peroxiredoxin PRDX-2. Proc Natl Acad Sci USA, 2014, 111: E2501-E2509 CrossRef PubMed ADS Google Scholar

[25] DiMasi J.A., Grabowski H.G., Hansen R.W.. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Economics, 2016, 47: 20-33 CrossRef PubMed Google Scholar

[26] Dobchev D., Pillai G., Karelson M.. In silico machine learning methods in drug development. CTMC, 2014, 14: 1913-1922 CrossRef Google Scholar

[27] Du T., Liao L., Wu C.H., Sun B.. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning. Methods, 2016, 110: 97-105 CrossRef PubMed Google Scholar

[28] Duch W., Swaminathan K., Meller J.. Artificial intelligence approaches for rational drug design and discovery. CPD, 2007, 13: 1497-1508 CrossRef Google Scholar

[29] Dudek A., Arodz T., Galvez J.. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. CCHTS, 2006, 9: 213-228 CrossRef Google Scholar

[30] Durant J.L., Leland B.A., Henry D.R., Nourse J.G.. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci, 2003, 34: 1273-1280 CrossRef Google Scholar

[31] Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Hirzel, T., and Adams, R.P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 2224-2232. Google Scholar

[32] Esposito, E.X., Hopfinger, A.J., and Madura, J.D. (2004). Methods for applying the quantitative structure-activity relationship paradigm. Methods Mol Biol 275, 131-214. Google Scholar

[33] Falchi F., Caporuscio F., Recanatini M.. Structure-based design of small-molecule protein-protein interaction modulators: the story so far. Future Medicinal Chem, 2014, 6: 343-357 CrossRef PubMed Google Scholar

[34] Free S.M., Wilson J.W.. A mathematical contribution to structure-activity studies. J Med Chem, 1964, 7: 395-399 CrossRef Google Scholar

[35] Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz, D.T., Repasky, M.P., Knoll, E.H., Shelley, M., and Perry, J.K. (2004). Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47, 1739--1749. Google Scholar

[36] Ghasemi, F., Mehridehnavi, A.R., Fassihi, A., and Pérez-Sánchez, H. (2017). Deep neural network in biological activity prediction using deep belief network. Appl Soft Comput 62, doi: 10.1016/j.asoc.2017.09.040. Google Scholar

[37] Goh, G.B., Hodas, N.O., Siegel, C., and Vishnu, A. (2017). SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties. arXiv:1712.02034v2. Google Scholar

[38] Gómez-Bombarelli, R., Wei, J.N., Duvenaud, D., Hernández-Lobato, J.M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T.D., Adams, R.P., and Aspuru-Guzik, A. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4, 268--276. Google Scholar

[39] Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning (Cambridge: The MIT Press). Google Scholar

[40] Guengerich F.P.. Mechanisms of drug toxicity and relevance to pharmaceutical development. Drug Metab Pharmacokinetics, 2011, 26: 3-14 CrossRef Google Scholar

[41] Hansch C., Fujita T.. Additions and corrections -ρ-σ-π analysis. A method for the correlation of biological activity and chemical structure. J Am Chem Soc, 1964, 86: 5710 CrossRef Google Scholar

[42] Hartenfeller, M., and Schneider, G. (2011). De novo drug design. Methods Mol Biol 672, 299–323. Google Scholar

[43] Hassan Baig M., Ahmad K., Roy S., Mohammad Ashraf J., Adil M., Haris Siddiqui M., Khan S., Amjad Kamal M., Provazník I., Choi I.. Computer aided drug design: success and limitations. CPD, 2015, 22: 572-581 CrossRef Google Scholar

[44] Heller S.R., McNaught A., Pletnev I., Stein S., Tchekhovskoi D.. InChI, the IUPAC international chemical identifier. J Cheminform, 2015, 7: 23 CrossRef PubMed Google Scholar

[45] Higueruelo A.P., Jubb H., Blundell T.L.. Protein-protein interactions as druggable targets: recent technological advances. Curr Opin Pharmacol, 2013, 13: 791-796 CrossRef PubMed Google Scholar

[46] Huang S.Y., Zou X.. Inclusion of solvation and entropy in the knowledge-based scoring function for protein-ligand interactions. J Chem Inf Model, 2010, 50: 262-273 CrossRef PubMed Google Scholar

[47] Huang S.Y., Grinter S.Z., Zou X.. Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. Phys Chem Chem Phys, 2010, 12: 12899-12908 CrossRef PubMed ADS Google Scholar

[48] Hubatsch I., Ragnarsson E.G.E., Artursson P.. Determination of drug permeability and prediction of drug absorption in Caco-2 monolayers. Nat Protoc, 2007, 2: 2111-2119 CrossRef PubMed Google Scholar

[49] Jimenez, J., Skalic, M., Martinez-Rosell, G., and De Fabritiis, G. (2018). KDEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model 58, 287--296. Google Scholar

[50] Jin, W., Barzilay, R., and Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. arXiv:1802.04364v2. Google Scholar

[51] Kadurin A., Aliper A., Kazennov A., Mamoshina P., Vanhaelen Q., Khrabrov K., Zhavoronkov A.. The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget, 2017a, 8: 10883 CrossRef PubMed Google Scholar

[52] Kadurin A., Nikolenko S., Khrabrov K., Aliper A., Zhavoronkov A.. druGAN: An advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm, 2017b, 14: 3098-3104 CrossRef PubMed Google Scholar

[53] Kearnes, S., Goldman, B., and Pande, V. (2016a). Modeling industrial ADMET data with multitask networks. arXiv:1606.08793v3. Google Scholar

[54] Kearnes, S., Mccloskey, K., Berndl, M., Pande, V., and Riley, P. (2016b). Molecular graph convolutions: moving beyond fingerprints. J Comput Aid Mol Design 30, 1--14. Google Scholar

[55] Khamis, M.A., Gomaa, W., and Ahmed, W.F. (2015). Machine learning in computational docking. Artif Intell Med 63, 135--152. Google Scholar

[56] Kim, K.H., Kim, N.D., and Seong, B.L. (2010). Pharmacophore-based virtual screening: a review of recent applications. Expert Opin Drug Discov 5, 205--222. Google Scholar

[57] Kinnings S.L., Liu N., Tonge P.J., Jackson R.M., Xie L., Bourne P.E.. A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model, 2011, 51: 408-419 CrossRef PubMed Google Scholar

[58] Klaeger S., Heinzlmeir S., Wilhelm M., Polzer H., Vick B., Koenig P.A., Reinecke M., Ruprecht B., Petzoldt S., Meng C., et al. The target landscape of clinical kinase drugs. Science, 2017, 358: eaan4368 CrossRef PubMed Google Scholar

[59] Labbé C.M., Kuenemann M.A., Zarzycka B., Vriend G., Nicolaes G.A.F., Lagorce D., Miteva M.A., Villoutreix B.O., Sperandio O.. iPPI-DB: an online database of modulators of protein-protein interactions. Nucleic Acids Res, 2016, 44: D542-D547 CrossRef PubMed Google Scholar

[60] Lavecchia A., Giovanni C.. Virtual screening strategies in drug discovery: a critical review. CMC, 2013, 20: 2839-2860 CrossRef Google Scholar

[61] LeCun Y., Bengio Y., Hinton G.. Deep learning. Nature, 2015, 521: 436-444 CrossRef PubMed ADS Google Scholar

[62] Leelananda S.P., Lindert S.. Computational methods in drug discovery. Beilstein J Org Chem, 2016, 12: 2694-2718 CrossRef PubMed Google Scholar

[63] Li H., Hou J., Adhikari B., Lyu Q., Cheng J.. Deep learning methods for protein torsion angle prediction. BMC BioInf, 2017, 18: 417 CrossRef PubMed Google Scholar

[64] Liew, C.Y., Ma, X.H., Liu, X., and Yap, C.W. (2009). SVM model for virtual screening of Lck inhibitors. J Chem Inf Model 49, 877. Google Scholar

[65] Lombardo, F., and Jing, Y. (2016). In silico prediction of volume of distribution in humans. Extensive data set and the exploration of linear and nonlinear methods coupled with molecular interaction fields descriptors. J Chem Inf Model 56, 2042--2052. Google Scholar

[66] Lombardo F., Obach R.S., Varma M.V., Stringer R., Berellini G.. Clearance mechanism assignment and total clearance prediction in human based upon in silico models. J Med Chem, 2014, 57: 4397-4405 CrossRef PubMed Google Scholar

[67] Lotfi Shahreza, M., Ghadiri, N., Mousavi, S.R., Varshosaz, J., and Green, J.R. (2017). A review of network-based approaches to drug repositioning. Brief Bioinform, doi: 10.1093/bib/bbx017. Google Scholar

[68] Luo Y., Zhao X., Zhou J., Yang J., Zhang Y., Kuang W., Peng J., Chen L., Zeng J.. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun, 2017, 8: 573 CrossRef PubMed ADS Google Scholar

[69] Lusci A., Pollastri G., Baldi P.. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model, 2013, 53: 1563-1575 CrossRef PubMed Google Scholar

[70] Ma J., Sheridan R.P., Liaw A., Dahl G.E., Svetnik V.. Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model, 2015, 55: 263-274 CrossRef PubMed Google Scholar

[71] Ma X., Jia J., Zhu F., Xue Y., Li Z., Chen Y.. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries. CCHTS, 2009, 12: 344-357 CrossRef Google Scholar

[72] Maheshwari S., Brylinski M.. Template-based identification of protein-protein interfaces using eFindSitePPI. Methods, 2016, 93: 64-71 CrossRef PubMed Google Scholar

[73] Martin-Montalvo A., Mercken E.M., Mitchell S.J., Palacios H.H., Mote P.L., Scheibye-Knudsen M., Gomes A.P., Ward T.M., Minor R.K., Blouin M.J., et al. Metformin improves healthspan and lifespan in mice. Nat Commun, 2013, 4: 2192 CrossRef PubMed ADS Google Scholar

[74] Mason, J.S. (2007). Introduction to the volume and overview of computer-assisted drug design in the drug discovery process. In Taylor, J.B., and Triggle, D.J., ed. Comprehensive Medicinal Chemistry II (Elsevier), pp. 1--11. Google Scholar

[75] Matlock M.K., Hughes T.B., Swamidass S.J.. XenoSite server: a web-available site of metabolism prediction tool. Bioinformatics, 2015, 31: 1136-1137 CrossRef PubMed Google Scholar

[76] Mauri, A., Consonni, V., Pavan, M., and Todeschini, R. (2006). DRAGON software: An easy approach to molecular descriptor calculations. Match Commun Math Comput Chem 56, 237--248. Google Scholar

[77] Mayr, A., Klambauer, G., Unterthiner, T., and Hochreiter, S. (2016). DeepTox: toxicity prediction using deep learning. Front Environ Sci, https://doi.org/10.3389/fenvs.2015.00080. Google Scholar

[78] Melville J., Burke E., Hirst J.. Machine learning in virtual screening. CCHTS, 2009, 12: 332-343 CrossRef Google Scholar

[79] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature 518, 529--533. Google Scholar

[80] Mullard A.. The drug-maker’s guide to the galaxy. Nature, 2017, 549: 445-447 CrossRef PubMed ADS Google Scholar

[81] Myint K.Z., Xie X.Q.. Recent advances in fragment-based QSAR and multi-dimensional QSAR methods. IJMS, 2010, 11: 3846-3866 CrossRef PubMed Google Scholar

[82] Ning X., Karypis G.. In silico structure-activity-relationship (SAR) models from machine learning: a review. Drug Dev Res, 2011, 72: 138-146 CrossRef Google Scholar

[83] O’Boyle, N.M., and Hutchison, G.R. (2008). Cinfony—combining Open Source cheminformatics toolkits behind a common interface. Chem Cent J 2, 1--10. Google Scholar

[84] OECD. (2014). Guidance document on the validation of (quantitative) structure-activity relationship [(Q)Sar] models. 69, 1–154. Google Scholar

[85] Olivecrona M., Blaschke T., Engkvist O., Chen H.. Molecular de-novo design through deep reinforcement learning. J Cheminform, 2017, 9: 48 CrossRef PubMed Google Scholar

[86] Pan, S.J., and Yang, Q. (2010). A survey on transfer learning. IEEE Trans Knowl Data Eng 22, 1345--1359. Google Scholar

[87] Pereira, J.C., Caffarena, E.R., and Dos Santos, C.N. (2016). Boosting docking-based virtual screening with deep learning. J Chem Inf Model 56, 2495. Google Scholar

[88] Pu, Y., Wang, W., Henao, R., Chen, L., Gan, Z., Li, C., and Carin, L. (2017). Adversarial symmetric variational autoencoder. arXiv:1711.04915v2. Google Scholar

[89] Ramsundar, B., Liu, B., Wu, Z., Verras, A., Tudor, M., Sheridan, R.P., and Pande, V. (2017). Is multitask deep learning practical for pharma? J Chem Inf Model 57, 2068--2076. Google Scholar

[90] Repasky, M.P., Shelley, M., and Friesner, R.A. (2007). Flexible Ligand Docking with Glide (John Wiley & Sons, Inc.). Google Scholar

[91] Rogers, D., and Hahn, M. (2010). Extended-connectivity fingerprints. J Chem Inf Model 50, 742--754. Google Scholar

[92] Sahoo S., Adhikari C., Kuanar M., Mishra B.. A short review of the generation of molecular descriptors and their applications in quantitative structure property/activity relationships. CAD, 2016, 12: 181-205 CrossRef Google Scholar

[93] Sanchez-Lengeling, B., Outeiral, C., Guimaraes, G.L., and Aspuru-Guzik, A. (2017). Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC). ChemRxiv Preprint. Google Scholar

[94] Santos R., Ursu O., Gaulton A., Bento A.P., Donadi R.S., Bologa C.G., Karlsson A., Al-Lazikani B., Hersey A., Oprea T.I., et al. A comprehensive map of molecular drug targets. Nat Rev Drug Discov, 2017, 16: 19-34 CrossRef PubMed Google Scholar

[95] Schaarschmidt, J., Monastyrskyy, B., Kryshtafovych, A., and Bonvin, A. (2017). Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age. Proteins 86, https://doi.org/10.1002/prot.25407. Google Scholar

[96] Schmidhuber J.. Deep learning in neural networks: An overview. Neural Networks, 2015, 61: 85-117 CrossRef PubMed Google Scholar

[97] Schneider, G., Funatsu, K., Okuno, Y., and Winkler, D. (2017). De novo drug design—Ye olde scoring problem revisited. Mol Inform 36, https://doi.org/10.1002/minf.201681031. Google Scholar

[98] Schneidman-Duhovny D., Inbar Y., Nussinov R., Wolfson H.J.. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res, 2005, 33: W363-W367 CrossRef PubMed Google Scholar

[99] Scott D.E., Bayly A.R., Abell C., Skidmore J.. Small molecules, big targets: drug discovery faces the protein-protein interaction challenge. Nat Rev Drug Discov, 2016, 15: 533-550 CrossRef PubMed Google Scholar

[100] Segler, M.H.S., Kogej, T., Tyrchan, C., and Waller, M.P. (2018). Generating focussed molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4, 120-131. Google Scholar

[101] Sheridan R.P.. Time-split cross-validation as a method for estimating the goodness of prospective prediction.. J Chem Inf Model, 2013, 53: 783-790 CrossRef PubMed Google Scholar

[102] Shin W.H., Christoffer C.W., Kihara D.. In silico structure-based approaches to discover protein-protein interaction-targeting drugs. Methods, 2017, 131: 22-32 CrossRef PubMed Google Scholar

[103] Shoemaker R.H.. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer, 2006, 6: 813-823 CrossRef PubMed Google Scholar

[104] Sim, D.S.M. (2015a). Drug Distribution (Springer International Publishing). Google Scholar

[105] Sim, D.S.M. (2015b). Drug elimination. In Chan, Y., Ng, K., and Sim, D., ed. Pharmacological Basis of Acute Care (Springer, Cham), pp. 37-47. Google Scholar

[106] Smith, E.G., and Wiswesser, W.J. (1975). The Wiswesser Line-Formula Chemical Notation (New York: McGraw-Hill). Google Scholar

[107] Spencer M., Eickholt J., Jianlin Cheng J.. A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans Comput Biol Bioinf, 2015, 12: 103-112 CrossRef PubMed Google Scholar

[108] Subramanian G., Ramsundar B., Pande V., Denny R.A.. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J Chem Inf Model, 2016, 56: 1936-1949 CrossRef PubMed Google Scholar

[109] Sushko, I., Salmina, E., Potemkin, V.A., Poda, G., and Tetko, I.V. (2012). ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J Chem Inf Model 52, 2310--2316. Google Scholar

[110] Szklarczyk D., Franceschini A., Wyder S., Forslund K., Heller D., Huerta-Cepas J., Simonovic M., Roth A., Santos A., Tsafou K.P., et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res, 2015, 43: D447-D452 CrossRef PubMed Google Scholar

[111] Talele T., Khedkar S., Rigby A.. Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. CTMC, 2010, 10: 127-141 CrossRef Google Scholar

[112] Tian, S., Li, Y., Wang, J., Zhang, J., and Hou, T. (2011). ADME evaluation in drug discovery. 9. Prediction of oral bioavailability in humans based on molecular properties and structural fingerprints. Mol Pharm 8, 841--851. Google Scholar

[113] Tishby, N., and Zaslavsky, N. (2015). Deep learning and the information bottleneck principle. Paper presented at: Information Theory Workshop, arXiv:1503.02406v1. Google Scholar

[114] Todeschini, R., and Consonni, V. (2009). Molecular Descriptors for Chemoinformatics (Wiley-VCH). Google Scholar

[115] Turner, J.R. (2010). New Drug Development (Springer New York). Google Scholar

[116] Unterthiner, T., Mayr, A., Klambauer, G., Steijaert, M., Ceulemans, H., Wegner, J.K., and Hochreiter, S. (2014). Deep learning as an opportunity in virtual screening. Paper presented at: The Workshop on Deep Learning & Representation Learning. Google Scholar

[117] Urban, G., Subrahmanya, N., and Baldi, P. (2018). Inner and outer recursive neural networks for chemoinformatics applications. J Chem Inf Model 58, 207--211. Google Scholar

[118] Vakser I.A.. Protein-protein docking: from interaction to interactome. BioPhys J, 2014, 107: 1785-1793 CrossRef PubMed ADS Google Scholar

[119] Valkov, E., Sharpe, T., Marsh, M., Greive, S., and Hyvonen, M. (2012). Targeting protein-protein interactions and fragment-based drug discovery. Top Curr Chem 317, 145--179. Google Scholar

[120] Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). Matching networks for one shot learning. Papers published at the Neural Information Processing Systems Conference. Google Scholar

[121] Vohora, D., and Singh, G. (2017). Pharmaceutical Medicine and Translational Clinical Research (Academic Press). Google Scholar

[122] Voosen, P. (2017). The AI detectives. Science 357, 22--27. Google Scholar

[123] Wallach, I., Dzamba, M., and Heifets, A. (2015). AtomNet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. Mathematische Zeitschrift 47, 34--46. Google Scholar

[124] Wang C., Zhang Y.. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem, 2017, 38: 169-177 CrossRef PubMed Google Scholar

[125] Wang J., Luo C., Shan C., You Q., Lu J., Elf S., Zhou Y., Wen Y., Vinkenborg J.L., Fan J., et al. Inhibition of human copper trafficking by a small molecule significantly attenuates cancer cell proliferation. Nat Chem, 2015, 7: 968-979 CrossRef PubMed ADS Google Scholar

[126] Wang N.N., Dong J., Deng Y.H., Zhu M.F., Wen M., Yao Z.J., Lu A.P., Wang J.B., Cao D.S.. ADME properties evaluation in drug discovery: prediction of Caco-2 cell permeability using a combination of NSGA-II and boosting. J Chem Inf Model, 2016, 56: 763-773 CrossRef PubMed Google Scholar

[127] Wang S., Sun S., Li Z., Zhang R., Xu J.. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol, 2017, 13: e1005324 CrossRef PubMed ADS arXiv Google Scholar

[128] Wang, W., Yang, S., Zhang, X., and Li, J. (2014). Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 30, 2923--2930. Google Scholar

[129] Weininger, D. (2011). Simplified Molecular Input Line Entry Specification. Google Scholar

[130] Willett, P. (2006). Similarity-based virtual screening using 2D fingerprints. Drug Discovery Today 11, 1046--1053. Google Scholar

[131] Wójcikowski M., Zielenkiewicz P., Siedlecki P.. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J Cheminform, 2015, 7: 26 CrossRef PubMed Google Scholar

[132] Wu, Z., Ramsundar, B., Feinberg, E.N., Gomes, J., Geniesse, C., Pappu, A.S., Leswing, K., and Pande, V. (2017). MoleculeNet: A benchmark for molecular machine learning. arXiv:1703.00564v2. Google Scholar

[133] Xing J., Lu W., Liu R., Wang Y., Xie Y., Zhang H., Shi Z., Jiang H., Liu Y.C., Chen K., et al. Machine-learning-assisted approach for discovering novel inhibitors targeting bromodomain-containing protein 4. J Chem Inf Model, 2017, 57: 1677-1690 CrossRef PubMed Google Scholar

[134] Xu Y., Pei J., Lai L.. Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model, 2017, 57: 2672-2685 CrossRef PubMed Google Scholar

[135] Xue L.C., Dobbs D., Bonvin A.M.J.J., Honavar V.. Computational prediction of protein interfaces: A review of data driven methods. FEBS Lett, 2015, 589: 3516-3526 CrossRef PubMed Google Scholar

[136] Yamanishi Y., Araki M., Gutteridge A., Honda W., Kanehisa M.. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 2008, 24: i232-i240 CrossRef PubMed Google Scholar

[137] Yap C.W.. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem, 2011, 32: 1466-1474 CrossRef PubMed Google Scholar

[138] Zaretzki J., Matlock M., Swamidass S.J.. XenoSite: accurately predicting CYP-mediated sites of metabolism with neural networks. J Chem Inf Model, 2013, 53: 3373-3383 CrossRef PubMed Google Scholar

[139] Zhang Q.C., Petrey D., Norel R., Honig B.H.. Protein interface conservation across structure space. Proc Natl Acad Sci USA, 2010, 107: 10896-10901 CrossRef PubMed ADS Google Scholar

[140] Zsoldos Z., Reid D., Simon A., Sadjad S.B., Johnson A.P.. eHiTS: a new fast, exhaustive flexible ligand docking system. J Mol Graphics Model, 2007, 26: 198-212 CrossRef PubMed Google Scholar

  • Figure 1

    The drug discovery, drug design topics and AI models.

  • Figure 2

    The encoding of the chemical reaction. A, B and C represent the reactants. P represents the main product.

  • Table 1   Summary of the molecular representation

    Representation methods


    Molecular fingerprints:

    MACCS, ECFP, FCFP, Molprint2D, etc.

    MACCS was employed as the input and output of the AAE to search anti-cancer molecules (Kadurin et al., 2017a).

    Graphs: the molecular graph

    CNN graph convolutional representation methods: Duvenaud graph convolution fingerprints (Duvenaud et al., 2015), Kearnes graph convolution fingerprints (Kearnes et al., 2016b), and Coley’s graph convolution fingerprints (Coley et al., 2017).

    Gregor Urban et al. developed the inner and outer recursive neural networks for graph representation of the molecule (Urban et al., 2018).

    ASCII strings: SMILES, InChI, SLN, WLN, etc.

    Olivecrona et al. developed the deep reinforcement learning method to tune the RNN to generate the molecules with predicted biological activity (Olivecrona et al., 2017).

    SMILES can be directly used as an input feature of RNN to predict molecular properties (Goh et al., 2017).

    Numbers: molecular descriptor

    Ma et al. used the DNN to predict molecular bioactivity with the union of the atom pair descriptor and the donor-acceptor pair descriptor (Ma et al., 2015).

    Mayr et al. developed a multi-task DNN model to predict with the chemical descriptors (Mayr et al., 2016).

  • Table 2   Summary of the AI implementation programs in drug design






    A free python library that incorporates many high quality AI algorithms for the drug discovery

    Neural Graph Fingerprints


    CNN is used to generate molecular fingerprints to predict molecular properties.



    The tensor-basd CNN is used to predict molecular properties.



    Multi-task DNN is used to predict molecular activity.



    A rescoring approach combining the RF with AutoDock scoring function

    Chemical VAE


    An implementation of VAE generation model proposed by Gómez-Bombarelli et al.

    ORGANIC (Sanchez-Lengeling, 2017)


    A generative model for de novo molecule design with desired properties



    A generative model for de novo molecule design by using RNN and reinforcement learning

    Open Drug Discovery Toolkit (ODDT)

    (Wójcikowski et al., 2015)


    A modular and comprehensive toolkit for use in cheminformatics and molecular modeling

    JunctionTree VAE (Jin et al., 2018)


    A generative model for de novo molecular design based on junction tree VAE



    A score evaluating synthetic complexity of the molecule



    Two kinds of recursive neural networks used to predict molecular properties

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号