分子挖掘

分子探勘(Molecule mining)為使用分子数据挖掘。由於分子可由分子圖表示,這與圖形挖掘結構化數據挖掘密切相關。主要問題是如何在區分數據實例時表示分子。其中一種方法是化學相似性度量,这在化學信息學領域具有悠久的傳統。

計算化學相似性的典型方法是使用化學指紋,但這会导致丟失有關分子拓撲的基礎信息。挖掘分子圖直接避免了這個問題。反向QSAR問題也適用於矢量映射問題。

編碼(分子i,分子j\neq i)

核心方法

最大值共同圖形方法(Maximum Common Graph methods)

  • MCS-HSCS[9] (單MCS最高得分普通子結構(HSCS)排名策略)
  • 小分子子图檢測器(SMSD)[10]-是一個基於Java的軟件庫,用於計算小分子之間的最大共同子圖(MCS)。這將有助於我們找到兩個分子之間的相似性/距離。 MCS也用於通過擊打分子來篩選藥物化合物,其分享共同的子圖(子結構)。[11]

編碼(分子i)

分子查詢方法

基於神經網絡特殊架構的方法

参见

参考文献

  1. H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
  2. H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
  3. H. Fröhlich, J. K. Wegner, A. Zell, Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression, QSAR Comb. Sci., 2006, 25, 317-326. doi:10.1002/qsar.200510135
  4. H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
  5. P. Mahe, L. Ralaivola, V. Stoven, J. Vert, The pharmacophore kernel for virtual screening with support vector machines, J Chem Inf Model, 2006, 46, 2003-2014. doi:10.1021/ci060138m
  6. P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. . Proceedings of the 21st ICML. 2004: 552–559.
  7. L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi. . Neural Networks. 2005, 18: 1093–1110 [2017-07-02]. doi:10.1016/j.neunet.2005.07.009. (原始内容存档于2015-09-24).
  8. P. Mahé and J.-P. Vert. . Machine Learning. 2009, 75 (1): 3–35. ISSN 0885-6125. doi:10.1007/s10994-008-5086-2.
  9. J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. doi:10.1002/qsar.200510009
  10. S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. doi:10.1186/1758-2946-1-12
  11. . [2017-07-02]. (原始内容存档于2020-01-28).
  12. R. D. King, A. Srinivasan, L. Dehaspe, Wamr: a data mining tool for chemical data, J. Comput.-Aid. Mol. Des., 2001, 15, 173-181. doi:10.1023/A:1008171016861
  13. L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
  14. A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001, 2, 87-92.
  15. A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
  16. A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
  17. M. Kuramochi, G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs, IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9), 1038-1051.
  18. M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent Substructure-Based Approaches for Classifying Chemical Compounds, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8), 1036-1050.
  19. C. Helma, T. Cramer, S. Kramer, L. de Raedt, Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds, J. Chem. Inf. Comput. Sci., 2004, 44, 1402-1411. doi:10.1021/ci034254q
  20. T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
  21. T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
  22. T. Meinl, M. R. Berthold, Hybrid Fragment Mining with MoFa and FSG, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  23. S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  24. C. Helma, Predictive Toxicology, CRC Press, 2005.
  25. M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
  26. K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
  27. X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
  28. A. Karwath, L. D. Raedt, SMIREP: predicting chemical activity from SMILES, J Chem Inf Model, 2006, 46, 2432-2444. doi:10.1021/ci060159g
  29. H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, Discovering H-Bonding Rules in Crystals with Inductive Logic Programming, Mol Pharm, 2006, 3, 665-674 . doi:10.1021/mp060034z
  30. P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity, J. Chem. Inf. Model., 2006, ASAP alert. doi:10.1021/ci600411v
  31. N. Wale, G. Karypis. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification, ICDM, ''2006, 678-689.
  32. A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, In Proc. of ECML--PKDD, pp. 365–376, 2008.
  33. Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , in BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
  34. Baskin, I. I.; V. A. Palyulin; N. S. Zefirov. . Doklady Akademii Nauk SSSR. 1993, 333 (2): 176–179.
  35. I. I. Baskin, V. A. Palyulin, N. S. Zefirov. . J. Chem. Inf. Comput. Sci. 1997, 37 (4): 715–721. doi:10.1021/ci940128y.
  36. D. B. Kireev. . J. Chem. Inf. Comput. Sci. 1995, 35 (2): 175–180. doi:10.1021/ci00024a001.
  37. A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina. . Applied Intelligence. 2000, 12 (1-2): 117–146. doi:10.1023/A:1008368105614.
  38. A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci. . J. Chem. Inf. Comput. Sci. 2001, 41 (1): 202–218. PMID 11206375. doi:10.1021/ci9903399.
  39. O. Ivanciuc. . Roumanian Chemical Quarterly Reviews. 2001, 8: 197–220.
  40. A. Goulon, T. Picot, A. Duprat, G. Dreyfus. . SAR and QSAR in Environmental Research. 2007, 18 (1-2): 141–153. PMID 17365965. doi:10.1080/10629360601054313.

进一步阅读

  • Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
  • R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
  • Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997ISBN 0-521-58519-8
  • R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3-527-29913-0

参见

外部链接

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.