AlphaFold

AlphaFold(直译:阿尔法折叠)是Alphabet旗下Google旗下DeepMind开发的一款蛋白质结构预测程序[1]。该进程被设计为一个深度学习系统[2]

three individual polypeptide chains at different levels of folding and a cluster of chains
氨基酸折叠形成蛋白质

AlphaFold人工智能有2个主要版本:AlphaFold 1(2018)和AlphaFold 2(2020)。前者使用AlphaFold 1在2018年12月的第13届CASP英语:,直译:蛋白质结构预测的关键评估)的排名中第一。该进程特别成功地预测了被竞赛组织者评为最困难的目标的最准确结构,其中没有来自具有部分相似串行的蛋白质的现有模板结构。

蛋白质卷曲折叠会构成三维结构,蛋白质的功能正由其结构决定。了解蛋白质结构有助于开发治疗疾病的药物[3]。DeepMind称,AlphaFold能在数天内识别蛋白质的形状,而此前学界识别蛋白质形状经常需花费数年时间[4]。2020年11月,在第14届CASP英语:,直译:蛋白质结构预测的关键评估)竞赛中[5],AlphaFold 2(2020)表现良好,中位分数为92.4(满分100分)[6]。它的准确度远远高于其他任何[7]

2021年7月15日,AlphaFold 2论文在《自然》杂志上作为高级访问出版物与开源软件和可搜索的物种蛋白质组数据库一起发表[8][9][10]

蛋白质折叠问题

蛋白质由蛋白质一级结构组成,蛋白质折叠的过程中蛋白质会自发折叠形成蛋白质三级结构。蛋白质结构对蛋白质生物学功能至关重要。然而,了解氨基酸串行如何确定蛋白质三级结构极具挑战性,这被称为「蛋白质折叠问题」。[11]「蛋白质折叠问题」涉及折叠稳定结构的原子间力热力学、蛋白质以极快速达到其最终折叠状态的机制和途径,以及如何从氨基酸串行预测蛋白质天然结构。[12]

蛋白质结构过去通过诸如X射线晶体学低温电子显微镜核磁共振等技术进行实验确定,这些技术既昂贵又耗时。[11]

过去60年努力只确定了约170,000种蛋白质结构,而所有生命形式中已知蛋白质超过2亿种。[13]

如果可以仅从氨基酸串行预测蛋白质结构,将极大地促进科学研究。然而利文索尔佯谬表明,虽蛋白质可在几毫秒内折叠,但随机计算所有可能的结构以确定真正的天然结构所需的时间比已知宇宙的年龄要长,这使得预测蛋白质为科学家们构建了生物学中的一项重大挑战。[11]

多年来,研究人员应用了许多计算方法来解决蛋白质结构预测问题,但除了小而简单的蛋白质外,它们准确性还远远远没有接近实验技术,从而限制了科学研究。

CASP于1994年发起,旨在挑战科学界做出最好的蛋白质结构预测,结果对于最困难的到2016年的蛋白质发现GDT分数也只能达到100满分的40分。[13]

2018年,AlphaFold使用人工智能深度学习技术参加CASP[11]

算法

AlphaFold蛋白质结构数据库

AlphaFold蛋白质结构数据库于2021年7月22日启动,这是AlphaFold和欧洲分子生物学实验室欧洲生物信息研究所的共同努力。AlphaFold提供对超过2亿个蛋白质结构预测的开放访问,以加速科学研究。在启动时,该数据库包含人类和20种模式生物的几乎完整UniProt蛋白质组的AlphaFold预测蛋白质结构模型,总计超过365,000种蛋白质(该数据库不包括少于16个或多于2700个氨基酸残基蛋白质[69],但对人类而言,残基蛋白质可在文档中获得。[70])。

AlphaFold目标是覆盖UniRef90中1亿个蛋白质大部分集合。截至2022年5月15日,已有992,316个可用。[71]

应用

AlphaFold已被用于预测SARS-CoV-2COVID-19的病原体)的蛋白质结构。 这些蛋白质的结构在2020年初有待实验检测[72]。在将结果发布到更大的研究界之前,英国弗朗西斯·克里克研究所(Francis Crick Institute)的科学家们对结果进行了检查。该团队还证实了对实验确定的SARS-CoV-2刺突蛋白的准确预测,该蛋白在国际开放访问数据库蛋白质数据库(Protein Data Bank)中共享,然后发布了计算确定的未充分研究的蛋白质分子的结构[73]

参见

参考文献

  1. . Deepmind. [2020-11-30]. (原始内容存档于2021-01-19).
  2. . MIT Technology Review. [2020-11-30]. (原始内容存档于2021-08-28) (英语).
  3. . 第一财经.
  4. . 金融时报中文网. [2020-12-03]. (原始内容存档于2020-12-22).
  5. Shead, Sam. . CNBC. 2020-11-30 [2020-11-30]. (原始内容存档于2021-01-28) (英语).
  6. . 科技日报. [2020-12-03]. (原始内容存档于2020-12-05).
  7. . MIT Technology Review. [2020-11-30]. (原始内容存档于2021-08-28) (英语).
  8. Jumper, John; Evans, Richard; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf; Tunyasuvunakool, Kathryn; Bates, Russ; Žídek, Augustin; Potapenko, Anna; Bridgland, Alex; Meyer, Clemens; Kohl, Simon A A; Ballard, Andrew J; Cowie, Andrew; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Adler, Jonas; Back, Trevor; Petersen, Stig; Reiman, David; Clancy, Ellen; Zielinski, Michal; Steinegger, Martin; Pacholska, Michalina; Berghammer, Tamas; Bodenstein, Sebastian; Silver, David; Vinyals, Oriol; Senior, Andrew W; Kavukcuoglu, Koray; Kohli, Pushmeet; Hassabis, Demis. . Nature. 2021-07-15, 596 (7873): 583–589. PMC 8371605可免费查阅. PMID 34265844. doi:10.1038/s41586-021-03819-2可免费查阅 (英语).
  9. . GitHub. [2021-07-24]. (原始内容存档于2021-07-23) (英语).
  10. . alphafold.ebi.ac.uk. [2021-07-24]. (原始内容存档于2021-07-24).
  11. . Deepmind. [2020-11-30]. (原始内容存档于2022-03-07).
  12. Ken A. Dill, S. Banu Ozkan, M. Scott Shell, and Thomas R. Weikl. . Annual Review of Biophysics. 2008, 37: 289–316. PMC 2443096可免费查阅. PMID 18573083. doi:10.1146/annurev.biophys.37.092707.153558.
  13. Robert F. Service, 'The game has changed.' AI triumphs at solving protein structures页面存档备份,存于), Science, 30 November 2020
  14. . Deepmind. [2020-11-30]. (原始内容存档于2020-11-30).
  15. Mohammed AlQuraishi (May 2019), AlphaFold at CASP13 页面存档备份,存于, Bioinformatics, 35(22), 4862–4865 doi:10.1093/bioinformatics/btz422. See also Mohammed AlQuraishi (December 9, 2018), AlphaFold @ CASP13: "What just happened?" 页面存档备份,存于 (blog post).
    Mohammed AlQuraishi (15 January 2020), A watershed moment for protein structure prediction 页面存档备份,存于, Nature 577, 627–628 doi:10.1038/d41586-019-03951-0
  16. AlphaFold: Machine learning for protein structure prediction 页面存档备份,存于, Foldit, 31 January 2020
  17. Torrisi, Mirko et al. (22 Jan. 2020), Deep learning methods in protein structure prediction 页面存档备份,存于. Computational and Structural Biotechnology Journal vol. 18 1301–1310. doi:10.1016/j.csbj.2019.12.011 (CC-BY-4.0)
  18. . The Economist. 2020-11-30 [2020-11-30]. ISSN 0013-0613. (原始内容存档于2020-12-03).
  19. Jeremy Kahn, Lessons from DeepMind's breakthrough in protein-folding A.I. 页面存档备份,存于, Fortune, 1 December 2020
  20. John Jumper et al., conference abstract (December 2020)
  21. See block diagram. Also John Jumper et al. (1 December 2020), AlphaFold 2 presentation 页面存档备份,存于, slide 10
  22. The structure module is stated to use a "3-d equivariant transformer architecture" (John Jumper et al. (1 December 2020), AlphaFold 2 presentation 页面存档备份,存于, slide 12).
    One design for a transformer network with SE(3)-equivariance was proposed in Fabian Fuchs et al SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks 页面存档备份,存于, NeurIPS 2020; also website 页面存档备份,存于. It is not known how similar this may or may not be to what was used in AlphaFold.
    See also the blog post 页面存档备份,存于 by AlQuaraishi on this, or the more detailed post 页面存档备份,存于 by Fabian Fuchs
  23. John Jumper et al. (1 December 2020), AlphaFold 2 presentation 页面存档备份,存于, slides 12 to 20
  24. Callaway, Ewen. . Nature. 2022-04-13, 604 (7905): 234–238 [2022-04-15]. doi:10.1038/d41586-022-00997-5. (原始内容存档于2022-07-26) (英语).
  25. Group performance based on combined z-scores 页面存档备份,存于, CASP 13, December 2018. (AlphaFold = Team 043: A7D)
  26. Sample, Ian. . The Guardian. 2018-12-02 [2020-11-30]. (原始内容存档于2019-07-18).
  27. . Deepmind. [2020-11-30].
  28. Singh, Arunima. . Nature Methods. 2020, 17 (3): 249. ISSN 1548-7105. PMID 32132733. S2CID 212403708. doi:10.1038/s41592-020-0779-y可免费查阅 (英语).
  29. See CASP 13 data tables 页面存档备份,存于 for 043 A7D, 322 Zhang, and 089 MULTICOM
  30. Wei Zheng et al,Deep-learning contact-map guided protein structure prediction in CASP13 页面存档备份,存于, Proteins: Structure, Function, and Bioinformatics, 87(12) 1149–1164 doi:10.1002/prot.25792; and slides 页面存档备份,存于
  31. Hou, Jie; Wu, Tianqi; Cao, Renzhi; Cheng, Jianlin. . Proteins: Structure, Function, and Bioinformatics (Wiley). 2019-04-25, 87 (12): 1165–1178. ISSN 0887-3585. PMC 6800999可免费查阅. PMID 30985027. bioRxiv 10.1101/552422可免费查阅. doi:10.1002/prot.25697.
  32. . Bloomberg.com. 2020-11-30 [2020-11-30]. (原始内容存档于2022-04-05) (英语).
  33. . GitHub. [2020-11-30]. (原始内容存档于2022-02-01) (英语).
  34. . MIT Technology Review. [2020-11-30]. (原始内容存档于2021-08-28) (英语).
  35. Mohammed AlQuraishi, CASP14 scores just came out and they’re astounding 页面存档备份,存于, Twitter, 30 November 2020.
  36. For the GDT_TS measure used, each atom in the prediction scores a quarter of a point if it is within 8 Å(0.80 nm) of the experimental position; half a point if it is within 4 Å, three-quarters of a point if it is within 2 Å, and a whole point if it is within 1 Å.
  37. To achieve a GDT_TS score of 92.5, mathematically at least 70% of the structure must be accurate to within 1 Å, and at least 85% must be accurate to within 2 Å.
  38. Callaway, Ewen. . Nature. 2020-11-30, 588 (7837): 203–204. Bibcode:2020Natur.588..203C. PMID 33257889. doi:10.1038/d41586-020-03348-4可免费查阅 (英语).
  39. Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionise’ medical research 页面存档备份,存于 (press release), CASP organising committee, 30 November 2020
  40. Brigitte Nerlich, Protein folding and science communication: Between hype and humility 页面存档备份,存于, University of Nottingham blog, 4 December 2020
  41. Michael Le Page, DeepMind's AI biologist can decipher secrets of the machinery of life 页面存档备份,存于, New Scientist, 30 November 2020
  42. The predictions of DeepMind’s latest AI could revolutionise medicine 页面存档备份,存于, New Scientist, 2 December 2020
  43. Cade Metz, London A.I. Lab Claims Breakthrough That Could Accelerate Drug Discovery 页面存档备份,存于, New York Times, 30 November 2020
  44. Ian Sample,DeepMind AI cracks 50-year-old problem of protein folding 页面存档备份,存于, The Guardian, 30 November 2020
  45. Lizzie Roberts, 'Once in a generation advance' as Google AI researchers crack 50-year-old biological challenge 页面存档备份,存于. Daily Telegraph, 30 November 2020
  46. Nuño Dominguez, La inteligencia artificial arrasa en uno de los problemas más importantes de la biología 页面存档备份,存于 (Artificial intelligence takes out one of the most important problems in biology), El País, 2 December 2020
  47. Jeremy Kahn, In a major scientific breakthrough, A.I. predicts the exact shape of proteins 页面存档备份,存于, Fortune, 30 November 2020
  48. Julia Merlot, Forscher hoffen auf Durchbruch für die Medikamentenforschung 页面存档备份,存于 (Researchers hope for a breakthrough for drug research), Der Spiegel, 2 December 2020
  49. Bissan Al-Lazikani, The solving of a biological mystery 页面存档备份,存于, The Spectator, 1 December 2020
  50. Tom Whipple, "Deepmind computer solves new puzzle: life", The Times, 1 December 2020. front page image 页面存档备份,存于, via Twitter.
  51. Tom Whipple, Deepmind finds biology’s ‘holy grail’ with answer to protein problem 页面存档备份,存于, The Times (online), 30 November 2020.
    In all science editor Tom Whipple wrote six articles on the subject for The Times on the day the news broke. (thread 页面存档备份,存于).
  52. Tim Hubbard, The secret of life, part 2: the solution of the protein folding problem. 页面存档备份,存于, medium.com, 30 November 2020
  53. Christian Stöcker, Google greift nach dem Leben selbst 页面存档备份,存于 (Google is reaching for life itself), Der Spiegel, 6 December 2020
  54. John Jumper et al. (1 December 2020), AlphaFold 2 页面存档备份,存于. Presentation given at CASP 14.
  55. Carlos Outeiral, CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics 页面存档备份,存于, Oxford Protein Informatics Group. (3 December)
  56. Aled Edwards, The AlphaFold2 success: It took a village 页面存档备份,存于, via medium.com, 5 December 2020
  57. David Briggs, If Google’s Alphafold2 really has solved the protein folding problem, they need to show their working 页面存档备份,存于, The Skeptic, 4 December 2020
  58. The Guardian view on DeepMind’s brain: the shape of things to come 页面存档备份,存于, The Guardian, 6 December 2020
  59. Demis Hassabis, "Brief update on some exciting progress on #AlphaFold!" 页面存档备份,存于 (tweet), via twitter, 18 June 2021
  60. Tom Ireland, How will AlphaFold change bioscience research? 页面存档备份,存于, The Biologist, 4 December 2020
  61. Stephen Curry, No, DeepMind has not solved protein folding 页面存档备份,存于, Reciprocal Space (blog), 2 December 2020
  62. Derek Lowe, In the Pipeline: What’s Crucial And What Isn’t 页面存档备份,存于, Science Translational Medicine, 25 September 2019
  63. Philip Ball, Behind the Screens of AlphaFold 页面存档备份,存于, Chemistry World, 9 December 2020. See also tweets 页面存档备份,存于, 1 December
  64. Derek Lowe, In the Pipeline: The Big Problems 页面存档备份,存于, Science Translational Medicine, 1 December 2020
  65. Bagdonas, Haroldas; Fogarty, Carl A.; Fadda, Elisa; Agirre, Jon. . Nature Structural & Molecular Biology. 2021-10-29, 28 (11): 869–870 [2022-07-29]. ISSN 1545-9985. PMID 34716446. S2CID 240228913. doi:10.1038/s41594-021-00680-9. (原始内容存档于2022-06-23) (英语).
  66. e.g. Greg Bowman, Protein folding and related problems remain unsolved despite AlphaFold's advance 页面存档备份,存于, Folding@home blog, 8 December 2020
  67. Cristina Sáez, El último avance fundamental de la biología se basa en la investigación de un científico español 页面存档备份,存于, La Vanguardia, 2 December 2020. (Alfonso Valencia overall view)
  68. Zero Gravitas and Jacky Liang, DeepMind’s AlphaFold 2—An Impressive Advance With Hyperbolic Coverage 页面存档备份,存于, Skynet today (blog), Stanford, 9 December 2020
  69. . alphafold.ebi.ac.uk. [2021-07-29]. (原始内容存档于2022-07-29).
  70. . alphafold.ebi.ac.uk. [2021-07-27]. (原始内容存档于2022-07-29).
  71. . www.alphafold.ebi.ac.uk. [2022-07-29]. (原始内容存档于2022-08-02).
  72. . Wired. [2020-12-01]. ISSN 1059-1028. (原始内容存档于2022-04-23) (美国英语).
  73. . Deepmind. [2020-12-01]. (原始内容存档于2022-03-25).

外部链接

AlphaFold(2018年)

AlphaFold 2(2020年)

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.