梯度消失问题

梯度消失问题（Vanishing gradient problem）是一种机器学习中的难题，出現在以梯度下降法和反向传播训练人工神經網路的時候。在每次訓練的迭代中，神经网路权重的更新值与误差函数的偏導數成比例，然而在某些情况下，梯度值会几乎消失，使得权重无法得到有效更新，甚至神經網路可能完全无法继续训练。舉個例子來說明問題起因，一個传统的激勵函数如双曲正切函数，其梯度值在 $(-1, 1)$ 范围内，反向传播以链式法则来计算梯度。

這樣做的效果，相当于在 $n$ 層網路中，将 $n$ 个這些小数字相乘來計算“前端”層的梯度，这就使梯度（误差信号）随着 $n$ 呈指數遞減，导致前端層的訓練非常緩慢。

反向傳播使研究人員從頭開始訓練監督式深度人工神經網路，最初收效甚微。 1991年賽普·霍克賴特（Hochreiter）的畢業論文[1][2]正式確認了“梯度消失問題”失敗的原因。梯度消失問題不僅影響多層前饋網絡，[3]還影響循環網路。[4]循環網路是通過將前饋網路深度展開來訓練，在網路處理的輸入序列的每個時間步驟中，都會產生一個新的層。

當所使用的激勵函數之導數可以取較大值時，則可能會遇到相關的梯度爆炸問題（exploding gradient problem）。

解決方案

參考文獻

S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.
Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav. . Journal of Computational Chemistry. 2017-06-15, 38 (16): 1291–1307. PMID 28272810. arXiv:1701.04503 . doi:10.1002/jcc.24764 （英语）.
Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua. . 2012-11-21. arXiv:1211.5063  [cs.LG].

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.

[2] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.

[3] Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav. . Journal of Computational Chemistry. 2017-06-15, 38 (16): 1291–1307. PMID 28272810. arXiv:1701.04503 . doi:10.1002/jcc.24764 （英语）.

[4] Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua. . 2012-11-21. arXiv:1211.5063  [cs.LG].