发信人: arcam (arcam), 信区: Go
标 题: 看大家讨论中比较少提这个 reinforcement learning
发信站: BBS 未名空间站 (Sun Mar 13 19:20:23 2016, 美东)
We trained the neural networks on 30 million moves from games played by
human experts, until it could predict the human move 57 percent of the time
(the previous record before AlphaGo was 44 percent).
But our goal is to beat the best human players, not just mimic them. To do
this, AlphaGo learned to discover new strategies for itself, by playing
thousands of games between its neural networks, and adjusting the
connections using a trial-and-error process known as reinforcement learning.
Of course, all of this requires a huge amount of computing power, so we
made extensive use of Google Cloud Platform.
※ 来源:·WWW 未名空间站 网址：mitbbs.com 移动：在应用商店搜索未名空间·[FROM: 2.]