网易首页
21. Reinforcement learning with policy gradients (Schulman) - 3
2023年9月23日 860观看
加州大学伯克利分校 2017 深度增强学习课程
大学课程 / 社会学
https://www.youtube.com/playlist?list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX CS294-112 Deep Reinforcement Learning Sp17 课程主页:http://rll.berkeley.edu/deeprlcourse/
共57集
7.3万人观看
1
Introduction and course overview (Levine, Finn, Schulman) - 1
26:11
2
Introduction and course overview (Levine, Finn, Schulman) - 2
26:14
3
Introduction and course overview (Levine, Finn, Schulman) - 3
26:08
4
Supervised learning and decision making (Levine) - 1
24:06
5
Supervised learning and decision making (Levine) - 2
24:07
6
Supervised learning and decision making (Levine) - 3
24:03
7
Optimal control and planning (Levine) - 1
21:06
8
Optimal control and planning (Levine) - 2
21:13
9
Optimal control and planning (Levine) - 3
21:03
10
Learning dynamical system models from data (Levine) - 1
27:27
11
Learning dynamical system models from data (Levine) - 2
27:35
12
Learning dynamical system models from data (Levine) - 3
27:22
13
Learning policies by imitating optimal controllers (Levine) - 1
23:05
14
Learning policies by imitating optimal controllers (Levine) - 2
23:08
15
Learning policies by imitating optimal controllers (Levine) - 3
22:58
16
RL definitions, value iteration, policy iteration (Schulman) - 1
17:19
17
RL definitions, value iteration, policy iteration (Schulman) - 2
17:22
18
RL definitions, value iteration, policy iteration (Schulman) - 3
17:18
19
Reinforcement learning with policy gradients (Schulman) - 1
21:48
20
Reinforcement learning with policy gradients (Schulman) - 2
21:54
21
Reinforcement learning with policy gradients (Schulman) - 3
21:42
22
Learning Q-functions: Q-learning, SARSA, and others (Schulman) - 1
25:50
23
Learning Q-functions: Q-learning, SARSA, and others (Schulman) - 2
25:53
24
Learning Q-functions: Q-learning, SARSA, and others (Schulman) - 3
25:42
25
Advanced Q-learning: replay buffers, target networks, double Q-learning (Sc - 1
26:47
26
Advanced Q-learning: replay buffers, target networks, double Q-learning (Sc - 2
26:55
27
Advanced Q-learning: replay buffers, target networks, double Q-learning (Sc - 3
26:41
28
Advanced topics in imitation and safety (Finn) - 1
27:53
29
Advanced topics in imitation and safety (Finn) - 2
27:56
30
Advanced topics in imitation and safety (Finn) - 3
27:47
31
Inverse RL: acquiring objectives from demonstration (Finn) - 1
24:47
32
Inverse RL: acquiring objectives from demonstration (Finn) - 2
24:48
33
Inverse RL: acquiring objectives from demonstration (Finn) - 3
24:47
34
Advanced policy gradients: natural gradient and TRPO (Schulman) - 1
28:05
35
Advanced policy gradients: natural gradient and TRPO (Schulman) - 2
28:08
36
Advanced policy gradients: natural gradient and TRPO (Schulman) - 3
28:02
37
Policy gradient variance reduction and actor-critic algorithms (Schulman) - 1
26:55
38
Policy gradient variance reduction and actor-critic algorithms (Schulman) - 2
27:00
39
Policy gradient variance reduction and actor-critic algorithms (Schulman) - 3
26:51
40
Summary of policy gradients and temporal difference methods (Schulman) - 1
24:06
41
Summary of policy gradients and temporal difference methods (Schulman) - 2
24:10
42
Summary of policy gradients and temporal difference methods (Schulman) - 3
23:59
43
The exploration problem (Schulman) - 1
27:18
44
The exploration problem (Schulman) - 2
27:18
45
The exploration problem (Schulman) - 3
27:17
46
Parallel RL algorithms, open problems and challenges in deep reinforcement - 1
26:14
47
Parallel RL algorithms, open problems and challenges in deep reinforcement - 2
26:22
48
Parallel RL algorithms, open problems and challenges in deep reinforcement - 3
26:11
49
Transfer in Reinforcement Learning (Finn) - 1
28:18
50
Transfer in Reinforcement Learning (Finn) - 2
28:18
51
Transfer in Reinforcement Learning (Finn) - 3
28:16
52
Neural Architecture Search with Reinforcement Learning: Quoc Le and Barret Z - 1
25:24
53
Neural Architecture Search with Reinforcement Learning: Quoc Le and Barret Z - 2
25:29
54
Neural Architecture Search with Reinforcement Learning: Quoc Le and Barret Z - 3
25:17
55
Generalization and Safety in Reinforcement Learning and Control: Aviv Tamar - 1
25:39
56
Generalization and Safety in Reinforcement Learning and Control: Aviv Tamar - 2
25:40
57
Generalization and Safety in Reinforcement Learning and Control: Aviv Tamar - 3
25:33
相关视频
第24/24集 · 11:39
民间的三国:从司马仲相断狱的故事到“柴堆三国” - 3
大学课程
2022年8月18日
4444观看
02:24
个历史典故:管鲍之交是如何成就千古传奇?
轻知识
11月前
1362观看
01:28
个历史典故:秦晋联姻策略对现代人的启示
轻知识
11月前
991观看
06:14
古人是如何取名字的,不同的朝代都有何区别?
轻知识
11月前
1005观看
03:12
奇书《马前课》:诸葛亮三大预言竟然全部说中,真实解密
轻知识
2023年7月21日
1166观看
03:25
中国乃至世界上现存最早的兵书,也是历史上排名第一的军事著作
轻知识
1年前
1001观看
03:01
子弹库帛书归国,请记住它背后这些伟大的名字
轻知识
6月前
631观看
01:54
条历史典故:22年卧薪尝胆的春秋传奇
轻知识
11月前
828观看
01:55
条历史典故:从禅让到世袭,权力传承的古今变奏
轻知识
11月前
1019观看
19:26
改变三国历史的十大小人物
轻知识
1月前
1500观看
01:40
毛主席的《三国演义》首字别具一格,风格迥异,可称为“神品”
轻知识
11月前
1071观看
06:11
杨家将真的存在吗?不但存在,真实的杨家将比演义更加忠烈千秋
轻知识
1月前
1350观看
05:39
第6集 《封神》的历史原型人物:姜子牙
轻知识
10月前
1194观看
02:34
霸王别姬只是演义,历史上真正的虞姬可能被杀了,探秘灵璧虞姬墓
轻知识
1年前
936观看
02:32
关羽形象难道是被虚构的?历史上有些事他真没做过!
轻知识
1年前
1340观看
02:01
傅佩荣:清朝灭亡后,儒家变成一缕游魂?这件事要原原本本讲清楚
轻知识
1年前
1859观看