网易首页
16. RL definitions, value iteration, policy iteration (Schulman) - 1
2023年9月23日 1155观看
加州大学伯克利分校 2017 深度增强学习课程
大学课程 / 社会学
https://www.youtube.com/playlist?list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX CS294-112 Deep Reinforcement Learning Sp17 课程主页:http://rll.berkeley.edu/deeprlcourse/
共57集
7.3万人观看
1
Introduction and course overview (Levine, Finn, Schulman) - 1
26:11
2
Introduction and course overview (Levine, Finn, Schulman) - 2
26:14
3
Introduction and course overview (Levine, Finn, Schulman) - 3
26:08
4
Supervised learning and decision making (Levine) - 1
24:06
5
Supervised learning and decision making (Levine) - 2
24:07
6
Supervised learning and decision making (Levine) - 3
24:03
7
Optimal control and planning (Levine) - 1
21:06
8
Optimal control and planning (Levine) - 2
21:13
9
Optimal control and planning (Levine) - 3
21:03
10
Learning dynamical system models from data (Levine) - 1
27:27
11
Learning dynamical system models from data (Levine) - 2
27:35
12
Learning dynamical system models from data (Levine) - 3
27:22
13
Learning policies by imitating optimal controllers (Levine) - 1
23:05
14
Learning policies by imitating optimal controllers (Levine) - 2
23:08
15
Learning policies by imitating optimal controllers (Levine) - 3
22:58
16
RL definitions, value iteration, policy iteration (Schulman) - 1
17:19
17
RL definitions, value iteration, policy iteration (Schulman) - 2
17:22
18
RL definitions, value iteration, policy iteration (Schulman) - 3
17:18
19
Reinforcement learning with policy gradients (Schulman) - 1
21:48
20
Reinforcement learning with policy gradients (Schulman) - 2
21:54
21
Reinforcement learning with policy gradients (Schulman) - 3
21:42
22
Learning Q-functions: Q-learning, SARSA, and others (Schulman) - 1
25:50
23
Learning Q-functions: Q-learning, SARSA, and others (Schulman) - 2
25:53
24
Learning Q-functions: Q-learning, SARSA, and others (Schulman) - 3
25:42
25
Advanced Q-learning: replay buffers, target networks, double Q-learning (Sc - 1
26:47
26
Advanced Q-learning: replay buffers, target networks, double Q-learning (Sc - 2
26:55
27
Advanced Q-learning: replay buffers, target networks, double Q-learning (Sc - 3
26:41
28
Advanced topics in imitation and safety (Finn) - 1
27:53
29
Advanced topics in imitation and safety (Finn) - 2
27:56
30
Advanced topics in imitation and safety (Finn) - 3
27:47
31
Inverse RL: acquiring objectives from demonstration (Finn) - 1
24:47
32
Inverse RL: acquiring objectives from demonstration (Finn) - 2
24:48
33
Inverse RL: acquiring objectives from demonstration (Finn) - 3
24:47
34
Advanced policy gradients: natural gradient and TRPO (Schulman) - 1
28:05
35
Advanced policy gradients: natural gradient and TRPO (Schulman) - 2
28:08
36
Advanced policy gradients: natural gradient and TRPO (Schulman) - 3
28:02
37
Policy gradient variance reduction and actor-critic algorithms (Schulman) - 1
26:55
38
Policy gradient variance reduction and actor-critic algorithms (Schulman) - 2
27:00
39
Policy gradient variance reduction and actor-critic algorithms (Schulman) - 3
26:51
40
Summary of policy gradients and temporal difference methods (Schulman) - 1
24:06
41
Summary of policy gradients and temporal difference methods (Schulman) - 2
24:10
42
Summary of policy gradients and temporal difference methods (Schulman) - 3
23:59
43
The exploration problem (Schulman) - 1
27:18
44
The exploration problem (Schulman) - 2
27:18
45
The exploration problem (Schulman) - 3
27:17
46
Parallel RL algorithms, open problems and challenges in deep reinforcement - 1
26:14
47
Parallel RL algorithms, open problems and challenges in deep reinforcement - 2
26:22
48
Parallel RL algorithms, open problems and challenges in deep reinforcement - 3
26:11
49
Transfer in Reinforcement Learning (Finn) - 1
28:18
50
Transfer in Reinforcement Learning (Finn) - 2
28:18
51
Transfer in Reinforcement Learning (Finn) - 3
28:16
52
Neural Architecture Search with Reinforcement Learning: Quoc Le and Barret Z - 1
25:24
53
Neural Architecture Search with Reinforcement Learning: Quoc Le and Barret Z - 2
25:29
54
Neural Architecture Search with Reinforcement Learning: Quoc Le and Barret Z - 3
25:17
55
Generalization and Safety in Reinforcement Learning and Control: Aviv Tamar - 1
25:39
56
Generalization and Safety in Reinforcement Learning and Control: Aviv Tamar - 2
25:40
57
Generalization and Safety in Reinforcement Learning and Control: Aviv Tamar - 3
25:33
相关视频
第56/60集 · 09:40
【【公开课】走进中国传统文化(全41集)】14.3庄子的浪漫
大学课程
2022年9月29日
7441观看
03:39
庄子3个寓言故事,藏着3层人生境界,指点世人活出自我
轻知识
2023年9月3日
1874观看
01:33
杜甫崇奉儒家学,并为之付出一生的追求和血泪
轻知识
6月前
1019观看
02:18
中国人有庄子,不是我们的幸运,而是我们的不幸
轻知识
1年前
1493观看
01:52
中国文人为什么都酷爱庄子?
轻知识
2023年8月28日
1.1万观看
04:03
我们为什么不自由?看完庄子《逍遥游》,我读懂了自由的更高境界
轻知识
1年前
1335观看
08:36
庄子是“躺平学”鼻祖吗?
2022年2月23日
4万观看
02:05
傅佩荣:庄子最复杂的几篇,其实他要说的很简单
轻知识
1月前
1006观看
04:46
在中国的圣贤里面,我最喜欢的是庄子。中国文学中最洒脱、最诗意、最自由的一股清流,源头是庄子。在哲学上...
轻知识
1年前
1156观看
05:25
庄子和老子相隔200年,他们的道家思想有什么不同?
轻知识
2023年9月2日
1万观看
01:55
学道家怎能不知道道在哪里?你问庄子,他的回答真是绝了
轻知识
8月前
1205观看
00:46
旺自己是有玄学的,牢牢记住以下几条
轻知识
10月前
1273观看
08:14
我们经常听说古代有玄学,到底什么玄学?
轻知识
1年前
3668观看
01:16
不听明人言,吃亏是必然,《止学》就是一位明白人写的。 #传统文化 #止学 #好书推荐 #儒家经典
轻知识
1年前
1911观看
06:02
傅佩荣:推崇老子的西方哲人是谁?老子有一样东西,让他非常激动
轻知识
9月前
1048观看
01:11
龙场悟道的王阳明,书法直入魏晋,气息高古,不愧是“圣人书法”
轻知识
2023年8月17日
4210观看