課程目錄: 基于樣本的學習方法培訓
4401 人關注
(78637/99817)
課程大綱:

    基于樣本的學習方法培訓

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

日韩精品一区二区三区大桥未久| 黄床大片免费30分钟国产精品 | 久久国产乱子免费精品| 久久这里只有精品66re99| 午夜精品在线观看| 国产精品美女网站| 国产精品亚洲专区无码牛牛| 久久无码专区国产精品s| 日韩精品人成在线播放| 国产精品久久婷婷六月丁香| 国产成人精品无缓存在线播放| 99re热这里只有精品18| 日本精品一区二区三区视频| 91一区二区在线观看精品| 国产在线观看高清精品| 国产麻传媒精品国产AV| 精品无码无人网站免费视频| 亚洲精品动漫在线| 欧洲精品无码成人久久久| 9久热精品免费观看视频| 久热国产精品视频一区二区三区| 久久精品青青大伊人av| 精品无码久久久久久久久久| 国产精品扒开腿做爽爽爽的视频| 精品国产爽爽AV| 国产精品玩偶在线观看| 国产精品密蕾丝视频| 99精品国产在热久久婷婷| 国内精品久久久人妻中文字幕| 国产成人精品高清免费| 国语自产偷拍精品视频偷| 国产精品视频免费| 国产综合色产在线精品| 精品99在线观看| 国产精品久久免费视频| 日韩精品电影一区| 精品国产麻豆免费网站| 久久婷婷国产综合精品| 久久综合久久精品| 久久久久琪琪去精品色无码 | 香蕉国产精品频视|