課程目錄: 基于函數(shù)逼近的預(yù)測(cè)與控制培訓(xùn)
4401 人關(guān)注
(78637/99817)
課程大綱:

    基于函數(shù)逼近的預(yù)測(cè)與控制培訓(xùn)

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

日韩精品一区在线| 亚洲av无码国产精品色在线看不卡| 久久久久久精品久久久久| 精品一区二区三区在线成人| 99在线观看精品视频| 56prom在线精品国产| 国产精品揄拍一区二区久久| 久久精品免费一区二区喷潮 | 久久无码国产专区精品| 精品久久久噜噜噜久久久| 亚洲精品久久无码| 国产精品免费综合一区视频| 亚洲精品乱码久久久久久不卡| 久久99精品国产麻豆不卡| 亚洲国产另类久久久精品黑人| 久久国产精品范冰啊| 亚洲欧洲精品国产区| 久久久久人妻精品一区三寸| 国产伦精品一区二区三区无广告| 国产精品美女网站| 国产精品无码成人午夜电影| 欧洲精品久久久av无码电影| 精品免费久久久久久久| 99亚洲精品卡2卡三卡4卡2卡| 国产成人精品午夜二三区| 久久久精品国产亚洲成人满18免费网站 | 中文国产成人精品久久app| 精品福利资源在线| 久久精品国产亚洲av麻豆色欲| 亚洲国产精品美女| 欧洲精品一卡2卡三卡4卡乱码| 人人妻人人做人人爽精品| 伊人久久精品午夜| 久久99国产综合精品| 无码精品不卡一区二区三区| 人妖系列精品视频在线观看| 正在播放国产精品每日更新| 99久久免费看国产精品| 精品国产自在久久| 伊人精品久久久大香线蕉99| 亚洲一区精品视频在线|