Figure 2 - uploaded by Yat Long Lo
Content may be subject to copyright.
Response functions with raw inputs (top) and tile coding preprocessing (bottom) for Mountain Car control.

Response functions with raw inputs (top) and tile coding preprocessing (bottom) for Mountain Car control.

Source publication
Preprint
Full-text available
Reinforcement learning systems require good representations to work well. For decades practical success in reinforcement learning was limited to small domains. Deep reinforcement learning systems, on the other hand, are scalable, not dependent on domain specific prior knowledge and have been successfully used to play Atari, in 3D navigation from pi...

Contexts in source publication

Context 1
... vertices are all extreme points of a convex set and thus the ReLU activations will have the ability to respond to each of these sub-areas separately. Figure 2 shows heat-maps for the case where NN used raw inputs (top) and tile coding preprocessing (bottom). The feature maps were created using a neural network trained on the Mountain Car problem for 500 episodes. ...
Context 2
... heat-map represents the magnitude of the output of a node from the first hidden layer. Heat-maps on the bottom row of Figure 2 show two rather global and two rather local node responses from the hidden layer. As shown in the figure, responses from the neural net that use raw inputs are global. ...
Context 3
... vertices are all extreme points of a convex set and thus the ReLU activations will have the ability to respond to each of these sub-areas separately. Figure 2 shows heat-maps for the case where NN used raw inputs (top) and tile coding preprocessing (bottom). The feature maps were created using a neural network trained on the Mountain Car problem for 500 episodes. ...
Context 4
... heat-map represents the magnitude of the output of a node from the first hidden layer. Heat-maps on the bottom row of Figure 2 show two rather global and two rather local node responses from the hidden layer. As shown in the figure, responses from the neural net that use raw inputs are global. ...

Similar publications

Preprint
Full-text available
The aim of this paper is to study the reward based policy exploration problem in a supervised learning approach and enable robots to form complex movement trajectories in challenging reward settings and search spaces. For this, the experience of the robot, which can be bootstrapped from demonstrated trajectories, is used to train a novel Neural Pro...