Reinforcement learning process [5]

Reinforcement learning process [5]

Context in source publication

Context 1
... choose actions to interact with environment by observing the states and then it will get a reward or a punishment (r t ). Because of the actions did by agent, the environment might change, and the agent needs to choose actions again due to the new state of environment (s t+1 ) and getting new reward or a punishment (r t+1 ). This process shows in Fig. 2 [5]. Many reinforcement learning situation can be described as Markov Decision Process that a quintuple including S, A, P, R, γ. S is a states sequence extract from environment. For example, the position of each Go chess piece can be the state when playing the game of go. A is actions that an agent can select to do. P is the ...

Similar publications

Preprint
Full-text available
This paper studies how a domain-independent planner and combinatorial search can be employed to play Angry Birds, a well established AI challenge problem. To model the game, we use PDDL+, a planning language for mixed discrete/continuous domains that supports durative processes and exogenous events. The paper describes the model and identifies key...
Article
Full-text available
In this paper, reinforcement learning will be applied to the game flappy bird with two methods DQN and Q-learning. Then, we compare the performance through the visualization of data. Furthermore, more results from other games are summarized to analysis the corresponding advantages and disadvantages. Finally, we discuss and compare these two reinfor...

Citations

Article
Full-text available
A transformer neural network is employed in the present study to predict Q-values in a simulated environment using reinforcement learning techniques. The goal is to teach an agent to navigate and excel in the Flappy Bird game, which became a popular model for control in machine learning approaches. Unlike most top existing approaches that use the game’s rendered image as input, our main contribution lies in using sensory input from LIDAR, which is represented by the ray casting method. Specifically, we focus on understanding the temporal context of measurements from a ray casting perspective and optimizing potentially risky behavior by considering the degree of the approach to objects identified as obstacles. The agent learned to use the measurements from ray casting to avoid collisions with obstacles. Our model substantially outperforms related approaches. Going forward, we aim to apply this approach in real-world scenarios.