Fig 3 - uploaded by Kazuhiko Kawamoto
Content may be subject to copyright.
Average reward for the plain (blue) and broken (green) quadruped tasks. Error bars indicate the standard error.

Average reward for the plain (blue) and broken (green) quadruped tasks. Error bars indicate the standard error.

Source publication
Preprint
Full-text available
This study is aimed at addressing the problem of fault tolerance of quadruped robots to actuator failure, which is critical for robots operating in remote or extreme environments. In particular, an adaptive curriculum reinforcement learning algorithm with dynamics randomization (ACDR) is established. The ACDR algorithm can adaptively train a quadru...

Contexts in source publication

Context 1
... average reward and average walking distance for all algorithms are shown in Fig. 3 and Fig. 4, respectively. In both the plain and broken conditions, UDR is inferior to the Baseline, which does not consider robot failures. This result indicates that the simple UDR cannot adapt to robot failures. The hard2easy curriculum of LCDR, denoted by LCDR h2e, outperforms the Baseline in terms of the average reward, as shown in Fig. 3. ...
Context 2
... in Fig. 3 and Fig. 4, respectively. In both the plain and broken conditions, UDR is inferior to the Baseline, which does not consider robot failures. This result indicates that the simple UDR cannot adapt to robot failures. The hard2easy curriculum of LCDR, denoted by LCDR h2e, outperforms the Baseline in terms of the average reward, as shown in Fig. 3. However, LCDR h2e corresponds to an inferior walking ability compared to that of the Baseline for both plain and broken conditions, as shown in Fig. 4. This result indicates that LCDR h2e implements a conservative policy that does not promote walking as active as that generated by the Baseline. For both LCDR and ACDR, the hard2easy ...
Context 3
... contrast, the easy2hard curriculum gradually increases the degree of difficulty, and eventually, the leg stops moving. Figures 3 and 4 show that the hard2easy curriculum is more effective than the easy2hard curriculum. We discuss the reason below. ...
Context 4
... in Section IV-B for robots trained on the interval k ∈ [0.5, 1.5]; in particular, in this framework, the robots are not trained at k = 0. Figures 7 and 8 show the average reward and average walking distance, respectively, and Fig. 9 shows the average reward for each k over the interval [0,1]. Comparison of these results with those presented in Figs. 3, 4, and 5, demonstrates that by avoiding training in the proximity of k = 0, the average rewards of LCDR e2h and ACDR e2h are increased. In summary, the easy2hard curriculum deteriorates the learned policy owing to the strong degree of robot failure introduced at the end of training, whereas the hard2easy curriculum does ...
Context 5
... average reward and average walking distance for all algorithms are shown in Fig. 3 and Fig. 4, respectively. In both the plain and broken conditions, UDR is inferior to the Baseline, which does not consider robot failures. This result indicates that the simple UDR cannot adapt to robot failures. The hard2easy curriculum of LCDR, denoted by LCDR h2e, outperforms the Baseline in terms of the average reward, as shown in Fig. 3. ...
Context 6
... in Fig. 3 and Fig. 4, respectively. In both the plain and broken conditions, UDR is inferior to the Baseline, which does not consider robot failures. This result indicates that the simple UDR cannot adapt to robot failures. The hard2easy curriculum of LCDR, denoted by LCDR h2e, outperforms the Baseline in terms of the average reward, as shown in Fig. 3. However, LCDR h2e corresponds to an inferior walking ability compared to that of the Baseline for both plain and broken conditions, as shown in Fig. 4. This result indicates that LCDR h2e implements a conservative policy that does not promote walking as active as that generated by the Baseline. For both LCDR and ACDR, the hard2easy ...
Context 7
... contrast, the easy2hard curriculum gradually increases the degree of difficulty, and eventually, the leg stops moving. Figures 3 and 4 show that the hard2easy curriculum is more effective than the easy2hard curriculum. We discuss the reason below. ...
Context 8
... in Section IV-B for robots trained on the interval k ∈ [0.5, 1.5]; in particular, in this framework, the robots are not trained at k = 0. Figures 7 and 8 show the average reward and average walking distance, respectively, and Fig. 9 shows the average reward for each k over the interval [0,1]. Comparison of these results with those presented in Figs. 3, 4, and 5, demonstrates that by avoiding training in the proximity of k = 0, the average rewards of LCDR e2h and ACDR e2h are increased. In summary, the easy2hard curriculum deteriorates the learned policy owing to the strong degree of robot failure introduced at the end of training, whereas the hard2easy curriculum does ...

Similar publications

Article
Full-text available
In order to improve the accuracy of robot control system, a scheme based on artificial intelligence is proposed. On the basis of the software environment of reinforcement learning simulation platform, a kind of rounding scheme in dynamic environment is designed and simulated. The results show that when the inclination sensor is placed on an incline...