Article

2P1-G12 Efficiency Improvement of Reinforcemnt Learning Using Parallel Processing for Combination Value Function

January 2010
The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2010:_2P1-G12_1-_2P1-G12_4

January 2010
2010:_2P1-G12_1-_2P1-G12_4

DOI:10.1299/jsmermd.2010._2P1-G12_1

Authors:

University of the Ryukyus

In this paper, efficiency improvement of reinforcement learning using parallel processing for combination value function. We propose the method of periodically composing Q table of local learning clusters to global Q table. We apply this method to two applications. One is maze problem and an another is behavior rule detection problem for modular typed robot. Q Learning method and Monte Carlo method are compared with profit share method that learns robot behaviors. We presented computer experiments of 40 PC clusters. The convergence time and learning times are evaluated and discussed.

ResearchGate has not been able to resolve any citations for this publication.

ResearchGate has not been able to resolve any references for this publication.

Article

Full-text available

Using negotiable features for prescription problems

February 2011 · Computing

Data mining is usually concerned on the construction of accurate models from data, which are usually applied to well-defined problems that can be clearly isolated and formulated independently from other problems. Although much computational effort is devoted for their training and statistical evaluation, model deployment can also represent a scientific problem, when several data mining models ... [Show full abstract] have to be used together, constraints appear on their application, or they have to be included in decision processes based on different rules, equations and constraints. In this paper we address the problem of combining several data mining models for objects and individuals in a common scenario, where not only we can affect decisions as the result of a change in one or more data mining models, but we have to solve several optimisation problems, such as choosing one or more inputs to get the best overall result, or readjusting probabilities after a failure. We illustrate the point in the area of customer relationship management (CRM), where we deal with the general problem of prescription between products and customers. We introduce the concept of negotiable feature, which leads to an extended taxonomy of CRM problems of greater complexity, since each new negotiable feature implies a new degree of freedom. In this context, we introduce several new problems and techniques, such as data mining model inversion (by ranging on the inputs or by changing classification problems into regression problems by function inversion), expected profit estimation and curves, global optimisation through a Monte Carlo method, and several negotiation strategies in order to solve this maximisation problem. KeywordsData mining–Profit maximisation–Function inversion problem–Global optimisation–Negotiation–CRM–Ranking–Probability estimation–Negotiable features–Monte Carlo method

Article

Full-text available

Excited States with Selected Configuration Interaction-Quantum Monte Carlo: Chemically Accurate Exci...

July 2019 · Journal of Chemical Theory and Computation

We employ quantum Monte Carlo to obtain chemically accurate vertical and adiabatic excitation energies, and equilibrium excited-state structures for the small, yet challenging, formaldehyde and thioformaldehyde molecules. A key ingredient is a robust protocol to obtain balanced ground- and excited-state Jastrow-Slater wave functions at a given geometry, and to maintain such a balanced description ... [Show full abstract] as we relax the structure in the excited state. We use determinantal components generated via a selected configuration interaction scheme which targets the same second-order perturbation energy correction for all states of interest at different geometries, and we fully optimize all variational parameters in the resultant Jastrow-Slater wave functions. Importantly, the excitation energies as well as the structural parameters in the ground and excited states are converged with very compact wave functions comprising few thousand determinants in a minimally augmented double-ζ basis set. These results are obtained already at the variational Monte Carlo level, the more accurate diffusion Monte Carlo method yielding only a small improvement in the adiabatic excitation energies. We find that matching Jastrow-Slater wave functions with similar variances can yield excitations compatible with our best estimates; however, the variance-matching procedure requires somewhat larger determinantal expansions to achieve the same accuracy, and it is less straightforward to adapt during structural optimization in the excited state.

Conference Paper

Full-text available

Automatic Development of Robot Behaviour Using Monte Carlo Methods

August 2000

James Brusey

Control systems for autonomous robots often use an architecture known as behaviour-based [1], which means that the problem of defining what the robot does is broken down into a number of competing or cooperating modules, or behaviours. Although a single behaviour might have access to all sensory information and might be able to control all effectors, it doesn’t necessarily do so all of the time, ... [Show full abstract] or may have its control outputs adjusted by other behaviours. The behaviour-based approach has been remarkably successful because the resulting control systems are fast and robust, in comparison with deliberative approaches used in the past, which tended to yield robots that were slow and sensitive to changes in the environment. Our experience has been that, although it is often easy to develop behaviours that work, they tend to be inefficient. They are inefficient in the sense that the robot takes more sense-decide-act cycles than necessary. We address this problem by developing a general method for generating near optimal behaviours based on a reward function. The approach is based on using a Monte Carlo algorithm [2] for solving Markov Decision Processes to learn the behaviour. Monte Carlo algorithms are a subclass of Reinforcement Learning algorithms and bears similarities to Q-learning or TD(λ). This algorithm is slow to converge and so we found it necessary to train using a simulator. The level of realism in the simulator is therefore quite important. We found that we were able to improve over hand-coded behaviours and that the improvement carried over to tests on the physical robot.

Article

Full-text available

Monte Carlo Matrix Inversion and Reinforcement Learning

February 1995 · Advances in Neural Information Processing Systems

We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our ... [Show full abstract] observations lies in arguments (Curtiss, 1954) that these Monte Carlo methods scale better with respect to state-space size than do standard, iterative techniques for solving systems of linear equations. This analysis also establishes convergence rate estimates. Because methods used in RL systems for approximating the evaluation function of a fixed control policy also approximate solutions to systems of linear equations, the connection to these Monte Carlo methods establishes that algorithms very similar to TD algorithms (Sutton, 1988) are asymptotically more efficient in a precise sense than other...

Article

Full-text available

Monte Carlo Q-learning for General Game Playing

February 2018

Recently, the interest in reinforcement learning in game playing has been renewed. This is evidenced by the groundbreaking results achieved by AlphaGo. General Game Playing (GGP) provides a good testbed for reinforcement learning, currently one of the hottest fields of AI. In GGP, a specification of games rules is given. The description specifies a reinforcement learning problem, leaving programs ... [Show full abstract] to find strategies for playing well. Q-learning is one of the canonical reinforcement learning methods, which is used as baseline on some previous work (Banerjee & Stone, IJCAI 2007). We implement Q-learning in GGP for three small board games (Tic-Tac-Toe, Connect-Four, Hex). We find that Q-learning converges, and thus that this general reinforcement learning method is indeed applicable to General Game Playing. However, convergence is slow, in comparison to MCTS (a reinforcement learning method reported to achieve good results). We enhance Q-learning with Monte Carlo Search. This enhancement improves performance of pure Q-learning, although it does not yet out-perform MCTS. Future work is needed into the relation between MCTS and Q-learning, and on larger problem instances.

Article

Excitation Variance Matching with Limited Configuration Interaction Expansions in Variational Monte...

May 2017 · The Journal of Chemical Physics

We investigate the potential of variational Monte Carlo, through its joint ability to optimize excited states directly and evaluate a wave function's energy variance, to improve excitation energy estimates in the context of selective configuration interaction. As a direct measure of an approximate wave function's accuracy, the energy variance offers a means to balance the descriptions of ground ... [Show full abstract] and excited states in a system-specific way that should counteract biases due to differing degrees of multi-reference character between states. In the regime where selective configuration interaction cannot afford to exhaustively converge wave function energies, achieving this type of balance between states is essential for accurate predictions of energy differences. In a number of small molecule tests, we demonstrate that Monte Carlo based variance matching does indeed reduce ground state favoritism. However, we also find that, without further development, the proposed variance-based approach cannot be expected to remove biases that arise due to the choice of molecular orbital basis.

Article

Efficient multiple control variate method with applications to exotic option pricing

August 2019 · Communication in Statistics- Theory and Methods

The Monte Carlo simulation method is still the only feasible approach to handle high dimensional problems encountered in many areas so far. The main drawback of this method is its slow convergence. A variance reduction technique is one of the main methods to speed up Monte Carlo simulations. In this paper, we reconsider the multiple control variate method and provide sufficient conditions to ... [Show full abstract] ensure that the variance of an m-variate control variate estimator is smaller than that of a k-variate control variate estimator for any k where 1≤k<m. The results can be applied to a wide range of high dimensional complex problems where exact solutions do not exist. As nontrivial examples, the results are applied to problems of options pricing under the Black-Scholes-Merton’s model. For arithmetic Asian and basket options, more efficient new control variate estimators are constructed. Numerical results show that the constructed multiple control variate estimators are more efficient than estimators with fewer control variates in reducing variances.

Article

Full-text available

Monte Carlo least-squares fitting of experimental signal waveforms

January 2006 · Journal of Information and Computational Science

This paper focuses on why the regular least–squares fitting technique is unstable when used to fit exponential functions to signal waveforms, since such functions are highly correlated. It talks about alternative approaches, such as the search method, which has a slow convergence rate of 1/N 1/M , for M parameters, where N is the number of computations performed. We have used the Monte Carlo ... [Show full abstract] method, utilizing both search and random walk, to devise a stable least–squares fitting algorithm that converges rapidly at a rate 1/N 1/2 , regardless of the number of parameters used in fitting the waveforms. The Monte Carlo approach has been tested for computed data—with and without noise, and by fitting actual experimental signal waveforms associated with optogalvanic transitions recorded with a hollow cathode discharge tube containing a mixture of neon (Ne) and carbon monoxide (CO) gases, and has yielded excellent results, making the developed algorithm both stable and fast for today's personal computers.

Article

Steady-State Properties of Single-File Systems with Conversion

July 2002 · Physical Review E

We have used Monte Carlo methods and analytical techniques to investigate the influence of the characteristic parameters, such as pipe length, diffusion, adsorption, desorption, and reaction rate constants on the steady-state properties of single-file systems with a reaction. We looked at cases when all the sites are reactive and when only some of them are reactive. Comparisons between mean-field ... [Show full abstract] predictions and Monte Carlo simulations for the occupancy profiles and reactivity are made. Substantial differences between mean-field and the simulations are found when rates of diffusion are high. Mean-field results only include single-file behavior by changing the diffusion rate constant, but it effectively allows passing of particles. Reactivity converges to a limit value if more reactive sites are added: sites in the middle of the system have little or no effect on the kinetics. Occupancy profiles show approximately exponential behavior from the ends to the middle of the system.

Article

Wavelet Monte Carlo Methods for the Global Solution of Integral Equations

June 2013

Stefan Heinrich

We study the global solution of Fredholm integral equations of the second kind by the help of Monte Carlo methods. Global solution means that we seek to approximate the full solution function. This is opposed to the usual applications of Monte Carlo, were one only wants to approximate a functional of the solution. In recent years several researchers developed Monte Carlo methods also for the ... [Show full abstract] global problem. In this paper we present a new Monte Carlo algorithm for the global solution of integral equations. We use multiwavelet expansions to approximate the solution. We study the be-haviour of variance on increasing levels, and based on this, develop a new variance reduction technique. For classes of smooth kernels and right hand sides we determine the convergence rate of this algorithm and show that it is higher than those of previously developed algorithms for the global problem. Moreover, an information-based complexity analysis shows that our algorithm is optimal among all stochastic algorithms of the same computational cost and that no deterministic algorithm of the same cost can reach its convergence rate.

Article

Full-text available

Efficiencies of Dynamic Monte Carlo Algorithms for Off-Lattice Particle Systems with a Single Impuri...

February 2010 · Physics Procedia

The efficiency of dynamic Monte Carlo algorithms for off-lattice systems composed of particles is studied for the case of a single impurity particle. The theoretical efficiencies of the rejection-free method and of the Monte Carlo with Absorbing Markov Chains method are given. Simulation results are presented to confirm the theoretical efficiencies.

Article

Convergence Rates of a Dynamic Monte Carlo Rejection-Free Method for Interacting particles

March 2008

We calculated the efficiency of a Rejection-Free Monte Carlo method ootnotetextH. Watanawe, S. Yukawa, M.A. Novotny and N. Ito, Efficiency of Rejection-free dynamic Monte Carlo methods for homogeneous spin models, hard disk systems, and hard sphere system, Phys. Rew. E, 74, 026707 (2006) in the limit of low temperatures and/or high densities for d-dimensional particles interacting through a ... [Show full abstract] repulsive power-law r^p as well as Lennard-Jones Interactions. Theoretically we find the algorithmic efficiency is proportional to &p+2circ;2T^-d2 where rho is the particle density and T the temperature. For different powers (p) in 1, 2 and 3 dimensions as a function of T and rho, we report results in agreement with our theoretical predictions