Figure - available from: Journal of Ambient Intelligence and Humanized Computing
This content is subject to copyright. Terms and conditions apply.
32 bit genotype representing 2 activation functions

32 bit genotype representing 2 activation functions

Source publication
Article
Full-text available
In the family of recurrent neural networks the long short-term model network provides promising solutions for many complex applications such as speech and voice recognition, machine translation and time series analysis. When building these networks, many tunable hyper-parameters need to be set early. Among these hyperparameters, the activation func...

Similar publications

Article
Full-text available
Low-resource languages (LRL) with complex morphology are known to be more difficult to translate in an automatic way. Some LRLs are particularly more difficult to translate than others due to the lack of research interest or collaboration. In this article, we experiment with a specific LRL, Quechua, that is spoken by millions of people in South Ame...

Citations

... However, for tasks such as classification, the number of publications underperforms the prediction publications, resulting in an opportunity niche for researchers. Mostly, the surrogate model research applied to time series classification is focused on hyperparameter optimization [30], deep learning [31], and neuroevolution [32]. Nevertheless, as far as state-of-the-art has been reviewed, surrogate models have been scarcely implemented in time series discretization. ...
Preprint
Full-text available
The enhanced multi-objective symbolic discretization for time series (eMODiTS) method employs a flexible discretization scheme using different value cuts for each non-equal time interval, which requires a computational cost for evaluating each objective function. Therefore, surrogate models were implemented to minimize this disadvantage. Nevertheless, each solution found by eMODiTS is a different-sized vector, so the surrogate model must be able to handle data sets under this characteristic. Consequently, this work's contribution lies in analyzing the surrogate models' implementation on the time series discretization, where each potential scheme is a real-number different-sized vector. For this reason, the surrogate model proposed was k-nearest Neighbors for regression with Dynamic Time Warping as a distance measure. Results suggest our proposal finds a suitable approximation to the final eMODiTS solutions with a functions evaluation reduction rate between 15% and 95%. However, according to Pareto front performance measures, the proposal Pareto front is competitive compared to the eMODiTS Pareto front, reaching an average Generational Distance (GD) between 0.0447 and 0.0536. Moreover, the average Hypervolume Ratio (HVR) ranges between 0.334 and 0.3891. Finally, our proposal compared against SAX-based methods presents a similar behavior regarding classification tasks and statistical tests.
... Many applications train stacks of LSTM RNNs by connectionist temporal classification to find the optimal RNN weight matrix (Greff et al., 2017). In addition, an LSTM can be trained by policy gradient methods for hyperparameter optimization, or by neuroevolution to find an optimal activation function (Vijayaprabakaran and Sathiyamurthy, 2021). ...
... Research on LSTM improvement based on metaheuristic algorithms is currently expanding, and these algorithms are used for weight optimization, parameter optimization, and deep learning network threshold. For example, the research conducted in this field follows: LSTM and Lion Swarm Optimization (LSTM-LSO) to predict the optimal problems [16,17]; improved LSTM based on Ant Colony Optimization (ACO-LSTM) [18]; Fireworks Algorithm (FWA), and LSTM to solve optimization problems [19]; Artificial Fish Swarm Optimization (AFSO) algorithm and LSTM for disease diagnosis [20]; Grasshopper Optimization Algorithm (GOA) and LSTM network for wind speed prediction [21]; GOA and LSTM network for detection of defective gears [22]; PSO and LSTM for wind energy prediction [23]; PSO and RNN-LSTM for detection of objects in medical images [24]; Differential Evolution (DE) and LSTM for prediction [25]; face recognition based on deep learning and Cat Swarm Optimization (CSO) [26]; disease diagnosis (cancer and heart) based on CNN-PSO [27]; and use of Improved Crossover-Based Monarch Butterfly Optimization (ICRMBO) to improve CNN [28]. ...
Article
Full-text available
An essential work in natural language processing is the Multi-Label Text Classification (MLTC). The purpose of the MLTC is to assign multiple labels to each document. Traditional text classification methods, such as machine learning usually involve data scattering and failure to discover relationships between data. With the development of deep learning algorithms, many authors have used deep learning in MLTC. In this paper, a novel model called Spotted Hyena Optimizer (SHO)-Long Short-Term Memory (SHO-LSTM) for MLTC based on LSTM network and SHO algorithm is proposed. In the LSTM network, the Skip-gram method is used to embed words into the vector space. The new model uses the SHO algorithm to optimize the initial weight of the LSTM network. Adjusting the weight matrix in LSTM is a major challenge. If the weight of the neurons to be accurate, then the accuracy of the output will be higher. The SHO algorithm is a population-based meta-heuristic algorithm that works based on the mass hunting behavior of spotted hyenas. In this algorithm, each solution of the problem is coded as a hyena. Then the hyenas are approached to the optimal answer by following the hyena of the leader. Four datasets are used (RCV1-v2, EUR-Lex, Reuters-21578, and Bookmarks) to evaluate the proposed model. The assessments demonstrate that the proposed model has a higher accuracy rate than LSTM, Genetic Algorithm-LSTM (GA-LSTM), Particle Swarm Optimization-LSTM (PSO-LSTM), Artificial Bee Colony-LSTM (ABC-LSTM), Harmony Algorithm Search-LSTM (HAS-LSTM), and Differential Evolution-LSTM (DE-LSTM). The improvement of SHO-LSTM model accuracy for four datasets compared to LSTM is 7.52%, 7.12%, 1.92%, and 4.90%, respectively.
... Through their experimental analysis, the authors demonstrate that their new activation functions outperform several standard functions on multivariate classification problems. Differential evolution is applied in [51] for evolving new activation functions for the long-short-term memory networks. The proposed method builds a hierarchical activation function of a predefined structure by searching for the most appropriate function elements that should appear in it to represent a complete activation function. ...
Article
Full-text available
The choice of activation functions can significantly impact the performance of neural networks. Due to an ever-increasing number of new activation functions being proposed in the literature, selecting the appropriate activation function becomes even more difficult. Consequently, many researchers approach this problem from a different angle, in which instead of selecting an existing activation function, an appropriate activation function is evolved for the problem at hand. In this paper, we demonstrate that evolutionary algorithms can evolve new activation functions for side-channel analysis (SCA), outperforming ReLU and other activation functions commonly applied to that problem. More specifically, we use Genetic Programming to define and explore candidate activation functions (neuroevolution) in the form of mathematical expressions that are gradually improved. Experiments with the ASCAD database show that this approach is highly effective compared to results obtained with standard activation functions and that it can match the state-of-the-art results from the literature. More precisely, the obtained results for the ASCAD fixed key dataset demonstrate that the evolved activation functions can improve the current state-of-the-art by achieving a guessing entropy of 287 for the Hamming weight model and 115 for the Identity leakage model, compared to 447 and 120 obtained in the literature.
Preprint
Power system management and operation rely heavily on short-term power load forecasting. Accurate forecasting results can help reduce power waste and economic losses. The existing power forecasting methods only forecast the future load based on historical data, which factors have the greatest influence on the power load is not considered enough, and there are no effective methods for simultaneously mining time characteristics and correlation characteristics of multidimensional time series. Therefore, we propose a new hybrid approach, which combines LSTM with attention mechanism and GA (genetic algorithm). In LSTM, GA optimizes the number of layers, dense layers, hidden layer neurons, and dense layer neurons, so as to determine the optimal parameters. On the basis of the load data set containing five characteristics of dry bulb temperature, dew point temperature, wet bulb temperature, humidity and electricity price, the method proposed in this paper will be verified. By comparing with RNN, LSTM, GRU, LSTM-Attention and GRU-Attention. According to the experimental results, the application of the proposed method noticeably minimizes the prediction error and elevates the goodness of fit of the model.
Thesis
Full-text available
Água produzida, em plataformas marítimas, é um dos efluentes recuperados de poços em conjunto com petróleo e gás natural, sendo o principal resíduo gerado nesse processo. O Teor de Óleos e Graxas (TOG) é considerado um dos principais parâmetros de controle do descarte de água produzida no mar, com limites diários e mensais definidos pela legislação vigente. A medição de TOG usada como referência pelo IBAMA é feita pelo método gravimétrico, com amostras de água coletadas diariamente e enviadas para laboratório acreditado, que fornece os resultados com defasagem de alguns dias a partir da data de amostragem. A necessidade de ações corretivas em caso de valores acima do limite tem motivado o uso de métodos alternativos que gerem estimativas com maior frequência. Neste trabalho, modelos baseados em dados são criados para obtenção de estimativas do TOG. Variáveis de processo de tratamento de água produzida, informações sobre produtos químicos e dados sobre produção diária de uma plataforma oshore foram coletados, tratados e utilizados para treinar, validar e testar esses modelos. Além disso, foram aplicadas técnicas de otimização de hiperparâmetros e seleção de atributos. Os resultados obtidos mostraram que os modelos baseados em redes neurais recorrentes (LSTM e CNN+LSTM) alcançaram desempenhos superiores se comparados aos sistemas de monitoramento online existentes.