| Schematic diagram showing data analytics process. Connections are shown between the individual analysis steps that constitute the whole project Lee et al., 2019.

| Schematic diagram showing data analytics process. Connections are shown between the individual analysis steps that constitute the whole project Lee et al., 2019.

Source publication
Article
Full-text available
The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practi...

Context in source publication

Context 1
... must be aware of such a situation in an industrial framework and should select the right approach for the analysis. He further divides data analysis into six categories (see Figure 6). This chart is useful for the basic understanding of data analysis for business experts and data experts, which helps prevent knowledge discovery from diverging from the real objective. ...

Citations

... This is accomplished by adding bias to the model, which reduces the variance component of the prediction error (i.e. in the context of the bias-variance trade-off). Another significant and underappreciated advantage of ensemble methods is improved robustness or reliability in a model's average performance(Tripathi et al. 2021)(Zhao et al. 2021)(Heidemann et al. 2021). ...
Article
Full-text available
The Electroencephalography (EEG)-based precise emotion identification is one of the most challenging tasks in pattern recognition. In this paper, an innovative EEG signal processing method is devised for an automated emotion identification. The Symlets-4 filters based “Multi Scale Principal Component Analysis” (MSPCA) is used to denoise and reduce the raw signal’s dimension. Onward, the “Modified Covariance” (MCOV) is used as a feature extractor. In the classification step, the ensemble classifiers are used. The proposed method achieved 99.6% classification accuracy by using the ensemble of Bagging and Random Forest (RF). It confirms effectiveness of the devised method in EEG-based emotion recognition.
... Data models specify the data format in the context required for relevant business processes [16]. Models enable business and technical resources to collaboratively decide how data will be stored, accessed, shared, updated, and leveraged across an organization [18,20]. Data models play a key role in bringing together all segments of an enterprise, business analysts, management, and others to cooperatively design information systems (and the databases they rely on). ...
... Most of the data models are concerned with data storage, and their development follows the same stages outlined in [20]. That is, its development starts with developing a conceptual model (domain model), which explores and provides an abstraction of the system. ...
... This gives way to the second stage of logical data models or specification models, which clarify the various logical entities (types or classes of data), the data attributes that define those entities, and the relationships between them. The third stage is the physical data modeling stage or computational model [20]. These models are usually directly translated into production database design, which supports further development of information systems. ...
... The performance evaluation of machine learning models is a multifaceted process aimed at assessing the model's predictive prowess on unseen data. By analyzing various statistical measures, deeper insights can be obtained into the model's generalization capability, robustness, and reliability for real-world deployment (Tripathi et al., 2021). This data analysis section will expand on the key model evaluation metrics and their implications. ...
Preprint
This pioneering study employs machine learning to predict startup success, addressing the long-standing challenge of deciphering entrepreneurial outcomes amidst uncertainty. Integrating the multidimensional SECURE framework for holistic opportunity evaluation with AI's pattern recognition prowess, the research puts forth a novel analytics-enabled approach to illuminate success determinants. Rigorously constructed predictive models demonstrate remarkable accuracy in forecasting success likelihood, validated through comprehensive statistical analysis. The findings reveal AI's immense potential in bringing evidence-based objectivity to the complex process of opportunity assessment. On the theoretical front, the research enriches entrepreneurship literature by bridging the knowledge gap at the intersection of structured evaluation tools and data science. On the practical front, it empowers entrepreneurs with an analytical compass for decision-making and helps investors make prudent funding choices. The study also informs policymakers to optimize conditions for entrepreneurship. Overall, it lays the foundation for a new frontier of AI-enabled, data-driven entrepreneurship research and practice. However, acknowledging AI's limitations, the synthesis underscores the persistent relevance of human creativity alongside data-backed insights. With high predictive performance and multifaceted implications, the SECURE-AI model represents a significant stride toward an analytics-empowered paradigm in entrepreneurship management. Introduction The dynamic landscape of entrepreneurship is characterized by high uncertainty, complexity, and risk, posing an intricate challenge of predicting startup success. While passion and persistence play pivotal roles, data-driven insights can provide the clarity and objectivity needed to determine the likelihood of an entrepreneurial venture's triumph. This study presents a pioneering approach that harnesses the prowess of artificial intelligence (AI) to bring unparalleled predictive power to decipher entrepreneurial outcomes.
... To support manufacturing firms in managing these complementarities, we believe that one fruitful strategy is to embed hands-on guidelines in existing and widely used engineering methodologies. In particular, there exists a wide range of systematic methodologies that prescribe a sequence of interrelated phases for developing AI solutions to specific practical problems such as CRISP-DM, SEMMA, and KDD (Tripathi et al. 2021). CRISP-DM (Wirth and Hipp 2000) has been extensively used in both industry and academia for successfully developing AI solutions (Schr€ oer, Kruse, and G omez 2021; Huber et al. 2019) owing to its practical orientation (i.e. ...
... Such extensions aim to overcome limitations of the original version, tailor it to specific situations, and/or better accommodate recent technological advancements. Specifically, Tripathi et al. (2021) outline two types of extensions: (1) general extensions and (2) industry-specific extensions. The first type focuses on extensions according to general theoretical concepts. ...
... Tackling this particular problem, Tripathi et al. (2021) place a special focus on the long-term use of Machine Learning (ML) models in manufacturing and emphasise a wide variety of robustness issues that need to be considered in the deployment phase and beyond. For example, they argue the need for ensuring the models' utility and robustness over time. ...
Article
Full-text available
To support manufacturing firms in realising the value of Artificial Intelligence (AI), we embarked on asix-year process of research and practice to enhance the popular and widely used CRISP-DM method-ology. We extend CRISP-DM into a continuous, active, and iterative life-cycle of AI solutions by addingthe phase of ‘Operation and Maintenance’ as well as embedding a task-based framework for linkingtasks to skills. Our key findings relate to the difficult trade-offs and hidden costs of operating andmaintaining AI solutions and managing AI drift, as well as ensuring the presence of domain, data sci-ence, and data engineering competence throughout the CRISP-DM phases. Further, we show howdata engineering is an essential but often neglected part of the AI workflow, provide novel insightsinto the trajectory of involvement of the three competences, and illustrate how the enhanced CRISP-DM methodology can be used as a management tool in AI projects.
... The authors of the papers [26]- [28] have developed a framework to introduce big data analytic into manufacturing systems. Tripathi and al. [29] provide a detailed review of CRISP-DM in industries. However, very few studies use this approach for purpose to predict failures in manufacturing systems [30]. ...
... Therefore, different features and prior knowledge need to be considered in the local models of different operational modes. To date, several researchers have attempted to introduce prior knowledge into data-driven modelling [23][24][25]. However, although many researchers have investigated the qualitative influences of boiler adjustment variables on the NO x formation reaction from the aspects of experiment and numerical modelling, few studies have treated these rules as prior knowledge and integrated them into data-driven modelling. ...
... Step 4. Given a set of training samples T: T = {(x 1 , y 1 ), (x 2 , y 2 ), ⋯, (x n , y n )}, a priori information is used to initialize the learning machine as follows: (24) where L is the loss function and n is the total number of samples. ...
Article
This study proposed a data-driven NO x modelling framework that could capture the multimodal operational characteristics of a utility boiler, improve the training sample quality and strengthen the model interpretability. With this framework, to obtain a high-quality steady-state dataset from operational data, a multivariate F-test algorithm, which enhanced robustness by introducing the minimal number of stable units (MNSU) and the variational mode decomposition (VMD) outlier detection method, was first proposed. Based on the obtained sample, a Gaussian mixture model (GMM) was established to classify the data by their operational modes. In addition, the least absolute shrinkage and selection operator (LASSO) algorithm was employed to select specific features for each mode. Finally, the NO x prediction model was constructed using the light gradient boosting machine (LightGBM) model integrated with prior knowledge. The mechanistic relationships between the selected features (i.e., the oxygen content and separated overfire air damper opening) and target variable (i.e., NO x) were considered as the constraint conditions. By taking a 660 MW utility boiler as the research object, the proposed modelling framework obtained R 2 , RMSE and MAE values of 0.979, 3.573 mg/m 3 , and 2.586 mg/m 3 , respectively , performing better than five comparison models.
... Knowledge discovery methods have been applied to extract patterns from manufacturing systems. Studies have shown that decision rules extracted from applying data mining and knowledge discovery methods to historical datasets can boost the effectiveness of production development and constitutes a challenging area for the future of manufacturing systems [57,58]. Although RMSs constitute one of the critical enablers that impacts significantly on today's so-needed changeable manufacturing systems, the knowledgecapturing and decision-making process is very complex due to their intrinsic complexity and stochastic nature [1,59,60]. ...
Article
Full-text available
In today’s uncertain and competitive market, where manufacturing enterprises are subjected to increasingly shortened product lifecycles and frequent volume changes, reconfigurable manufacturing system (RMS) applications play significant roles in the success of the manufacturing industry. Despite the advantages offered by RMSs, achieving high efficiency constitutes a challenging task for stakeholders and decision makers when they face the trade-off decisions inherent in these complex systems. This study addresses work task and resource allocations to workstations together with buffer capacity allocation in an RMS. The aim is to simultaneously maximize throughput and to minimize total buffer capacity under fluctuating production volumes and capacity changes while considering the stochastic behavior of the system. An enhanced simulation-based multi-objective optimization (SMO) approach with customized simulation and optimization components is proposed to address the abovementioned challenges. Apart from presenting the optimal solutions subject to volume and capacity changes, the proposed approach supports decision makers with knowledge discovery to further understand RMS design. In particular, this study presents a customized SMO approach combined with a novel flexible pattern mining method for optimizing an RMS and conducts post-optimal analyses. To this extent, this study demonstrates the benefits of applying SMO and knowledge discovery methods for fast decision support and production planning of an RMS.
... Teams that were specifically instructed to adopt CRISP-DM throughout training and, after receiving such instruction, outperformed teams utilizing alternative methods (Saltz et Proceedings of the International Conference on Industrial Engineering and Operations Management Manila, Philippines, March 7-9, 2023 © IEOM Society International al. 2017). Briefly, CRISP-DM divides the data mining-related knowledge discovery process into six phases (Tripathi et al. 2021). The six phases are illustrated in Figure 1. ...
... Cross-industry standard procedure for data mining [329] is a rudimentary standard that emphasizes feedback between successive analytical processes. This has recently been expanded to consider industry-specific demands and domain-specific expertise [330]. ...
Article
Full-text available
Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence, and internet of things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human–computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: Intelligent computing is not only intelligence oriented but also intelligence driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing. Intelligent computing is still in its infancy, and an abundance of innovations in the theories, systems, and applications of intelligent computing is expected to occur soon. We present the first comprehensive survey of literature on intelligent computing, covering its theory fundamentals, the technological fusion of intelligence and computing, important applications, challenges, and future perspectives. We believe that this survey is highly timely and will provide a comprehensive reference and cast valuable insights into intelligent computing for academic and industrial researchers and practitioners.
... Cross-industry standard procedure for data mining [340] is a rudimentary standard that emphasizes feedback between successive analytical processes. This has recently been expanded to consider industry-specific demands and domain-specific expertise [341]. ...
Preprint
Full-text available
Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human-computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: intelligent computing is not only intelligence-oriented but also intelligence-driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing. Intelligent computing is still in its infancy and an abundance of innovations in the theories, systems, and applications of intelligent computing are expected to occur soon. We present the first comprehensive survey of literature on intelligent computing, covering its theory fundamentals, the technological fusion of intelligence and computing, important applications, challenges, and future perspectives. We believe that this survey is highly timely and will provide a comprehensive reference and cast valuable insights into intelligent computing for academic and industrial researchers and practitioners.