Conference PaperPDF Available

A Detailed Schematic Study on Feature Extraction Methodologies and Its Applications: A Position Paper

Authors:

Abstract

Feature extraction is stated to be one of the most important aspects in the field of machine learning, image processing and pattern recognition. In these fields, as the dimension of the data increases, it becomes highly important to provide a reliable analysis for growing. The process of feature extraction mostly starts from basic dataset, and slowly, it builds features derived from the data to make an informative learning step for human reading. In order to present this process of learning, we thought of providing a position paper which will discuss all the criteria, information, methodology and existing work in the area of feature extraction in different fields and domain. A clear, descriptive analysis is presented that discusses the feature extraction methods for selecting significant features that will improve the quality of the results.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Word alignment is an important and challenging task just before doing machine translation from one language to another language, which is described very elaborately in this paper. This paper shows the translation relationship among the words in a parallel corpus between two languages Bangla and Odia. Rarely, the meaning of a single word unit of source language (Bangla) is converted to multiword unit of target language (Odia) and vice versa. In many cases one-to-one correspondence happens and is also expected while trying to do the task of word alignment. Multi-word units are handled by the use of non-compositional compounds, associated idiomatic expressions and multi-word names and so on. The challenge lies in finding out or identifying a single word of source text converted to multiple words of target language, which information is basically extracted from bilingual lexicon. All the possible sentences consisting of the focus word of target language are to be obtained and compared with the corresponding word of source language and probability value is calculated. Thereafter, Expectation and Maximization (EM) algorithm is used to find the maximum likelihood between the possible words to be aligned .The relation between the fully aligned corpus of the bilingual text and probability dictionary is depicted in this paper very clearly. Bitext word alignment finds out corresponding matching words in between parallel texts. Small size of corpus has been taken for testing purpose and consisting of both Bangla-Das 2 Odia parallel text, giving high accuracy, much more than expected values. This prototypical model is the beginning for achieving good bilingual translators and need to be tested for its efficacy, taking larger corpus.
Conference Paper
Full-text available
Word alignment is a crucial and important part of text alignment that is to be used in a machine-oriented translation from one language(source language)to another (target language). For years it has been one of the tough challenges because the process of lexical alignment for translation involves several machine learning algorithms and mathematical modelling. Keeping these issues in mind, we have attempted to describe the nature of lexical problems that arise at the time of analyzing of bilingual translated texts between Bangla (as source language) and Odia (as target language).The parallel translation corpus that we have used for our study includes Bangla-Odia texts from agriculture domain, which is collected corpus of parallel texts from the Indian language Corpora Initiative(ILCI), Government of India. The analysis of Bangla-Odia bilingual translation corpus shows that it is difficult to map and establish one-to-one, one-to-many, and many-to-one lexical relationships between the words of the two languages although the languages are known as sister languages. We observe that in many occasions a single word unit of the source language is rendered as a multiword unit in the target language, and vice versa, which we identify as 'word divergence'. Problems of word divergence are normally addressed at the phrase level using bilingual dictionaries and lexical databases. We apply the recursive algorithm technique known as Expectation-Maximization(EM) to find out 'approximation relationship' between those words which show this kind of divergence. The output shows maximum likelihood estimation(MLE) between the possible words to be aligned. Also, the probability value is estimated between the corresponding aligned matches in the parallel corpus. The process of word alignment is further tested at the sentence level and the estimation of their approximate relation in the probability dictionary are described in this paper. The basic challenge lies in identification of the single word units of the source text which are converted to multiword units in the target text. This information is extracted from bilingual lexicon produced from the bilingual translation corpus. For our study, we have used more than ten thousand sentences for training and testing purposes. The prototypical model developed so far is working to a satisfactory level for achieving our goal for bilingual translation. We also plan to test our method on large unstructured texts for enhancing its efficacy and accuracy.
Article
Full-text available
Transportation problem (TP) is a popular branch of Linear Programming Problem in the field of Transportation engineering. Over the years, attempts have been made in finding improved approaches to solve the TPs. Recently, in Quddoos et al. (Int J Comput Sci Eng (IJCSE) 4(7): 1271–1274, 2012), an efficient approach, namely ASM, is proposed for solving crisp TPs. However, it is found that ASM fails to provide better optimal solution in some cases. Therefore, a new and efficient ASM appoach is proposed in this paper to enhance the inherent mechanism of the existing ASM method to solve both crisp TPs and Triangular Intuitionistic Fuzzy Transportation Problems (TIFTPs). A least-looping stepping-stone method has been employed as one of the key factors to improve the solution quality, which is an improved version of the existing stepping-stone method (Roy and Hossain in, Operation research Titus Publication, 2015). Unlike stepping stone method, least-looping stepping-stone method only deals with few selected non-basic cells under some prescribed conditions and hence minimizes the computational burden. Therefore, the framework of the proposed method (namely LS-ASM) is a combination of ASM (Quddoos et al. 2012) and least-looping stepping-stone approach. To validate the performance of LS-ASM, a set of six case studies and a real-world problem (those include both crisp TPs and TIFTPs) have been solved. The statistical results obtained by LS-ASM have been well compared with the existing popular modified distribution (MODI) method and the original ASM method, as well. The statistical results confirm the superiority of the LS-ASM over other compared algorithms with a less computationl effort.
Article
Full-text available
Aims To study if machine learning methodology can be used to detect persons with increased type 2 diabetes or prediabetes risk among people without known abnormal glucose regulation. Methods Machine learning and interpretable machine learning models were applied on research data from Stockholm Diabetes Preventive Program, including more than 8000 people initially with normal glucose tolerance or prediabetes to determine high and low risk features for further impairment in glucose tolerance at follow-up 10 and 20 years later. Results The features with the highest importance on the outcome were body mass index, waist-hip ratio, age, systolic and diastolic blood pressure, and diabetes heredity. High values of these features as well as diabetes heredity conferred increased risk of type 2 diabetes. . The machine learning model was used to generate individual, comprehensible risk profiles, where the diabetes risk was obtained for each person in the data set. Features with the largest increasing or decreasing effects on the risk were determined. Conclusions The primary application of this machine learning model is to predict individual type 2 diabetes risk in people without diagnosed diabetes, and to which features the risk relates However, since most features affecting diabetes risk also play a role for metabolic control in diabetes, e.g. body mass index, diet composition, tobacco use, and stress, the tool can possibly also be used in diabetes care to develop more individualized, easily accessible health care plans to be utilized when encountering the patients.
Article
Recent research evidence indicates that the powerful testing tools, even though generate test inputs automatically for coverage measures, but not up to satisfaction. These tools sometimes achieve high structural coverage, which do not guarantee to have high fault detection ability. These findings lead us to a decisive point that code coverage is merely one factor towards effective test data generation. Thus, we discuss our findings and proposed work on Modified Condition/ Decision Coverage (MC/DC) test case generation and prioritization techniques. This work aims to generate, minimize, and prioritize MC/DC test cases obtained through concolic testing process. This work presents three technical contributions. The first contribution is to propose a greedy algorithm to increase the number of effective test cases for improving MC/DC scores. The second contribution is to minimize the updated test suite size to have only the optimal number of contributing test cases towards forming MC/DC pairs. The third contribution is to prioritize these test cases by considering both their Contribution Index (CI) values and Fault Exposing Potential (FEP) values. The proposed approach is validated by experimenting on eighteen Java programs and achieved on an average ×1.67 times increase in the number of effective test cases that lead to an average increase in MC/DC score by 41.08%. We also achieved on an average 49.00% reduction rate to minimize the test suite size and finally prioritized the test cases, based on their prioritization index values.
Article
Data mining for healthcare is an interdisciplinary field of study that originated in database statistics and is useful in examining the effectiveness of medical therapies. Machine learning and data visualization Diabetes-related heart disease is a kind of heart disease that affects diabetics. Diabetes is a chronic condition that occurs when the pancreas fails to produce enough insulin or when the body fails to properly use the insulin that is produced. Heart disease, often known as cardiovascular disease, refers to a set of conditions that affect the heart or blood vessels. Despite the fact that various data mining classification algorithms exist for predicting heart disease, there is inadequate data for predicting heart disease in a diabetic individual. Because the decision tree model consistently beat the naive Bayes and support vector machine models, we fine-tuned it for best performance in forecasting the likelihood of heart disease in diabetes individuals.
Article
Software reliability prediction is the foremost challenge in software quality assurance. Several models have been developed that effectively assess software reliability, but no single model produces accurate prediction results in all situations. This paper proposes a recurrent chemical functional link artificial neural network model to predict the software reliability, where the parameters of the model are estimated by chemical reaction optimization. The proposed model is inheriting the best attributes of functional link artificial neural networks and recurrent neural networks which dynamically modeling a nonlinear system for software reliability prediction. The proposed model is analyzed using ten real-world software failure data. A time-series approach with logarithmic scaling has been adopted for the proper distribution of input data. Statistical analysis reveals that the proposed model exhibits superior performance.
Article
During COVID-19, schools around the world rapidly went online. Examining youth technology use reveals sharp inequities within the United States’ education system and incongruencies between the technologies used in virtual schooling and those in the lives of students outside of school. In affluent communities, virtual schooling is supported by a distributed schooling infrastructure that coordinates students’ knowledge work. This home and school technology infrastructure features material, human, and structural capital that facilitates youth development as nascent knowledge workers. Technology use during virtual schooling keeps youth activity grounded within the “walls” of school; during virtual schooling, students have little voice in setting learning goals or contributing “content.” Technology use at home for learning or entertainment stems from their own goals and features them as active inquisitors seeking out information and extending their social networks, and crucially, using participatory learning technologies such as Discord for communications. An extended period of virtual schooling could enable a rethinking of the role of technology in schools, including an embrace of play, emotional design, participatory communications, place-based learning, embodied understandings, and creative construction.
Article
This empirical study is conducted in a blended learning setting of a technology-focused private university in Bangladesh to offer a model that could help attain a comprehensive goal of blended learning. The main objectives of this study are to examine course design in a blended learning setting, strategies adopted by the course teachers to maximize students’ online interactions in a collaborative manner and how well these strategies had an impact on quality of blended teaching and learning in tertiary education. Drawing upon a quasi-experimental approach, qualitative data were collected by observing the teaching and learning activities of a course named ‘Bangladesh Studies’ over a four-month semester. Findings of this study suggest a model that would allow better student–teacher interaction in both synchronous and asynchronous modes of teaching and learning based on three sequential stages such as referring to and discussing online peer-group comments in the regular face-to-face classes in asynchronous mode (stage 1), off-campus synchronous mode of interactions to utilize students’ personal study hours (stage 2) and off-campus asynchronous mode of interactions to offer flexibility for collaborative learning (stage 3). It is argued that the model that has been offered could be useful in promoting innovative and contextual pedagogy which will essentially involve students in sharing, interacting and collaborating discussions for knowledge construction and hereby enable overall cognitive development of students in a blended learning environment.
Article
As the COVID-19 pandemic began, universities swiftly moved to remote teaching, posing challenges to students and instructors alike. This case study discusses how a distributed team approach was used to support instructors teaching four sections of a technology class for preservice teachers. Students struggled with stress, technology, and in some instances, meeting basic needs as the pandemic began. Starting with a common syllabus and assignments, the instructors and their mentor swiftly redesigned the course to be flexible and engaging without compromising academics. Collaborative course development and teaching helped minimize pedagogical and technological tasks so instructors could focus on meeting the unique needs of students in their course sections. This approach may be useful for promoting professional development and maximizing student engagement in non-pandemic times, too.