Fig 5 - uploaded by Said jai andaloussi
Content may be subject to copyright.
Table 1. Structure of Hbase table for image features storage 

Table 1. Structure of Hbase table for image features storage 

Source publication
Conference Paper
Full-text available
Most medical images are now digitized and stored in large image databases. Retrieving the desired images becomes a challenge. In this paper, we address the challenge of content based image retrieval system by applying the MapReduce distributed computing model and the HDFS storage model. Two methods are used to characterize the content of images: th...

Context in source publication

Context 1
... based image retrieval (CBIR) is composed of two phases: 1) offline phase, 2) online phase. In the offline phase, the signature vector is computed for each image in databases and they will be stored. In the online phase, the query is constructed by computing the vector signature of input image. Then, the query signature is compared with signatures of images in the database, MapReduce is known for its ability to handle large amounts of data. In this work, we use the open source distributed cloud computing framework Hadoop and its implementation of the MapReduce model to extract vectors features of images. The implementation method of distributed features extraction and image storage is given in figure 4. Storage is the base of CBIR system, given the amount of images data produced daily by the medical services, retrieve and processed these images need important computation time. Therefore, parallel processing is necessary. For this reason, we adopt Mapreduce computing model to extract the visual features of images and then write the features and image files all into HBase. HBase partitions the key space. Each partition is called a Table. Each table declares one or more column families. Column families define the storage properties for an arbitrary set of columns [6]. The given table in figure 5 shows the structure of our Hbase table, the row key of our Hbase table is assigned to the ID of image and families are files and features. Label ”source” and ”class” are added under family ”file”, representing for source image and class of image (the DDSM database is classified in 3 levels of diagnosis (’normal’, ’benign’ or ’cancer’)) respectively. Under family ”features”, label ”feature BEMD-GGD Alpha”, ”feature BEMD-GGD Beta”, ”feature BEMD-HHT mean”,”feature BEMD-HHT standard deviation”, ”feature BEMD-HHT phase”, ”feature BEMD-residue histogram” are added, representing features extracting by using BEMD-GGD and BEMD-HHT methods. In the figure (given below), we describe the online retrieval phase. This phase is divided into 7 steps: 1) The user sends a query image to SCL, then the image will be stored temporarily in HDFS. 2) Run a map-reduce job to extract features from query image 3) Store image features in HDFS 4) The similarity/distance between the features vectors of the query image in HDFS and the target images in the HBASE are computed. 5) A reduce collect and combines all the result from all the map function. 6) The reducer stores the result into HDFS. 7) Send the result to the user IV. RESULT The method is tested on the DDSM database (see II-A). We made experiments on mean precision at 20, which is the ratio between the number of pertinent images retrieved and the total images retrieved. We give below the principle of our retrieval ...

Similar publications

Article
Full-text available
Due to the advanced growth in multimedia data and Cloud Computing (CC), Secure Image Archival and Retrieval System (SIARS) on cloud has gained more interest in recent times. Content based image retrieval (CBIR) systems generally retrieve the images relevant to the query image (QI) from massive databases. However, the secure image retrieval process...
Article
Full-text available
Our Country's economy prospect lies mainly in agricultural sector. Although there is much advancement in technology, still chances of predicting the diseases in plants are vague. In this paper, a technical solution for the farmers to detect and diagnose the right disease affecting the plants is discussed. The Content Based Image Retrieval (CBIR) te...
Article
Full-text available
With the recent evolution of technology, the number of image archives has increased exponentially. In Content-Based Image Retrieval (CBIR), high-level visual information is represented in the form of low-level features. The semantic gap between the low-level features and the high-level image concepts is an open research problem. In this paper, we p...
Article
Full-text available
Content-based image retrieval (CBIR) system becomes a hot topic in recent years. CBIR system is the retrieval of images based on visual features. CBIR system based on a single feature has a low performance. Therefore, in this paper a new content based image retrieval method using color and texture features is proposed to improve performance. In thi...
Article
Full-text available
In the modern times, there has been a great advance in technology. With this comes the problem of the enormous data generated by this technology. Smartphones, laptops and even televisions are connected to the internet, constantly generating data. With this said, there has been a huge push towards digitization of information. With the data processed...

Citations

... A whole system consisting of all the nodes acts as a single computer to its users. Every node or computer in the distributed system has its own processor and harmony is achieved through synchronization and coordination, also each system has its own memory [8]. Processes are autonomous and execute tasks concurrently. ...
Article
Full-text available
With the exponential growth of data, it is difficult to efficiently store and retrieve data using traditional methods. There is a need to optimize the storage and to efficiently retrieve relevant data matching the user query. Traditional methods lack optimized storage and to effectively retrieve data. To overcome these limitations, in this project, we propose a distributed architecture framework to optimize memory usage and to effectively retrieve relevant data using Content-Based Image Retrieval (CBIR). The experimental results show that the proposed model enhances storage performance and retrieval time by 20%.
... When compared to the findings without homomorphic filtering, the evaluation results showed a lower error rate. Similarly, Jai-Andaloussi et al. (2013) addressed the issues of content-based image retrieval systems using the MapReduce processing architecture and HDFS storage model. They performed testing on mammography datasets and achieved good results, demonstrating that the MapReduce approach may be utilized efficiently for content-based medical image retrieval. ...
Article
Full-text available
The healthcare industry is different from other industries–patient data are sensitive, their storage needs to be handled with care and in compliance with regulative, while prediction accuracy needs to be high. This fast expansion in medical image modalities and data collection leads to generation of so called “Big Data” which is time-consuming to be analyzed by medical experts. This paper provides an insight into the Big Data from the aspect of its role in multiscale modelling. Special attention is paid to the workflow, starting from medical image processing all the way to creation of personalized models and their analysis. A review of literature regarding Big Data in healthcare is provided and two proposed solutions are described–carotid artery ultrasound image processing and 3D reconstruction, and drug testing on personalized heart models. Related to the carotid artery ultrasound image processing, the starting point is ultrasound images, which are segmented using convolutional neural network U-net, while segmented masks were further used in 3D reconstruction of geometry. Related to the drug testing on personalized heart model, similar approach was proposed, images were used in creation of personalized 3D geometrical model that is used in computational modelling to determine pressure in the left ventricle before and after drug testing. All the aforementioned methodologies are complex, include Big Data analysis and should be performed using servers or high-performance computing. Future development of Big Data applications in healthcare domains offers a lot of potential due to new data standards, rapid development of research and technology, as well as strong government incentives.
... Jai-Andaloussi et al. [81] employed the MapReduce for computation and HDFS for storage in content-based image retrieval systems. They used mammography image database and applied Bi-dimensional Empirical Mode Decomposition with Generalized Gaussian Density functions (BEMD-GGD) method and Bi-dimensional Empirical Mode Decomposition with Huang-Hilbert Transform (BEMD-HHT) method. ...
Article
Full-text available
Clinical decisions are more promising and evidence-based, hence, big data analytics to assist clinical decision-making has been expressed for a variety of clinical fields. Due to the sheer size and availability of healthcare data, big data analytics has revolutionized this industry and promises us a world of opportunities. It promises us the power of early detection, prediction, prevention, and helps us to improve the quality of life. Researchers and clinicians are working to inhibit big data from having a positive impact on health in the future. Different tools and techniques are being used to analyze, process, accumulate, assimilate, and manage large amount of healthcare data either in structured or unstructured form. In this review, we address the need of big data analytics in healthcare: why and how can it help to improve life?. We present the emerging landscape of big data and analytical techniques in the five sub-disciplines of healthcare, i.e., medical image analysis and imaging informatics, bioinformatics, clinical informatics, public health informatics and medical signal analytics. We present different architectures, advantages and repositories of each discipline that draws an integrated depiction of how distinct healthcare activities are accomplished in the pipeline to facilitate individual patients from multiple perspectives. Finally, the paper ends with the notable applications and challenges in adoption of big data analytics in healthcare.
... Jai-Andaloussi et al. [133] employed the MapReduce for computation and HDFS for storage in contentbased image retrieval systems. They used mammography image database and applied Bi-dimensional Empirical Mode Decomposition with Generalized Gaussian Density functions (BEMD-GGD) method and Bidimensional Empirical Mode Decomposition with Huang-Hilbert Transform (BEMD-HHT) method. ...
... Fraud Detection: 'Suspect, detect and protect'. Fraud, waste, and abuse have caused significant cost and it range from honest mistakes that result in erroneous [282,133,97,280,170] Data and Workflow Sharing [20,50,221,221] Data Analysis [68,170,263] Bioinformatics Feature Selection [13,16,288,293,255,165] Classification [92,25,284,37,107,106,293,56,66] Clustering [142,189,79,119,279,46,199] Microarray Data Analysis [249,201,162,57,156] Protein-Protein Interaction [219,12,187,144] Pathway Analysis [287,130,281,99,198,166] [220,126,41,94,7,75,149,174,9,125] Clinical Informatics Storage of EHR [28,15,116,74,135,190,134,227,171,206] Retrieval of EHR [96,241] Interactive data retrieval for Data Sharing [64,14,251,14,44,113,270] Treatment Recommendation [47,43,68,131] Business Transformation [267,268,269,103,266,146] Disease Predication, Diagnosis and Progression [188,40,2,292] Data Security [235,246,163,238] Public [4,143,108,34,38,157] Signal Integration and Aggregation [229,23,291,182] billings, inefficiencies that may result in wasteful diagnostic tests, over-payments due to false claims. Personal data is extremely sensitive due to its profitable value in black-markets, thus, healthcare industry is 200% more likely to experience data breaches than any other. ...
Preprint
Clinicians decisions are becoming more and more evidence-based meaning in no other field the big data analytics so promising as in healthcare. Due to the sheer size and availability of healthcare data, big data analytics has revolutionized this industry and promises us a world of opportunities. It promises us the power of early detection, prediction, prevention and helps us to improve the quality of life. Researchers and clinicians are working to inhibit big data from having a positive impact on health in the future. Different tools and techniques are being used to analyze, process, accumulate, assimilate and manage large amount of healthcare data either in structured or unstructured form. In this paper, we would like to address the need of big data analytics in healthcare: why and how can it help to improve life?. We present the emerging landscape of big data and analytical techniques in the five sub-disciplines of healthcare i.e.medical image analysis and imaging informatics, bioinformatics, clinical informatics, public health informatics and medical signal analytics. We presents different architectures, advantages and repositories of each discipline that draws an integrated depiction of how distinct healthcare activities are accomplished in the pipeline to facilitate individual patients from multiple perspectives. Finally the paper ends with the notable applications and challenges in adoption of big data analytics in healthcare.
... They used DDSM mammography database. [14] Wang et al. (2011) proposed an efficient and cost-effective parallel system for analyzing digital pathology imaging data called Hadoop-GIS. It would help in querying and analyzing spatial oriented scientific data which is becoming increasingly important for many applications. ...
Conference Paper
Full-text available
Deep Learning and Big Data Analytics are the two high-focus areas in medical image analysis in the recent times. Owing to the great volume of imaging data in the databases, a lot of research has been focused in the area of medical image analysis involving big data tools and techniques. Also due to the saturation of considerable advances in shallow reasoning-based machine leaning algorithms, complex reasoning-based algorithms like Deep learning are employed to address the issues of image data in biomedical field. This paper discusses the challenges of traditional medical image analysis and reviews some of the latest researches in the areas of medical image analysis involving deep leaning and employing big data platforms.
... At that point, the HDFS (Hadoop Distributed File System) stores the image information, trailed by the execution of MapReduce. The assessment outcomes showed a diminished error rate in images in contrast to the outcome without homomorphic separating [26,33,34]. These were used to evaluate the results of the comprehensive evaluation value for the synthesized image [35,36]. ...
Chapter
Full-text available
The data related to human health and medicine can be stored, searched, shared, analysed, and presented in ingenious ways and the scale of this medical big data is continuously growing with advancements in medical technology and hospital information. However, there are predicaments and problems that remain to be overcome in its current stage of inception especially on how to analyze this data in a reliable manner. In this chapter, how data mining technology is more convenient for integrating this medical data for a variety of applications such as disease diagnosis, prevention, hospital administration has been discussed. In this chapter, the practicality of big data analytics, methodological and technical issues such as data quality, inconsistency and instability, analytical and legal issues and lastly, the issue of integration of big data analytics with clinical practice and clinical utility have been analysed. It is important to overcome these challenges to secure the application of big data technology in medical field and to thus improve patient outcome and more essentially to reduce resource wastage in medical field, which should be the real aim of big data studies. This chapter also aims at exploring methods to overcome these obstacles using big data tools and understanding the potential of Hadoop, which is an open-source distributed data storage and analysis application, in managing healthcare data. An analysis and examination of possible future work for these areas is also done with a translational approach of using data from all levels of human existence.
... Jones and Shao [5] tried to make the combination between several techniques like vocabulary guided, spatiotemporal pyramid matches, Bag-of-Words for action representation, and also SVMs/ABRS-SVMs for relevance feedback using the datasets of the realistic action like "UCF Sports, UCF YouTube and HOHA2 ". Jai-Andaloussi et al. [6] already suggested Content-Based Image Retrieval (CBIR) using a distributed computing system to benefit the computation time. ...
Chapter
Full-text available
The paper studies the influence on the similarity by extracting and using m from n frames on videos, the purpose is to evaluate the amount of the proportion similarity between them, and propose a new Content-Based Video Retrieval (CBVR) system. The proposed system uses a Bounded Coordinate of Motion Histogram (BCMH) [1] to characterize videos which are represented by spatio-temporal features (eg. motion vectors) and the Fast and Adaptive Bidimensional Empirical Mode Decomposition (FABEMD). However, a global representation of a video is compared pairwise with all those of the videos in the Hollywood2 dataset using the k-nearest neighbors (KNN). Moreover, this approach is adaptive: a training procedure is presented, and an accuracy of 58.1% is accomplished in comparison with the state-of-the-art approaches on the dataset of 1707 movie clips.
... These metadata includes a list of "stored filenames, corresponding blocks of each file, and Datanodes containing these blocks".For this reason, when a client reads a file, it first communicates with Namenode to get the locations of the data blocks that make up the file, and Namenode directs the client to the Datanode cluster hosting the requested file. The client then communicates directly with Datanode to perform file operations [12] [13]. ...
Article
Full-text available
Large amount of data are produced daily from various fields such as science, economics, engineering and health. The main challenge of pervasive computing is to store and analyze large amount of data.This has led to the need for usable and scalable data applications and storage clusters. In this article, we examine the hadoop architecture developed to deal with these problems. The Hadoop architecture consists of the Hadoop Distributed File System (HDFS) and Mapreduce programming model, which enables storage and computation on a set of commodity computers. In this study, a Hadoop cluster consisting of four nodes was created. Regarding the data size and cluster size, Pi and Grep MapReduce applications, which show the effect of different data sizes and number of nodes in the cluster, have been made and their results examined.
... ImageTerrier (Hare et al. 2012) used the largest collection of these, indexing 10.9 million images using BoW features based on about 10 billion SIFT feature vectors (Lowe 2004). Such systems have also seen some use in the medical image retrieval domain, again with relatively small collections (Grace et al. 2014;Jai-Andaloussi et al. 2013;Yao et al. 2014 . While only k-means is described in detail, the library contains multiple algorithms for descriptor creation, image retrieval, and result processing. ...
Article
The world has experienced phenomenal growth in data production and storage in recent years, much of which has taken the form of media files. At the same time, computing power has become abundant with multi-core machines, grids, and clouds. Yet it remains a challenge to harness the available power and move toward gracefully searching and retrieving from web-scale media collections. Several researchers have experimented with using automatically distributed computing frameworks, notably Hadoop and Spark, for processing multimedia material, but mostly using small collections on small computing clusters. In this article, we describe a prototype of a (near) web-scale throughput-oriented MM retrieval service using the Spark framework running on the AWS cloud service. We present retrieval results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection, making this the largest high-dimensional feature vector collection reported in the literature. We also present a publicly available demonstration retrieval system, running on our own servers, where the implementation of the Spark pipelines can be observed in practice using standard image benchmarks, and downloaded for research purposes. Finally, we describe a method to evaluate retrieval quality of the ever-growing high-dimensional index of the prototype, without actually indexing a web-scale media collection.
... Ayrıca, sensörler açısından saniye, milisaniye veya mikrosaniye gibi farklı zaman birimleri ile ifade edilebilmektedir [2]. Günümüzde MapReduce [3], Hadoop [4], STORM [5], NoSQL (Not Only SQL) [6] gibi mevcut büyük veri teknolojileri ve bulut bilişim teknolojisi, sağlık verilerinin ölçeklenebilirlik problemine cevap vermek ve sağlık hizmetleri bilişim sistemlerinin performansını arttırmak amacıyla kullanılmaktadır [7,5]. ...