MapReduce processing framework.

Source publication

Parallel Cleaning Algorithm for Similar Duplicate Chinese Data Based on BERT

Article

Full-text available

Dec 2021

Data is an important source of knowledge discovery, but the existence of similar duplicate data not only increases the redundancy of the database but also affects the subsequent data mining work. Cleaning similar duplicate data is helpful to improve work efficiency. Based on the complexity of the Chinese language and the bottleneck of the single ma...

POVERTY PREDICTION USING MACHINE LEARNING APPROACH

Article

Full-text available

Feb 2022

The incidence of poverty is not a taboo topic. In fact, it happens in every country worldwide where the policymakers and governments struggle to reduce their country’s poverty rate. However, the existing ways of finding the right targeted impoverished group to provide economic aid are often flawed because of multiple issues such as data transparenc...

EQFF: An Efficient Query Method Using Feature Fingerprints

Chapter

Feb 2024

The amount of data features is growing rapidly in the era of big data, posing challenges to both the security and efficiency of feature query. Most existing encryption-based retrieval approaches are limited by the significant computational overhead and merely support precise query, which might fail to handle the incomplete keywords and misspellings in the query. To achieve both query efficiency and privacy-preserving for large-scale data, this paper presents EQFF, an Efficient Query Method Using Feature Fingerprints. It converts varying-length features into fingerprints in the form of fixed-length vectors, and hence turns semantic information invisible to ensure query security. Based on the feature fingerprints, we further present the corresponding precise and fuzzy query approaches, design the inverted index library and propose a compression storage mechanism to improve query efficiency. Extensive experiments are conducted based on real-world datasets. Experimental results show that our EQFF takes only 6.4% memory compared with raw data, reduces the time cost from minutes to tens of milliseconds, and achieves an accuracy of 98% above.

MapReduce processing framework.

Similar publications

Citations