Fig 6 - uploaded by Ori Rottenstreich
Content may be subject to copyright.
Hierarchical structure in a 16-bit word of MPCBF [50]. Initially, the number of hash functions k=3, and level 1 has 8 bits which have been initialized as 0. 

Hierarchical structure in a 16-bit word of MPCBF [50]. Initially, the number of hash functions k=3, and level 1 has 8 bits which have been initialized as 0. 

Source publication
Article
Full-text available
Bloom filter (BF) has been widely used to support membership query, i.e., to judge whether a given element x is a member of a given set S or not. Recent years have seen a flourish design explosion of BF due to its characteristic of space-efficiency and the functionality of constant-time membership query. The existing reviews or surveys mainly focus...

Contexts in source publication

Context 1
... depicted in Fig. 6, MPCBF allocates the bits in each word as multiple levels. The basic principle to construct the hierarchical structure is shown as follows. Whenever an element is inserted into the word, k bits must be set from 0 to 1. And whenever a bit is set from 0 to 1, an empty bit should be added into the next level and initialized as 0. This is ...
Context 2
... an element is inserted into the word, k bits must be set from 0 to 1. And whenever a bit is set from 0 to 1, an empty bit should be added into the next level and initialized as 0. This is realized by using a function popcount(i) which computes the number of ones before position i at the hierarchy level that bit i belongs to. For example, in Fig. 6, when element x is inserted into the word, the three bits in level 1 are set from 0 to 1. According to the constructing principles, three empty bits, i.e., bits 8, 9, and 10, are added into level 2. Thereafter, when another element y is inserted into the word, it is hashed into three bits 7, 4 and 2 in the level 1. Bit 7 is set from 0 ...
Context 3
... depicted in Fig. 6, MPCBF allocates the bits in each word as multiple levels. The basic principle to construct the hierarchical structure is shown as follows. Whenever an element is inserted into the word, k bits must be set from 0 to 1. And whenever a bit is set from 0 to 1, an empty bit should be added into the next level and initialized as 0. This is realized by using a function popcount(i) which computes the number of ones before position i at the hierarchy level that bit i belongs to. For example, in Fig. 6, when element x is inserted into the word, the three bits in level 1 are set from 0 to 1. According to the constructing principles, three empty bits, i.e., bits 8, 9, and 10, are added into level 2. Thereafter, when another element y is inserted into the word, it is hashed into three bits 7, 4 and 2 in the level 1. Bit 7 is set from 0 to 1, thus a new bit (bit 11) is added to level 2. For bit 2, the function popcount(2)=1, and check the bit at position 8+1=9 and set bit 9 from 0 to 1, then the bit 12 is added into level 3. By doing the same for bit 4, the bit 10 is set to 1 and bit 13 is added to level 3. To delete an element, the inverse operations are ...
Context 4
... depicted in Fig. 6, MPCBF allocates the bits in each word as multiple levels. The basic principle to construct the hierarchical structure is shown as follows. Whenever an element is inserted into the word, k bits must be set from 0 to 1. And whenever a bit is set from 0 to 1, an empty bit should be added into the next level and initialized as 0. This is realized by using a function popcount(i) which computes the number of ones before position i at the hierarchy level that bit i belongs to. For example, in Fig. 6, when element x is inserted into the word, the three bits in level 1 are set from 0 to 1. According to the constructing principles, three empty bits, i.e., bits 8, 9, and 10, are added into level 2. Thereafter, when another element y is inserted into the word, it is hashed into three bits 7, 4 and 2 in the level 1. Bit 7 is set from 0 to 1, thus a new bit (bit 11) is added to level 2. For bit 2, the function popcount(2)=1, and check the bit at position 8+1=9 and set bit 9 from 0 to 1, then the bit 12 is added into level 3. By doing the same for bit 4, the bit 10 is set to 1 and bit 13 is added to level 3. To delete an element, the inverse operations are ...

Similar publications

Article
Full-text available
[Spa] En las asociaciones del Tercer Sector de acción socioeducativa se han ido estableciendo lógicas, políticas y prácticas para acreditar la calidad de los servicios. Este tipo de orientaciones organizativas son similares a los procedimientos que se han llevado a cabo en los últimos años, en otros contextos educativos, y que han mermado considera...
Article
Full-text available
Objectives Development of optimal recipes of concrete mixtures using local natural raw materials in the form of gravel-sand mixtures from deposits of the Chechen Republic. Method The research methods adopted in the work are based on the theoretical principles and laws of designing and optimizing polydisperse multicomponent systems, the phase and st...
Chapter
Full-text available
Integrity constraints (ICs) are meant for many data management tasks. However, some types of ICs can express semantic rules that others ICs cannot, or vice versa. Denial constraints (DCs) are known to be a response to this expressiveness issue because they generalize important types of ICs, such as functional dependencies (FDs), conditional FDs, an...
Article
Full-text available
Video materials belong to the most powerful tools in educational process because they provide learners simultaneously with auditory and visual information. Thus, according to the Edgar Dale’s cone of learning they are more effective than classical classroom lectures, reading textbooks or listening to podcasts. Moreover, video as a part of a learnin...
Conference Paper
Full-text available
The customer-provider collaboration that was diminished during the industrial revolution is being revived to achieve higher customer satisfaction and a competitive edge. Manufacturers are now interested in co-creating value with their customers to design a customized and sustainable solution. Value co-creation is being implemented by various busine...

Citations

... Due to space, energy, and bandwidth constraints, systems often compromise some accuracy for efficiency by advertising periodical approximate indicators. Indicators are data structures that trade space efficiency for accuracy (e.g., Bloom filters [9], [12]- [14]). The compromised accuracy introduces a risk of false-indications, which result in unnecessary misses. ...
... Another approach is to accurately advertise important information while allowing less critical data to be stale, or less accurate [19], [20]. The work [14] surveys many optimizations to indicators, such as the support for removals and dynamic scaling. ...
... The positive exclusion probability captures the probability that the requested item is not in the cache, given a positive indication. The main reason for such false-positive indications is the inherent inaccuracy of the indicator, which sacrifices some accuracy for space efficiency [14]. An additional cause of false-positive indications is staleness of the indicator. ...
Preprint
Full-text available
Caching is extensively used in various networking environments to optimize performance by reducing latency, bandwidth, and energy consumption. To optimize performance, caches often advertise their content using indicators, which are data structures that trade space efficiency for accuracy. However, this tradeoff introduces the risk of false indications. Existing solutions for cache content advertisement and cache selection often lead to inefficiencies, failing to adapt to dynamic network conditions. This paper introduces SALSA2, a Scalable Adaptive and Learning-based Selection and Advertisement Algorithm, which addresses these limitations through a dynamic and adaptive approach. SALSA2 accurately estimates mis-indication probabilities by considering inter-cache dependencies and dynamically adjusts the size and frequency of indicator advertisements to minimize transmission overhead while maintaining high accuracy. Our extensive simulation study, conducted using a variety of real-world cache traces, demonstrates that SALSA2 achieves up to 84% bandwidth savings compared to the state-of-the-art solution and close-to-optimal service cost in most scenarios. These results highlight SALSA2's effectiveness in enhancing cache management , making it a robust and versatile solution for modern networking challenges.
... A space-effective stochastic data structure of support membership queries is called a BF [142]. To detect whether a specific element belongs to a set or not, BF is utilized [143]. Inherent benefits of BF include controlled false positives, constant-time queries, space efficiency, and others. ...
Article
Full-text available
Authentication systems are pivotal in fortifying security measures against unauthorized access. Yet, they often fall short of effectively combating impersonation attacks, leaving systems susceptible to exploitation. Continuous Authentication Systems (CAS) have emerged as a promising solution, offering dynamic adaptability to evolving threats. However, the existing literature lacks a thorough critical evaluation of CAS progress, hindering practical advancements in the field. This comprehensive review addresses this gap by analyzing recent advancements, emerging trends, and critical challenges in CAS design and implementation. The review reveals that while supervised learning methods, particularly score-level fusion, dominate CAS classification techniques, there remains a dearth of comparative analysis regarding the efficacy of different biometric pairings (e.g., physiological, behavioral, or multimodal). While studies predominantly assess CAS accuracy using metrics like False Rejection Rate (FRR), False Acceptance Rate (FAR), and Equal Error Rate (EER), aspects crucial to practical success, such as usability, security, and scalability, often receive inadequate attention. Moreover, the practical viability of CAS demands comprehensive implementation and evaluation using real-world data. This survey paper explores various facets of CAS, including physiological and behavioral biometrics, multimodal biometrics, context-aware techniques, and other emerging methodologies. Additionally, open issues, challenges, and proposed future directions aim to inspire further research and development in secure biometric-based continuous authentication and user profiling.
... we encode the mined features as membership relations using bloom filters (described in Section F), shortly called filter (described in detail below). We use bloom filters for their superior performance for rule-validation and the security and privacy properties they provide [23], [24]; additionally, we use them to develop a privacy-preserving encoding and membership evaluation protocol in the federated setting (presented shortly). ...
Article
Full-text available
Privacy Enhancing Technologies (PETs) have the potential to enable collaborative analytics without compromising privacy. This is extremely important for collaborative analytics can allow us to really extract value from the large amounts of data that are collected in domains such as healthcare, finance, and national security, among others. In order to foster innovation and move PETs from the research labs to actual deployment, the U.S. and U.K. governments partnered together in 2021 to propose the PETs prize challenge asking for privacy-enhancing solutions for two of the biggest problems facing us today: financial crime prevention and pandemic response. This article presents the Rutgers ScarletPets privacy-preserving federated learning approach to identify anomalous financial transactions in a payment network system (PNS). This approach utilizes a two-step anomaly detection methodology to solve the problem. In the first step, features are mined based on accountlevel data and labels, and then a privacy-preserving encoding scheme is used to augment these features to the data held by the PNS. In the second step, the PNS learns a highly accurate classifier from the augmented data. Our proposed approach has two major advantages: 1) there is no noteworthy drop in accuracy between the federated and the centralized setting, and 2) our approach is extremely flexible since the PNS can keep improving its model and features to build a better classifier without imposing any additional computational or privacy burden on the banks. Notably, our solution won the first prize in the US for its privacy, utility, efficiency, and flexibility.
... Approximate membership checking filters are widely used in different applications to speed up membership testing [1]. Typically, these filters, such as Bloom filters, do not suffer from false-negatives, but false-positives can occur with low probability [2]. ...
... Let us consider U to be the universe of IPv4 addresses formed by 2 32 elements and an ACF with d = 4, c = 1, s = 2, f b = 8, b = 2 16 . Then, for element x to be a persistent false-positive, there must be another element y in one of the four buckets to which it maps such that f p 1 ...
Article
Full-text available
As probabilistic data structures are widely adopted in computing systems, their privacy is a major issue. Recent works have shown that even though the values stored in these structures look random, information can be extracted from them in some settings. In this paper, we consider the privacy of adaptive cuckoo filters, a probabilistic data structure that implements approximate membership checking. The main novelty and benefit of these filters are that they can adapt to removing false-positives. Unfortunately, our analysis shows that adaptation can dramatically reduce the privacy of the filters, allowing an attacker to extract the set of elements stored in the filter. Indeed, in some settings, the attacker can identify 100% of the elements stored in the filter. This means that the protection of the privacy of adaptive cuckoo filters should be considered. To that end, we propose preprocessing reduction (PR), a scheme that prevents an attacker from extracting the set of elements stored in the filter at the cost of increasing the false-positive probability of the filter. In many settings, the impact on false-positives will be negligible. For example, in a case study with 32-bit universes, the increase in the false-positive probability was smaller than 8% in all the configurations tested. Interestingly, PR is applicable not only to adaptive filters but also to approximate membership check filters in general and thus can be used to protect, for example, Bloom filters.
... Another function commonly implemented with probabilistic data structures is checking if an element belongs to a set. In this case, the structures are commonly referred to as filters and return an approximate answer in the sense that false positives occur with a given probability [5]. The Bloom filter [6] is the most widely known filter, but many other approximate membership check filters have been proposed over the years to improve performance and cost by for example reducing the number of memory accesses and the memory needed to achieve a given false positive probability [7]. ...
Article
Full-text available
The security of probabilistic data structures is increasingly important due to their wide adoption in many computing systems and applications. In particular, the security of approximate membership check filters such as Bloom or cuckoo filters has been recently studied showing how an attacker can degrade the filter performance in some settings. In this paper, we consider for the first time the security of another popular approximate membership check filter, the Quotient Filter (QF). Our analysis and simulations show that quotient filters are vulnerable to both white and black box attackers that can cause insertion failures and degrade the filter performance very significantly. An interesting finding is that quotient filters are vulnerable to a new type of attack, not applicable to Bloom or cuckoo filters, that can degrade the speed of queries dramatically. The paper also briefly discusses and evaluates potential countermeasures to detect and protect against those attacks.
... They are investigated to support element deletion, capacity resizing, and reverse decoding. More Bloom filter variants are proposed to improve the Bloom filter from a performance or generalization perspective in diverse circumstances [2], [3], [4], [10]. ...
Article
Full-text available
Sketches are widely deployed to represent network flows to support complex flow analysis. Typical sketches usually employ hash functions to map elements into a hash table or bit array. Such sketches still suffer from potential weaknesses upon throughput, flexibility, and functionality. To this end, we propose Ark filter, a novel sketch that stores the element information with either of two candidate buckets indexed by the quotient or remainder between the fingerprint and filter length. In this way, no further hash calculations are required for future queries or reallocations. We further extend the Ark filter to enable capacity elasticity and more functionalities (such as frequency estimation and top- $k$ query). Comprehensive experiments demonstrate that, compared with Cuckoo filter, Ark filter has $2.08\times$ , $1.34\times$ , and $1.68\times$ throughput of deletion, insertion, and hybrid query, respectively; compared with Quotient filter, Ark filter has $4.55\times$ , $1.74\times$ , and $22.12\times$ throughput of deletion, insertion, and hybrid query, respectively; compared with Bloom filter, Ark filter has $2.55\times$ and $2.11\times$ throughput of insertion and hybrid query, respectively.
... Instead, our third approach is to use a Bloom filter on Set_HA and consider multiple independent uniform hash functions that map from Set_HA to f0; :::; n − 1g. For a given false positive rate " and a given number of sequences, we set the number of hash functions optimal such that the size n of the Bloom filter is minimized [34]. We call this method BF_HA. ...
... Moreover, using a Bloom filter instead of Set_HA results in a higher rejection rate. For both Bloom filters, BF_HA and BF_HA (1), we use the common settings for the number of bits under the assumption that ", the probability of false positives, is given [34]. However, the Bloom filter BF_HA (1), which makes use of one hash function only, shows a substantially higher rejection rate than BF_HA, the Bloom filter with the optimal number of hash functions. ...
Article
Full-text available
Over the past decade, DNA has emerged as a new storage medium with intriguing data volume and durability capabilities. Despite its advantages, DNA storage also has crucial limitations, such as intricate data access interfaces and restricted random accessibility. To overcome these limitations, DNAContainer has been introduced with a novel storage interface for DNA that spans a very large virtual address space on objects and allows random access to DNA at scale. In this paper, we substantially improve the first version of DNAContainer, focusing on the update capabilities of its data structures and optimizing its memory footprint. In addition, we extend the previous set of experiments on DNAContainer with new ones whose results reveal the impact of essential parameters on the performance and memory footprint.
... For example, for two clusters ϕ 1 and ϕ 2 with |ϕ 1 | and |ϕ 2 | chunks, it takes O(|ϕ 1 | × |ϕ 2 |) time-complexity to determine the number of shared chunks. To further decrease the computation complexity, we adopt Bloom Filter (BF) [45], [46] to sketch the fingerprints of chunks in each cluster. The basic BF is a hashing mapping method that has been widely utilized in various networking and distributed systems. ...
... The penalty of such an approach is the false positive, i.e., for any chunk c ̸ ∈ C, all of its k BF hash positions in the bit vector may be set as 1 when representing other chunks in set C. This is caused by the unavoidable hash conflicts, as the 8 th bit in Fig. 4. The false-positive probability, denoted as p, can be derived by p=(1−(1−1/l BF ) n·k BF ) k BF [45], where n represents the number of represented chunks in set C. ...
Article
Full-text available
Placing popular data at the network edge helps reduce the retrieval latency, but it also brings challenges to the limited edge storage space. Currently, using available yet not necessarily reliable edge resources is common sense for edge space expansion, while deploying deduplication storage strategies is a general method for better space utilization. However, a contradiction arises when jointly implementing data deduplication with unreliable edge resources. On the one hand, the deduplication policy stipulates that any data chunk can be stored exactly once; on the other hand, the use of unreliable resources imposes that data should be backed up for the seek of file availability. To resolve such contradiction, we propose MEAN, a deduplication-enabled storage system using unreliable resources at the network edge. The core idea of MEAN is to place similar files together for better deduplication and maintain replicas of popular files for higher reliability. We first formulate this problem and prove its NP-hardness, then provide efficient heuristics based on similarity-aware hierarchical clustering. Three different reliability scenarios are comprehensively considered to develop our algorithms. We also implement a prototype system and evaluate the performance of MEAN with a real-world dataset. The results show that MEAN can fortify the file hit ratio under unreliable environments by 77% while reducing the file retrieval delay up to 71%, compared with the state-of-the-art approach.
... For example, for a file with n blocks, it takes O(n × |B t |) time-complexity to determine whether the server contains such blocks or not, where |B t | is the total number of blocks in a candidate server. In order to decrease the computation complexity, we adopt Bloom Filter (BF) [33], [34], a hashing mapping method that has been widely utilized in networking and distributed systems, to represent the blocks on each candidate server. This captures the data characteristics and facilitates the similarity detection from pair-wise fingerprint checking to the membership queries on the data sketches. ...
... In this way, the deletion of an element will not affect the existence of other elements. It has been proved that 4 bits for a counter are enough to achieve eligible overflow probability [33]. CBF also supports constanttime membership queries. ...
... Then, for block b, all of its k BF hash positions are projected by other data with a probability p = [1−(1−1/d) ε·k BF ] k BF . This probability is also called the falsepositive rate [33]. ...
Article
Full-text available
The traditional migration methods are confronted with formidable challenges when data deduplication technologies are incorporated. Firstly, the deduplication creates data-sharing dependencies in the stored files; breaking such dependencies in migration may attach extra space overhead. Secondly, the redundancy elimination makes the storage system reserves only one copy for each storage file, and heightens the risk of data unavailability. The existing methods fail to tackle them in one shot. To this end, we propose Jingwei, an efficient and adaptive data migration strategy for deduplicated storage systems. To be specific, Jingwei tries to minimize the extra space cost in migration for space efficiency. Meanwhile, Jingwei realizes the service adaptability by encouraging replicas of hot files to spread out their data access requirements. We first model such a problem as an integer linear programming (ILP) and solve it with a commercial solver when only one empty migration target server is allowed. We then extend this problem to a scenario wherein multiple non-empty target servers are available for migration. We solve it by effective heuristic algorithms based on the Bloom Filter-based data sketches. The Jingwei strategy can suffer from performance degradation when the heat degree varies significantly. Therefore, we further present incremental adjustment strategies for the two scenarios, which adjust the number of block replicas and their locations in an incremental manner. The mathematical analyses and trace-driven experiments show the effectiveness of our Jingwei strategy. To be specific, Jingwei fortifies the file replicas by 25% with only 5.7% of the extra storage space, compared with the latest “Goseed” method. With the small extra space cost, the file retrieval throughput of Jingwei can reach up to 333.5Mbps, which is 12.3% higher than that of the Random method.
... As aforementioned, set queries address two types of information for each element e: 1)existence (membership) information, i.e., whether e is in a set; 2)auxiliary information, some additional information such as the frequency of e (i.e., multiplicity information) or which set that e is in (i.e., association information). Set queries rely on sketches to record and track the data information above, and have been widely used in computer networks, such as packet routing and forwarding, web caching, network monitoring, security enhancement, content delivery, etc [25]. Therefore, it is of great significance to improve set query performance with an elegant probabilistic data structure. ...
Article
Full-text available
Set query is a fundamental problem in computer systems. Plenty of applications rely on the query results of membership, association, and multiplicity. A traditional method that addresses such a fundamental problem is derived from Bloom filter. However, such methods may fail to support element deletion, require additional filters or apriori knowledge, making them unamenable to a high-performance implementation for dynamic set representation and query. In this paper, we envision a novel sketch framework that is multi-functional, non-parametric, space efficient, and deletable. As far as we know, none of the existing designs can guarantee such features simultaneously. To this end, we present a general shifting framework to represent auxiliary information (such as multiplicity, association) with the offset. Thereafter, we specify such design philosophy for a hash table horizontally at the slot level, as well as vertically at the bucket level. Theoretical and experimental results jointly demonstrate that our design works exceptionally well with three types of set queries under small memory.