Figure 2 - uploaded by Antal Járai
Content may be subject to copyright.
Speed comparison chart  

Speed comparison chart  

Source publication
Article
Full-text available
Sieving is essential in different number theoretical algorithms. Sieving with large primes violates locality of memory access, thus degrading performance. Our suggestion on how to tackle this problem is to use cyclic data structures in combination with in-place bucket-sort. We present our results on the implementation of the sieve of Eratosthenes,...

Contexts in source publication

Context 1
... the real improvement can be seen, when running on older hard- ware, like complab07. With the slow memory of 200MHz, the plot in Figure 2, is much flatter and closer to the theoretical speed of n log log n than for example the similar i0662 with a faster 333MHz RAM. It should also be noted, that the major part of execution is spent on sieving with medium primes and more optimization is desired out of that part of the algorithm. ...
Context 2
... lime, complab07 and some computers from [4], Table 1 shows the time needed to sieve out an interval (represented by 2 30 ≈ 10 9 bits, with its midpoint at 10 e ). This data is plotted out in Figure 2: values of e are represented on the horizontal, execution times in seconds on the vertical axis. ...

Similar publications

Conference Paper
Full-text available
Prime numbers play a pivotal role in current encryption algorithms and given the rise of cloud computing, the need for larger primes has never been so high. This increase in available computation power can be used to either try to break the encryption or to strength it by finding larger prime numbers. With this in mind, this paper provides an analy...

Citations

... From all the publicly implementations analyzed, special interest was devoted to the Prime Sieve [10] developed by Kim Walisch, which is considered to be the fastest multi-core implementation publicly available at the present date. Other papers that influenced the strategies developed included [12] that details a very efficient use of the cache memory system for very large prime numbers. Additionally, [14] [15] explain how to use wheel factorization to considerably speed up the sieving process, and [16] [17] provide insights on how to implement the simple MPI [22] version. ...
... For maximum efficiency was developed an OpenMP version to be used in common multi-core processors, and a hybrid OpenMP and MPI version with dynamic scheduling to be used in heterogeneous computer cluster, that may have computation nodes with different hardware capabilities and variable workload. To improve the current implementation, in the future, we intend to use a bucket sort algorithm [12] to increase the cache hit rate for very large ranges and port the best algorithms to use GPUs. ...
Conference Paper
Full-text available
Prime numbers play a pivotal role in current encryption algorithms and given the rise of cloud computing, the need for larger primes has never been so high. This increase in available computation power can be used to either try to break the encryption or to strength it by finding larger prime numbers. With this in mind, this paper provides an analysis of different sieve implementations that can be used to generate primes to near 2^64. It starts by analyzing cache friendly sequential sieves with wheel factorization, then expands to multi-core architectures and ends with a cache friendly segmented hybrid implementation of a distributed prime sieve, designed to efficiently use all the available computation resources of heterogeneous computer clusters with variable workload and to scale very well in both the shared and distributed memory versions.
... From all the publicly implementations analyzed, special interest was devoted to the Prime Sieve [7] developed by Kim Walisch, since it is considered to be the fastest multicore implementation publicly available at the present date. Other papers that influenced the strategies developed included [8] that details a very efficient use of the cache memory system for very large prime numbers, [9] [10] that explains how to use wheel factorization to considerably speed up the sieving process and [11] [12] that provide insights on how to implement the simple MPI version. From this search it was determined that at the present date, a distributed implementation optimized for heterogeneous clusters could be of public interest and as such, was the implementation that was devoted most of work in development. ...
... For maximum efficiency was developed an OpenMP version to be used in traditional multicore computers, and a hybrid OpenMP and MPI with dynamic scheduling to be used in heterogeneous computer cluster, that may have computation nodes with different hardware capabilities and that most probably will have variable workload. To improve the current implementation, an OpenACC variant is being implemented to take full advantage of the massive parallelism that current GPUs can provide, and a bucket sort algorithm [8] may be used to increase the cache hit rate for very large ranges. ...
Working Paper
Full-text available
Please check the improved and peer reviewed version of this paper at: https://www.researchgate.net/publication/262142509_Distributed_Prime_Sieve_in_Heterogeneous_Computer_Clusters
... This is done using the usual sieve of Eratosthenes, which produces the small primes (p < 2 24 ) and the large primes (2 24 < p < 2 48 ). There are numerous ways of optimizing this sieve (see [12]), but it is not as important as the second phase. ...
Article
Full-text available
In this paper we study the details of sieving for Cunningham chains of the first kind of length 3. To find such prime triplets larger than the ones already known, we have to investigate the primality of 2 37 numbers, each in the magnitude of 2 34944 (more than 10 500 decimal digits). This would not be feasible if it weren’t for the sieving process which reduces the estimated time of completion to only a few weeks on a grid or a supercomputer with multiple cores.
... To speed up the process we can use different sieving methods, e.g. cache optimized linear sieve [12], and the inverse sieve [13]. We keep on sieving until the it is faster than the probabilistic primality test. ...
Article
Full-text available
We study the details of a triple sieving method for Cunningham chains of length 2 of the second kind and twin primes. The theoretical background is described, and also concrete computational results are published. The magnitude of the investigated numbers is 2 253824 (more than 76 000 decimal digits).
Article
Full-text available
Antal J´arai established three research groups that dealt with computational number theory. Due to his brilliant ideas these teams were very successful. Moreover, he implemented the world’s fastest arithmetic routines. He focused mainly on large prime combinations and his teams reached 19 world records from 1992 to 2014, namely they set the record for the largest known twin primes 9 times and Sophie Germain primes 7 times and a Cunningham chain of length 3 of the first kind. Furthermore they proved the primality of the largest known number of the form n^4 + 1 and a number which is simultaneously twin and Sophie Germain prime. In this paper, we report on a new project proving that J´arai’s methods and routines are cutting edge tools for effective manipulation of large numbers even in 2020. We are celebrating Prof. J´arai’s 70th birthday with his 20th world record.