algorithm - C#: How to make Sieve of Atkin incremental

Question

Welcome To Ask or Share your Answers For Others

algorithm - C#: How to make Sieve of Atkin incremental

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - C#: How to make Sieve of Atkin incremental

I don't know if this is possible or not, but I just gotta ask. My mathematical and algorithmic skills are kind of failing me here :P

The thing is I now have this class that generates prime numbers up to a certain limit:

public class Atkin : IEnumerable<ulong>
{
    private readonly List<ulong> primes;
    private readonly ulong limit;

    public Atkin(ulong limit)
    {
        this.limit = limit;
        primes = new List<ulong>();
    }

    private void FindPrimes()
    {
        var isPrime = new bool[limit + 1];
        var sqrt = Math.Sqrt(limit);

        for (ulong x = 1; x <= sqrt; x++)
            for (ulong y = 1; y <= sqrt; y++)
            {
                var n = 4*x*x + y*y;
                if (n <= limit && (n % 12 == 1 || n % 12 == 5))
                    isPrime[n] ^= true;

                n = 3*x*x + y*y;
                if (n <= limit && n % 12 == 7)
                    isPrime[n] ^= true;

                n = 3*x*x - y*y;
                if (x > y && n <= limit && n % 12 == 11)
                    isPrime[n] ^= true;
            }

        for (ulong n = 5; n <= sqrt; n++)
            if (isPrime[n])
            {
                var s = n * n;
                for (ulong k = s; k <= limit; k += s)
                    isPrime[k] = false;
            }

        primes.Add(2);
        primes.Add(3);
        for (ulong n = 5; n <= limit; n++)
            if (isPrime[n])
                primes.Add(n);
    }


    public IEnumerator<ulong> GetEnumerator()
    {
        if (!primes.Any())
            FindPrimes();

        foreach (var p in primes)
            yield return p;
    }


    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Now, what I would like is to get rid of the limit so that the sequence would never stop (theoretically).

I am thinking it could go something like this:

Start with some trivial limit
- Find all the primes up to the limit
- Yield all the newfound primes
- Increase the limit (by doubling or squaring the old limit or something like that)
- Goto step 2

And optimally that step two should only have to work between the old limit and the new one. In other words it shouldn't have to find the lowest primes again and again.

Is there a way this can be done? My main problem is that I don't quite understand what x and y for example is in this algorithm. Like, could I just use the same algorithm kind of but set x and y to oldLimit (initially 1) and run it up to newLimit? Or how would that work? Any bright minds with some light to shed on this?

The point of this is so that I won't have to set that limit. So that I can for example use Linq and just Take() however many primes I need, not worrying about if the limit is high enough, et cetera.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:48:41+0000

Here is a solution to unbounded prime sieving in C#, which can be implemented using the Sieve of Eratosthenes (SoE) or the Sieve of Atkin (SoA) algorithms; however, I maintain that it is hardly worth the extreme complexity of an optimized SoA solution given than the true SoE gives about the same performance without as much complexity. Thus, this is perhaps only a partial answer in that, while it shows how to implement a better SoA algorithm and shows how to implement an unbounded sequence using SoE, it only hints at how to combine these to write a reasonably efficient SoA.

Note that if only discussion on the very fastest implementations of these ideas is desired, jump to the bottom of this answer.

First we should comment on the point of this exercise in producing an unbounded sequence of primes to permit using the IEnumerable extension methods such as Take(), TakeWhile(), Where(), Count(), etcetera, as these methods give away some performance due to the added levels of method calling, adding at least 28 machine cycles for every call and return to enumerate one value and adding several levels of method calls per function; that said, having an (effectively) infinite sequence is still useful even if one uses more imperative filtering techniques on the results of the enumerations for better speed.

Note, in the the simpler examples following, I have limited the range to that of unsigned 32-bit numbers (uint) as much past that range the basic SoE or SoA implementations aren't really appropriate as to efficiency and one needs to switch over to a "bucket" or other form of incremental sieve for part of the sieving for efficiency; however, for a fully optimized page segmented sieve as in the fastest implementation, the range is increased to the 64-bit range although as written one likely would not want to sieve past about a hundred trillion (ten to the fourteenth power) as run time would take up to hundreds of years for the full range.

As the question chooses the SoA for probably the wrong reasons in first mistaking a Trial Division (TD) prime sieve for a true SoE and then using a naive SoA over the TD sieve, let's establish the true bounded SoE which can be implemented in a few lines as a method (which could be converted to a class as per the question's implementation coding style) as follows:

static IEnumerable<uint> primesSoE(uint top_number) {
  if (top_number < 2u) yield break;
  yield return 2u; if (top_number < 3u) yield break;
  var BFLMT = (top_number - 3u) / 2u;
  var SQRTLMT = ((uint)(Math.Sqrt((double)top_number)) - 3u) / 2u;
  var buf = new BitArray((int)BFLMT + 1,true);
  for (var i = 0u; i <= BFLMT; ++i) if (buf[(int)i]) {
      var p = 3u + i + i; if (i <= SQRTLMT) {
        for (var j = (p * p - 3u) / 2u; j <= BFLMT; j += p)
          buf[(int)j] = false; } yield return p; } }

This calculates the primes to 2 million in about 16 milliseconds on an i7-2700K (3.5 GHz) and the 203,280,221 primes up to 4,294,967,291 in the 32-bit number range in about 67 seconds on the same machine (given a spare 256 MegaBytes of RAM memory).

Now, using the version above to compare to the SoA is hardly fair as the true SoA automatically ignores the multiples of 2, 3, and 5 so introducing wheel factorization to do the same for the SoE yields the following code:

static IEnumerable<uint> primesSoE(uint top_number) {
  if (top_number < 2u) yield break;
  yield return 2u; if (top_number < 3u) yield break;
  yield return 3u; if (top_number < 5u) yield break;
  yield return 5u; if (top_number < 7u) yield break;
  var BFLMT = (top_number - 7u) / 2u;
  var SQRTLMT = ((uint)(Math.Sqrt((double)top_number)) - 7u) / 2u;
  var buf = new BitArray((int)BFLMT + 1,true);
  byte[] WHLPTRN = { 2, 1, 2, 1, 2, 3, 1, 3 };
  for (uint i = 0u, w = 0u; i <= BFLMT; i += WHLPTRN[w], w = (w < 7u) ? ++w : 0u)
    if (buf[(int)i]) { var p = 7u + i + i; if (i <= SQRTLMT) {
        var pX2 = p + p; uint[] pa = { p, pX2, pX2 + p };
        for (uint j = (p * p - 7u) / 2u, m = w; j <= BFLMT;
                               j += pa[WHLPTRN[m] - 1u], m = (m < 7u) ? ++m : 0u)
          buf[(int)j] = false; } yield return p; } }

The above code calculates the primes to 2 million in about 10 milliseconds and the primes to the 32-bit number range in about 40 seconds on the same machine as above.

Next, let's establish whether a version of the SoA that we are likely to write here actually has any benefit as compared to the SoE as per the above code as far as execution speed goes. The code below follows the model of the SoE above and optimizes the SoA pseudo-code from the Wikipedia article as to tuning the ranges of the 'x' and 'y' variables using separate loops for the individual quadratic solutions as suggested in that article, solving the quadratic equations (and the square free eliminations) only for odd solutions, combining the "3*x^2" quadratic to solve for both the positive and negative second terms in the same pass, and eliminating the computationally expensive modulo operations, to produce code that is over three times faster than the naive version of the pseudo-code posted there and as used in the question here. The code for the bounded SoA is as then follows:

static IEnumerable<uint> primesSoA(uint top_number) {
  if (top_number < 2u) yield break;
  yield return 2u; if (top_number < 3u) yield break;
  yield return 3u; if (top_number < 5u) yield break;
  var BFLMT = (top_number - 3u) / 2u; var buf = new BitArray((int)BFLMT + 1, false);
  var SQRT = (uint)(Math.Sqrt((double)top_number)); var SQRTLMT = (SQRT - 3u) / 2u;
  for (uint x = 1u, s = 1u, mdx12 = 5u, dmdx12 = 0u; s <= BFLMT; ++x, s += ((x << 1) - 1u) << 1) {
    for (uint y = 1u, n = s, md12 = mdx12, dmd12 = 8u; n <= BFLMT; y += 2, n += (y - 1u) << 1) {
      if ((md12 == 1) || (md12 == 5)) buf[(int)n] = buf[(int)n] ^ true;
      md12 += dmd12; if (md12 >= 12) md12 -= 12; dmd12 += 8u; if (dmd12 >= 12u) dmd12 -= 12u; }
    mdx12 += dmdx12; if (mdx12 >= 12u) mdx12 -= 12u; dmdx12 += 8u; if (dmdx12 >= 12u) dmdx12 -= 12u; }
  for (uint x = 1u, s = 0u, mdx12 = 3u, dmdx12 = 8u; s <= BFLMT; ++x, s += x << 1) {
    int y = 1 - (int)x, n = (int)s, md12 = (int)mdx12, dmd12 = ((-y - 1) << 2) % 12;
    for (; (y < 0) && (uint)n <= BFLMT; y += 2, n += (-y + 1) << 1) {
      if (md12 == 11) buf[(int)n] = buf[(int)n] ^ true;
      md12 += dmd12; if (md12 >= 12) md12 -= 12; dmd12 += 4; if (dmd12 >= 12) dmd12 -= 12; }
    if (y < 1) { y = 2; n += 2; md12 += 4; dmd12 = 0; } else { n += 1; md12 += 2; dmd12 = 8; }
    if (md12 >= 12) md12 -= 12; for (; (uint)n <= BFLMT; y += 2, n += (y - 1) << 1) {
      if (md12 == 7) buf[(int)n] = buf[(int)n] ^ true;
      md12 += dmd12; if (md12 >= 12) md12 -= 12; dmd12 += 8; if (dmd12 >= 12) dmd12 -= 12; }
    mdx12 += dmdx12; if (mdx12 >= 12) mdx12 -= 12; dmdx12 += 4; if (dmdx12 >= 12) dmdx12 -= 12; }
  for (var i = 0u; i<=BFLMT; ++i) if (buf[(int)i]) { var p = 3u+i+i; if (i<=SQRTLMT) { var sqr = p*p;
        for (var j = (sqr - 3ul) / 2ul; j <= BFLMT; j += sqr) buf[(int)j] = false; } yield return p; } }

This is still over twice as slow as the wheel factorization SoE algorithm posted due to the not fully optimized number of operations. Further optimizations can be made to the SoA code as in using modulo 60 operations as for the original (non-pseudo-code) algorithm and using bit packing to only deal with multiples excluding 3 and 5 to get the code fairly close in performance to SoE or even exceed it slightly, but we get closer and closer to the complexity of the Berstein implementation to achieve this performance. My point is "Why SoA, when one works very hard to get about the same performance as SoE?".

Now for the unbounded primes sequence, the very easiest way to do this is to define a const top_number of Uint32.MaxValue and eliminate the argument in the primesSoE or primesSoA methods. This is somewhat inefficient in that the culling is still done over the full number range no matter the actual prime value being processed, which makes the determination for small ranges of primes take much longer than it should. There are also other reasons to go to a segmented version of the primes sieve other than this and extreme memory use: First, algorithms that use data that is primarily within the CPU L1 or L2 data caches process faster because of more efficient memory access, and secondly because segmentation allows the job to be easily split into pages that can be culled in the background using multi-core processors for a speed-up that can be proportional to the number of cores used.

For efficiency, we should choose an array size of the CPU L1 or L2 cache size which is at least 16 Kilobytes for most modern CPU's in order to avoid cache thrashing as we cull composites of primes from the array, meaning that the BitArray can have a size of eight times that large (8 bits per byte) or 128 Kilobits. Since we only need to sieve odd primes, this represents a range of numbers of over a quarter million, meaning that a segmented version will only use eight segments to sieve to 2 million as required by Euler Problem 10. One could save the results from the last segment and continue on from that point, but that would preclude adapting this code to the multi-core case where one relegates culling to the background on multiple threads to take full advantage of multi-core processors. The C# code for an (single thread) unbounded SoE is as follows:

static IEnumerable<uint> primesSoE() { yield return 2u; yield return 3u; yield return 5u;
  const uint L1CACHEPOW = 14u + 3u, L1CACHESZ = (1u << (int)L1CACHEPOW); //for 16K in bits...
  const uint BUFSZ = L1CACHESZ / 15u * 15u; //an even number of wheel rotations
  var buf = new BitArray((int)BUFSZ);
  const uint MAXNDX = (uint.MaxValue - 7u) / 2u; //need maximum for number range
  var SQRTNDX = ((uint)Math.Sqrt(uint.MaxValue) - 7u) / 2u;
  byte[] WHLPTRN = { 2, 1, 2, 1, 2, 3, 1, 3 }; //the 2,3,5 factorial wheel, (sum) 15 elements long
  byte[] WHLPOS = { 0, 2, 3, 5, 6, 8, 11, 12 }; //get wheel position from index
  byte[] WHLNDX = { 0, 0, 1, 2, 2, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, //get index from position
                    0, 0, 1, 2, 2, 3, 4, 4, 5, 5, 5, 6, 7 }; //allow for overflow
  byte[] WHLRNDUP = { 0, 2, 2, 3, 5, 5, 6, 8, 8, 11, 11, 11, 12, 15, //allow for overflow...
                      15, 15, 17, 17, 18, 20, 20, 21, 23, 23, 26, 26, 26, 27 };
  uint BPLMT = (ushort.MaxValue - 7u) / 2u; var bpbuf = new BitArray((int)BPLMT + 1, true);
  for (var i = 0; i <= 124; ++i) if (bpbuf[i]) { var p = 7 + i + i; //initialize baseprimes array
      for (var j = (p * p - 7) / 2; j <= BPLMT; j += p) bpbuf[j] = false; } var pa = new uint[3];
  for (uint i = 0u, w = 0, si = 0; i <= MAXNDX;
        i += WHLPTRN[w], si += WHLPTRN[w], si = (si >= BUFSZ) ? 0u : si, w = (w < 7u) ? ++w : 0u) {
    if (si == 0) { buf.SetAll(true);
      for (uint j = 0u, bw = 0u; j <= BPLMT; j += WHLPTRN[bw], bw = (bw < 7u) ? ++bw : 0u)
        if (bpbuf[(int)j]) { var p = 7u+j+j; var pX2=p+p; var k = p * (j + 3u) + j;
          if (k >= i + BUFSZ) break; pa[0] = p; pa[1] = pX2; pa[2] = pX2 + p; var sw = bw; if (k < i) {
            k = (i - k) % (15u * p); if (k != 0) { var os = WHLPOS[bw]; sw = os + ((k + p - 1u) / p);
              sw = WHLRNDUP[sw]; k = (sw - os) * p - k; sw = WHLNDX[sw]; } } else k -= i;
          for (; k<BU

Categories

algorithm - C#: How to make Sieve of Atkin incremental

algorithm - C#: How to make Sieve of Atkin incremental

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags