Title: Probabilistically determine whether a number is prime in C#
This is a cool little algorithm that uses some clever mathematics. This algorithm and several related algorithms are described in my book Essential Algorithms: A Practical Approach to Computer Algorithms. I think it's a really good book (and it's gotten very good reviews) so if you're interested in this sort of thing (or other algorithms), check it out!
The example also demonstrates a crucial technique in C# programming when you're performing integer arithmetic.
Before I describe the program, you should know a bit about why it's important to decide whether a number is prime.
Some modern cryptographic algorithms (in particular RSA) rely on large prime numbers. Without going into all of the details, in RSA you pick two large prime numbers p and q and publish their product n = p × q for everyone to see.
Now suppose your friend Alice wants to meet you for lunch at her favorite restaurant but doesn't want to invite Bill, who's a really annoying conversationalist, always orders something really expensive, and then demands to split the bill equally. Alice uses the public value n to encrypt the invitation and sends it to you.
Unfortunately Bill works in your company's IT department and he intercepts the message. If he can figure out what p and q are, he can decipher the invitation and show up unexpectedly. Basically the security of RSA depends on the fact that it's hard to factor products of large primes p and q.
If p and q are small, say only a few billion, Bill can easily write a program to try all possible prime factors up to the square root of n and eventually figure out what p and q are. To prevent that, you need to use really big prime numbers. Instead of using 32-bit ints or 64-bit longs, you might use 100-bit integers. The product n will have 200 bits and its square root will have 100 bits. (It will be around 1.3×1030.) Even if Bill has a computer that can test 1 trillion (109) factors per second, it would take him roughly 1.3×1030 / 109 = 1.3 × 1021 seconds or about 4 × 1013 years to find p and q. By then you and Alice will be safely done with lunch.
Unfortunately if it takes Bill a long time to factor n, then it will also take you a long time to use factoring to determine whether p and q are prime. Fortunately there are tests that you can use to determine whether a number is prime without actually factoring it. That's where this example begins.
"Fermat's little theorem" says that, if P is prime and 1 ≤ n < P, then nP-1 = 1 mod P. In other words, if you raise n to the P-1 power and take the result modulo P, the result is 1.
For example, suppose P = 7. Then:
- 16 mod 7 = 1 mod 7 = 1
- 26 mod 7 = 64 mod 7 = 1
- 36 mod 7 = 729 mod 7 = 1
- 46 mod 7 = 4,096 mod 7 = 1
- 56 mod 7 = 15,625 mod 7 = 1
- 66 mod 7 = 46,656 mod 7 = 1
Now suppose you want to know if p is prime. Pick a random number n with 1 ≤ n < P and apply Fermat's little theorem. If p really is prime, then the result will always be 1 no matter what n you pick. It can be shown (but not by me) that if p is composite (non-prime), then there is at least a 50% chance that the n you pick will make np-1 ≠ 1 mod p. An n that makes Fermat's little theorem fail in this way is called a "Fermat witness" because it proves that p is not prime.
Of course there's also a chance that you'll be unlucky and the random n you pick satisfies Fermat's little theorem even though p is composite. Such a value is called a "Fermat liar" because implies that p is prime when it really isn't.
After you know these facts, the algorithm for primality testing is relatively simple. Pick a bunch of random values n and see if they satisfy Fermat's little theorem. If any of them are Fermat witnesses, then you know that p is composite. (Although you still don't know p's factors.) If you pick K values for n and they all satisfy Fermat's little theorem, then there is only a 1/2K chance that p is composite and every value n you tried was a Fermat liar.
Now you can pick as many n as you like to get any desired degree of certainty. For example, if you use 10 values for n and they all claim p is prime, there is only a 1/210 ≈ 0.00098 chance that p is really composite. If those odds aren't good enough, try 20 values for n. If they all claim p is prime, there's only a 1/220 ≈ 0.00000095 chance that p is composite. If that's still not good enough, try 100 values for n and there will be only a 1/2100 ≈ 7.9 × 10-31 chance that p is actually composite. (In contrast the odds of you being struck by lightning is a whopping 9 × 10-7 per year, roughly a million million million million times greater.)
This is an example of a probabilistic algorithm. You can never be completely sure the result is correct, but you can use as many test values as you like to achieve whatever degree of certainty you want.
The following IsProbablyPrime method is the key to this example. (I've removed some progress reporting code to keep things simple.)
// Perform tests to see if a number is (probably) prime.
private Random Rand = new Random();
private bool IsProbablyPrime(int p, int num_tests)
{
checked
{
// Perform the tests.
for (int i = 0; i < num_tests; i++)
{
// Pick a number n in the range (1, p).
long n = Rand.Next(2, p);
// Calculate n ^ (p - 1).
long result = n;
for (int power = 1; power < p - 1; power++)
{
result = (result * n) % p;
}
// If the final result is not 1, p is not prime.
if (result != 1) return false;
}
}
// If we survived all the tests, p is probably prime.
return true;
}
The code performs the desired number of tests. For each test, the code picks a random n and then raises it to the p-1 power modulus p. If the result is 1 not for any value of n, the code returns false to indicate that p is definitely not prime.
If the method finishes all of the tests, it concludes that p is probably prime.
After it decides whether the number is prime, the program factors the number to make sure it was correct. To see how that code works, see the example Find a number's prime factors in C#.
There are a couple of C# issues here. First, the method uses a checked block. If a C# program encounters an integer error while performing an arithmetic operation, it normally does not signal an error. Instead it gets a nonsensical result and continues running as if nothing has gone wrong. In this example, if result * n is too big, the program causes an integer overflow, the new result is some sort of garbage (probably a large negative number), and the test is invalid.
The checked block makes the program throw an exception if it encounters an integer arithmetic error.
This program uses long integers to perform its calculations and only handles values up to 100 million so it can use larger values without causing an integer overflow.
Another issue to consider is that these values are pretty small by cryptographic standards, so the attacker Bill can easily factor them. (With a computer, not by hand.) In a real application you would need to use much larger numbers. The System.Numerics namespace defines a BigInteger structure that can do this in Framework version 4.0 and later. The version of Visual Studio I used for this example doesn't support that version, so it uses the long data type. You can rewrite the application if you like. This example is really just to show how it all works.
A final issue is that it can take a while to raise a test value n to the p-1 power if p is really large. Fortunately there's a clever algorithm called "fast exponentiation" that lets you do this more quickly. See my book for a description of that algorithm.
In my next post, I'll explain how you can use the algorithm described here to find large prime numbers. It's fairly simple, so you may want to try to figure it out yourself before you read that post.
Again, my book contains more information about primality testing, prime number finding, and RSA. To see a description and table of contents, go here.
Download the example to experiment with it and to see additional details.
|