April 18, 2018

# Code review: Goldbach conjecture

I cannot spell Goldbach to save my life. My fingers refuse to type the ch in that position. I dunno why. Sue me. Anyway... I cannot spell Goldbach to save my life. My fingers refuse to type the ch in that position. I dunno why. Sue me. Anyway...

Thanks to the original author from /r/dailyprogrammer for the following code, used with permission.

``````static void Main(string[] args)
{
while (input % 2 == 0 || input <= 5) { input = Convert.ToInt32(Console.ReadLine()); }

List<int> primes = new List<int>();
for(int i = 0; i <= input; i++)
{
}

List<string> knownCombinations = new List<string>();

for(int i = 0; i < primes.Count; i++)
{
for(int j = 0; j < primes.Count; j++)
{
for(int k = 0; k < primes.Count; k++)
{
if (input == (primes[i] + primes[j] + primes[k]))
{
string combination = primes[i].ToString() + " " + primes[j].ToString() + " " + primes[k].ToString();

if (checkRepeat(knownCombinations, combination) == false)
{
Console.WriteLine(\$"{input} = {primes[i]} + {primes[j]} + {primes[k]}");
}
}
}
}
}
}

public static bool checkRepeat(List<string> a, string b)
{
for (int i = a.Count - 1; i >= 0; i--)
{
if (a[i].Distinct().OrderBy(c => c).SequenceEqual(b.Distinct().OrderBy(c => c))) return true;
}
return false;
}

public static bool isPrime(long n)
{
if (n <= 1) return false;
for (int i = 2; i < n; i++)
{
if (n % i == 0) return false;
}
return true;
}
``````

The code above is entirely unmodified. Let's start out with just a little bit of cleanup to address the following issues. I'll also add a stopwatch just to ensure we don't do anything egregious to the runtime, which starts off around 0.05 seconds. Here are some of the things I want to change:

1. Replace type names with var.
2. Replace for loops with foreach loops.
3. Use Pascal case for method names.
4. Invert boolean test values with `!`.
5. Ensure names for predicate functions are explanatory.

Also, the input thing is confusing and not necessary for what we're working on here, so I'm just going to replace it with a number.

Here's what we end up with:

``````static void Main(string[] args)
{
var time = Stopwatch.StartNew();

var input = 131;
var primes = new List<int>();

for (int i = 0; i <= input; i++)
{
}

var knownCombinations = new List<string>();

foreach (var i in primes)
{
foreach (var j in primes)
{
foreach (var k in primes)
{
if (input == (i + j + k))
{
string combination = i.ToString() + " " + j.ToString() + " " + k.ToString();
if (!IsRepeat(knownCombinations, combination))
{
Console.WriteLine(\$"{input} = {i} + {j} + {k}");
}
}
}
}
}

time.Stop();
Console.WriteLine(\$"{time.Elapsed}");
}

public static bool IsRepeat(List<string> a, string b)
{
for (int i = a.Count - 1; i >= 0; i--)
{
if (a[i].Distinct().OrderBy(c => c).SequenceEqual(b.Distinct().OrderBy(c => c))) return true;
}
return false;
}

public static bool IsPrime(long n)
{
if (n <= 1) return false;
for (int i = 2; i < n; i++)
{
if (n % i == 0) return false;
}
return true;
}
``````

Runtime after all this fiddling around: 0.045 seconds, down from 0.054. (Note that this runtime does not include the time required to provide input for the previous version.)

# Initial thoughts

I like the fact that the logic is pretty straightforward. I like the use of the nested loops; they make it clear exactly what strategy is being applied. I like that bits of more complex logic have been pulled out into functions, because this allows us to focus on the meaning of the program rather than how it is built. Of course, I do have some criticisms.

Our prime generator could be more efficient. I already wrote an article on that, so, you know. Never mind that here. :)

The `IsRepeat` function is really dense. I think we could do without that kind of density. It's also strange that it combines both functional and imperative styles. My guess is that the author considers this to be a more efficient mode of iteration because he believes that matches are more likely to appear near the end of the collection, and I'm willing to take that on faith for the most part.

Wait.

Hold on.

Ah... You remember how I said that function was really dense and looked strange? Well, I just realized something: it also doesn't work. That's the fun thing about clever code; it's usually fucking you and you just don't know it yet.

# First, let's unfuck this

Anything I do at this point is going to be... Well, it's all going to mean taking certain creative liberties with the original code, because I'm no longer critiquing what's here; I'm instead talking about how to make what's here work--which means exchanging some of it for something that's not here. Let's start with the assumption that our original author likes strings, because the original code creates a string.

I guess that means we'll rewrite using a string split.

``````public static bool IsRepeat(List<string> set, string b)
{
var candidate = b.Split(' ').Distinct().OrderBy(x => x).ToList();

bool IsMatch(string left)
{
return left.Split(' ')
.Distinct()
.OrderBy(x => x)
.SequenceEqual(candidate);
}

return set.Any(IsMatch);
}
``````

All right, that's a little better. But now hopefully you can now start to see why the efficiency (or lack thereof) of string manipulation in C# is a concern. Can we find some more efficent way to do this? For a start, how about just... not using a string?

# Alternative techniques

Let's start with the idea that we can add a little more meaning of the code by using data structures that express intent.

``````public class Triplet
{
public int A;
public int B;
public int C;

public Triplet(int a, int b, int c)
{
A = a;
B = b;
C = c;
}

public IEnumerable<int> Values()
{
yield return A;
yield return B;
yield return C;
}
}
``````

For now, focus on the first part. See? We just want three numbers. That's all we actually care about. The other stuff comes in handy for constructing this class and for this code here:

``````public static bool IsRepeat(List<Triplet> set, Triplet b)
{
bool IsMatch(Triplet left)
{
return left.Values().OrderBy(x => x).SequenceEqual(b.Values().OrderBy(x => x));
}

return set.Any(IsMatch);
}
``````

Basically, what we've done is we've gotten rid of the whole string manipulation thing and garnered a slight increase in efficiency as a result. Runtime for this is something like a hundredth of a second faster than runtime for the stringy version. However, it is possible for us to expand on this technique.

# Time to get weird?

As I mentioned, we are effectively rolling our own set based on a list and a prayer—or, you know, a predicate function. Basically the same thing, right? Thing is, all we need in order to have our very own hashset is to override two little methods.

Ok, I admit, I'm going to do one other thing, and those two little methods are... Complicated. Here.

``````public override int GetHashCode()
{
return A ^ B ^ C;
}

public override bool Equals(object obj)
{
var other = obj as Triplet;
if (other == null) return false;

return Values().SequenceEqual(other.Values());
}
``````

In order to make our type compatible (in the way we want) with HashSet(T), we need to override GetHashCode and Equals. You can read about this implementation of GetHashCode on StackOverflow, but the idea is that the same three values will result in the same hashed value regardless of their order.

There's a wrinkle with the Equals method because, obviously, those numbers could have been passed to the constructor in any order: 1, 2, 3; 1, 3, 2; 3, 2, 1; etc... So to account for that, I've also made a change to the constructor:

``````public Triplet(int a, int b, int c)
{
int tmp;
if (a > c)
{
tmp = a;
a = c;
c = tmp;
}

if (a > b)
{
tmp = a;
a = b;
b = tmp;
}

if (b > c)
{
tmp = b;
b = c;
c = tmp;
}

A = a;
B = b;
C = c;
}
``````

...And, honestly, by this point, I'm making questionable improvements that really don't need to be made, so, you know. Sorry. :) You could have stuck with my iterator-based version.

Anyway, this produces the same output as the string-splitty version but in a little more than half the time. I would also say it's clearer to a point—that point being where you have to actually go look at the implementations of GetHashCode and the constructor and so forth to understand why the HashSet(T) does the "right" thing with this class, because I wouldn't usually expect that with a custom type.

To me, the primary benefit of this last iteration is that it does away with the IsRepeat method entirely, which could be worthwhile if you can hide the necessary complexity of the Triplets class from any vulnerable contributors. Your mileage may vary.