Off the top of my head, I can think of three ways to associate a set of values with a set of keys such that each key may be associated with multiple values. The end result of each of these three methods is effectively the same. Strangely, they exhibit surprisingly variable performance characteristics.

What are we talking about?

Say you have a table of tires. This table stores the name of the tire and the name of the tire's manufacturer. You want to be able to see all of the tires for a given manufacturer. Let's also say that, for whatever reason, you want to be able to perform this task repeatedly. Maybe you already have a list of manufacturers somewhere and you want to hydrate the manufacturer.Tires property of each of them.

One way you might do that is by simply dumping the tires table into memory and creating a lookup table of some kind. That is what we're talking about: the various methods by which one could, conceivably, create exactly that lookup.

For the purposes of this article, we are talking about the following:

  • GroupBy
  • ToDictionary
  • ToLookup

These are all LINQ extensions, and they all produce something we can use for this purpose, but they don't really produce the same thing.

Also for the purposes of this article, we're going to refine the problem: we have a table of tires and we wish to calculate the total inventory on hand of a given manufacturer's tires.

Before you scroll down too far, place your bet: which of these three operations will be fastest in my simple test?

GroupBy

We could call GroupBy the most naive option. Here's what it looks like:

var groups = pairings.GroupBy(pairing => pairing.ManufacturerId);
return groups
    .Single(group => group.Key == 7)
    .Sum(pairing => pairing.Inventory);

As you can see, we group by the ManufacturerId and return the sum of the Inventory. This will throw an exception of the requested id (7 in this case) is not found, but that has more to do with using Single instead of SingleOrDefault than anything else.

ToDictionary

This version is somewhat more complicated:

var lookup = pairings.Aggregate(new Dictionary<int, List<int>>(), (map, pairing) =>
{
    if (map.TryGetValue(pairing.ManufacturerId, out var x))
    {
        x.Add(pairing.Inventory);
    }
    else
    {
        map.Add(pairing.ManufacturerId, new List<int> { pairing.Inventory });
    }
    return map;
});

return lookup.TryGetValue(7, out var result) ? result.Sum() : 0;

In addition to requiring a good deal more code, this version does not explode in your face if you request a key that isn't present. I wouldn't say I went out of my way to achieve that; it's just that the error checking was kind of a natural fit here.

ToLookup

This version may be the simplest:

var lookup = pairings.ToLookup(
    pairing => pairing.Left,
    pairing => pairing.Right
);
return lookup[7].Sum();

Interestingly, even though we use the [7] index operator to get our result here, this does not explode if the requested index isn't found. Instead, we get an empty enumeration and get back the appropriate-ish sum of 0. This seems weird, given that index operations are kind of known for being exothermic.

Performance ruminations

First off, this test is pretty simple. It involves getting back only one key out of a comparatively small number of keys, which probably exaggerates the performance of GroupBy if nothing else, so keep that in mind. Moving on, here are our results.

"Short" table

byDictionary:   00:00:00.0016506 :: 72
byLookup:       00:00:00.0057605 :: 72
byGroupBy:      00:00:00.0018367 :: 72

"Long" table

byDictionary:   00:00:00.0059935 :: 4567
byLookup:       00:00:00.0082432 :: 4567
byGroupBy:      00:00:00.0060029 :: 4567

I ran two tests. In the first, we had 1000 different "tires" from 100 possible "manufacturers." In the second, we have 100,000 different tires from the same 100 manufacturers. The data in the first set is a subset of the data in the second set, since it was generated using just new Random(13).

First place goes to ToDictionary, and GroupBy comes in a close second. Bringing up the rear we have ToLookup, presumably because ToLookup incurs far greater overhead; its runtime increases the least from the short list to the long list.

I found these results surprising. I expected ToLookup to be implemented as, effectively, some kind of sugar on top of my own code for ToDictionary. That appears not to be the case.