Friday, April 30, 2010

Unit Testing Methods That Use Delayed Execution In C#

I've found the yield break/return construct and LINQ to Objects in C# to be really handy for shorter, more concise and more readable code in many cases. However, their lazy evaluation can also trip you up if you don't pay attention to when they are evaluated. I'll illustrate below.

The Same Method - Three Ways
Just as a reminder, if you don't use these constructs very often, here are three versions of a simple method that converts a collection of strings into integers.
public IEnumerable<int> ConvertStringToInt1(IEnumerable<string> strings)
{
    var returnCollection = new List<int>();

    foreach(var str in strings)
        returnCollection.Add(int.Parse(str));

    return returnCollection;
}

public IEnumerable<int> ConvertStringToInt2(IEnumerable<string> strings)
{
    foreach(var str in strings)
        yield return int.Parse(str);
}

public IEnumerable<int> ConvertStringToInt3(IEnumerable<string> strings)
{
    return strings.Select(str => int.Parse(str));
}
Late Evaluation
There is a significant difference between the first example and the others. The latter two, using yield return and LINQ to Objects respectively, aren't executed until the resulting collection is evaluated. This can happen much later in the program flow, or may not even happen at all, and can lead to some confusing results when the code is run, whether in tests or otherwise.

This generally isn't an issue if you're testing the result itself, since it will require you to evaluate the contents of the result, which then will trigger the method execution.

For example, the following test should pass without issues:
[TestMethod]
public void ConvertStringToInt2_PassIntsAsString_ValuesAsInt()
{
    var converter = new Converter();
    var strings = new List<string> { "1", "2", "3", "4", "5" };
    var collection = converter.ConvertStringToInt2(strings);
    Assert.IsTrue(collection.Any());
}
When Any() is called on the collection, evaluation starts and continues until it encounters an element, checking that the collection is not empty. If you step through the code, you should see that the ConvertStringToInt2 method is called once.

Side Effects
So far so good.

However, often we want to do more than just a simple call when iterating. Often we want other things to happen in addition, and this is where it becomes easy to trip up.

Consider the following variant of the string to integer converter, which has received the additional responsibility to update a flag if it encounters the value "4" in the conversion set.
public bool FourFound { get; private set; }

public IEnumerable<int> ConvertStringToInt4(IEnumerable<string> strings)
{
    foreach(var str in strings)
    {
        if (str == "4")
            FourFound = true;

        yield return int.Parse(str);
    }
}
Obviously, we'd want to test this behavior as well, and at first glance, the following test should do that.
[TestMethod]
public void GetCollection_Call_CollectionReturned()
{
    var converter = new Converter();
    var strings = new List<string> { "1", "2", "3", "4", "5" };
    converter.ConvertStringToInt4(strings);
    Assert.IsTrue(converter.FourFound);
}
However, this test fails.

The reason is that the result of the ConvertStringsToInt4 method is never evaluated since the result is never used anywhere. As such, the method is never actually called, and FourFound is never set to true.

The simple, short-term solution to this is to force an evaluation. A simple way is to tack on a .ToList() conversion to the method call as below.
[TestMethod]
public void GetCollection_Call_CollectionReturned()
{
    var converter = new Converter();
    var strings = new List<string> { "1", "2", "3", "4", "5" };
    converter.ConvertStringToInt4(strings).ToList();
    Assert.IsTrue(converter.FourFound);
}
As soon as .ToList() is added, the program must iterate through the conversion to create the list, and FourFound is set as expected. This can be an OK way to handle these situations in tests, in my opinion, but it is a warning sign.

Also note that it is necessary to iterate through the whole collection to ensure that all side effects are triggered. Using .Any(), for example, would not make this test pass, since the fourth element would never be evaluated. Likewise, assigning the method call to a variable will not iterate, since all that is assigned is a reference.

The Bigger Picture
While the .ToList() trick can make your tests pass or even make code function as expected, having something that volatile in your code can be dangerous. If I had production code checking the value of FourFound before it would be natural to evaluate the method result, there would be a bug there that could be pretty tricky to hunt down. That kind of code would force the programmer to remember to force an evaluation first, which defeats the point of having automatically delayed evaluation in the first place.

The Single Responsibility Principle (SRP) becomes key here. If your method has no other responsibilities than to generate the new collection, there is no way that other code will rely on anything in the collection before it's evaluated. Ideally, all code that doesn't directly lead to building the new collection is refactored out.

Sometimes, this can be difficult, and can result in having to iterate through the same collection multiple times. I might be processing a list that has come from the URL query string and I want to validate each element before I convert it. For collections known to be small, this may not be a big deal.

If the collections are large, however, or a lot of processing is necessary, this can have direct performance implications. If it is necessary to process everything with one iteration through the collection, my recommendation is to save yourself some trouble and use a traditional loop, like in ConvertStringToInt1, where there is no delayed evaluation.

1 comment:

  1. Excellent post! I completely agree: SRP is key. And, while I like the new language constructs Microsoft is adding to C#, I appreciate your final thought: that sometimes the "old-fashioned" way is the right way. That implies one should not use language constructs because they are new/hip/trendy, but because they are the right tool for the job.

    ReplyDelete