TechEd 2008 notes: How LINQ Works

How LINQ Works: A Deep Dive into the Implementation of LINQ
Alex Turner
C# Compiler Program Manager

This is a 400-level (advanced) talk about the implementation of LINQ.

  • What’s the compiler doing behind the scenes? Layers it translates down to
  • Differences between the translation in the object world vs. remote store

Example of LINQ syntax: GetLondoners()

var query = from c in LoadCustomers()
            where c.City == "London"
            select c;
  • They didn’t want to bake any knowledge of how to do queries into the compiler; instead they use libraries, so you could even use your own implementation of Where() if you really wanted to

Where() as it would look with .NET 1.x delegates:

bool LondonFilter(Customer c)
{
    return c.City == "London";
}
...
var query = LoadCustomers().Where(LondonFilter);
  • You don’t really want to make a new method for each filter
  • Solved in .NET 2.0 with anonymous delegates, but they were too wordy to encourage use of functional libraries
  • Rewritten with C# 3.0 lambdas:
var query = LoadCustomers().Where(c => c.City == "London");

Proving what it’s actually compiled to:

  • Use Reflector
  • In Reflector Options, set Optimization to “.NET 1.0”, so it doesn’t try to re-create the LINQ syntax for us
    • Interestingly, it does still show extension-method syntax and anonymous-type instantiations. Have to turn optimizations off entirely to see those, but then you’ll go crazy trying to read the code.
  • Anonymous delegates make:
    • A cache field with a wacky name and a [CompilerGenerated] attribute
    • A method with a wacky name and a [CompilerGenerated] attribute
    • Generated names have characters that aren’t valid in C# identifiers, but that are valid for CLR. Guarantees its generated names don’t clash with anything we could possibly write.
  • Implementing Where: you don’t really want to build a whole state machine. Use iterators instead:
static class MyEnumerable
{
    public static IEnumerable<TSource> Where<TSource>(
        this IEnumerable<TSource> source, Func<TSource, bool> filter)
    {
        foreach (var item in source)
            if (filter(item)
                yield return item;
    }    
}
  • I didn’t realize .NET 2 iterators were lazy-initialized. Cool.

Side note: You can set a breakpoint inside an anonymous delegate, or on a lambda expression, even if it’s formatted on the same line of source code as the outer call. Put the cursor inside the lambda and press F9; I don’t think you can click on the gutter to set a breakpoint on anything other than the beginning of the line.

Side note: When you step into the c.City == "London" in the LINQ where clause, the call stack shows it as “Main.Anonymous method”.

var query = from c in LoadCustomers()
            where c.City == "London"
            select new { c.ContactName, c.Phone };
  • Anonymous type:
    • Generated with another C#-impossible name, and it’s generic.
    • Immutable.
    • Default implementations for Equals, GetHashCode, ToString.

LINQ to SQL: We don’t want to do any of this anonymous-delegate generation. Instead, want to translate the intent of the query into T/SQL, so the set logic runs on the server.

Side note: Generated NorthwindDataContext has a Log property. Set it to Console.Out and you’ll get info about the query that was generated for us.

Func<Customer, bool> myDelegate = (c => c.City == "London");
Expression<Func<Customer, bool>> myExpr = (c => c.City == "London");
  • The first is just a delegate.
  • The second is a parse tree.
    • C# samples have an Expression Tree Visualizer that you can download.
  • This runs a different Where method. That’s because here we’ve got an IQueryable<T>, rather than just an IEnumerable<T>.
  • Where() takes an Expression<Func<TSource, bool>> predicate. So the compiler generates an expression tree.
  • Where() just returns source.Provider.CreateQuery(...new expression...), where the new expression is a method call to itself, with parameters that, when evaluated, become the parameters it was called with. (Is your head spinning yet?) It basically just builds the expression-tree version of the call to itself, which is later parsed by LINQ to SQL and turned into an SQL query.
  • LINQ to Objects: code that directly implements your intent
  • LINQ to SQL: data that represents your intent

The difference is all in the extension methods.

Leave a Reply

Your email address will not be published. Required fields are marked *