Just another way to analyse code behaviour – In my humble opinion

Last week, a good friend of mine posted a question on his blog, which track backs to Eric Lippert’s blog post Simple names are not so simple.

Side Note: I am big fan of that blog, its awesome and enlightening

Well, here is an excerpt of that post:

Check out the code below:

using System.Linq;
class Program
{
  static void Main()
  { 
      int[] data = { 1, 2, 3, 1, 2, 1 };
      foreach (var m in from m in data orderby m select m)
            System.Console.Write(m);
  }
}

frustration3

Now, the question is:

  Is this code valid or not??

  If valid, how?

  If not valid, why?

(Source: Eric Lippert’s Blog) 

So, I worked on that problem posted some comments there, and presenting it again here in the form of a post.

The moment I read the problem, things started running in my mind around

var m in from m in data orderby m select m

as you can perceive, m is being used in two different contexts, first as a enumerator in LINQ query: " from m in data orderby m select m "

and the "var m " part in foreach. The question here: Can that possibly work?

Luckily, I was honing my skills in Reflection (System.Reflection) at that time, I get across a phrase in this book about foreach block (it says Cs compiler adds some temp. variable to smooth out operations like

a += 2;

or

foreach(var m in M){}

along with that there was also a prenotion that something "magical" happens when you encounter such situation, and whole code behaves like it is working in scopes (The blocks of { }).

So, the possible explanation, that came to my mind about this problem was :

the thing "var m in from m in data orderby m select m"

breaks up into two scopes : { var m } in { from m in data orderby m select m }

or in other terms var m in {from m in data orderby m select m}

We can proceed to further abstractions

var m = {from m’ in data order by m’ select m’}

that would eventually looks like

var m in m’ // where m’ is enumerable

[ 😮 Oops, m’ is not a valid name for a variable, but good for human beings]

 

OK! I have a possible explanation, now how to confirm that its right and not a conjecture.

I fired RSS Bandit and rushed to Eric Lippert’s blog…

<digression>

x-(

Though I like reading blogs, but Internet, blogs and twitter (a.k.a Knowledge Supernova) provide so much information to absorb, I occasionally procrastinate reading RSS and let them aggregate scheduled to be read on weekends, after I am exhausted  from celebrating TGIF and TGIS, so I missed the Eric Lipperts blog post that mentioned this problem 🙁

</digression>

I fired RSS Bandit and rushed to Eric Lippert’s blog and tried to find out where this problem is being discussed, and I failed to figure its occurrence since it was phrased at the end of blog post and I was flipping over top.

This made me more curious to find the real answer, so I fired VS, hit Ctrl + C, Ctrl+ V and some tabs, ran it and yes it was running fine, just like you would expect.

image

But, that doesn’t answer actually what concerns me: Is my explanation is completely right?

Suddenly, an idea struck me: Utilize Reflection

Advantage was that, it will cause extra practice much needed to get comfortable with reflection

So, I started writing code:

First, I wrote

using System.Linq;
using System.Reflection;
using System.Reflection.Emit;
using System.Collections.Generic;
    class Program 
    { 
      static void Main() 
      {  
         int[] data = { 1, 2, 3, 1, 2, 1 }; 

          foreach (var m in from m in data orderby m select m)
             System.Console.Write(m); 

          new Analyse().Run(); 

          System.Console.ReadKey();
      } 
    } 

    class Analyse
    {
        public void Run()
        {
            Assembly asm = Assembly.GetAssembly(typeof(Program));
            MethodBody mb = asm.EntryPoint.GetMethodBody();
            System.Console.WriteLine("nMethod Name: "+asm.EntryPoint.Name);
            foreach (var locals in mb.LocalVariables)
            {
                System.Console.WriteLine("n {0}", locals.LocalType.FullName);
            }
            System.Console.ReadKey();
        }
    }

The Run method would actually list all the variables that are present after actual compilation, this will include the variables we have declared explicitly or implicitly as well as other variables introduced by compiler to store temporary results and perform calculations.

If you run it you will get output:

 

111223

Method Name: Main

System.Int32[]

System.Int32

System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib, Version=2.0.0.

0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]

System.Boolean

Ok! Let me set a context, before I go for explaining things here

change the body of main() method inside program class to

using System.Linq;
using System.Reflection;
using System.Reflection.Emit;
using System.Collections.Generic;
    
    class Program 
    { 
      static void Main() 
      {  
         int[] data = { 1, 2, 3, 1, 2, 1 };
        
          foreach (var num in data)
          {
              System.Console.Write(num);
          }

          new Analyse().Run();

          System.Console.ReadKey();
      } 
    }
... contd.

How the output changes:

123121

Method Name: Main

System.Int32[]

System.Int32

System.Int32[]

System.Int32

System.Boolean

A little explanation

 

the first system.Int32[] is this array int[ ] data = { 1, 2, 3, 1, 2, 1 };

next System.Int32 will receive value at each iteration and will be used in Write() [this is our var m]

the second System.Int32[] refers to ForEach’s copy of array. Remember foreach doesn’t allow changing contents, so it makes a copy.

the second Int32 is an index to track current iterator location

last is a boolean which stores the result of condition check

 

Note: this is again guess work, but still quiet predictable :P

Note that there are two occurrence of System.Int32, than how can I figure out, what is the purpose of the second one and last one

so, we again change the main() body to

 

static void Main() 
      {  

         char[] ch = { 'a', 'b', 'c' };

         foreach (var num in ch)
         {
             System.Console.Write(num);
         }

          new Analyse().Run();

          System.Console.ReadKey();
      } 
...
..

the output changes to :

abc

Method Name: Main

System.Char[]

System.Char

System.Char[]

System.Int32

System.Boolean

so, you can see that, system.char (replacing first System.Int32) is getting the assigned value that will be Write() outputted, and thus by comparing the ordinal similarity between the two outputs, it would be right to think that first System.Int32 is receiving the current value of data (index to which, iteration is pointing).

Back to our actual code, analysing the output

111223

Method Name: Main

System.Int32[]

System.Int32

System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib, Version=2.0.0.

0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]

System.Boolean

line by line, we can see that

System.Int32[ ]

(the actual data[] array)

System.Int32

(that will receive value at each iteration and will be WriteLined/outputted)

System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib blah blah blah… ]]

( the generic IEnumerator generated from LINQ expression obtained, now all classes implementing IEnumerator has a method called MoveNext( ), hence it will not require any additional Sytem.int32 indexer, hence here it is absent in output)

System.Boolean

(condition check result as previously stated)

From all this, we get that compiler handle duplicate references smartly, until we play by his rules and work according to C# language specs 🙂

This technique forms (No, I didn’t discovered it) another tool to analyse the code behaviour.

As a side note, other such techniques you are familiar with are:

  • looking at preprocessor output (those *.i files in C)
  • looking at post compile code
  • Looking at generated assembly code
  • running ILDASM x-(
  • debugging??
  • blah blah blah (ask a real Expert, he will provide you an exhaustive list of such techniques)
    THE HAPPY ENDING

    finally, on the other day when I was reading posts aggregated in RSS Bandit from coding horror, many other blogs and somewhere in middle Eric’s blog, I find this problem hiding at the bottom of the blog post, and by reading the blog post you can get the explanation and It seems that, I stand correct :)Still if you think, I am wrong somewhere, please comment here. I will be happy to get the right facts.

Edit: I just found out that, Eric also posted next post in continuation to that problem, where he explains excellently the behaviour of this code, in his own way.