Last week, a good friend of mine posted a question on his blog, which track backs to Eric Lippert’s blog post Simple names are not so simple.
Side Note: I am big fan of that blog, its awesome and enlightening
Well, here is an excerpt of that post:
Check out the code below:
using System.Linq;
class Program
{
static void Main()
{
int[] data = { 1, 2, 3, 1, 2, 1 };
foreach (var m in from m in data orderby m select m)
System.Console.Write(m);
}
}Now, the question is:
Is this code valid or not??
If valid, how?
If not valid, why?
(Source: Eric Lippert’s Blog)
So, I worked on that problem posted some comments there, and presenting it again here in the form of a post.
The moment I read the problem, things started running in my mind around
var m in from m in data orderby m select m
as you can perceive, m is being used in two different contexts, first as a enumerator in LINQ query: " from m in data orderby m select m "
and the "var m " part in foreach. The question here: Can that possibly work?
Luckily, I was honing my skills in Reflection (System.Reflection) at that time, I get across a phrase in this book about foreach block (it says Cs compiler adds some temp. variable to smooth out operations like
a += 2;
or
foreach(var m in M){}
along with that there was also a prenotion that something "magical" happens when you encounter such situation, and whole code behaves like it is working in scopes (The blocks of { }).
So, the possible explanation, that came to my mind about this problem was :
the thing "var m in from m in data orderby m select m"
breaks up into two scopes : { var m } in { from m in data orderby m select m }
or in other terms var m in {from m in data orderby m select m}
We can proceed to further abstractions
var m = {from m’ in data order by m’ select m’}
that would eventually looks like
var m in m’ // where m’ is enumerable
[ 😮 Oops, m’ is not a valid name for a variable, but good for human beings]
OK! I have a possible explanation, now how to confirm that its right and not a conjecture.
I fired RSS Bandit and rushed to Eric Lippert’s blog…
<digression>
x-(
Though I like reading blogs, but Internet, blogs and twitter (a.k.a Knowledge Supernova) provide so much information to absorb, I occasionally procrastinate reading RSS and let them aggregate scheduled to be read on weekends, after I am exhausted from celebrating TGIF and TGIS, so I missed the Eric Lipperts blog post that mentioned this problem 🙁
</digression>
I fired RSS Bandit and rushed to Eric Lippert’s blog and tried to find out where this problem is being discussed, and I failed to figure its occurrence since it was phrased at the end of blog post and I was flipping over top.
This made me more curious to find the real answer, so I fired VS, hit Ctrl + C, Ctrl+ V and some tabs, ran it and yes it was running fine, just like you would expect.
But, that doesn’t answer actually what concerns me: Is my explanation is completely right?
Suddenly, an idea struck me: Utilize Reflection
Advantage was that, it will cause extra practice much needed to get comfortable with reflection
So, I started writing code:
First, I wrote
using System.Linq; using System.Reflection; using System.Reflection.Emit; using System.Collections.Generic; class Program { static void Main() { int[] data = { 1, 2, 3, 1, 2, 1 }; foreach (var m in from m in data orderby m select m) System.Console.Write(m); new Analyse().Run(); System.Console.ReadKey(); } } class Analyse { public void Run() { Assembly asm = Assembly.GetAssembly(typeof(Program)); MethodBody mb = asm.EntryPoint.GetMethodBody(); System.Console.WriteLine("nMethod Name: "+asm.EntryPoint.Name); foreach (var locals in mb.LocalVariables) { System.Console.WriteLine("n {0}", locals.LocalType.FullName); } System.Console.ReadKey(); } }
The Run method would actually list all the variables that are present after actual compilation, this will include the variables we have declared explicitly or implicitly as well as other variables introduced by compiler to store temporary results and perform calculations.
If you run it you will get output:
111223
Method Name: Main
System.Int32[]
System.Int32
System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib, Version=2.0.0.
0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]System.Boolean
Ok! Let me set a context, before I go for explaining things here
change the body of main() method inside program class to
using System.Linq; using System.Reflection; using System.Reflection.Emit; using System.Collections.Generic; class Program { static void Main() { int[] data = { 1, 2, 3, 1, 2, 1 }; foreach (var num in data) { System.Console.Write(num); } new Analyse().Run(); System.Console.ReadKey(); } } ... contd.
How the output changes:
123121
Method Name: Main
System.Int32[]
System.Int32
System.Int32[]
System.Int32
System.Boolean
A little explanation
the first system.Int32[] is this array int[ ] data = { 1, 2, 3, 1, 2, 1 };
next System.Int32 will receive value at each iteration and will be used in Write() [this is our var m]
the second System.Int32[] refers to ForEach’s copy of array. Remember foreach doesn’t allow changing contents, so it makes a copy.
the second Int32 is an index to track current iterator location
last is a boolean which stores the result of condition check
Note: this is again guess work, but still quiet predictable
Note that there are two occurrence of System.Int32, than how can I figure out, what is the purpose of the second one and last one
so, we again change the main() body to
static void Main() { char[] ch = { 'a', 'b', 'c' }; foreach (var num in ch) { System.Console.Write(num); } new Analyse().Run(); System.Console.ReadKey(); } ... ..
the output changes to :
abc
Method Name: Main
System.Char[]
System.Char
System.Char[]
System.Int32
System.Boolean
so, you can see that, system.char (replacing first System.Int32) is getting the assigned value that will be Write() outputted, and thus by comparing the ordinal similarity between the two outputs, it would be right to think that first System.Int32 is receiving the current value of data (index to which, iteration is pointing).
Back to our actual code, analysing the output
111223
Method Name: Main
System.Int32[]
System.Int32
System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib, Version=2.0.0.
0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]System.Boolean
line by line, we can see that
System.Int32[ ]
(the actual data[] array)
System.Int32
(that will receive value at each iteration and will be WriteLined/outputted)
System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib blah blah blah… ]]
( the generic IEnumerator generated from LINQ expression obtained, now all classes implementing IEnumerator has a method called MoveNext( ), hence it will not require any additional Sytem.int32 indexer, hence here it is absent in output)
System.Boolean
(condition check result as previously stated)
From all this, we get that compiler handle duplicate references smartly, until we play by his rules and work according to C# language specs 🙂
This technique forms (No, I didn’t discovered it) another tool to analyse the code behaviour.
As a side note, other such techniques you are familiar with are:
- looking at preprocessor output (those *.i files in C)
- looking at post compile code
- Looking at generated assembly code
- running ILDASM x-(
- debugging??
- blah blah blah (ask a real Expert, he will provide you an exhaustive list of such techniques)
- THE HAPPY ENDING
finally, on the other day when I was reading posts aggregated in RSS Bandit from coding horror, many other blogs and somewhere in middle Eric’s blog, I find this problem hiding at the bottom of the blog post, and by reading the blog post you can get the explanation and It seems that, I stand correct :)Still if you think, I am wrong somewhere, please comment here. I will be happy to get the right facts.
Edit: I just found out that, Eric also posted next post in continuation to that problem, where he explains excellently the behaviour of this code, in his own way.