Increment differences in C++ and C#

Wednesday, March 29, 2006

A friend was a bit surprised today to find that the postfix increment operator (i++) doesn’t always work exactly the same in both C++ and C#. I’m not a language lawyer and this is the kind of thing I usually file under “you shouldn’t be doing that in the first place” but I thought this might serve as a good example.

Here is the offending (offensive) code:

int i = 1;
i = i++;

What is the value of ‘i’ after this code has run? Don’t ask your compiler. Instead, try and figure it out based on your understanding of the grammar.

The reason I put this into the category of “you shouldn’t be doing that in the first place” is because it’s dangerous to write expressions that include operands with side effects. The problem is that the order in which the operands of individual operators are evaluated is undefined. As with many of the things that are left “undefined” in C and C++ this is to allow compilers to optimize the code without unnecessary constraints. I would then argue that this isn’t really a difference between the C++ and C# languages. It just happens to be a difference in the undefined behavior from different compiler implementations.

So as general rule, you should consider the result of expressions where a value is modified more than once to be undefined. Can you think of a case where this is not true? Why the comma operator of course! The lesser known comma operator is always evaluated left-to-right.

i = 1, ++i, i++;

In this example ‘i’ becomes 3. Of course unlike most of the other C++ operators, C# didn’t inherit the comma operator.

Now let’s go back to the original example:

int i = 1;
i = i++;

Although the results are undefined, the results can be interesting (in a useless sort of way) as you examine the different compilers. The reason I’m still talking about this is because some people like to think the results of one compiler are somehow “better” than the results of another compiler. Let’s take a quick look at the difference between the Visual C++ and Visual C# compilers and you will realize that the results, although different, are equally meaningless. Here I use MSIL as a common medium for discussion.

The Visual C# compiler basically follows the following logic:

Instructions Stack Variable

                           1
ldloc i         1          1
dup             1, 1       1
ldc.i4.1        1, 1, 1    1
add             1, 2       1
stloc i         1          2
stloc i                    1

The compiler pushes the value of ‘i’ onto the stack and duplicates it so that the previous value can be retrieved. It then continues to increment the variable by pushing the constant 1 onto the stack and adds the values at the top of the stack. It now pops the result off stack and writes it back to ‘i’ as the result of the increment. Finally it returns the previous value as the result of the assignment operator not realizing that this is also referring to the same variable and write the previous value of ‘i’ back to ‘i’. Boy that was a lot of instructions for nothing.

The Visual C++ compiler on the other hand goes about things a little different:

Instructions Stack Variable

                        1
ldloc i         1       1
stloc i                 1
ldloc i         1       1
ldc.i4.1        1, 1    1
add             2       1
stloc i                 2

The compiler pushes the value of ‘i’ onto the stack and then assigns it to ‘i’ by popping it off the stack not realizing that it’s the same variable. It then continues to increment the variable by pushing the value of ‘i’ onto the stack again followed by the constant 1 and adds the values. Finally it pops the value off the stack and writes it back to ‘i’ as the result of the increment.

At the end of the day the C# compiler results in a value of 1 and the C++ compiler results in a value of 2. Neither is right, neither is wrong and both are undefined.

We need to give the Visual C++ compiler credit. It is so focused on optimizing the code it cuts to the chase and produces the following for optimized (release) builds:

int i = 2;

What’s the moral of the story? Don’t rely on undefined behavior.

11 Comments

> What is the value of ‘i’ after this code

> has run?

'i' doesn't have to have any value. The code doesn't have to finish running. The code is allowed to format your hard disk.

> it’s dangerous to write expressions that

> include operands with side effects.

That's an overreaction. Without side effects you wouldn't get anything done. Consider two examples of correct code, maybe this:

i = i;

i++;

or this:

int j = i;

i++;

i = j;

In the first example, there are two expressions with side effects. The first one doesn't accomplish much but the second one does, and you wouldn't get i incremented without it.

In the second example, officially there are two expressions with side effects, but the initializer is obviously a moral equivalent to a side effect. The third side effect undoes the result of the first side effect so the net result is null, but still the meaning of each side effect by itself is pretty clear. You wouldn't get much programming done without these kinds of expressions.

Of course the original code isn't required to be equivalent to either of these two examples. These are two ways in which humans are most likely to think but the standard simply prohibits the original code. The standard doesn't require an implementation to do either of them, nor to do anything else in particular, nor to refrain from anything else in particular.

Norman Diamond - Thursday, March 30, 2006 3:03:00 AM

Dean: thanks for the link. For those of you who want to walk the line between well defined and undefined behavior that may help to avoid falling off unexpectedly. :)

Norman: I never know when you’re serious and when you’re joking. Regarding the comment about side effects in expressions you should probably read it as I intended it. More clearly: it’s dangerous to write expressions with more than trivial side effects. Naturally there’s nothing wrong with side effects and increment operators are invaluable but you can easily write expressions that are misleading even if they do not produce undefined behavior. As I’ve said before, you really need to get your own blog. :)

Thanks for the comments!

Kenny Kerr - Thursday, March 30, 2006 3:14:00 AM

In C# (and Java) it's explicitly defined what a statement like "i = i++" should do, and that's what you see in the C# code.

The C and C++ standards were largely written by compiler implementors, who like their compilers to look good in benchmarks, so the C/C++ standards goes out of it's way to make certain things "ambiguous" to allow for the compiler to optimize. But C# and Java were designed with different goals, namely to ensure that the same code will run exactly the same way on many different architectures and implementations. That means you really *do* need to be explicit with such constructs.

And that thing about formatting the hard disk is probably over reacting as well :-) From the standard's point of view, it doesn't care what the result is, but practically speaking it's only ever going to be "1" or "2".

Dean Harding - Thursday, March 30, 2006 3:33:00 AM

> Norman: I never know when you're serious

> and when you're joking.

Aha, so now I know who my wife has been taking lessons from.

Anyway, there is no joke about the fact that the C and C++ standards defined side effects the way they did, the fact that undefined behaviour means undefined behaviour, and the fact that the first two facts here are completely independent but the original code invoked both of them to do its damage.

> And that thing about formatting the hard

> disk is probably over reacting as well :-)

Yeah, who ever found their partition formatted and ready to use when they were expecting it not to be? Far more common is the opposite. Who knows how many partitions have had their formatting destroyed by bugs like this vs. how many have been destroyed by bugs different from this. The difference between the way Windows 95 and Windows 98 laid out partitions does look like Windows 98 fixed a bug very much like this one.

Norman Diamond - Thursday, March 30, 2006 5:10:00 AM

Kenny, I'm glad I'm not alone in feeling a shiver down my spine when I see code like this.

As far as I'm concerned, it is far more important to write easy to understand code than it is to exploit every clever shortcut/trick a programming language provides.

But then in my case it could be a case of "simple is as simple does" (to paraphrase Forrest Gump) ;-)

Ashley Visagie - Thursday, March 30, 2006 9:51:00 AM

The C++ behavior (though undefined in the standard and thus compiler dependent) is the more intuitive one in my opinion.

The C# behavior is counter-intuitive!

Nish - Thursday, March 30, 2006 1:24:00 PM

A prefix operator should return the value of the variable AFTER the operation, while a postfix operator should return the value of the variable BEFORE the operation. This should have been the case in C++ from my understanding of the language.

int i = 1;

i = i++; // should end up with i = 1 since using postfix

int i = 1;

i = ++i; // should end up with i = 2 since using prefix

I must have had this same discussion 20 times over the past few months after having discovered the exact same code while reviewing another developer's bug. Regardless of the outcome, code like this can be confusing and should be avoided.

Ron Buckton - Thursday, March 30, 2006 1:38:00 PM

I agree with Ron in what he said. However, when assigning i = i++, that will yield 2, because i++ modifies i. if you did, say,

int i = 1;

int j = i++;

then j would be 1, because it's a post increment, but now i is 2.

int i = 1;

i = i++;

and

int i = 1;

int i = ++i;

should both result in 2. that's what i read it as when i read it, but i also seem to always miss the easy questions. :)

-darren

Darren Kopp - Thursday, March 30, 2006 8:31:00 PM

Darren,

In release mode, the VC++ 2005 compiler optimizes away the assignment.

Nish - Friday, March 31, 2006 2:21:00 AM

Actually, I think I can see what the difference could be.

I think it all depends on what you actually think i++ returns. If i++ returns "the variable i, which will be incremented by 1 afterward", then it should be 2, as with c++. Logically, i is set to itself, and then i is incremented.

However, if you define i++ to return 'the value of i, before it is incremented (now)', then i++ will evaluate to 1, and as part of that evaluation, i will be incremented to 2. Then, the result of that evaluation (1) will be assigned to i, and i becomes 1 again.

Personally, because incrementing is a value-type operation, I prefer the latter- I want i to be incremented as a part of the expression i++, before the greater expression i = i++ is evaluated.

Sadly, I came upon this page when googling for "comma operator in c#". Woe is me.

Brian - Sunday, September 3, 2006 1:12:24 AM

It doesn't matter what it produces. What most people fail to realize about undefined behavior is that the compiler can glean additional information from it.

In the case of i = i++, the compiler (for example) can assume that i and i cannot alias each other so that all writes to i don't affect reads from i. Then it can go off and use that information to reorder code in strange ways (including subtle ones that manifests itself on rare conditions), like loops that never terminate because it hoisted the read from i inside a loop to above the loop.

asdf - Friday, February 23, 2007 11:01:05 PM

Comments have been disabled for this content.