Sunday 4 October 2015

Speed problems caused by code that never ran

Let me start out by telling you that I am currently implementing a (simple) BigInteger implementation for Delphi. This implementation tries to be compatible with the BigIntegers as used in .NET, so I can test my results to be the same as the results of my tests in C#.

For this, RightShift must use two's complement semantics, and shifting a negative integer must shift in 1-bits from the top, so -100 ($FFFFFF9C) shifts to -50 (hex $FFFFFFCE). For normal Integer values, the result in Delphi would be 2147483598 (hex $7FFFFFCE), because Delphi does a simple shift and does not preserve the sign bit. But my BigIntegers should, because most other Biginteger implementations (including the .NET one, my reference) do that too.

To achieve this, in my routine, I did something like this:

class operator BigInteger.RightShift(const Value: BigInteger; 
  Shift: Integer): BigInteger;
  LSize: Integer;
  ShiftOffset: Integer;
  RSize: Integer;
  P: PLimb;
  if Value.IsZero then

  if Value.IsPositive then
    // Call internal methods that allocate the result, shift the value right, etc.
    // Works fine, no problem.
    // Recursive call
    Result := MinusOne - ((MinusOne - Value) shr Shift);

That gave me the right results, so I was happy until I started benchmarking the new routine. I noticed that the right shift was more than twice as slow as the corresponding left shift. I did not understand this, because a right shift actually has to move fewer bits and the result is smaller. Even if I only passed in positive values, it would still be slow. The code was almost the same as the code for left shift, though. Well, except for the else clause.

People cleverer than I am probably already see the problem: the "negative" branch (the else clause) of the if clause. When I removed it, code was indeed faster than that for left shifts. But when I put it back in, even though it would never run, it slowed down everything. Only after some hard thinking and some debugging I noticed, in the CPU view, that there were calls to InitializeArray and FinalizeArray in the compiled routine, and that everything was surrounded by a try-finally block. The expression in the "negative" branch had some intermediate results, and records for these were allocated and managed by the runtime. That made the entire code slow.

First solution

My first solution was to put the code for the "negative" branch into a nested procedure of its own. The hidden local BigIntegers were now confined in that procedure and would not cause the entire routine to be slow. The "positive" part was indeed up to speed now.

I finally deconstructed Result := MinusOne - ((MinusOne - Value) shr Shift); into internal calls entirely, so no hidden BigIntegers were allocated anymore. Now I could put the "negative" code back from the nested procedure to the negative branch, and it was quite a bit faster as well.


I learned two things from this:

  • Beware of hidden code. Here, it was caused by the expression with intermediate results.
  • Inline with caution. If I had inlined the nested routine, the hidden BigIntegers would have been put back in the outer scope, and things would have been slow again.