Let me start out by telling you that I am currently implementing a (simple) BigInteger implementation for Delphi. This implementation tries to be compatible with the BigIntegers as used in .NET, so I can test my results to be the same as the results of my tests in C#.
RightShift must use two's complement semantics, and shifting a negative integer must shift in 1-bits from the top, so
$FFFFFF9C) shifts to
hex $FFFFFFCE). For normal
Integer values, the result in Delphi would be
$7FFFFFCE), because Delphi does a simple shift and does not preserve the sign bit. But my BigIntegers should, because most other Biginteger implementations (including the .NET one, my reference) do that too.
To achieve this, in my routine, I did something like this:
class operator BigInteger.RightShift(const Value: BigInteger; Shift: Integer): BigInteger; var LSize: Integer; ShiftOffset: Integer; RSize: Integer; P: PLimb; begin if Value.IsZero then Exit(BigInteger.Zero); if Value.IsPositive then begin // Call internal methods that allocate the result, shift the value right, etc. // Works fine, no problem. end else begin // Recursive call Result := MinusOne - ((MinusOne - Value) shr Shift); end; end;
That gave me the right results, so I was happy until I started benchmarking the new routine. I noticed that the right shift was more than twice as slow as the corresponding left shift. I did not understand this, because a right shift actually has to move fewer bits and the result is smaller. Even if I only passed in positive values, it would still be slow. The code was almost the same as the code for left shift, though. Well, except for the
People cleverer than I am probably already see the problem: the "negative" branch (the
else clause) of the
if clause. When I removed it, code was indeed faster than that for left shifts. But when I put it back in, even though it would never run, it slowed down everything. Only after some hard thinking and some debugging I noticed, in the CPU view, that there were calls to
FinalizeArray in the compiled routine, and that everything was surrounded by a
try-finally block. The expression in the "negative" branch had some intermediate results, and records for these were allocated and managed by the runtime. That made the entire code slow.
My first solution was to put the code for the "negative" branch into a nested procedure of its own. The hidden local BigIntegers were now confined in that procedure and would not cause the entire routine to be slow. The "positive" part was indeed up to speed now.
I finally deconstructed
Result := MinusOne - ((MinusOne - Value) shr Shift); into internal calls entirely, so no hidden BigIntegers were allocated anymore. Now I could put the "negative" code back from the nested procedure to the negative branch, and it was quite a bit faster as well.
I learned two things from this:
- Beware of hidden code. Here, it was caused by the expression with intermediate results.
- Inline with caution. If I had inlined the nested routine, the hidden BigIntegers would have been put back in the outer scope, and things would have been slow again.