Let me start out by telling you that I am currently implementing a (simple) BigInteger implementation for Delphi. This implementation tries to be compatible with the BigIntegers as used in .NET, so I can test my results to be the same as the results of my tests in C#.
For this, RightShift
must use two's complement semantics, and shifting a negative integer must shift in 1-bits from the top, so -100
($FFFFFF9C
) shifts to -50
(hex $FFFFFFCE
). For normal Integer
values, the result in Delphi would be 2147483598
(hex $7FFFFFCE
), because Delphi does a simple shift and does not preserve the sign bit. But my BigIntegers should, because most other Biginteger implementations (including the .NET one, my reference) do that too.
To achieve this, in my routine, I did something like this:
class operator BigInteger.RightShift(const Value: BigInteger;
Shift: Integer): BigInteger;
var
LSize: Integer;
ShiftOffset: Integer;
RSize: Integer;
P: PLimb;
begin
if Value.IsZero then
Exit(BigInteger.Zero);
if Value.IsPositive then
begin
// Call internal methods that allocate the result, shift the value right, etc.
// Works fine, no problem.
end
else
begin
// Recursive call
Result := MinusOne - ((MinusOne - Value) shr Shift);
end;
end;
That gave me the right results, so I was happy until I started benchmarking the new routine. I noticed that the right shift was more than twice as slow as the corresponding left shift. I did not understand this, because a right shift actually has to move fewer bits and the result is smaller. Even if I only passed in positive values, it would still be slow. The code was almost the same as the code for left shift, though. Well, except for the else
clause.
People cleverer than I am probably already see the problem: the "negative" branch (the else
clause) of the if
clause. When I removed it, code was indeed faster than that for left shifts. But when I put it back in, even though it would never run, it slowed down everything. Only after some hard thinking and some debugging I noticed, in the CPU view, that there were calls to InitializeArray
and FinalizeArray
in the compiled routine, and that everything was surrounded by a try-finally
block. The expression in the "negative" branch had some intermediate results, and records for these were allocated and managed by the runtime. That made the entire code slow.
First solution
My first solution was to put the code for the "negative" branch into a nested procedure of its own. The hidden local BigIntegers were now confined in that procedure and would not cause the entire routine to be slow. The "positive" part was indeed up to speed now.
I finally deconstructed Result := MinusOne - ((MinusOne - Value) shr Shift);
into internal calls entirely, so no hidden BigIntegers were allocated anymore. Now I could put the "negative" code back from the nested procedure to the negative branch, and it was quite a bit faster as well.
Conclusion
I learned two things from this:
- Beware of hidden code. Here, it was caused by the expression with intermediate results.
- Inline with caution. If I had inlined the nested routine, the hidden BigIntegers would have been put back in the outer scope, and things would have been slow again.