Monday, 22 January 2018

Accessing private methods of another class

In versions of Delphi before 10.1 Berlin, it was possible to access private members and private methods of a class simply by defining a class helper for it and accessing these private items in its methods.

But that it was so easy to access private members undermined the basic OO principles of information hiding and encapsulation. That is why this loophole (which could be considered a bug) was closed in Delphi 10.1 Berlin. But there was a lot of howling about this, because it was so easy and so convenient to have access to everything, even to other classes' "private parts".

There has been a lot of discussing going on about this. I will not repeat the arguments made there, but will state that I agree with the closing of the loophole. Accessing private members or methods is a hack, and a hack should never be easy. This hack was far too easy. Instead of using the hack, people should rather have tried to find other ways to achieve their goals.

But there is one argument of the pro-helper people that actually made sense: sometimes a class is not designed well, and you must resort to hacks.

Three hacks

They have all been mentioned before, in several blogs and StackOverflow answers, but I want to recap them here, in one post. All of these still require helper classes, as only there, you can still access private methods or members.

Accessing private methods

Say we have a class, let's call it THasPrivateMethods, that has at least one private method, let's say procedure PrivateMethod, that you want to call. The following three ways to call the private method have all been verified in Delphi 10.2.2 Tokyo. They should also work in previous versions.

Using assembler

The first way to get at private methods is to define a helper class with a new method. Here, I'll call the helper THelper and the first method CallingPrivateMethodUsingAsm. This method must be an assembler method:

type
  THelper = class helper for THasPrivateMethods
  public
    procedure CallingPrivateMethodUsingAsm;
  end;

The implementation is extremely simple:

{$STACKFRAMES OFF}
procedure THelper.CallingPrivateMethodUsingAsm;
asm
       JMP     THasPrivateMethods.PrivateMethod
end;

In other words, in assembler, helper methods can still access private methods.

Using a method pointer

For those who are not very good with assembler, there is another, more tedious way, using a method pointer:

    procedure CallingPrivateMethodUsingTMethod; inline;

The implementation takes the address of the private method (which can apparently still be queried), puts it in a method pointer and then calls the method pointer:

type
  TPrivateMethod = procedure of object;

...
  
procedure THelper.CallingPrivateMethodUsingTMethod;
var
  M: TMethod;
begin
  M.Code := @THasPrivateMethods.PrivateMethod;
  M.Data := Self;
  TPrivateMethod(M);
end;

Using with

Inside a method of a class helper, you can't call a private method directly anymore, but you can use a simple trick (probably forgotten when they closed the loophole), using "with Self do". I am no fan of "with", but in this special case, it works as expected:

procedure THelper.CallingPrivateMethodUsingWith;
begin
  with Self do PrivateMethod;
end;

Yes, it is that easy. But you must use "with Self do" to call the private method. Now, if you inline this method, there is absolutely no overhead. It will work as if you had called the private method in your own routine:

procedure NeedsAccessToPrivateMethod;
var
  H: THasPrivateMethods;
begin
  H := THasPrivateMethods.Create;
  try
    // ...
    H.CallingPrivateMethodUsingWith;
    // ...
  finally
    H.Free;
  end;
 end;
 

Again, if CallingPrivateMethodUsingWith is inlined, the resulting code is the same as if you had written:

  try
    // ...
    H.PrivateMethod;
    // ...
  finally

If you can read assembler, you can verify this in the CPU view of the IDE.

I will discuss accessing private fields in a later post.

Monday, 15 January 2018

Strings too slow outside WIN32?

In a recent debate I had it was said that strings in the Win64 runtime are too slow to be useful. That is, in my opinion, a gross exaggeration. It is true that the Win32 runtime library (RTL) has benefited a lot from the work of the FastCode project, usually with routines in extremely clever assembler. For all other platforms, often the routines are in plain Object Pascal, so no assembler is being used. Also, far fewer routines have been replaced by clever implementations.

One very obvious example of this is the Pos function, which searches if a certain string (I call that the Needle) can be found in a larger one (the Haystack). The Win32 implementation is in highly optimized assembler, written by Aleksandr Sharahov from the FastCode project, and licensed by CodeGear. The Win64 implementation is in plain Pascal (PUREPASCAL). But the implementation for UnicodeString is not the same, or even similar, to the implementation for AnsiString!

The implementation for UnicodeString is slower than the same routine for Win32. On my system a search in Win64 takes approx. 1.8 × the time it needs in Win32. On Win32, Pos for AnsiString is about as fast (or sometimes even slightly faster than) Pos for UnicodeString. But on Win64, Pos for AnsiString takes 2 × the time Pos for UnicodeString needs!

If you look at the sources in System.pas, you'll see that the Unicode version is slightly better optimized (searching for the first Char in the Needle first, and only checking the rest if a match was found).

For fun, I took the code for the UnicodeString implementation and converted it to work for AnsiString. It was slightly faster than System.Pos for UnicodeString, instead of 2 times as slow. I wonder why, in System.pas, the AnsiString implementation does not simply use the same code as that for UnicodeString, like I did. If I were a suspicious person, I would think it was done on purpose, to deprecate AnsiString by making it less usable.

But even that can be improved upon. I wrote three implementations of my own routine, one for AnsiString, one for UnicodeString and one for TBytes (many people have complained that TBytes lacks something like Pos and that was the reason they maintained the incredibly bad habit of using strings to store binary data — <shudder> — I wanted to take away that silly argument).

Code

Here is the code for my RVPosExA function (for what it's worth: these days, there is no difference between PosEx and Pos anymore: both have the exact same functionality and signature):

function RVPosExA(const Needle, Haystack: AnsiString; 
  Offset: Integer = 1): Integer;
type
  PUInt32 = ^UInt32;
  PUInt16 = ^UInt16;
{$IFNDEF CPU32BITS}
var
  LNeedleTip: UInt32;
  PNeedle: PAnsiChar;
  PHaystack, PEnd: PAnsiChar;
  LLenNeedle: Integer;
  LCmpMemOffset: Integer;
{$ENDIF}
begin
{$IFDEF CPU32BITS}
  // FastCode (asm) implementation.
  Result := System.Pos(Needle, Haystack, Offset); 
{$ELSE}
  if Offset - 1 + Length(Needle) > Length(Haystack) then
    Exit(0);
  Result := 0;
  PHaystack := PAnsiChar(Haystack) + Offset - 1;
  PEnd := PHaystack + Length(Haystack) - Length(Needle) + 1;
  case Length(Needle) of
    0: Exit(0);
    1:
      begin
        LNeedleTip := PByte(Needle)^;
        while PHaystack < PEnd do
          if PByte(PHaystack)^ = LNeedleTip then
            Exit(PHaystack - PAnsiChar(Haystack) + 1)
          else
            Inc(PHaystack);
        Exit(0);
      end;
    2:
      begin
        LNeedleTip := PUInt16(Needle)^;
        while PHaystack < PEnd do
          if PUInt16(Haystack)^ = LNeedleTip then
            Exit(PHayStack - PAnsiChar(Haystack) + 1)
          else
            Inc(PHaystack);
        Exit(0);
      end;
    3:
      begin
        LNeedleTip := PUInt32(Needle)^; // if Needle is length 3, then top byte
                                        // is the #0 terminator
        while PHaystack < PEnd do
          if ((PUInt32(Haystack)^ xor LNeedleTip) and $FFFFFF) = 0 then
            Exit(PHaystack - PAnsiChar(Haystack) + 1)
          else
            Inc(PHaystack);
        Exit(0);
      end;
    4:
      begin
        LNeedleTip := PUInt32(Needle)^;
        while PHaystack < PEnd do
          if PUInt32(Haystack)^ = LNeedleTip then
            Exit(PHaystack - PAnsiChar(Haystack) + 1)
          else
            Inc(PHaystack);
        Exit(0);
      end;
    else
      begin
        LCmpMemOffset := SizeOf(UInt32) div SizeOf(AnsiChar);
        PNeedle := PAnsiChar(Needle) + LCmpMemOffset;
        LLenNeedle := Length(Needle) - LCmpMemOffset;
        LNeedleTip := PUInt32(Needle)^;
        while PHaystack < PEnd do
          if (PUInt32(PHaystack)^ = LNeedleTip) and 
              CompareMem(PHaystack + LCmpMemOffset, PNeedle, LLenNeedle) then
            Exit(PHaystack - PAnsiChar(Haystack) + 1)
          else
            Inc(PHaystack);
      end;
  end;
{$ENDIF}
end;

As you can see, under Win32, it simply jumps to System.Pos, as that is the fastest anyway. But on all other platforms, it searches the Haystack 4-byte-wise (if the Needle is larger than 4 elements), and if it found something, then it searches the rest using CompareMem.

Timing

Here is a slightly reformatted output of a test program (I put the WIN32 and the WIN64 columns beside each other, to save space):

Different versions of Pos(Needle, Haystack: <sometype>; Offset: Integer): Integer
where <sometype> is UnicodeString, AnsiString or TBytes

Testing with Haystack lengths of 50, 200, 3000, 4000 and 300000
and Needle lengths of 1, 3, 8 and 20
5 * 4 * 2000 = 40000 loops

WIN64                                   WIN32

UnicodeString                           UnicodeString
-------------                           -------------
System.Pos:           2428 ms           System.Pos:           1051 ms
StrUtils.PosEx:       2258 ms           StrUtils.PosEx:       1070 ms
RVPosExU:             1071 ms           RVPosExU:             1050 ms

AnsiString                              AnsiString
----------                              ----------
System.Pos:           4956 ms           System.Pos:           1046 ms
AnsiStrings.PosEx:    4959 ms           AnsiStrings.PosEx:    1051 ms
OrgPosA:              5129 ms           OrgPosA:              5712 ms
PosUModForA:          1958 ms           PosUModForA:          3744 ms
RVPosExA:             1322 ms           RVPosExA:             1086 ms

TBytes                                  TBytes
------                                  ------
RVPosEXB:              998 ms           RVPosEXB:             2754 ms


Haystack: random string of 500000000 ASCII characters or bytes
Needle: last 10 characters of Haystack = 'WRDURJVDFA'

WIN64                                   WIN32

UnicodeString                           UnicodeString
-------------                           -------------
System.Pos:            847 ms           System.Pos:            421 ms
Strutils.PosEx:        827 ms           Strutils.PosEx:        414 ms
RVPosExU:              421 ms           RVPosExU:              438 ms

AnsiString                              AnsiString
----------                              ----------
System.Pos:           1735 ms           System.Pos:            428 ms
AnsiStrings.PosEx:    1831 ms           AnsiStrings.PosEx:     428 ms
OrgPosA:              1749 ms           OrgPosA:              2687 ms
PosUModForA:           708 ms           PosUModForA:          1525 ms
RVPosExA:              368 ms           RVPosExA:              423 ms
RvPosExA(,,Offset):    200 ms           RvPosExA(,,Offset):    220 ms

TBytes                                  TBytes
------                                  ------
RVPosExB(TBytes):      385 ms           RVPosExB(TBytes):     1095 ms

The routines RVPosExA, RVPosExU and RVPosExB are my implementations for AnsiString, UnicodeString and TBytes respectively. OrgPosA is the original code for Pos for AnsiString, while PosUModForA is the original PUREPASCAL code for Pos for UnicodeString, modified for AnsiString.

As you can see, the PosUModForA routine is almost twice as fast as the rather braindead OrgPosA, and in WIN32, the RVPosEx<A/U/B> implementations are faster than the others.

I didn't check, but it is well possible that one of the plain Pascal versions of the FastCode project is faster. But for me, this implementation is a start and proof, that with a few simple optimizations string routines could be made faster. Perhaps, one day, Embarcadero will adopt more of the plain Pascal code from the FastCode project.

The code for the routines and the program that produces the output above can be downloaded from my website.