To avoid duplication of generated code, the compiler builders of Embarcadero have done a nice 
job. They introduced new instrinsics like IsManagedType, GetTypeKind and 
IsConstantType (see this Stackoverflow answer), 
so they could make a function like the following generate a call to the exact function for the parametric type directly. This means that
the code below "runs" completely inside the compiler, even the code in the called InternalAddMRef and 
similar dispatching routines.
function TList.Add(const Value: T): Integer; begin if IsManagedType(T) then begin if (SizeOf(T) = SizeOf(Pointer)) and (GetTypeKind(T) <> tkRecord) then Result := FListHelper.InternalAddMRef(Value, GetTypeKind(T)) else if GetTypeKind(T) = TTypeKind.tkVariant then Result := FListHelper.InternalAddVariant(Value) else Result := FListHelper.InternalAddManaged(Value); end else case SizeOf(T) of 1: Result := FListHelper.InternalAdd1(Value); 2: Result := FListHelper.InternalAdd2(Value); 4: Result := FListHelper.InternalAdd4(Value); 8: Result := FListHelper.InternalAdd8(Value); else Result := FListHelper.InternalAddN(Value); end; end; 
That, in its turn, means that if you code something like:
var
  List: TList<string>;
begin
  List := TList<string>.Create;
  List.Add('Hello');
then
List.Add('Hello');
is in fact directly compiled as
List.FListHelper.DoAddString('Hello');
i.e. TList.Add does not have any runtime code at all, the result of "calling" it (see above) is the generation of a direct 
call to the DoAddString function. That is a simple method, not generic, 
not virtual or dynamic, like all the other methods of TListHelper, so the unused
functions can be eliminated by the linker. This also means there is hardly any duplication
of generated code anymore, i.e. if another part of the same executable also uses 
TList<string>.Add it will use the same function.
No code duplication?
Well, this means that if the class is coded like above, there is no unnecessary duplication of generated code anymore. That is cool and can probably reduce the expected bloat caused by using generics.
But it also means that it greatly reduces one of the advantages associated with generics: no unnecessary duplication of source code. You write a routine once, and only the type it works on is parametrized. The compiler takes care of generating code for each used routine and parametric type.
But then you still get the dreaded bloat. So you either duplicate a lot of source code (take a look at the code of 
TListHelper in System.Generics.Collections to see what I mean; for example, the functions 
DoAddWideString, DoAddDynArray, DoAddInterface and DoAddString 
contain exactly the same code, except for the lines where FItems^ is cast each to a different pointer type), 
which is almost as much work as writing a separate TList for each type, or you use the naïve approach, writing code 
only once, as generics should actually be used. But then you could get bloat again, i.e. it is well possible that the same routine 
gets generated multiple times, in different units of the same executable.
What to do?
There is not much we can do, except to emphatically ask Embarcadero to put the kind of logic that was implemented in TList<T> and other generic 
classes to avoid duplication of generated code, in the compiler and linker, so we don't have to write huge helper classes nor get the bloat.
In the meantime, you can do what Embarcadero did: if your class is used a lot for many different types, do a lot of "copy and paste generics". Otherwise, if you only have to deal with a few types and in a few places, simply use the naïve approach. Or you use a hybrid approach, using the intrinsics mentioned above and implementing the most used helper functions for the types you need most.
Let's hope they get a lot of requests and implement these things in the compiler and linker, like other languages with generics, e.g. C# or C++, do.
Rudy Velthuis
 
No comments:
Post a Comment