To avoid duplication of generated code, the compiler builders of Embarcadero have done a nice
job. They introduced new instrinsics like IsManagedType
, GetTypeKind
and
IsConstantType
(see this Stackoverflow answer),
so they could make a function like the following generate a call to the exact function for the parametric type directly. This means that
the code below "runs" completely inside the compiler, even the code in the called InternalAddMRef
and
similar dispatching routines.
function TList.Add(const Value: T): Integer; begin if IsManagedType(T) then begin if (SizeOf(T) = SizeOf(Pointer)) and (GetTypeKind(T) <> tkRecord) then Result := FListHelper.InternalAddMRef(Value, GetTypeKind(T)) else if GetTypeKind(T) = TTypeKind.tkVariant then Result := FListHelper.InternalAddVariant(Value) else Result := FListHelper.InternalAddManaged(Value); end else case SizeOf(T) of 1: Result := FListHelper.InternalAdd1(Value); 2: Result := FListHelper.InternalAdd2(Value); 4: Result := FListHelper.InternalAdd4(Value); 8: Result := FListHelper.InternalAdd8(Value); else Result := FListHelper.InternalAddN(Value); end; end;
That, in its turn, means that if you code something like:
var List: TList<string>; begin List := TList<string>.Create; List.Add('Hello');
then
List.Add('Hello');
is in fact directly compiled as
List.FListHelper.DoAddString('Hello');
i.e. TList.Add
does not have any runtime code at all, the result of "calling" it (see above) is the generation of a direct
call to the DoAddString
function. That is a simple method, not generic,
not virtual or dynamic, like all the other methods of TListHelper, so the unused
functions can be eliminated by the linker. This also means there is hardly any duplication
of generated code anymore, i.e. if another part of the same executable also uses
TList<string>.Add
it will use the same function.
No code duplication?
Well, this means that if the class is coded like above, there is no unnecessary duplication of generated code anymore. That is cool and can probably reduce the expected bloat caused by using generics.
But it also means that it greatly reduces one of the advantages associated with generics: no unnecessary duplication of source code. You write a routine once, and only the type it works on is parametrized. The compiler takes care of generating code for each used routine and parametric type.
But then you still get the dreaded bloat. So you either duplicate a lot of source code (take a look at the code of
TListHelper
in System.Generics.Collections
to see what I mean; for example, the functions
DoAddWideString
, DoAddDynArray
, DoAddInterface
and DoAddString
contain exactly the same code, except for the lines where FItems^
is cast each to a different pointer type),
which is almost as much work as writing a separate TList
for each type, or you use the naïve approach, writing code
only once, as generics should actually be used. But then you could get bloat again, i.e. it is well possible that the same routine
gets generated multiple times, in different units of the same executable.
What to do?
There is not much we can do, except to emphatically ask Embarcadero to put the kind of logic that was implemented in TList<T>
and other generic
classes to avoid duplication of generated code, in the compiler and linker, so we don't have to write huge helper classes nor get the bloat.
In the meantime, you can do what Embarcadero did: if your class is used a lot for many different types, do a lot of "copy and paste generics". Otherwise, if you only have to deal with a few types and in a few places, simply use the naïve approach. Or you use a hybrid approach, using the intrinsics mentioned above and implementing the most used helper functions for the types you need most.
Let's hope they get a lot of requests and implement these things in the compiler and linker, like other languages with generics, e.g. C# or C++, do.
Rudy Velthuis
No comments:
Post a Comment