Tuesday, 30 October 2018

Inline declaration of variables

There will be a new interesting feature in the upcoming Delphi 10.3 Rio: inline declared variables. I have permission to blog a little about it.

Marco Cantù already wrote about this new feature. I am not going to explain it, as I think he already did a good job doing that.

I just want to demo one of the main advantages of them: they are block-local, i.e. they are initialized at the point of declaration and are finalized at the end of the begin-end block they were declared in. This means that an interface or other managed variable declared that way is finalized at the end of the block.

To demo this, I made a little talkative interface, which tells us what it is doing, and when it is being created and being destroyed. It also has a name. Here is the code:

type
  ITalking = interface
    procedure Talk;
  end;

  TTalking = class(TInterfacedObject, ITalking)
  private
    FName: string;
  public
    constructor Create(const Name: string);
    procedure Talk;
    destructor Destroy; override;
  end;

implementation

{ TTalking }

constructor TTalking.Create(const Name: string);
begin
  FName := Name;
  Writeln(Format('Creating %s', [Name]));
end;

destructor TTalking.Destroy;
begin
  Writeln(Format('Destroying %s', [FName]));
  inherited;
end;

procedure TTalking.Talk;
begin
  Writeln(Format('Hi, %s talking here!', [FName]));
end;

Demo

And here is the demo:

procedure Test;
begin
  var I1: IInterface := TTalking.Create('One'); // one way to declare an interface
  Writeln('// Before block');
  Writeln;
  begin
    Writeln('// Before first inline var...');
    var I2 := TTalking.Create('Two') as ITalking; // the other way: type inference
    I2.Talk;
    var I3: ITalking := TTalking.Create('Three');
    I3.Talk;
    Writeln('// Do something useful here');
  end;
  Writeln;
  Writeln('// After block');
  var I4: ITalking := TTalking.Create('Four');
  Writeln('// After declaration of Four');
end;

And the output is:

Creating One
// Before block

// Before first inline var...
Creating Two
Hi, Two talking here!
Creating Three
Hi, Three talking here!
// Do something useful here
Destroying Three
Destroying Two

// After block
Creating Four
// After declaration of Four
Destroying Four
Destroying One

As you can see, the two interfaces declared and initialized inside the inner block are destroyed at the end of the same block. The interfaces declared and initialized in the outer block are destroyed at the end of the function.

This creates some interesting opportunities, e.g. for RAII using interfaces, or other auto-finalized structures. RAII doesn't have to be used for smart pointers. It can also be used for many other things, like locks, mutexes, temporary changes of cursors, etc. i.e. anything that is changed temporarily and restored at the end of a certain period. Just declare the RAII object inside the block and the change will be restored at the end of it. Just pass it an anonymous function to tell it what it must do at the end of the block (or in case of an exception).

But more about this later. FWIW, I also like the type inference, or that it can be used for for-loops.

Friday, 5 October 2018

Why I like Stack Exchange

Because you can get extremely interesting answers, like the excellent answer to the question "What is Chirped Pulse Amplification, and why is it important enough to warrant a Nobel Prize?" on Physics Stack Exchange:

The 2018 Nobel Prize in Physics has just been announced, with half going to Arthur Ashkin for his work on optical tweezers and half going to Gérard Mourou and Donna Strickland for developing a technique called "Chirped Pulse Amplification". While normally the Wikipedia page is a reasonable place to turn to, in this case it's pretty flat and not particularly informative. So:

  • What is Chirped Pulse Amplification? What is the core of the method that really makes it tick?
  • What pre-existing problems did its introduction solve?
  • What technologies does it enable, and what research fields have become possible because of it?

I am not a physicist and just know the usual school physics, but have no problem understanding the answer. Excellent!

Monday, 1 October 2018

The current state of generics in Delphi

To avoid duplication of generated code, the compiler builders of Embarcadero have done a nice job. They introduced new instrinsics like IsManagedType, GetTypeKind and IsConstantType (see this Stackoverflow answer), so they could make a function like the following generate a call to the exact function for the parametric type directly. This means that the code below "runs" completely inside the compiler, even the code in the called InternalAddMRef and similar dispatching routines.

function TList.Add(const Value: T): Integer;
begin
  if IsManagedType(T) then
  begin
    if (SizeOf(T) = SizeOf(Pointer)) and (GetTypeKind(T) <> tkRecord) then
      Result := FListHelper.InternalAddMRef(Value, GetTypeKind(T))
    else if GetTypeKind(T) = TTypeKind.tkVariant then
      Result := FListHelper.InternalAddVariant(Value)
    else
      Result := FListHelper.InternalAddManaged(Value);
  end else
  case SizeOf(T) of
    1: Result := FListHelper.InternalAdd1(Value);
    2: Result := FListHelper.InternalAdd2(Value);
    4: Result := FListHelper.InternalAdd4(Value);
    8: Result := FListHelper.InternalAdd8(Value);
  else
    Result := FListHelper.InternalAddN(Value);
  end;
end;

That, in its turn, means that if you code something like:

var
  List: TList<string>;
begin
  List := TList<string>.Create;
  List.Add('Hello');

then

List.Add('Hello');

is in fact directly compiled as

List.FListHelper.DoAddString('Hello');

i.e. TList.Add does not have any runtime code at all, the result of "calling" it (see above) is the generation of a direct call to the DoAddString function. That is a simple method, not generic, not virtual or dynamic, like all the other methods of TListHelper, so the unused functions can be eliminated by the linker. This also means there is hardly any duplication of generated code anymore, i.e. if another part of the same executable also uses TList<string>.Add it will use the same function.

No code duplication?

Well, this means that if the class is coded like above, there is no unnecessary duplication of generated code anymore. That is cool and can probably reduce the expected bloat caused by using generics.

But it also means that it greatly reduces one of the advantages associated with generics: no unnecessary duplication of source code. You write a routine once, and only the type it works on is parametrized. The compiler takes care of generating code for each used routine and parametric type.

But then you still get the dreaded bloat. So you either duplicate a lot of source code (take a look at the code of TListHelper in System.Generics.Collections to see what I mean; for example, the functions DoAddWideString, DoAddDynArray, DoAddInterface and DoAddString contain exactly the same code, except for the lines where FItems^ is cast each to a different pointer type), which is almost as much work as writing a separate TList for each type, or you use the naïve approach, writing code only once, as generics should actually be used. But then you could get bloat again, i.e. it is well possible that the same routine gets generated multiple times, in different units of the same executable.

What to do?

There is not much we can do, except to emphatically ask Embarcadero to put the kind of logic that was implemented in TList<T> and other generic classes to avoid duplication of generated code, in the compiler and linker, so we don't have to write huge helper classes nor get the bloat.

In the meantime, you can do what Embarcadero did: if your class is used a lot for many different types, do a lot of "copy and paste generics". Otherwise, if you only have to deal with a few types and in a few places, simply use the naïve approach. Or you use a hybrid approach, using the intrinsics mentioned above and implementing the most used helper functions for the types you need most.

Let's hope they get a lot of requests and implement these things in the compiler and linker, like other languages with generics, e.g. C# or C++, do.

Rudy Velthuis

Friday, 14 September 2018

Making Delphi operator overloads compatible with C++Builder

Delphi operator overloads

A few days ago, I made a simple test program to test my BigIntegers in C++Builder. The important part looked like this:

#include "Velthuis.BigIntegers.hpp"

int _tmain(int argc, _TCHAR* argv[])
{
    BigInteger a = 17;
    BigInteger b = "123";
    BigInteger c = a + b;

But that did not compile. I got the following error message for the line that did the addition:

[bcc32c Error] File13.cpp(16): invalid operands to binary expression ('Velthuis::Bigintegers::BigInteger' and 'Velthuis::Bigintegers::BigInteger')

Now, BigIntegers, as I defined them, have overloads for many arithmetic and relational operators. A sample:

    class operator Add(const Left, Right: BigInteger): BigInteger;
    class operator Subtract(const Left, Right: BigInteger): BigInteger;
    class operator Multiply(const Left, Right: BigInteger): BigInteger;

But taking a look in the (Delphi-)generated Velthuis.BigInteger.hpp file, I saw that these are declared as simple static functions with an _op_ prefix:

static BigInteger __fastcall _op_Addition(const BigInteger &Left, 
    const BigInteger &Right);
static BigInteger __fastcall _op_Subtraction(const BigInteger &Left, 
    const BigInteger &Right);
static BigInteger __fastcall _op_Multiply(const BigInteger &Left, 
    const BigInteger &Right);

Lines wrapped to make them more readable in this blog

So, to use them, you have to call them explicitly:

    BigInteger c = BigInteger::_op_Addition(a, b);

That is ugly, and unusual for someone used to C++'s operator overloading. It makes using BigIntegers pretty inconvenient. OK, I defined static backup functions for the arithmetic overloads, so in the case of BigIntegers, you can just as well use a more or less Java-style method call:

    BigInteger c = BigInteger::Add(a, b);

C++ operator overloads

I wanted my operator overloads to be usable as in the first code sample. I read up a bit on C++ operator overloading, and came up with the following solution:

inline BigInteger operator +(const BigInteger& left, const BigInteger& right) 
{ return BigInteger::Add(left, right); }

I added that line to the C++ file before the main function, and all of a sudden it compiled and even produced the right result. So I wrote a file called Velthuis.BigIntegers.operators.hpp, with lines similar to the above for all supported operator overloads and included that too:

#include "Velthuis.BigIntegers.hpp"
#include "Velthuis.BigIntegers.operators.hpp"

Ok, that worked. But I didn't find that solution very satisfactory. To use BigIntegers with overloaded operators, you had to include an additional header file. And I could not just include that file in the Velthuis.BigIntegers.hpp file, because that is generated by the Delphi compiler, and any re-generation would overwrite any edits made to it. So what to do?

Including new header in generated header

But then I remembered that there is something like {$HPPEMIT} in Delphi. That allows you to add extra lines of C++ code to a generated .hpp file. So I added:

{$HPPEMIT '#include "Velthuis.BigIntegers.operators.hpp"'}

But that did not compile at all! I looked at the generated .hpp file and saw that the line was added somewhere near the top, well before the declaration of BigInteger. But it had to be included near the bottom of the generated hpp. Fortunately, the online help shows that nowadays (according to Remy Lebeau, even since Delphi 2009 or 2010), you can add an END attribute to the directive:

{$HPPEMIT END '#include "Velthuis.BigIntegers.operators.hpp"'}

And now it was added somewhere near the bottom of the header, well after the declaration of BigInteger. Success! Now the code sample at the top of this post compiled and worked as expected.

Conclusion

To recap: To make Delphi-defined operator overloads usable in C++, you will have to

  • write C++ operator overload wrappers for them.
  • You can pack those in a header of its own
  • and then include that header in the generated header for your Delphi file using {$HPPEMIT END '#include "NameOfYourHeader"}.

I added these changes to my GitHub repository DelphiBigNumbers.

Rudy Velthuis

Friday, 9 March 2018

Converting line endings

eolconv

An up-to-date version of this and other files can be found in my CommandLineTools project on GitHub.

On Stack Overflow and in the Embarcadero forums I regularly see people ask how to convert a file with Unix line endings to Windows line endings, or vice versa. The usual advice is to load the file into Notepad++ and to save it with the desired line ending format.

But Delphi can do this too, either using AdjustLineBreaks or using a TStringList. That is why I wrote this simple command line tool. The actual conversion is extremely simple:

  • Load the file into a TStringList.
  • Set the LineBreak property of the string list, depending on the desired output format (Windows or Unix).
  • Convert the file depending on the encoding: 8 bit, 16 bit little endian or 16 bit big endian.
  • Rename the old file to a name with a unique extension.
  • Rename the new file to the name of the old file.
  • Save the contents of the TStringList back to the file.

Most of the code is there to handle command line parameters.

Update

The TStringList way is simple to write, but is not only pretty slow and — if the file is large — can also require a lot of memory. The new way uses a buffer of up to 2MB to load blocks of the file and convert on the fly. This turned out to be a lot faster for large files.

Synopsis

EOLCONV [options ...] Name[s]

Multiple file names can be specified. EOLConv does not understand wildcards (yet), but for instance on the Windows command line, you can do something like:

for %f in (*.txt) do eolconv %f -U

That will convert all files with the extension .txt in the current directory to the Unix format.

Each original file will be preserved by renaming it, adding a unique .~1, .~2, etc. extension to the original file name. EOLConv will try to add extensions up to .~200, and if the file name with that extension exists too, EOLConv will give up and the original file will remain as it was.

Options

On Windows, all options start with either a / or a . On other platforms, they start with a .

Options are case insensitive.

If no options are specified, the files will be converted to the default line ending format for the platform. To set the line ending format to a specific style, use:

  • -W     to set the line endings to Windows format (CR+LF)*
  • -U     to set the line endings to Unix format (LF)*

* CR stands for Carriage Return (#13 in Delphi), LF stands for Line Feed (#10 in Delphi).

To get help, either start eolconv without any parameters, or specify -H or -?.

Finally

EOLConv is freeware. All rights are reserved. Its code is provided as is, expressly without a warranty of any kind. You use it at your own risk.

I hope this code is useful to you. If you use some of it, please credit me. If you modify or improve EOLConv, please send me the modifications.

I may improve or enhance EOLConv myself, and I will try to post changes here. But this is not a promise. Please don’t request features.

Rudy Velthuis

Sunday, 4 March 2018

AutoSuffix

I guess everyone who had to maintain one package for several different versions of Delphi had wished this was possible to do with a single package source. The alternative is to have different set of .dpk and .dproj files, each with a different name or suffix, for each version you want to cover.

Some, including me, devised a "clever" include file that set a different {$LIBSUFFIX } for each version. So did I, but every simple change to the package would remove the {$INCLUDE 'LibSuffixes.inc'} I put in the .dpk file and replace it with something like {$LIBSUFFIX '230'} if 230 was the suffix found in the .dproj file.

If you are very careful not to modifiy the package, and the include file is not removed, this compiles as intended, i.e. a file MyPackage.dpk compiles as MyPackage250.bpl, but if you try to install that way the IDE is not aware of the changed suffix, so it complains that it could not install MyPackage.bpl.

Note: The {$LIBSUFFIX} directive works for DLLs too, but there is no entry in the Project → Options... → Description page (there is not even a Description page). The previously mentioned "clever" includes work very well for DLLs (and so for DLL-based experts too). The IDE does not meddle with the contents of a .dpr file for a library.

This problem has been discussed many times on forums and in communities. On Stack Overflow, someone called LaKraven set out to create an IDE expert (AutoSuffix) to solve this problem. But I can't find this expert anymore. All the links I found to it are dead.

The New AutoSuffix Expert

So I wrote a simple expert myself and called it AutoSuffix too.

If the current project is a package, it adds a menu item to the context menu of the Project Manager with the title Add Version Suffix, under the Rename menu item. If you click it, it sets the suffix for the package to a number that matches the suffixes used for the version of the compiler, i.e. for Delphi 10.2 Tokyo, it sets it to 250, for 10.1 Berlin to 240, for XE3 to 170, etc.

I could have made this an expert that sets the suffix automatically, but I wanted to leave the user (myself too) a choice, so it must be done manually, using the context menu.

Tested

I only tested this in Delphi XE3, Delphi 10 Seattle, 10.1 Berlin and 10.2 Tokyo, but it probably works in versions inbetween as well. I will update this as soon as I have tested it more.

Installation

Get all the files from GitHub (either clone the repository or download the files as zip) and put them in a single directory. Open the package AutoSuffix.dproj in your IDE, select the project in the Project Manager and click the Install menu item from the context menu. If all is well, this installs the expert and adds one menu item to the context menu of the Project Manager: Add Version Suffix, under the Rename item. This menu item only appears if the project is a package.

Rudy Velthuis

Thursday, 1 February 2018

Optimizing ARC the hard way?

The problem

In a recent blog post, Dalija Prasnikar talks about how hard it would be to optimize ARC code, because most ARC object references will be passed by value, but not as const. Passing as const would eliminate the need for an implicit __ObjAddRef() call each time the object is passed (and an __ObjRelease() call when the routine with the parameter ends).

OK, ARC (automatic reference counting) has been used for many types already, most of all strings, dynamic arrays and interfaces. Code passing strings and dynarrays is generally optimized already, by specifying such parameters as const. There are probably only a few exceptions. It is also customary to do it with interfaces, and the few exceptions probably don't make a big difference in performance. But it is far from customary to do this for objects. Until ARC, it didn't matter if you passed an object as const or not. But these days, in ARC, it can make a big difference. And objects are passed around a lot, e.g. the Sender parameter of events, etc.

Note that most runtime and FMX code doesn't use const to pass objects either, nor do any of the third parties. Dalija notes that it is not hard to add const to each object parameter, but that it would break huge amounts of code, not only Embarcadero's, but everyone's.

Switch?

I read that and have been thinking about it a little. I first thought of introducing a switch (sound familiar?) in the new ARC compilers, that would make const optional and reduce compiler complaints about interface changes, to make this transition easier. But that would still require a huge amount of code changes, even if it could perhaps be done at a slower pace.

Const by default?

But then I thought of something else. If an object is passed as call-by-value, to a method or function, inside that method or function, you very seldom change the reference by assigning a new object to it. In other words, inside such a method or function, you hardly ever do something like:

    ObjectParam.Free; // necessary in non-ARC, optional in ARC
    ObjectParam := TSomeObject.Create;

Yes, you often change the state of the object by setting properties or by calling methods, but that can be done on a const reference as well. Const means you can't change the reference, not that you can't change the state of the object it refers to.

For almost all practical purposes, such objects passed can be treated as if they were passed as const already. So it would perhaps make sense, in a new version of the ARC compilers, to treat every pass-by-value as const. This would make the code compatible with the non-ARC compilers, while still avoiding a truckload of reference counting calls. This would probably optimize ARC code quite a lot. Of course code already using const would be compatible too. And it would not apply to code using var or out, only to plain pass-by-value object reference parameters.

And the very few methods that do actually re-use a parameter like above should simply be rewritten to use a (new) local variable for the new object. I doubt there is lot of code that does this anyway, so this change would not break a lot of code at all.

So making all passed-by-value objects const by default would probably break very little code, and optimize a lot of ARC code. It would not affect non-ARC code.

As for the old compilers: they would remain non-optimized, but their code would not have to be changed. A simple recompile (and the occasional change) would suffice.

I'd love to hear about your thoughts.