Quantcast
Channel: Planet Object Pascal
Viewing all articles
Browse latest Browse all 1725

DelphiTools.info: Delphi array constructors performance (or lack of)

$
0
0

In Delphi you can initialize a dynamic array in two ways, either manually or via the Create magic constructor:

type
   TIntegerArray = arrayof Integer;

procedure Test;
var 
   a : TIntegerArray;
begin
   // magic constructor
   a := TIntegerArray.Create(1, 2);

   // manual creation
   SetLength(a, 2);
   a[0] := 1;
   a[1] := 2;
end;

The outcome in both cases is the same, are all things equal?

Some array initializations are more equal than others

The first method is less verbose in code, but quite a bit less efficient, if you check the CPU view, that becomes obvious

TestUnit.pas.32: a := TIntegerArray.Create(1, 2);
00511335 8D45F8           lea eax,[ebp-$08]
00511338 8B15F0125100     mov edx,[$005112f0]
0051133E E89576EFFF       call @DynArrayClear   // anybody knows why?
00511343 6A02             push $02
00511345 8D45F8           lea eax,[ebp-$08]
00511348 B901000000       mov ecx,$00000001
0051134D 8B15F0125100     mov edx,[$005112f0]
00511353 E87476EFFF       call @DynArraySetLength
00511358 83C404           add esp,$04
0051135B 8B45F8           mov eax,[ebp-$08]
0051135E C70001000000     mov [eax],$00000001
00511364 8B45F8           mov eax,[ebp-$08]
00511367 C7400402000000   mov [eax+$04],$00000002
0051136E 8B55F8           mov edx,[ebp-$08]
00511371 8D45FC           lea eax,[ebp-$04]
00511374 8B0DF0125100     mov ecx,[$005112f0]
0051137A E89576EFFF       call @DynArrayAsg

// Manual initialization
TestUnit.pas.35: SetLength(a, 2);
0051137F 6A02             push $02
00511381 8D45FC           lea eax,[ebp-$04]
00511384 B901000000       mov ecx,$00000001
00511389 8B15F0125100     mov edx,[$005112f0]
0051138F E83876EFFF       call @DynArraySetLength
00511394 83C404           add esp,$04
TestUnit.pas.36: a[0] := 1;
00511397 8B45FC           mov eax,[ebp-$04]
0051139A C70001000000     mov [eax],$00000001
TestUnit.pas.37: a[1] := 2;
005113A0 8B45FC           mov eax,[ebp-$04]
005113A3 C7400402000000   mov [eax+$04],$00000002

Now before you complain on the compiler capability, you’ve got to realize that the two ways of initializing a dynamic arrays are not equivalent:

  • the magic constructor creates an array, then assigns it, so the array variable is always in a well-defined state
  • the manual initialization mutates the array in several steps, so the array during the intermediate state is in an unfinished step

Of course, in the limited Test procedure, the compiler could figure out the array isn’t visible from the outside, and thus use the shorter form, but that’s an optimization that would apply only to a local variables.

A more generic optimization would be to have the compiler waive the temporary array when the array that is initialized isn’t referenced anywhere else (so intermediate states don’t matter), that’s possible given that dynamic arrays are reference-counted.

Overhead in detail

The final outcome is that using the Create magic constructor can incur quite a bit of overhead:

  • a DynArrayClear call (not sure why it’s there), that will release the previously assigned block of memory for the temporary array
  • a DynArraySetLength, that will allocate a new block of memory and zero it
  • a DynArrayAssign, that will trigger the release of the memory for the existing array (if it wasn’t empty), along with a bus lock for the reference count overhead
  • extra initialization and finalization for the temporary array

In a multi-threaded applications, all that extra memory management and bus locking is going to have a disproportionate impact on performance. If you test the above snippets in a multi-threaded environment, you’ll notice that when using the array constructor, execution quickly becomes single threaded, bottle-necking on the memory manager and bus locks.

The manual initialization only has a single DynArraySetLength call, and if the array is not empty, this may not result in a new block being allocated (as the existing memory block could just be resized in place). So if you initialize the same array variable more than once, the manual form can be quite cheap.

A better array initializer?

Now that I showed you the magic array Create constructor is no good, what if you still want something convenient? Well open arrays can come to the rescue:

procedure InitializeArray(var a : TIntegerArray; const values : array of Integer);
begin
   SetLength(a, Length(values));
   if Length(values)>0 then
      Move(values[0], a[0], Length(values)*SizeOf(Integer));
end;
...
InitializeArray(a, [1, 2]);

The above function won’t be as efficient as manual initialization: there is an extra function call and the values will be copied twice. However it eliminates all the extra memory management and bus locking, so will scale quite better in multi-thread, while being compact syntax and code-wise.

Note that for a managed type (String, Interface…) then System.Move can’t be used, you’ll need to use either asm hackery or a for-to-do loop with item-by-item assignment, which will incur a performance hit, and often make it non-competitive with the manual version.

Need even more speed?

In the grand scheme of things however, all the above approaches suffer from the SetLength call, which is quite complex (have a look at DynArraySetLength in the System.pas unit… and weep), so if you know there is a chance the dynamic array wasn’t resized,  in the manual version, you can gain by doing

if Length(a)<>Length(value) then
   SetLength(a, Length(Values));

Which can when the SetLength is waived, net you more than a mind boggling 10x speedup (ten times).
Ouch! Why doesn’t the RTL do that?

Well, it doesn’t do that because it can’t, as Delphi’s dynamic arrays are not some kind of hybrid half-way between a value type and a reference type, and SetLength is the key stone where all the hybridization happens (for more on the subject, see Dynamic Arrays as Reference or Value Type).

And FWIW, in DWScript, arrays are first-class reference types, which means they can have more capability, and their initialization syntax is also more compact, the above initialization is just:

a := [1, 2];

And if you’re using Smart Pascal and running it in Chrome V8 or node.js, well, let’s just say you’ll need to use all the above tricks for Delphi to come ahead performance-wise.


Viewing all articles
Browse latest Browse all 1725

Trending Articles