Quantcast
Channel: Planet Object Pascal
Viewing all articles
Browse latest Browse all 1725

From Zero To One: Improved Sliced Array implementation

$
0
0

If you don’t know what sliced array is, I suggest that you first read the previous post, where I presented the sliced array. There I presented the data structure that is similar to classical dynamic array, but is more efficient for deletes and inserts. Also, adding elements is a bit faster, because memory allocation is more efficient. But the downside was that accessing item by index was slower then with classical dynamic array. And because accessing the item by index is the most important feature of such a structure, I was not satisfied with the results. So as usual, I looked for a way to do it better while still being efficient for deletes and inserts. And I did it, I found a way. First lets see what changed from previous implementation.

SlicedArray2

As you can see a lot changed. I threw out most of the pointers for doubly linked lists that were actually not needed. I also changed how each slice is implemented. The biggest problem with the old implementation was, that the slices were of variable sizes. That proved it more difficult to access the slice by random index. I could find it very fast due to the tricks I wrote about, but not fast enough. 2-3 times slower access by index is not something I was happy about. Furthermore using the helper pointer to remember last accessed slice made even reading thread unsafe. That was unacceptable. So I though of another solution. The slices need to be of equal size that is a must, no way around that if I want to be super efficient. But how do we then solve deletes and inserts. If I have to maintain equal slice size, that means I have to move items in all slices if I delete or insert one item. Well at least in slices above the targeted one. That way I gain nothing. Then I thought about a neat solution, I remembered how a typewriter works :)

What if we add some buffer space bellow and above the slice limits. This way if we insert or delete an item, we only have to move one item per slice. To demonstrate, if we insert an item into the N-th slice, we have to move all the items in that slice for one item upwards. Then, we have to move the last item of the N-th slice to the next slice to maintain equal item count. But because we have some buffer space in each slice, we can simply append the item before next slice items and we don’t have to move other items at all! Then just repeat that for each slice to the end. So lets say we have 100 slices. if we insert an item into the 10-th slice we only have to move items in that slice and then just move one item in each of the 90 slices upwards. No need to move items in all 90 slices. I we run out of buffer space in one of the slices, then we move all items to the middle again to reposition them. The bigger the buffer the more operations we can do, before we must move again. Just like the typewriter. It shifts to the right and when we come to the end of the paper we reposition it again. We don’t reposition for each letter.

It turns out this is very efficient. We don’t have many slices. For instance typical setup I use in testing is for one million items. Each slice has 10.000 items, so I have 100 slices. Each slice has for 1000 items of buffer space, that is 10% and it means I have 500 items buffer bellow and above slice limits. I tested with 500 items buffer and the speed was still mostly the same. We don’t need a lot of buffer space. Its simple math really. If you have only 10 items of buffer on each side you already are 10 time more efficient on deletes and inserts. The record structure for the slice now come down to this:

   TArraySlice = record

    Last:Integer;
    Start:Integer;
    Index:Integer;
    Data: TAnyValues;end;

And the lookup control is now simply:

  PSliceData =^TSliceData;
  TSliceData =arrayof PArraySlice;

So quite simplified. The tests show great improvement. The next picture shows the same test as last time, for one million items (10.000 items per slice, 1.000 items buffer per slice). I only added sort test to see some real-time performance and changed Integer list with TValue list because TValue is something we compete against.

SpeedTest2

Nice, isn’t it. We retained around 10 times faster deletes and insert and we are now also very, very fast on indexed access. We got down from approximately 270 ms to around 110 ms. For 1.000.000 items look-up we are now only around 20 ms slower then pure dynamic array (eg. TList). You have to take into account that that 113 ms is mostly calling Random function for random indexed access. The true difference is seen in the iteration by index test. There it is 38 vs 29. I say this is more then acceptable. If that difference bothers you then well… :)

We can also note how slow variants and especially TValue are on deletes and inserts. 65 seconds on TValue against 0.1 with my implementation. Staggering difference. That is because TAnyValue is only 9 bytes in size on both 32 and 64 bit and TValue is just huge. That is why memory footprint is also very important. Few recognize it but when you move data around it shows. Its not just the pure memory consumption that hits you its also all operations. You need to work with so many more bytes in memory.

I have also done a stress and regression test for IAnyArray. I ran all basic operations in a loop for 5 hours, while checking the content of the array against TList after each cycle. If there was an inconsistency, I was informed. I ironed out all the major bugs I hope. Download section now has a separate AnyValue download entry, so you can download only AnyValue units. Also the updated code is already there, with all the tests and demos. Just don’t expect that you will know how to use the StressTest. That is only meant for me to test for bugs.

I truly think that this structure is better then TList in almost any way. The only downside is, it consumes more memory, but even that is not a given. I allocate memory in chunks of 10.000 items or whatever your slice size is and TList just doubles the size of the array each time it runs out of space. So it will quickly be less efficient even for that, at large item numbers. If you need an array with powerful interface and efficiency for large number of items then this is it.

If you find any bugs or have any thoughts on any of what I wrote, don’t be shy, drop a comment. Also I dare you all out there, to make it even faster if you can. What about a little competition? Take my code and lets see if you can squeeze something more out of it. :)


Viewing all articles
Browse latest Browse all 1725

Trending Articles