From the last post about array handling in TAnyValue (and in general), I was looking to make the IAnyArray as powerful and useful as I could. As I told you, my goal is to make a strong infrastructure around and on top of TAnyValue. And so I thought about it. Simple TList clone with more features was not something I was very thrilled about. So I looked the web about improving upon dynamic array structure. To my surprise, I found very little and even that was not what I wanted. I presume most of you know what dynamic array is and know its strengths and weaknesses. But lets just summarize.
1. Dynamic array
Very simple structure. I uses contiguous chunk of memory of same type of elements (integers, string, record, object…). Its strengths:
- Adding element to the end: O(1)
- Accessing n-th element by index: O(1)
- Removing element from the end: O(1)
- Good caching, fast indexed walk-through because of caching.
And its weaknesses:
- Removing element from the middle: O(n)
- Adding element to the middle: O(n)
- Can have problems with resizing at large sizes because of memory fragmentation
2. Single linked list
Also very simple structure. It uses pointers to link each record in one way (double linked lists do it both ways). Its strengths:
- Adding element to the end: O(1)
- Removing element from the middle: O(1)
- Adding element to the middle: O(1)
- Removing element from the end: O(1)
- Easily re-sized, because we only need to allocate memory for one new element at a time.
And its weaknesses:
- Accessing n-th element by index: O(n)
- Bad caching, elements can be all over the memory
Also found this nice table comparing them both:
Operation | Array | Singly Linked List |
Read (any where) | O(1) | O(n) |
Add/Remove at end | O(1) | O(1) |
Add/Remove in the interior | O(n) | O(1) |
Resize | O(n) | N/A |
Find By position | O(1) | O(n) |
Find By target (value) | O(n) | O(n) |
So as you see each has its strengths and its weaknesses. If we could just take best of each one and make an efficient hybrid, all would be fine. Well that is not so easy to do. If you go one way, you suffer on the other end and vice verse. But after a lot of thinking and testing, I found a good implementation of something i now call sliced array. It is a hybrid, that tries to take best of both world and I think it mainly succeeded at that. It has a lot of strengths and very, very little weaknesses. It looks like this:
I “sliced” the large array into several small arrays and then turned everything into a double linked list. I also added a look-up control array which is very important as you will see later. So how does it all work. Lets look at how a single slice is implemented:
TArraySlice =record Count:Int64; Lookup: PSliceLookup; ArrayData: TAnyValues; NextSlice: PArraySlice; PrevSlice: PArraySlice;end; |
Simple actually. It holds an array of TAnyValue, pointers to previous and next slice, pointer to look-up record and count of elements it holds. Look-up record looks like this
TSliceLookup =record Index:Integer; SumCount:Int64; Slice: PArraySlice;end; |
This is even simpler. It holds the index in the look-up array, pointer to slice and count of all elements in all the slices to this one + elements in this slice that the look-up points to. Now for the goods and bads of the sliced array.
Its strengths:
- Adding element to the end: O(1)
- Removing element from the middle: O(SliceSize)
- Adding element to the middle: O(SliceSize)
- Removing element from the end: O(1)
- Good caching. Slices are contiguous in memory and so it supports locality of access, which is very important
- Easily re-sized because we only need to allocate memory for one new slice. No need to re-size whole memory space used by the structure.
And its weaknesses:
- Accessing n-th element by index: Hard to tell without analyzing mathematically, but its still fast
So we only have one weakness, but that could be a huge weakness because random access by index is very, very important. If we can’t do it very, fast then the structure is not worth the trouble. After all we want a general purpose structure, a workhorse like TList.
To make it, I implemented a few tricks. I can use the look-up structure. The dumbest way would be to just go over the look-up array and check if our index we are searching falls inside the lower and upper index bound of the slice. Once the slice is found we just do the same as for classic dynamic array. So that would be O(SliceCount + 1). Not to shabby if we don’t have to many slices but not near good enough. Specially because the more slices we have, the more we trim down internal insert and delete. I used two tricks to improve the search for the correct slice. First one is a helper pointer that always points to the last slice accessed. So if we do linear iteration from 0 to Count -1, we do not search for the correct slice, ever. We just use the pointer to the last accessed slice and do the offset to the correct element. When we come to the end of the slice we just jump to the next one. Simple but very efficient. This way also the locality of access is supported if slices are big enough.
The next trick is (for truly random access) to calculate where the index should be (it does not mean it is there). If we have 100 elements in each slice and we want to access 350-th element we can assume that it is in the 4-th slice. So we jump directly there. If we missed, and we can miss as slices get degraded from deletions and insertions, we did not miss by much, we just check if we need to go left or right and move until we find the correct slice. It turns out this is also very fast, as predictions are good enough.
Furthermore something helps us here. If we do no deletions and insertions then the slices are in perfect condition and each prediction will be a hit. So we have an O(2) operation always. Tests will demonstrate that. If we have degradation it is slightly worse, but not by much, and we gained a lot on insertions and deletions. The key here is we take care of degradation. I do this by merging slices if the count falls under half the slice size and I split the slice if the count rises to 2x the slice size. This way the structure is never to degraded.
Ok you may say, but all this comes into play with a lot of elements. For small arrays it does not matter. True, that is why default slice size is 10.000 elements and under that you basically have a single dynamic array with one look-up pointer. I also do a check before doing indexed access and if only one slice is present I just jump directly to that one, no searching is done. The only real downside of the structure is a quite large complexity of it and the algorithm to manage it. Also it will never be truly as fast for random indexed search, if nothing else, because I just call more code opposed to returning n-th element from array.
Probably you won’t believe everything on my word, so here are the tests made on Delphi 2010. First one is for small number of elements (5.000) so we only have one slice and the other is for (1.000.000) elements. Here we have 100 slices (as default size is 10.000). I tested my IAnyArray that now uses this structure for data storage, TList<TAnyValue> as this was a fair reference point for dynamic array comparison. Then I also tested TList<Variant> and TList<Integer> for the fun of it
Small array test (5.000e lements) – times are in ms
Large array test (1.000.000 elements) - times are in ms
Just some clarifications.
- Deletes and inserts are 100 times faster. It makes sense. I have to move 100 times less elements in memory because I have 100 slices
- Adding elements is also faster. Simply because I just create new slice with 10.000 elements while dynamic array (TList) has to re-size the whole memory space
- Linear iteration by index or local access is as fast as with dynamic array because of the trick with the last accessed slice pointer
- True random access by index is slower by a magnitude 2-3. But it depends on the number of slices
- enumeration is very fast, especially by pointers
- I did also a “raw” access comparison where I work with PAnyValue so I avoid record copies.
It is a very long post, sorry for that but I really want to hear your honest opinion on this solution. Is it viable, what do you think of it? I find it very promising and very flexible in all usage scenarios. The only downside I see is complexity. Do you find truly random access to slow? Remember that was 1.000.000 access operations over 1.000.000 elements.
There is also a big update of code on my download section:
- Latest TAnyValue with array support, name-value, pairs etc….
- Hashing classes have been improved and are 2x faster now
- TTaskPool has been improved and is faster with less locking, also TThreadQueue. TLockFreeStack has been rewritten
- IPC is now faster because of all the changes. It went under 0.1 ms for Client->Server->Client cycle with full payload