I see no advantage to using a tree.
Note the key (timestamps) is repeating, so only multiset is valid.
Space — vector is more compact leading to better cache affinity. There’s a price to pay — Vector should use reserve() to prevent reallocation, if possible. Tree doesn’t need it.
Query by lower_bound/upper_bound — I guess both are similar O(log N)
Initial insertions — Remember our data are ticks from an exchange loaded from a large file.
- 1. For one insert (assuming no re-balancing no reallocation), vector is O(1) but tree is O(log N) 😦
- 2. If data comes in randomly, then tree re-balancing is less frequent but more than once. For vector, it’s a single quicksort after all insertions. I would say vector is faster mostly due to the previous factor
- 3. if data comes in (almost) by ascending timestamp, then vector requires no or minimal sorting, but the tree would require frequent re-balancing — presorted data is probably the worst input sequence for one-by-one-insertion.
- 4. If no re-balancing is done, then the tree would have depth = N, and query would be linear search 😦