sorted vector^multiset for tick query

I see no advantage to using a tree.

Note the key (timestamps) is repeating, so only multiset is valid.

Space — vector is more compact leading to better cache affinity. There’s a price to pay — Vector should use reserve() to prevent reallocation, if possible. Tree doesn’t need it.

Query by lower_bound/upper_bound — I guess both are similar O(log N)

Initial insertions — Remember our data are ticks from an exchange loaded from a large file.

  1. 1. For one insert (assuming no re-balancing no reallocation), vector is O(1) but tree is O(log N) 😦
  2. 2. If data comes in randomly, then tree re-balancing is less frequent but more than once. For vector, it’s a single quicksort after all insertions. I would say vector is faster mostly due to the previous factor
  3. 3. if data comes in (almost) by ascending timestamp, then vector requires no or minimal sorting, but the tree would require frequent re-balancing — presorted data is probably the worst input sequence for one-by-one-insertion.
  4. 4. If no re-balancing is done, then the tree would have depth = N, and query would be linear search 😦
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s