This book asserts that bucket sort is #1 top dog when data is uniformly distributed (not normally distributed!) — i.e. can be uniformly partitioned using a fast hashing function. Number of buckets created equals the input data size, just like in a standard hash table. Book shows bench-marking on large samples of random floating point numbers. However, I disagree.
- for special data types like strings and small ints, radix sort is probably faster than bucket sort
Bucket sort is #1 for large data volumes, beating quick sort, heap sort etc. But for nearly-sorted data, Insertion-sort is faster.
- Linked list — used inside each bucket. Arrays are possible alternative, but 50% slower.
- —-Relation to other sorts:
- Can be seen as generalization of counting sort
- A cousin of radix sort in the most-to-least significant digit (MSD) flavor i.e. top-down radix sort
Book shows benchmarking on random strings. Hash sort is #1 for large data volumes, beating quick sort, heap sort etc
I guess if string distribution is unconstrained/unknown (without guarantee of randomness), hash sort will not have any advantage
in both cases, the hash code must be strictly “ordered” , so bucket #1 must hold lowest data items.
== insertion sort
P100. for nearly-sorted data, insertion sort is faster than all other sorts (including bucket sort, radix sorts), often by an order of magnitude
https://stackoverflow.com/questions/736920/is-there-ever-a-good-reason-to-use-insertion-sort shows insertion sort is better than divide-n-conquer for 1) nearly-sorted or 2) small collection
https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf shows that MSD radix sort also need to switch to insertion-sort when sample size is small.
== counting sort
P315. “fastest” sort for the right data set. I think a single English word is suitable. However, for nearly sorted data, insertion sort is still faster.
I guess it’s still slower than insertion sort if nearly-sorted.
== Binary search tree vs hash table
P130 discusses when to use which