Q: What’s the time complexity of sorting J integer scores which are all [1, N], possibly non-unique?

This is classic counting sort. My book [[algo in a nutshell]] incorrectly says O(J). Incorrect when N is large.

My analysis below uses a total of three dimensions N, J and K, where N and J can be vastly different, and K <=min(N, J), but K could be much smaller.

====analysis

Textbook counting sort is listed at the bottom. If N is smaller than **J**, then O(N+**J**) is dominated by O(**J**). Now suppose N is huge (like the revenue figure of a company).

Suppose there are K distinct scores among the J scores. I would want a constant-time translation from each distinct score to a distinct counter. I wish there’s a perfect hashing from the K scores to K buckets but I think it’s impossible since the K distinct values are not known in advance. Even with imperfect hashing, I can get O(1) translation from each score value to a distinct counter. I will iterate over the J scores. For each,

- look up (on hash table) the corresponding counter.
- If the score is new i.e. missing from the hash table,
- then create a counter
- add the “pair” {score -> counter} into the hash table.
- Also insert the score into a min-heap

- increment the counter

O(K logK) : Now pop the min-heap K times, each time to get a distinct score in ascending order. Lookup the score in hash table to get the counter. If counter for score 55 the count is 3, then output 55 three times. This output would be a sorted sequence of the original J scores.

— comparing with alternatives: Mine is O(J + K logK). If K*logK exceeds N, then I would fall back to standard counting sort.

The comparison-based sort would be O(J logJ), inferior to mine if K is much smaller than J.

The textbook counting sort would be O(J + N) , using N independent counters.