case/group-by/self-join: #2 ] G9

(Simplest solution is at the end, which also returns #2 alone ….)
I have seen many tricks in SELECT queries (Most using joins – Fine.), but now if I must name one keyword to be the most powerful yet unappreciated keyword, it has to be
1) CASE.

2) The combined power of case and group-by is even more impressive.
3) Yet more unthinkable is the combination of case/group/self-join.
4) correlated subquery (slow?) in SELECT, FROM, WHERE and HAVING.

http://www.informit.com/articles/article.aspx?p=26856 explains “In addition to subqueries in the WHERE and HAVING clauses, the ANSI standard allows a subquery in the FROM clause, and some relational database management systems (RDBMSs) permit a subquery in the SELECT clause.”

Key to understanding case_group_self-join is the intermediate work table. Mind you, intermediate table is the key to _every_ tricky self-join. P30 [[ transact-sql cookbook]] among many chapters, has a wonderful solution to find the top 5 values in a table without a 5-way self-join. Based on that, here’s a simplified but full example, showing several distinct solutions. These solutions can be adapted to find 5th largest value in a table, too.

— setting up the data —-
drop table public.students
create table public.students(
student varchar(44),
score decimal(4,1),
primary key (student)
)
insert into students values( ‘Andrew’, 15.6)
insert into students values( ‘Becky’, 13)
insert into students values( ‘Chuck’, 12.2)
insert into students values( ‘Dan’, 25.6)
insert into students values( ‘Eric’, 15.6)
insert into students values( ‘Fred’, 5.6)
insert into students values( ‘Greg’, 5.6)
select * from students

Solution 1
—– top 3 scores
— intermediate table
select * from students h right join students L
on h.score > L.score

select L.student, ‘is lower than’, count(h.student), ‘competitors’
from students h right join students L
on h.score > L.score
group by L.student
having count(L.student) < 3

—– lowest 4 scores
— intermediate table
select * from students h left join students L
on h.score > L.score

select h.student, ‘defeats’, count(L.student), ‘competitors’
from students h left join students L
on h.score > L.score
group by h.student
having count(L.student) < 4

—- Solution 2, using case
— intermediate table
select *, (case when h.score > L.score then 1 else 0 end)
from students h left join — inner join ok since cartesian
students L on 1=1

select h.student, ‘defeats’, sum(case when h.score > L.score then 1 else 0 end), ‘competitors’
from students h left join — inner join ok since cartesian
students L on 1=1
group by h.student
having sum(case when h.score > L.score then 1 else 0 end) < 4

–same solution tested on http://sqlzoo.net/howto/source/z.dir/tip915069/sqlserver
select h.name, h.area, ‘is smaller than’, sum(case when h.area < L.area then 1 else 0 end), ‘countries’
from cia h, cia L
group by h.name, h.area
having sum(case when h.area < L.area then 1 else 0 end) < 4

—– (concise) Solution 3, using correlated sub-select, P295 [[sql hacks]] — without intermediate table
select * from students o where  (select count(1) from students where score < o.score) < 4

–Same solution tested in http://sqlzoo.net/howto/source/z.dir/tip915069/sqlserver:
select * from cia o where  (select count(1) from cia where area > o.area) < 4 — shows top 4

…and (select count(1) from cia where area > o.area) > =3 — returns #4 alone
…and (select count(1) from cia where area > o.area) > =2 — returns #4 #3 exactly

–This technique shows its power when you want top 2 in each continent, without group-by
select * from cia o where  (select count(1) from cia where area > o.area and region=o.region) < 2
order by o.region, o.area

I feel for a student/practitioner, it pays to think in terms of CASE. This strikingly simple solution can be rewritten using (messy) CASE.