outer table predicate in ON-clause #my take

(INNER table predicate in where-clause vs ON-clause is well covered online and also in my own http://bigblog.tanbin.com/2011/05/whats-different-between-these-2-sqls.html)

Note intermediate table is unaffected by where-clause. Query processor always uses on-clause exclusively to build an intermediate joined table before applying where-clause. This is the logical view we can safely assume. Physically, Query processor could optimize away the where/on distinction, but output is always consistent with the logical view.

Q1: can we _always_ assume that LL-outerjoin-RR intermediate table _always_ includes all LL rows if no outer table predicate in ON-clause?
A: yes with the big “if”
%%A: If you need to filter on outer table, do it in where-clause — better understood. Avoid ON-clause.

Q2: Boldly dropping the big “if”, can we _always_ assume that LL-outerjoin-RR intermediate table _always_ includes all LL rows _regardless_ of outer table predicate in ON-clause?
%%A: probably yes.

If you really really want outer table predicate in ON-clause, I assume you have no good reason and just feels adventurous.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc32300.1502/html/sqlug/sqlug169.htm shows that even with outer table predicate in ON-clause, we still get “… all LL rows”

select title, titles.title_id, price, au_id
from titles left join titleauthor
on titles.title_id = titleauthor.title_id
and titles.price > $20.00
title                  title_id   price   au_id       
--------------------   ---------  ------  ---------------
The Busy Executive’s   BU1032     19.99   NULL
Cooking with Compute   BU1111     11.95   NULL
You Can Combat Compu   BU2075      2.99   NULL
Straight Talk About    BU7832     19.99   NULL
Silicon Valley Gastro  MC2222     19.99   NULL
The Gourmet Microwave  MC3021      2.99   NULL
The Psychology of Com  MC3026      NULL   NULL
But Is It User Friend  PC1035     22.95   238-95-7766

join/case as alternative to union

Assuming all animals with/without names are wanted, I'd write a unioned outer join — animal left outer join cats union animal left outer join dogs. This would give me a complete intermediate table for further processing.

I don't use union often but I usually find no good alternatives when I need a union. However,

http://stackoverflow.com/questions/9187243/sql-union-optimized-into-a-left-join-faster-but-query-plan-says-cost-more-i-o shows a join/Case as an alternative to union. I hope it's usable here. The intermediate table probably has the right number of rows (as in animals) but more columns than wanted. Case expression acts as a multiplexer to pick either cats column or dogs column.

———-

Sent: Thursday, May 17, 2012 9:11 PM

To: Kit Cheng; Bin TAN; Song Ding; Xin Zhuang; Sister; Hai Yi

a table of animals (id, name);

a table of cats (id, color);

a table of dogs (id, color);

write a sql to list all the animals whose color is red.

case/group-by/self-join — #2 or top 9

(Simplest solution is at the end, which also returns #2 alone ….)
I have seen many tricks in SELECT queries (Most using joins – Fine.), but now if I must name one keyword to be the most powerful yet unappreciated keyword, it has to be
1) CASE.

2) The combined power of case and group-by is even more impressive.
3) Yet more unthinkable is the combination of case/group/self-join.
4) correlated subquery (slow?) in SELECT, FROM, WHERE and HAVING.

http://www.informit.com/articles/article.aspx?p=26856 explains “In addition to subqueries in the WHERE and HAVING clauses, the ANSI standard allows a subquery in the FROM clause, and some relational database management systems (RDBMSs) permit a subquery in the SELECT clause.”

Key to understanding case_group_self-join is the intermediate work table. Mind you, intermediate table is the key to _every_ tricky self-join. P30 [[ transact-sql cookbook]] among many chapters, has a wonderful solution to find the top 5 values in a table without a 5-way self-join. Based on that, here’s a simplified but full example, showing several distinct solutions. These solutions can be adapted to find 5th largest value in a table, too.

— setting up the data —-
drop table public.students
create table public.students(
student varchar(44),
score decimal(4,1),
primary key (student)
)
insert into students values( ‘Andrew’, 15.6)
insert into students values( ‘Becky’, 13)
insert into students values( ‘Chuck’, 12.2)
insert into students values( ‘Dan’, 25.6)
insert into students values( ‘Eric’, 15.6)
insert into students values( ‘Fred’, 5.6)
insert into students values( ‘Greg’, 5.6)
select * from students


Solution 1
—– top 3 scores
— intermediate table
select * from students h right join students L
on h.score > L.score

select L.student, ‘is lower than’, count(h.student), ‘competitors’
from students h right join students L
on h.score > L.score
group by L.student
having count(L.student) < 3

—– lowest 4 scores
— intermediate table
select * from students h left join students L
on h.score > L.score

select h.student, ‘defeats’, count(L.student), ‘competitors’
from students h left join students L
on h.score > L.score
group by h.student
having count(L.student) < 4


—- Solution 2, using case
— intermediate table
select *, (case when h.score > L.score then 1 else 0 end)
from students h left join — inner join ok since cartesian
students L on 1=1

select h.student, ‘defeats’, sum(case when h.score > L.score then 1 else 0 end), ‘competitors’
from students h left join — inner join ok since cartesian
students L on 1=1
group by h.student
having sum(case when h.score > L.score then 1 else 0 end) < 4

–same solution tested on http://sqlzoo.net/howto/source/z.dir/tip915069/sqlserver
select h.name, h.area, ‘is smaller than’, sum(case when h.area < L.area then 1 else 0 end), ‘countries’
from cia h, cia L
group by h.name, h.area
having sum(case when h.area < L.area then 1 else 0 end) < 4


—– (concise) Solution 3, using correlated sub-select, P295 [[sql hacks]] — without intermediate table
select * from students o where  (select count(1) from students where score < o.score) < 4

–Same solution tested in http://sqlzoo.net/howto/source/z.dir/tip915069/sqlserver:
select * from cia o where  (select count(1) from cia where area > o.area) < 4 — shows top 4

…and (select count(1) from cia where area > o.area) > =3 — returns #4 alone
…and (select count(1) from cia where area > o.area) > =2 — returns #4 #3 exactly

–This technique shows its power when you want top 2 in each continent, without group-by
select * from cia o where  (select count(1) from cia where area > o.area and region=o.region) < 2
order by o.region, o.area

I feel for a student/practitioner, it pays to think in terms of CASE. This strikingly simple solution can be rewritten using (messy) CASE.

What’s the different between these 2 sql’s — a condition to put into WHERE or ON

Q: What’s the different between these 2 queries?

left outer join CodeLegend c on spv.StatusUid=c.CodeLegendUid
where c.FullName=’Active’
———————————————————————— ———-
left outer join CodeLegend c on spv.StatusUid=c.CodeLegendUid
and c.FullName=’Active’

%%A: all rows from outer table will show up in intermediate table. The “Active” condition becomes “and (c.FullName=’Active’ or c.FullName is null)” if put into ON clause

If in doubt, I usually put conditions in WHERE (not ON) because WHERE is less implicit and more explicit. Intermediate table tends to show more meaningful data this way. Not sure about this case though.

See http://searchoracle.techtarget.com/answer/LEFT-OUTER-JOIN-with-ON-condition-or-WHERE-condition

http://blog.sqlauthority.com/2009/03/15/sql-server-interesting-observation-of-on-clause-on-left-join-how-on-clause-effects-resultset-in-left-join/ explains right-table predicate in ON-clause

The more obscure issue is Outer table predicate in ON-clause, explained in other blog post…

seeing no nulls from self outer join@@

–drop table students
–go
–create table students(
—    student varchar(44),
—    score decimal(4,1),
—    ident int identity
–)
–insert students select ‘Andrew’, 15.6
–insert students select ‘Becky’, 13
–insert students select ‘Chuck’, 12.2
–insert students select ‘Dan’, 25.6
–insert students select ‘Eric’, 15.6
–insert students select ‘Fred’, 5.6
–select * from students
select * from students h left join students l
  on h.score >= l.score
order by h.score, l.score
–- won’t show the nulls expected of an outer join, so I remove the “=”. Now Fred (and Greg) the poor student doesn’t score higher than anyone and therefore shows nulls.
Another way to get the nulls is adding the not-equal join condition
select * from students h left join students l
  on h.score >= l.score
and h.student != l.student –- this condition is essential to self outer joins
Now, this topic is relevant to me because whenever I construct an outer join, I expect to see nulls. An outer join without nulls is a emergency exit seat without legroom.
Q: if an outer join returns no nulls, is it equivalent to an inner join?
A: Usually yes. I’m yet to find an exception.