EPI300 skyline problem #SQL

Sound byte: use SQL

PROBLEM STATEMENT
We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. Each building B_i is represented by a triplet of (L_i, R_i, h_i) where L_i and R_i are the left and right coordinates of the ith building, and h_i is the height. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

Input
The input of the program is a sequence of building triplets. The triplets are sorted by L_i (the left coordinate of the building) in ascending order.

Q1 —– For any given x, determine the height in the skyline.
Note If x == R_i, then the ith building doesn’t count. In other words, If you look at the first building, the 1-to-5 range it covers is a half-open interval, sometimes written as [1,5) as this range includes 1 but excludes 5. You can think of [1,5) as [1, 4.99999999] approximately.

A1(brute force): look at each building and decide if it “covers” the point X. Given the pre-sort, most buildings aren’t relevant. A complete solution would be

Select max(h) from myTable t where t.l =< x < t.r

To support this solution, the objects could be stored in two sorted data structures, one sorted by L_i and one sorted by R_i.

Q2 —- draw the skyline.
A2: evaluate Q1 (i.e. get the height) at every L_i and R_i value. This solution is probably suboptimal.

Q2b —- draw the skyline by outputting a sequence of {x,h} pairs showing the change in height (represented by a new height h) at each vertical edge (marked by x).

Is it possible to do this in one scan after preprocessing the triplets with some clever data structures? [[EPI300]] may have a solution. Here’s my proposal —

Sort the N buildings into a list by L_i, and sort the same buildings into another list by R_i, and merge the 2 lists into a big sorted list of 2N objects. Each building shows up twice. Each of the 2N entries consists of {building_object_id, x, boolean flag left_or_right }. We will go through this big list one-pass.

As we hit the left edge of a building, we include this building in the Alive system. In Alive, we keep the buildings sorted by height. We also maintain a lookup table keyed by building_id. As we hit the right edge of a building, we remove it from the lookup table and from Alive.

As we hit any edge, we need to determine the impact on the skyline if any. This is when we make use of the Alive system. For any edge, there’s impact iff the target building is tallest in Alive.

Advertisements

mv-semantics: the players

Q: resource means?
A: Don’t talk about resources. Talk about heapy thingy.
A: Some expensive data item to acquire. Requires allocating from some resource manager such memory allocator. Each class instance holds a resource, not sharable.

Q: What’s the thing that’s moved? not the class instance, but part of the class instance. A “resource” used by the instance, via a pointer field.

Q: Who moves it? the compiler, not run time. Compiler selects the mv-ctor or mv-assignment, only if all conditions are satisfied. See post on use-case.

Q: What’s the mv-ctor (or mv-assignment) for? The enabler. It alone isn’t enough to make the move happen.

Q: What does the RVR point to? It points to an object that the programmer has marked for imminent destruction.

bi-temporal database – first lesson

Milestone in PWM — similar. I was using only the validFrom/validTill columns.

I think this is all about accountably modifying historical record.

snapshot — For a given observation date like 31/12/1999, “where TxStart < observDate and observDate < TxEnd” would return a snapshot as of 31/12/1999.

A “Tx” or “Transaction” — means a data entry. TxStart/TxEnd marks the observation window or period of belief. If we observe the price on a day within this window, i.e. take a snapshot on that day, then all the data rows outside this window is filtered out.

Indexing for performance — an index on (TxStart, TxEnd) would speed up such “snapshot” queries.

Data error 1 — If a wrong price was entered but quickly corrected, and never affected any customer, then I doubt bi-temporal is designed for this. Therefore the “never-delete” is not so strict.

Data error 2 — However, if a customer was charged the wrong price, then “never-delete” principle holds. The wrong price was in force at that time, so it is valid history (though not valid by company policy). Valid history is never deleted and must be milestoned.

Example — http://sqlmag.com/business-intelligence/bitemporal-design-time-after-time

[[automate the boring stuff with python]]

This book teaches just enough python features for the purpose. All

non-essentials are left out.

–sub chapter on selenium

the easiest way to use selenium and drive a browser, IMO.

–Chapter on Excel

text file spreadsheet converter

merge/unmerge

setting font in a cell

–Chapter on PDF:

combine select pages from many files

—-

Chapter on CSV + json

Chapter on task scheduler

Chapter on keyboard/mouse control — powerful