algo IV: binary matrix # raiserchu

Q: Given a 2D binary matrix filled with 0’s and 1’s, find the largest rectangle containing all ones and return its area. See raiserchu’s mail on 12 Sep 13.

Analysis: Each rectangle is identified by 2 vertices, i.e 4 integers. Without loss of generality, We require the “high” corner to have higher x-coordinate and higher y-coordinate than the “low” corner. (We can assume y-axis run upward.) With this nested loop we can iterate over all possible rectangles in a given matrix:

Lock low corner
Move high corner in typewriter (zigzag) steps i.e.
  hold highY and move highX step by step
  process the (series of) resulting rectangles
  increment highY and repeat
Move the lower corner in typewriter steps and repeat

Key observation: any “bad pixel” disqualifies every rectangle containing it.

— My solution 2:
1) Save all bad pixels in SQL table Bad, 
indexed by x-coordinate and 
indexed by y-ordinate

Table can be in-memory like Gemfire. Many sorted maps (skiplist or RB tree) support range selection. Gelber interviewer showed me how to use a SQL table to solve algo problems.

2) Follow the nested loop to iterate over all possible rectangles, either disqualify it or save/update its area in maxFound. Here’s how to disqualify efficiently:

For each rectangle under evaluation, we have 4 numbers (lowX, lowY) and (highX, highY).

select ANY from Bad where lowX < Bad.x < highX and lowY < Bad.y < highY

If any hit, then rectangle disqualified. In fact all high corners at the same horizontal level disqualify, so in the nested loop we skip ahead to increment highY

3) At end of nested loop, maxFound is the final answer.
— my earlier solution 1:
1) Iterate over all possible rectangles and save them in a SQL table Rec, indexed by the 4 integers. No need to validate each (time-consuming). Next we start elimination
2) Iterate over all bad pixels. For each bad pixel found, delete from Rec where Rec.lowX < X < Rec.highX and Rec.lowY < Y < Rec.highY

Now all remaining rows are valid candidates
3) max ( (x2 – x1)*(y2 – y1) )
— Here’s my partial solution:
We can effectively ignore all the “good pixels”.

1) Look at the x coordinates of all bad pixels. Sort them into an array. Find the largest gap. Suppose it’s between x=22 and x=33. Our candidate rectangle extends horizontally from 23 to 32, exactly. Notice there’s no bad pixel within this vertical band [1].
2) Look at the y coordinates of all bad pixels. Sort them into an array. Find the largest gap. Suppose it’s between y=15 and y=18. Our candidate rectangle extends vertically from 16 to 17, exactly.
[1] This candidate rectangle can expand All the way vertically, though it may give a bigger rectangle
Ditto horizontally.

Advertisements

EPI300 skyline problem

Sound byte: use SQL

PROBLEM STATEMENT
We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. Each building B_i is represented by a triplet of (L_i, R_i, h_i) where L_i and R_i are the left and right coordinates of the ith building, and h_i is the height. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

Input
The input of the program is a sequence of building triplets. The triplets are sorted by L_i (the left coordinate of the building) in ascending order.

Q1 —– For any given x, determine the height in the skyline.
Note If x == R_i, then the ith building doesn’t count. In other words, If you look at the first building, the 1-to-5 range it covers is a half-open interval, sometimes written as [1,5) as this range includes 1 but excludes 5. You can think of [1,5) as [1, 4.99999999] approximately.

A1(brute force): look at each building and decide if it “covers” the point X. Given the pre-sort, most buildings aren’t relevant. A complete solution would be

Select max(h) from myTable t where t.l =< x < t.r

To support this solution, the objects could be stored in two sorted data structures, one sorted by L_i and one sorted by R_i.

Q2 —- draw the skyline.
A2: evaluate Q1 (i.e. get the height) at every L_i and R_i value. This solution is probably suboptimal.

Q2b —- draw the skyline by outputting a sequence of {x,h} pairs showing the change in height (represented by a new height h) at each vertical edge (marked by x).

Is it possible to do this in one scan after preprocessing the triplets with some clever data structures? [[EPI300]] may have a solution. Here’s my proposal —

Sort the N buildings into a list by L_i, and sort the same buildings into another list by R_i, and merge the 2 lists into a big sorted list of 2N objects. Each building shows up twice. Each of the 2N entries consists of {building_object_id, x, boolean flag left_or_right }. We will go through this big list one-pass.

As we hit the left edge of a building, we include this building in the Alive system. In Alive, we keep the buildings sorted by height. We also maintain a lookup table keyed by building_id. As we hit the right edge of a building, we remove it from the lookup table and from Alive.

As we hit any edge, we need to determine the impact on the skyline if any. This is when we make use of the Alive system. For any edge, there’s impact iff the target building is tallest in Alive.

mv-semantics | the players

Q: resource means?
A: Don’t talk about resources. Talk about heapy thingy.
A: Some expensive data item to acquire. Requires allocating from some resource manager such memory allocator. Each class instance holds a resource, not sharable.

Q: What’s the thing that’s moved? not the class instance, but part of the class instance. A “resource” used by the instance, via a pointer field.

Q: Who moves it? the compiler, not run time. Compiler selects the mv-ctor or mv-assignment, only if all conditions are satisfied. See post on use-case.

Q: What’s the mv-ctor (or mv-assignment) for? The enabler. It alone isn’t enough to make the move happen.

Q: What does the RVR point to? It points to an object that the programmer has marked for imminent destruction.

bi-temporal database – first lesson

Milestone in PWM — similar. I was using only the validFrom/validTill columns.

I think this is all about accountably modifying historical record.

snapshot — For a given observation date like 31/12/1999, “where TxStart < observDate and observDate < TxEnd” would return a snapshot as of 31/12/1999.

A “Tx” or “Transaction” — means a data entry. TxStart/TxEnd marks the observation window or period of belief. If we observe the price on a day within this window, i.e. take a snapshot on that day, then all the data rows outside this window is filtered out.

Indexing for performance — an index on (TxStart, TxEnd) would speed up such “snapshot” queries.

Data error 1 — If a wrong price was entered but quickly corrected, and never affected any customer, then I doubt bi-temporal is designed for this. Therefore the “never-delete” is not so strict.

Data error 2 — However, if a customer was charged the wrong price, then “never-delete” principle holds. The wrong price was in force at that time, so it is valid history (though not valid by company policy). Valid history is never deleted and must be milestoned.

Example — http://sqlmag.com/business-intelligence/bitemporal-design-time-after-time

[[automate the boring stuff with python]]

This book teaches just enough python features for the purpose. All

non-essentials are left out.

–sub chapter on selenium

the easiest way to use selenium and drive a browser, IMO.

–Chapter on Excel

text file spreadsheet converter

merge/unmerge

setting font in a cell

–Chapter on PDF:

combine select pages from many files

—-

Chapter on CSV + json

Chapter on task scheduler

Chapter on keyboard/mouse control — powerful