https://leetcode.com/problems/symmetric-tree/solution/ iterative solution is elegant adaptation of BFT.

# Tag: t_elegant

# TreeNode lock/unlocking #Rahul,DeepakCM

Q: given a static binary tree, provide O(H) implementation of lock/unlocking subject to one rule — Before committing locking/unlocking on any node AA, we must check that all of AA’s subtree nodes are currently in the “free” state, i.e. already unlocked. Failing the check, we must abort the operation.

H:= tree height

====analysis

— my O(H) solution, applicable on any k-ary tree.

Initial tree must be fully unlocked. If not, we need pre-processing.

Each node will hold a private hashset of “locked descendants”. Every time after I lock a node AA, I will add AA into the hashset of AA’s parent, AA’s grand parent, AA’s great grandparent etc. Every time after I unlock AA, I will do the reverse i.e. removal.

I said “AFTER locking/unlocking” because there’s a validation routine

bool canChange(node* AA){return AA->empty(); } # to be used before locking/unlocking.

Note this design requires uplink. If no uplink available, then we can run a pre-processing routine to populate an uplink lookup hash table { node -> parent }

— simplified solution: Instead of a hashset, we may make do with a count, but the hashset provides useful info.

# lowest missing+ve int#Codility #80%

Q: Write a function int solution(int[] A); that, given an array A of N integers, returns the smallest natural number that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.

Given A = [1, 2, 3], the function should return 4.

Given A = [−1, −3], the function should return 1.

• each element of array A is an 32-bit signed int

• expected worst-case time complexity is O(N);

• expected worst-case space complexity is O(N).

* Elements of input arrays can be modified.

https://leetcode.com/problems/first-missing-positive/description/ is similar but O(1) space and average O(N) time!

—— my analysis —–

The mutable and O(1) space hints at — saving array indices as payload !

—- Idea A:

first scan to swap non-positives to the end and remember the new boundary (say #111).

In the same scan also determine min and max. Suppose min=55.

Another scan to reduce all payloads by 55 so they fall in the range of 0 to max-55.

— idea A1

Now use CSY technique .. check array@0-N in-situ #Nsdq#contrived

— idea A2:

Use quicksort partition to anchor a random node. If it is a[8] > 7, then discard higher section, to focus on lower section. During that scan also keep track of frequency of each int. If the lower section has no dupe numbers, then we can discard it and focus on the higher section.

However, if there are some dupes then this algo can’t discard any section.

—-solutionB: make_heap O(1) space but O(N logN) time. Build a min-heap based on the array in O(1) space, then keep calling min().

- make_heap shows random access container (vector used in demo), and “rearrange”, implying O(1) space
- make_heap shows O(N) to construct the heap

min() is O(1) but delete_min() is O(log N), so overall we probably get O(N logN)

—-solutionC: radix sort. https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations shows an in-place binary radix sort.

First pass to transform all negative numbers to 0. Then iterate the sorted array and check for the earliest “gap”. Worst case — you get 1,2,3… without gap, so answer is the 1+ largest array element.

O(W*N) where W is width of the largest integer. If we assume 64-bit then W is a constant:)

http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734 is another in-place radix sort.

# find offending section ] originally-sorted array

I rephrase Leetcode 581 Q: There was an ascending int array containing duplicates. Someone reshuffled some elements so no longer sorted. Write a function to determine the shortest subarray needs reshuffle. Before or after this subarray, no change needed.

O(N) time and O(1) space.

==== analysis

Not “easy” as labelled.

I want to find lowest integer that’s out of place. (Do the same on the other end and problem completed.)

first scan to find global min and max values, and ensure they are correctly placed.

Next scan from beginning on the rising slope. At first down-turn, treat the array as two partitions. In the right partition after the first peak, we determine the lowest value, say, -55. Then we will find the correct position for -55, within the Left partition! We will look for the last item equal-or-lower-than -55…. This is the key constraint of this entire problem.

Insight — We know -55 must move left and we don’t care about any other item in the right partition.

Insight — As a concrete illustration, if within left partition, -55 should stand after #7, then we know all Earlier items up to #7 are already correctly placed. Why? All Subsequent items in the left section are strictly higher than -55; and all other items in right section are all (equal or)higher than -55.

— my idea 1 in ON) space : radix sort a clone of the array, then compare to the input array.

# find All hid`places@ needle]haystack #suffixArray

Now I think suffix array is probably the best solution. Suffix array can be built in O(N) and can help any needle1, needle2, needle3 etc. This O(N) cost of pre-processing is small if searched repeatedly.

The Aha — this array is sorted by the string content. Therefore, all the “similar” suffixes club together.

Using the example in https://en.wikipedia.org/wiki/Suffix_array#Applications, if I want all matches for needle “AN” in haystack BANANA, using the suffix array, even a simple ( inefficient ) linear scan (over the array) would find AN, ANANA in a “cluster”. In general, O(logN) binary search is good.

This usage requires only suffix array, but many other usage scenarios require the helper LCP array.

Note the preprocessing is on the haystack not the needle.

Note building the suffix array is an ** one-time investment** , for repeated usage. If used only once, then total cost is O(N) to build and O(logN) to use this array. I think it is comparable to, not worse than, brute-force.

# sorted_Array|_List ⇒ balanced_BST

Q1 (easy): Given an array where elements are sorted in ascending order, convert it to a height balanced BST, where the depth of the two subtrees of *every* node never differ by more than 1.

Q2 (medium): How about a sorted slist?

==== analysis

I feel this “easy” problem should be medium. Perhaps there are elegant solutions.

I need not care about the payloads in the array. I can assume each payload equals the subscript.

There are many balanced BSTs. Just need to find one

— idea 1: construct an sequence of positions to probe the array. Each probed value would be added to the tree. The tree would grow level by level, starting from root.

The sequence is similar to a BFT. This way of tree building ensures

- only the lowest branch nodes can be incomplete.
- at all branch levels, node count is 2^L

I think it’s challenging to ensure we don’t miss any position.

Observation — a segment of size 7 or shorter is easy to convert into a balanced subtree, to be attached to the final BST.

When a subarray (between two probed positions) is recognized as such a segment we can pass it to a simple routine that returns the “root” of the subtree, and attach it to the final BST. This design is visual and a clean solution to the “end-of-iteration” challenge.

The alternative solution would iterate until the segment sizes become 0 or 1. I think the coding interviewer probably prefers this solution as it is shorter but I prefer mine.

— idea 2

Note a complete binary tree can be efficiently represented as an array indexed from one (not zero).

— Idea 3 (elegant) dummy payload — For the slist without using extra storage, I can construct a balanced tree with dummy payload, and fill in the payload via in-order walk

All tree levels are full except the leaf level — guarantees height-balanced. We can leave the right-most leaf nodes missing — a so-called “complete” binary tree. This way the tree-construction is simple — build level by level, left to right.

— idea 4 STL insert() — For the slist, we can achieve O(N) by inserting each element in constant time using STL insert() with hint

— For the slist without using an array, I can get the size SZ. (then divide it by 2 until it becomes 7,6,5 or 4.) Find the highest 3 bits (of SZ) which represent an int t, where 4 <= t <= 7. Suppose that value is 6.

We are lucky if every leaf-subtree can be size 6. Then we just construct eight (or 4, or 16 …) such leaf-subtrees

If SZ doesn’t give us such a nice scenario, then some leaf-subtrees would be size 5. Then my solution is to construct eight (or 4, or 16 …) leaf-trees of type AAAAA (size 6) then BBB (size 5).

Lucky or unlucky, the remaining nodes must number 2^K-1 like 3,7,15,31 etc. We then construct the next level.

–For the slist without using an array, I can propose a O(N logN) recursive solution:

f(slist, sz){

locate the middle node of slist. Construct a tree with root node populated with this value.

Then cut the left segment as a separated slist1, compute its length as sz1. node1 = f(slist1,sz1); set node1 as the left child of root.

Repeat it for the right segment

return the root

}

# merge 2 binTrees by node position

Q (leetcode 617): https://leetcode.com/problems/merge-two-binary-trees/submissions/

==== Analysis

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/merge2Tree.py is a short, simple solution fully tested on leetcode.com but hard to run offline. Elegant in term of implementation.

Insight — Challenge is implementation. Input and return type of DFT function are tricky but server as useful implementation techniques.

Labelled as easy, but pretty hard for me.

— Idea 1: BFT. When a node has any null child, put null into queue. Not so simple to pair up the two iterations

— Solution 2: DFT on both trees. Always move down in lock steps. When I notice a node in Tree A is missing child link that Tree B has, then I need to suspend the Tree A DFT?

My DFT function would have two parameters nodeInA and nodeInB. One of them could be null , but the null handling is messy.

Aha — I set the input parameters to to dummy objects, to avoid the excessive null check. In this problem, this technique is not absolutely necessary, but very useful in general

# intervalTree: classic RBTree to find overlapp`event

Notable detail — non-balancing BST won’t give logN performance, since the tree depth may degrade

Q1: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(interval INT) that returns any one event that overlaps with INT, or NULL if none.

If the end time == start time in the input INT, then it becomes the focus today :

Q2: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(timestamp ii) that returns any one event that’s ongoing at time ii, or NULL if none.

P312 [[intro to algorithms]] describes an elegant solution.

The text didn’t highlight — I think the end time of the input interval INT is … ignored. (Therefore, in my Q2, input ii is a single timestamp, not an event.) Precisely because of this conscious decision, the tree is sorted by event **start **time, and the additional payload (in each node N) is the subtree end time i.e. last end time of all events started before N (including N itself). By construction, N’s payload is equal or higher than N’s start time. (Note Our input ii can fall into gaps between the event intervals.)

Eg: suppose N starts at 2:22 and left-child payload says 3:33, then we know for sure there at least one ongoing even during the interval 2:22 to 3:33. Not sure if this is useful insight.

The text illustrates and proves why this design enables a clean and simple binary search, i.e. after each decision, we can safely discard one of “my” two subtrees. Here’s my (possibly imperfect) recall:

Let’s suppose each time the current node/event is not “ongoing”. (If it is then we can return it 🙂 ) So here’s the gist of the algorithm:

Suppose i’m the “current node”, which is root node initially. Compare ii to my left child L’s payload (not my payload).

As defined, L’s payload is the highest end times of L + all its sub nodes. In other words, this payload is the highest end time of all events starting before me.

Note the algo won’t compare L’s start time with ii

Note the algo won’t compare my start time with ii, though the scenario analysis below considers it.

- case 1: If payload is lower than ii, then we know all events on my left have ended before ii, so we can discard the left subtree completely. Only way to go is down the right subtree.
- the other case (case 2): if payload is higher or equal to ii, then we know my left subtree contains some events (known as “candd” aka candidates) that will end after ii.
- case 2a: suppose my start time is before ii, then by BST definition every candidate has started before ii. We are sure to find the “candd” among them.
- case 2b: suppose my start time is after ii, then by BST definition, all right subtree events have not started. Right subtree can be discarded
- In both cases 2a and 2b, we can discard right subtree.

# Alien dictionary

https://leetcode.com/problems/verifying-an-alien-dictionary/

Suppose Total C characters, and N words

====analysis

Mostly implementation challenge.

insight — Published solution is mediocre performance as it scans each word exactly TWICE, but luckily “twice” doesn’t affect bigO — O(total char count across all words)

— idea 1: maintain a linked list of “clusters”. Each cluster is {pos, startWordID, optional lastWordID} Each cluster has words with the same prefix up to pos.

copy first letter of N words into an N-array. verify this array is sorted. Now separate the words into up to 26 clusters. Suppose we a cluster of 55 words. This cluster is the payload of a link node. When we look at 2nd char within this cluster, we see up to 26 sub-clusters, so we replace the big cluster with these sub-clusters.

Invariant — the original order among the N words is never changed.

Even if this idea is overkill, it can be useful in other problems.

the verify(arrayOf1stChar) is a util function.

— Idea 4: convert each word to an English word, in O(C).

Then sort the collection. What’s the O()? O(N logN C/N) = O(C logN)

— idea 5: compute a score for each word and check the words are sorted in O(N)

# celebrity search in O(N)

Q (MS 2018): write an algorithm to search for celebrities in a collection of people with the help of a blackbox predicate that can tell whether one person knows the other (bool a.knows(b)) which is an asymmetric relation. Celebrities are those who know no one but are known by everyone else.

Aha : There can be 0 or 1 celebrity.

There are exactly N*(N-1) relationships in the system, so brute force is O(N*N) but I propose a 2-pass O(N) solution:

- First pass: shrink the collection of Nodes [A, B .. N]
- check a random pair. if A knows B, then remove A from list; else, remove B from list. So first step always shrinks list by one
- Now check any random pair of persons in remaining list. Check either direction. Will remove exactly one person.
- list will shrink to 1 person (Dan) after N-1 steps

- Second pass: check this Dan thoroughly. Dan must be known to everyone and must not know anyone. We end up with either no celebrity OR one celebrity in Dan.

Aha — better use A/B/C or Alice/Bob/Cody instead of #1, #2, #3.

# elegant use@priorityQ #indeed

Q: A given system receives a bunch of tasks at beginning of day.

task {

string id #assigned by this system as auto-incrementing string value

int execTime

list<task> children # all children becomes runnable as soon as this task completes. This is a classic edgeList

}

implement int getTotalTime(task) #including all descendant tasks

Q1: during system initialization we need to calculate how long each task (and its subtree) will execute. I solved this “warm-up” problem using a top-down tree-walk with memoization.

Q2: Now suppose this system expands to a farm of N identical cpu nodes. When a given cpu completes a task, it can only receive one specific runnable task, i.e. the one with smallest task ID. (This is an artificial constraint that hints at priority queue. I feel in a realistic context some other constraints may exist. Hopefully those constraints can be modeled and supported in the same priority queue.)

Overall, this is a realistic problem, not ** blatantly contrived**.

Solution — I use two priorityQueues to hold two **mutually exclusive** subsets of tasks. Any task not in these PQs are completed (dead) or blocked (unborn). The mutual exclusion is a *hallmark of clean* designs.

PQ1 “the eligibles” holds all runnable tasks. This queue receives a group of tasks when group’s parent task completes — a (completion) event. This PQ is sorted by task id.

The event time is predictable as soon as this parent task starts running on a cpu.

PQ2: “the running” contains up to N tasks that are in the driver seats, sorted by expected end times.

My main() function is nothing but a while loop. In each iteration, pop the lowest item from the “running” queue, to simulate the earliest task completion event. At the completion, add all its children to the eligible queue. Then pop the leading eligible (possibly a new entry) and append it to the “running” queue.

The main loop is therefore an event loop like Win32 event loop. Therefore, it’s driven by the “running” queue, not the eligibles queue.

Elegant solution based on two priorityQueues. This data structure is not really intuitive so I think such intuition can and should be learned. This is my first enlightenment on the usage of priority queue. PQ1 yields the most eligible task, based on id or other criteria; PQ2 yields the earliest finisher according to each task’s duration and start time.

- “Running” tasks is a natural usage of priorityQueue;
- “Eligibles” is based on a stylized scheduling policy for illustration, but I can imagine real scheduling policies to use criteria other than FIFO. Note FIFO queue is less versatile than priorityQueue for scheduling.
- Kernel scheduler uses priority queue in a similar way

Exit condition — while-loop ends when the running queue becomes empty. The eligibles queue must have emptied out earlier. The last event time is the total cost.

Minor Note on Initial data parsing: When Task 1 is received, it may reference zero or more child tasks by id, but the child Task-55 details are only available when Task-55 description is seen later on. So it’s best to store a list of child ID’s rather than pointers. The ID’s can translate to Task pointers via a lookup table, either a map or array

Minor Note: we don’t enqueue tasks unrelated to the initial task (specified by user input). We only care about the initial task and its subtree.

# count paths between 2 binTree nodes #PimcoQ9 Ashish,Pinsky

The DP idea — compare matrix-path-counter and EditDistance

- showcasing efficient queue in python.
- showcasing using (x,y) coordinates as dictionary key
- showcasing find max value in a dictionary

— Requirement: (https://leetcode.com/problems/unique-paths-ii/description/ is similar)

See Q9.pdf in the email to Ashish. Here are some key points:

Consider a maze mapped to a 2d-grid with an upper left corner at coordinates (row, column) = (0, 0). Any movement must be in increasing row or column direction. You must determine the number of distinct paths through the maze. You will always start at position (0, 0), the top left, and end up at (max(row), max(column)), the bottom right.

1 1 0 1

1 1 1 1

As an example, consider the above matrix where 1 indicates an open cell and 0 indicates blocked. You can only travel through open cells, so no path can go through the cell at (0, 2). There are two distinct paths to the goal. As a 2nd example, matrix below has 10 paths:

1 1 1 1

1 1 1 1

1 1 1 1

https://github.com/tiger40490/repo1/blob/py1/py/2d/classic_pathCountQ9.py is my solution.

==== Analysis ====

I see a binary tree, where each node is a cell. Cell (0,0) has down/left node (1,0) + right node (0,1). I feel this is similar to a more generic problem “count paths between 2 nodes in a binary tree”

— DFT:

😦 I implemented something like a DFT but it was too slow with some test cases

😦 can overflow stack

🙂 DFT can print each unique path. I think BFT can’t.

🙂 DFT is easier to implement than BFT

— DynamicProgramming BFT solution from origin, as I described to Bill Pinsky:

This solution is more **versatile** than the one from Ashish. It can handle any directed graph.

Mark each node as “computed” once we have computed a score denoting how many paths-from-root there are. Save the score in a shadow matrix.

Origin node has score 1. Blocked cell has score 0.

Start BFT from origin. The two immediate neighbors are set to 1 (or 0 if blocked). Every node can be computed by adding up above-score + left-score. (This algo is simpler than the EditDistance algo.)

Memoization Performance note — My implementation was extremely slow (albeit correct) until I added an “if already computed, then continue” early in the loop

This Layered DP algo is also described in **max-path-sum problems**

Q: why is score[1,1] accessed 4 times?

A: node[1,1] is added to the queue twice. Each dequeue would need one check.

A: Also score[1,2] need to access score[1,1] as a parent. Ditto score[2,1]

— Ashish Singh gave me a much simpler solution, as shown in my github code.

# staircase problem #CSY@Bbg

Q (bbg question posed to CSY): given a staircase of height N (eg 3), you can reach the top by three steps 1,1,1 or two steps 1,2 or 2,1, or a single step of 3. Generate all paths for a given N.

### I think the solution(s) need more review and internalization.

This problem has many classic solutions

- bottom-up
- top-down recursion with memoization
- yield-based generator function

- BFT without recursion without duplicates

I feel this is simpler than AQR factorization and the CombinationSum problems.

Is this same as N-boy-split? No… split is harder. With the split, ABC can have a “step” of AC.

easy to test — https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_comboPermu/staircase_CSY.cpp was my non-recursive BFT solution.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py is my yield-based recursive solution. Only 5 lines in the implementation. I added lots of tests.

Q: why path count == 2^{n-1}?

Answer from CSY: f(1)=1, f(2)=2, f(n) = f(n-1)+f(n-2)…+f(1) = 2^{n-1}

A: For a staircase of n, you can take step of 1 and have f(n-1) paths, or you can take step of 2 and have f(n-2) paths…

—jargon file:

a STEP from LEVEL 0 to 2 has LENGTH=2, and skips level 1; a PATH consists of 1 or more STEPS; a FORMULA is a complete PATH, and always starts at level 0 and may hit intermediate levels #2, #5, #6 …

— key observations:

* Observation 1: Every formula starts with firstStepOf1, or firstStepOf2, (or firstStepOf3…) with not duplicate between these two groups !

* Observation 2: After taking firstStepOf1, the remaining steps are really a formula for staircase of N-1, a smaller problem. This invites a bottom-up DP solution.

—BFT solution. suppose N = 5. We model each path as a directory path from root. Each queue item is a path represented by a linked list or vector of capacity N.

I hope this solution avoids duplicates… Yes it does:)

- enqueue all “first levels” /1; /2; /3; /4; /5
- then dequeue and visit first path i.e. “1”. In this case, there are 4 levels remaining in the staircase, so enqueue /1/1; /1/2; /1/3; /1/4
- then dequeue and visit 2nd path in the queue i.e. “/2”. enqueue /2/1; /2/2; /2/3

Every time (after dequeue) we see a path has no remaining level, we have a formula.

—DP (bottom-up) iterative solution to be implemented

- first, build the formulas for a length-2 staircase. They are /1/1 + /2 i.e. two formulas saved in FormulaArrayForStaircase2 i.e. “fa2”
- then build the formulas for a length-3 staircase — fa3 = firstStepOf2 -> fa1 + firstStepOf1 -> fa2
- then build the formulas for a length-4 — fa4 = firstStepOf3 -> fa1 + firstStepOf2 -> fa2 + firstStepOf1 -> fa3

— recursive: CSY gave a recursive solution and interviewer asked “Can you find a non-recursive solution”?

I think the recursive solution repeats the same sub-problem (such as a 3-level staircase) over and over again, but we can make the recursive function more efficient with memoization — it should short-circuit when the input already has a solution saved in a global hashtable.

—- python yield/generator recursive solution

My github code shows simple generator function without memoization is about half the speed of bottom-up. With memoization, it’s 20% faster than bottom-up.

# max-sum subMatrix O(N^3)

Q: given a square matrix (height N) of signed ints, find a submatrix with maxi sum

====analysis

compute submatrix sum by hashmap lookup is a O(N^4) solution.

https://www.geeksforgeeks.org/dynamic-programming-set-27-max-sum-rectangle-in-a-2d-matrix/ has a O(N^3) solution for square matrix. I think brute force is O(N^6). If you know the trick it is doable.

Clearly the max-profit algo is inapplicable. The Kadane algo may help.

Need auxDS for sure.

— simple, elegant DP idea based on hint:

For every left-right column pair, find the best submatrix bounded by those two columns . For example, given column pair 3/7, the best submatrix must extend from column3 to column7. I think this constraint is extremely liberating, and leads to an elegant DP solution:

For pair 1/2, construct a new column array temp[] where temp[6]:= row-wise sum across columns 1/2. Once we have temp[], apply Kadane’s algo on temp[] to find the max submatrix sum. Entire process is O(N)

Q: Is each pair evaluated as an independent problem? I doubt it. I think after we have done pair 1/2, adding column3 into the mix is O(N), so the pairs **1**/2, **1**/3, **1**/4 .. **1**/N can be completed in O(NN). All pairs would O(NNN)

So how is adding column3 into the mix O(N)? I think we don’t use the prev Kadane result, **but we use the previous temp[]** to gain efficiency! We can update existing temp[] in O(N), then apply Kadane’s algo on it in O(N)

What if I have a 99×88 matrix? Do we use a O(NNM) or O(NMM) algo? Since there are fewer columns, I will stick to the column-pair algo presented. (If there are fewer rows i.e. flat matrix then I would use row pairs.)

# min-stack #bbg

My friend Abhinav (not his real name, to protect his privacy) got this question at Bloomberg internship interview. I added some details to make it clearer:

Q: Design a stack that supports push, pop, top, and retrieving the minimum element all with O(1) time complexity in the worst case.

There exists a function compare(Item A, Item B) that returns 1 if A is greater, 0 if equal, and -1 if A is smaller.

- getMin() — Retrieve the minimum element in the stack.
- push(x) — Push element x onto stack.
- pop() — Removes the element on top of the stack.
- top() — Get the top element.

==== analysis =====

The most efficient getMin() data structure is the binary heap, but insert/delete runs in O(logN). Therefore I felt the requirements here are computationally impossible. But in this context, we only need to support deleting the last added item 🙂

Key insight — popping is a constrained form of deletion. It’s hard to hit O(1) while supporting unconstrained deletions, BUT with a constraint on deletions, all operations can be O(1).

I need a regular stack + a helper data structure. A linked list or vector can support the stack operations

— The helper — a naturally sorted stack (or vector or deque) to hold record-breakers and record-matchers.

IFF a new minimum (record breaker) or another item matching the existing minimum (record-matcher) is added, we push it to the sorted stack.

After every pop(), we check the popped item. If equal to top of sorted stack, then pop the sorted stack.

At any time, top of the sorted stack is the current minimum.

Vector and deque are actually fine. Interviewer may feel they are inferior to a stack, but with a modified requirement, they may become very useful.

— Here’s a design to beat binary_heap, based on finite-sized (32 or 64-bit int) keys

Assuming the comparison is based on 32-bit integer key (string or floats can also use radix sort). I will use a radix array structure. Perhaps 4 arrays of 256-elements. Or perhaps a 4×256 matrix, Ultimately the payload stored in the data structure are pointers to stack nodes. This saves memory since a stack node may be 99MB.

Every time we push a node, we derive the new key value in O(1), use the key value to locate its ‘home’ in the radix structure and store the new node’s address in a linked list therein. After the push(), we can compare and update (in constant time) a global variable pointing to the minimum node.

Each stack node also has a back-pointer to the iterator into the list, so before pop() we can use the iterator to locate the object in the radix structure and delete it from the host linked list. We will also update the global variable.