c++get stacktrace@particular call site #core dump

Not needed in any interview…

It’s easier to trigger core dump in order to get a stack trace. I was successful with raise(SIGABRT)

Here’s a simple demo generating a back trace in a c++ source code. Not sure if it would work in a big code base, but it requires -rdynamic!

#ifdef usage_demo

g++ -g -rdynamic backtrace.cpp # -rdynamic needed
./a.out | perl -pe 's/.*\((.*)\+.*\[(.*)\]/ `c++filt $1` . `addr2line $2` /e;'


#include <execinfo.h>
#include <stdexcept>
using namespace std;
int f2(){
                throw 1;
                void *array[10];
                size_t size = backtrace(array, 10);
                backtrace_symbols_fd(array, size, STDOUT_FILENO);
int f1(){
        return f2();
int main(){

show free slots between meetings #bbg c++11

Q: You are given a list of teleconference booking like {9,10} meaning from 9am to 10am. We can multi-task, so overlap is fine. Can you print out all the free slots within the day? You can safely assume all the input data are sensible and between 9 to 18, i.e. our workday.

I solved this problem on the spot. My weakness is still the ECT cycle esp. the compiler errors.

I was familiar with the inverse problem of listing all busy sessions. So I decided to *focus* on that. I took a calculated risk that once I get that to work, I will have the cool confidence to solve the original problem.

corner case: two adjacent meetings. I added comment to specify exactly how to check for that
corner case: what if the entire day is one busy session?

int const SOD = 0, EOD=24;

struct Intv{ //can be a single meeting, a joined session or a free window
  int bgn, end; //end = 10 means 10 o'clock sharp. Will become float
  Intv(int a, int b): bgn(a), end(b){}
  bool operator<(Intv const & other) const { return this->bgn < other.bgn;}

  //can be a free function if no private fields
  friend ostream & operator<<(ostream & os, Intv const & a){
        return os;
template<typename T> ostream & operator<<(ostream & os, vector<T> const & v){
        for(int i = 0; i<v.size(); ++i) os<<v[i]<<"  ";
void printBusy(vector<Intv> const & arg){
  if (arg.empty()) return;
  vector<Intv> & v = const_cast<vector<Intv>&> (arg);
  sort(v.begin(), v.end());
  size_t const sz = v.size(); //cache
  Intv growable = v[0];
  for(size_t i = 1; i<sz; ++i){
        Intv & req = v[i];
        if (growable.end < req.st){
          cout<<growable<<endl; //close current session
          growable = req;//start new session
        growable.end = max(growable.end, req.end);
  cout<<"last session : "<<growable<<endl;
void printWindow(vector<Intv> const & arg){ //subsequent exercise
        if (arg.empty()){
                cout<<"free slot "<<Intv(SOD, EOD)<<endl;
        vector<Intv> &v = const_cast<vector<Intv> &>(arg); //avoid vector deep cloning
        sort(v.begin(), v.end());
        if (v[0].bgn > SOD){
          cout<<"SOD free slot "<<Intv(SOD, v[0].bgn)<<endl;
        Intv growing(v[0]); //current busy period
        for(int i=1; i<v.size(); ++i){
          //Heart of the algo. each new meeting either extends the current
          //busy period or prints a free window before it
          Intv & meeting = v[i];
          //cout<<i<<" a meeting: "<<meeting<<endl;
          //cout<<"growing = "<<growing<<endl;
          if (growing.end < meeting.bgn){
                cout<<"free slot "<<Intv(growing.end, meeting.bgn)<<endl;
                growing = meeting;
                growing.end = max(growing.end, meeting.end);
        if (growing.end < EOD)
                cout<<"EOD free slot "<<Intv(growing.end, EOD)<<endl;
int main() {
        //pass in initializer list to initliaze a const vector
        printBusy({Intv(15,17), Intv(7,9), Intv(5,7)});

Breadth-first-traversal, level-aware

BST visits root level (one node only), and 2nd generation, and 3rd generation etc. The BST below is aware of the height or “generation” currently visited.

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/BST_show_level.cpp

struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);
    Node marker(-1); //to be added to queue to mark new level

Node * dequeue(queue<Node*> & q){
    Node * n = q.front();
    return n;
void BFT(){
  queue<Node *> q;
  size_t height=0;

  for( q.push(&marker), q.push(&root); !q.empty();){
    Node * n = dequeue(q);
    if (n == &marker){
        if (q.empty()){
                cout<<"  ^^^^^^^^^^"<<endl;
        n = dequeue(q);
        q.push(&marker); //move the marker to end of queue
        cout<<"starting height # "<<++height<<endl;
    int dataL = 0, dataR=0;
    if (n->left){
        cout<<"pushing "<<n->left->data<<endl; q.push(n->left);
        dataL = n->left->data;
    if (n->right){
        cout<<"pushing "<<n->right->data<<endl; q.push(n->right);
        dataR = n->right->data;
    cout<<dataL<<" <- "<<n->data<<" -> "<<dataR<<endl;
// "  #  next is "<<dataN<<endl;
int main() {

memory leak demo: memset() std::string

valgrind proves the leak.

using namespace std;
size_t const len=15;
struct A{
        string s4;
        A(string s="default std::string"): s4(s){ cout<<"ctor"<<endl; }
size_t const  cnt=2;
size_t siz=cnt * sizeof(A);
A arr[cnt], ar2[cnt];

char fname[] = "/tmp/,.dat";
void leakDemo1() {
                arr[1]=A("frown"); //somehow skipping this causes core dump
        cout<<"before write()"<<endl;

        int fd = open(fname, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        write(fd, arr, siz);

                int fd2 = open(fname, O_RDONLY);
                read(fd2, ar2, siz);
        for (int idx = 0; idx < cnt; ++idx){
                A * tmp = ar2 + idx;
void leakDemo2(){
        int * intp = new int(11); //not deleted. valgrind detected the leak
        memset(&intp, 0, 8); //overwrite the 8-byte pointer object
        delete intp;  //deleting on the fake address from memset
void leakDemo3(){
        string s="hello";
        cout<<"sie of string == size of pointer = " << sizeof(string)<<endl; //inside the string object, there's nothing but a poitner!
        memset(&s, 0, 1); // overwite pointer object itself, so it now points to somewhere else ... leak

        //somehow, memset(....,2) causes seg fault?
int main() {

20 C++territories for QnA IV

NO coding interviews — The scope here is the QQ topics i.e. tough IV topics not needed in GTD.

[1] “+” means Interviewers ask tons of question in this domain, more than it deserves. Its weight in the total score is lower than the share of questions would indicate.
[2] “70%/core” means this IV topic is 70% theoretical; core c++. The more theoretical, the more you need to read and summarize, rather than “GTD by any means”
[O] means HFT style optimization-heavy interviews. Not needed in regular jobs

swap-related tricks? advanced and seldom quizzed

  unsorted topics theoretical Q weight  share of Q [1]  
T1 big 4 70% / core  9%  +
T1 vptr-related: dynamic cast,
interface classes, pure virtual
100% / core  9%  +
T1 virtual dtor 100% / core  2%  +
T1 inheritance: MI, PI, hiding,
overload^override, casting, protection
60% / core  5%
T2 ptr as field; return ptr 20% / core  N.A.
no tier OO low level design #swap tricks,
class hierarchy..
60% / add-on 5%  – hard to ask
T1 stack,heap,static …variables 50% / core  3%
T1 pbclone, pbref, ref params 50% / core  4%
T1 object allocation, life time
vs variable declaration
0 /core N.A.
tier 2 STL: containers, 20% / add-on  14%  – asked more in coding IV
tier 2 array 0 / core N.A.
T2 malloc; heap prohibit; leak prevention/detection 40% / core 7%  –
T1 const 90% / core  2%
T3 [O] DAM allocation efficiency 100% / core 1%
T2 threads 80% / add-on 8%  –
T1 smart ptr and variations 30%  6%
no tier c++11: move, rvr 90% / core  4%
T2[O] sockets: a lot of c++jobs I applied
happen to need it
0 / add-on 10%
T3[O] cpu cache optimization 100% / core 1%
T2[O] inline and code-size bloat 100% / core 1%
T2[O] field layout 90% 1%
T2 c^c++ 90% / core 2%
T2 static/shared library  50% / core 2%
—  minor topics  —  4%
boost besides smart ptr 10% / add-on
container of polymorphic type #interface 50%
templates 80%
singleton, factory 100%
pimpl 90%
other design patterns 100%
tier 2 exception safety 90% / core
T1 RAII 50% / core  –

##[17] 4 tech career growth directions #aggressive

In spite of the threats and pains, I remain hopeful about my technical growth. Ideally, I hope to

  1. maintain competitiveness in java. Consider body-building
  2. selectively build up my c++ IV competitiveness, in terms of QQ/BP/ECT, from 6 to 7, over 2 years
    1. sockets, concurrency, sharedMem, templates ..
  3. try to recover the /sunk cost/ in quant domain
  4. slowly branch out to data analytics or data science

[1 java] feels achievable, based on my recent interviews. [4 data] is secondary and nice to have, not a must at age 43, so [2 cpp] is my main drive.

The interviews are crucial. Without IV, I can improve my GTD or zbs but painfully slow improvement of IV competitiveness.

Q: Suppose we don’t know java. Based on our c++ IV performance, can we survive as a c++ developer?
A: I will survive. My family needs me.

TradeWeb c++IV

100% QnA interview. No coding no white-board no ECT no BP no algo no data structure.

System is Multi-threaded VC++/MSSQL.

One technical interviewer only. (The other 2 are behavior interviewers.) He is a director and looks very technical. I have met perhaps 3 to 5 interviewers focused on the fundamentals of a language/compiler. They pick one of 10 important parts of a language (like java, c++ or c#) and try to determine how well a candidate understands the key details and the rationales.

Below, Q’s are questions from this interviewer. A’s are my answers/guesses.

Q1: let’s talk about non-virtual dtor. You said there are problems in deleting a subclass instance via a base class pointer. Show an example.
(Now I think giving a non-trivial example requires non-trivial knowledge.)
A: Eg: subclass holds some resources to be explicitly released (via close() or disconnect()) in its dtor, but in our scenario only the base dtor is called. Subclass dtor is not invoked.

Q1b: if subclass dtor is invoked, does it always run the base dtor?
%%A: Yes guaranteed. See my blogpost.

Q1c: if a subclass dtor is virtual and you delete an instance via a subclass ptr, you said this is good and subclass dtor will run, so is the “virtual” unnecessary? In this case, what’s the difference between virtual dtor  vs non-virtual dtor?
%%A: yes in both cases the subclass dtor runs.  If your system is written such that you delete a subclass instance only via a subclass ptr and never superclass ptr, then drop the “virtual” to improve runtime performance.

Q3: Let’s talk about a static member function sf2() in class Base. If you have a reference to a Base instance, Can you call myInstance.sf2()?
A: yes

Q3b: if Base and Der both have a static method sf2()?
A: subclass sf2 tends to hide Base sf2. Static type of myInstance variable is used to determine which version is used.

Q3c: how about myPtr->sf2()? Obscure details that I don’t want to spend too much time on.
%%A: looks very odd, but if it compiles, then it’s static, never dynamic, binding, since there’s no ‘virtual’. See my code in https://github.com/tiger40490/repo1/tree/cpp1/cpp/lang_misc

Q4: when must you use the pointer “this”? Obscure details that I don’t want to spend too much time on.
%%A: ctor chaining? Actually chaining is supported in c++11 but not using “this”.
A: if there’s a field and a local variable of the same name, I use “this->” to disambiguate. CORRECT:) See
A: in my ctor (or method), i may want to print my address. Yes you can usually do that in the caller afterwards but not always
A (hindsight): Similarly, in my ctor (or method) I may want to conditionally save my address in some container. The conditional logic is not doable after the ctor.
A (hindsight): if I know the host object is in an array, I could do array arithmetic using “this” with precaution.
A: delete this. I have seen people doing it.
A (hindsight): CRTP
A (hindsight): if parent class has a const method m1() and non-const overload m1(), you can cast “this” to call either of them.
A (hindsight): what if inside a method you need to access a parent object field hidden by a host object field? Can you cast “this”? Yes tested but not a “must” use case. Alternative solution is B::nonStaticField
AA: sometimes you must return *this where the return type is HostClass&

Q4b: can you think of an example of using “this” to avoid access violation?

Q4h: why would anyone use ctor chaining? Why not call a common (non-static) setter method s2() from multiple ctors of the same class?
A: that setter has to be non-static!

Q4i: You think ctor should not call a non-static method?
%%A: it can, but I don’t do it. The host object is half-initialized, so the setter method must strictly perform field initialization and nothing else.

Q5: why do people use class templates? Note — Interviewer didn’t ask function templates.
%%A: avoid the cost of virtual functions

Q5b: Let’s look at using both class hierarchy and class templates.
%%A: not a good idea to combine them. STL has probably no virtual functions.

Q5c: Let’s see. You said you have a bunch of “payload” classes PL1/PL2/… and you want to use a container of PL1 and another container of PL2. Ok You said that’s a common use case of class templates unrelated to virtual functions. Suppose the container template requires each payload class to implement a method f2(). If f2 is provided by a base class, wouldn’t it be easier (can be virtual or non-virtual)?
A: If there’s a family of payload classes, like Shape, Rectangle, Square… then I feel it’s best practice to use a container of smart pointers. I think it reduces the risk of slicing, among other benefits.
A: if the payload classes don’t form a family hierarchy, then there is probably some template meta-programming technique to provide a default implementation of f2(). I’m no expert on template meta-programming.

## stateful OMS class design: observations

More details are in email…

Here’s a well-established and large-scale order manager class design. It handles millions of orders a day.

  • The entire process is restarted on every trading day. Before the restart, all pending orders are cancelled! The OM is probably a per-thread singleton in the process.
  • The OM stores all the orders for the day, including each closed order in case it needs cancellation.
  • The OM keeps all the partial executions (aka partial fills) for a given order, because each execution could be busted.
  • Each action on an order (such as validation, partial execution ..) is performed by a dedicated object. For 1000 orders, if there are 5 actions, then there would be 5000 distinct “action objects”. The OM has pointers to all of these action objects.
  • Most action objects are stateful. ALL action objects are persisted somewhere so as to support busting/cancellation.


leetcode 40% accepted … means nothing

Hi XR,

1) Some leetcoders (me too) would spent 5 seconds reading a discussion thread, get a basic idea and implement and submit her own solution and get accepted.

Therefore, the 40% acceptance rate doesn’t mean the problem was solved independently by 40% of the readers.

2) Also, some questions are not interesting so leetcoders don’t even try. Therefore, some relatively easy questions may have low acceptance rates.

I told myself to ignore the acceptance rate, but it’s too visible on the web page, so instead of complaining, I do look at the acceptance number. Mentally, I follow a simple heuristic rule — if the problem has high acceptance percentage then it is probably easy.

This heuristic rule (like all heuristics) can still be incorrect in the 25-horse (or linkedlist-loop-detection) problem where a very clever solution has become well-known.

remove subsequent duplicates in slist #SocGen

Q (SocGen hackerrank): given the head node of a linked list that might have duplicate values,  return the head node of a sanitized linked list without duplicates.

eg: Node@0x7089 (value=3) ->Node@0x6481 (value=2) ->Node@0x7921 (value=3) ->Node@0x6308 (value=7) ->null

would become Node@0x7089 (value=3) ->Node@0x6481 (value=2) ->Node@0x6308 (value=7) ->null

Note the first node (value=3) is still in the list, but all subsequent nodes with (value=3) are discarded.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/removeDupNodes_SocGen.cpp is my tested solution

  • The two-pointer solution sounds simple but is over-complicated and error-prone during implementation.
  • fake-link-head technique is again useful

std::vector-of-containers: initializer list

Typical example: If you heavily use a vector of map, it’s tempting to use a vector of pointers to maps. The java way.

If you drop the “pointers to”, then when you retrieve the map from the vector, you often get a copy, unless you save the return value in a reference variable

By the way, here’s an initializer for std::map:

vec.push_back(map<int, int>{{32,1}} );

MSOffice one-note for note-taking

Confession — I’m a laggard on adopting new applications. Ascii text files are my default choice.

  1. I use spreadsheets when I must.
  2. I use MSWord only for equations.
  3. I use Outlook tasks/drafts to capture screenshots for record keeping.
  4. 😦 OneNote is too bulky to back up
  5. 😦 OneNote is hard to share with computers without MSOffice

I think OneNote is suitable for office use —

🙂 save screenshots, but I can do the same in outlook
🙂 add notes anywhere on the doc, to make more visual presentation/documentations. However, for personal note taking I won’t need this feature.

don’t feel so bad about recursive solutions #CSY

If I only have a recursive solution and interviewer points out it could overflow the stack, then here’s my answer

"Indeed I wish there’s an iterative solution. It would be more efficient. However, if there are multiple threads available, then iterative solution might be less adaptable to use multiple threads.

As to stack overflow, I understand the run time stack space is much smaller than the heap space, so if I am given a large data set and I hit a million nested recursive calls, then I will follow standard procedure to convert my recursive solution to an iterative solution, and maintain my own stack in heap. This will relieve the pressure on stack space"

This answer applies to all your recursive solutions.

[17]deque with uneven segments #GS

Update — Now I think unrolled linked list for small payloads is very similar.

Q: (2017 GS interview question) We know std::deque offers efficient insert at both ends. But now we want efficient insert mid-stream, and willing to relax O(1) lookup.

I proposed a solution using uneven segments. Each segment has a max capacity. When reached, a new segment would be allocated and linked in. Lookup using a staircase function.

Segment sizes (not capacity) is a mutable int array. From this array, we lazily derive another int array, the “threshold array” or staircase array. Every lookup requires a binary search in this threshold array to locate the target segment. If we keep a hard limit (like 1024) on segment count, then the threshold array would be at most 1024 long, so within 10 hits we always locate the target segment — better than O(log N)

The two int arrays should be L1-cached.


This is related to q[cannot open shared object file] abc.so

See https://amir.rachum.com/blog/2016/09/17/shared-libraries/#rpath-and-runpath for the RUNPATH

q(objdump) can inspect the binary file better than q(ldd) does.

q(ldd) shows the final, resolved path of each .so file, but (AFAIK) doesn’t show how it’s resolved. The full steps of resolution is described in http://man7.org/linux/man-pages/man8/ld.so.8.html

q(objdump) can shed some light … in terms of DT_RUNPATH section of the binary file.

cvs rollback #basics

cvs -q update -D “2017-07-29” # will remove any file that’s “not in the universe” at that moment in history

The date means beginning of the day in GMT.  Here’s the more precise form:

cvs -q up -D “2017-07-29 00:00:01 GMT”

cvs -q up -D ‘2017/09/15 19:54:25 GMT’ # GMT make a difference. Default is … surprise.

–To verify the rollback

  • cvs log # is confusing and shows commits after the snapshot!
  • cvs stat path/to/file # shows the sticky tag

–To check the commits between two dates:

cvs diff -N -b -D “2017-10-20 22:00:00 GMT” -D “2017-10-20 23:00:00 GMT”

10front-stage(+backstage)data structures@cod`IV

Based on my experience of algorithm interviewing at investment banks, high or low frequency hedge funds, 3rd-party financial firms like data providers, liquidity venues, a few deep-pocketed startups and a few WestCoast tech giants.. here’s the breakdown in terms of the data structure used in the original problem description.

  • 5%of them are presented in nothing but a single integer or two integers or all integers under 10000. Calculation required.
  • 5% of them are presented in huge stacks or queues
  • — arrays in 1D or 2D
  • 10% of them are presented in huge integer arrays, that require comparison, summation, subtraction or other simple calculations
  • 15%+ of them are presented in a huge sequence of value-objects, not pre-sorted
    • This is the default way to receive input data
  • 10% of them are presented in huge 2D arrays including matrices, mazes and primitive images
    • my #1 weakness
  • — strings
  • 15% of them are presented in nothing but a single char-array or two strings
  • 10% of them are presented in huge collection of strings like words
  • — linked node data structures:
  • 10% of them are presented in long linked lists
  • 10% of them are presented in huge unsorted trees
  • 5% of them are presented in huge binary search trees
  • 4% of them are presented in other huge directed graphs but not binary trees or linked lists
  • 1% of them are presented as huge collection of points on an x/y coordinate plane. Always some math required, therefore rare.
  • =====
  • 100% in total

Any of these problems could require a solution using DP/Greedy, swapping, and auxiliary data structures like binary trees, hash tables, linked lists, stacks, queues or priority queues..

  1. 70% need auxiliary arrays or hash table
  2. 50% can be solved with recursion excluding DFT
  3. 25% needs a breadth/depth-first tree traversal
  4. 25% can use DP or greedy algo, usually in one iteration — tough
  5. 20% can use an auxiliary BST
  6. 10% can use an auxiliary tree but not a BST — tough and non-intuitive. Eg: many matrix problems
  7. 10% can use an auxiliary linked list
  8. 10% require recursion in a loop — tough
  9. 10% can use an auxiliary stack or queue excluding BFT
  10. 10% can use an auxiliary sorted vector
  11. 5% can use an auxiliary matrix — tough and non-intuitive. Eg: EditDistance

LRU cache #Part 1

I believe my friend got this question in a bbg phone interview.

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations:

get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) – Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

Follow up:
Could you do both operations in O(1) time complexity?


Hash table to support lookup. The “value” is a pointer to a link node. Link node also has key/value — 2-way linkage.

slist is a FIFO and grows at tail for every new key/value pair, so head is the earliest pair. Every time a key/value is accessed via hash table, we move the node to the tail.

When capacity is reached, we would remove the tail node. Using the key in that node, we also remove from hash table.

Q: When do U choose c++over python4personal coding #FB

A facebook interviewer asked me “I agree that python is great for coding interviews. When do you choose c++?” I said

  1. when I need to use template meta programming
  2. when I need a balanced tree

Now I think there are other reasons

  1. when I practice socket programming
  2. when I practice memory mgmt techniques — even java won’t give me the low-level controls
  3. when I parse binary data using low-level constructs like reinterpret_cast, endianness conversion
  4. when I practice pthreads — java is easier

I didn’t showcase c++ but I felt very confident and strong about my c++, which probably shined through in front of the last interviewer (from the architect team)

I think my c++/python combo was good combo for the FB onsite, even though only one interviewer noticed my c++ strength.

reverse words in a string containing a sentence

Q: A sentence can contain 2 or more words, spaces and punctuation marks but no hyphen. You are given a sentence as a c-string. Reverse the words in O(1) space O(N) time. “a1 b2 c3” becomes “c3 b2 a1”.

%%A: first pass to reverse the c-string character-wise by swapping. 2nd pass — scan left to right and keep the current-word-start and current-word-end positions. Once we have a word, reverse the chars in it by swapping.

shortest path: linked-in distance #Flextrade

I haven’t searched online for this common question.

Q1: given a snapshot of the entire linked-in network and two personIDs A1 and B1, return the shortest distance between them. If they are directly connected, then return 1.
%%A: breadth-first search

Q2: what if you are given another pair of personID’s like A2 to B2, then A3 to B3 … Will your answer change?
%%A: I will pre-construct a jagged 2D int-array (more efficient than a matrix [1]). The outer array is indexed by personID. The element at position 2503 is for person #2503(say, Ed) and is an int array holding all of Ed’s first-degree friends’ personIDs.

This jagged array offers cache efficiency since I can load entire #2503 child-array in one memory access, even if Ed has 500 friends. Without this jagged array, I need to visit 500 pointers.

The jagged array alone can answer my question, so I don’t need the graph.

O() complexity is identical to graph traversal.

Q3: what if the snapshot is replaced by a live network? I feel we have to assume at the time of query we treat the network as momentarily frozen.
%%A: I would update my jagged array in real time

[1] jagged array can achieve 99% memory footprint saving. If the most connected person has 9999 friends then the matrix needs 9999 columns, but a typical person only has a few hundred friends.

[19]hash table expansion: implementation note #GS

(I don’t use the word “rehash” as it has another meaning in java hashmap. See separate blogpost.)

Note this blogpost applies to separate chaining as well as linear probing.

As illustrated in my re_hash_table_LinearProbing.py, the sequence of actions in an expansion is tricky. Here is what worked:

  1. compute bucket id
  2. insert
  3. check new size. If exceeding load factor, then
    1. create new bucket array
    2. insert all entries, including the last one
    3. reseat the pointer at the new bucket array

If you try to reduce the double-insertion of the last entry, you would try moving Step 2 to later. This is tricky and likely buggy.

Say the computed bucket id was 25, so after the expansion, you insert at (or around Bucket25), but this 25 was based on the old “capacity”. When we lookup this key, we would use the current capacity to get a bucket id of 9, so we won’t find this key.

%% Quesiton to interviewer

As I get older, the soft interview is becoming more important more tricky. Part of the soft interview is my list of questions to interviewer.

Tip: If possible, ask a “salesy” question so you can highlight your memorable strengths. More importantly, ask smart, serious questions.

How about “truthful” question that show your real concern? I feel there’s very limited value in truthfulness here.

Some people (like me) don’t like [e]asy questions, but interviewer may find it too sensitive and blunt, though they can always dodge it.

  • [e] Q: between quality and speed, what’s more important in everyday situations?
  • Q: bottlenecks and what was/is done about each
  • [e] Q: what are your challenges?
    • resource?
    • people not communicating enough?
    • requirement changing too fast?
  • [e] Q: what design trade-offs have you made?
  • [e] Q: what user groups, and what challenges do each present?
  • [e] Q: what types of team player are needed here?
  • [e] Q: how is performance *measured*?
  • [e] Q: your management style?
  • [e] Q: median age among the developers?
  • Q: Are expectations any different on an older developer (say, my age)
  • Q: who is the most important (not necessarily biggest) competitor?
  • Q (only for the hiring manager): Based on what you have learned about me in this interview, do you have any questions or concerns about my ability (not suitability) to do this job?
    • Note This is a pretty aggressive question, but if there is a good rapport built during the interview, it can be useful. More assertive hiring managers like this as it shows the candidate is not shy about seeking feedback. This can also clear the air of any lingering concerns or possible miscommunications/misconceptions that are a subtext to the interview.


min-stack #bbg

My friend Abhinav (not his real name, to protect his privacy) got this question at Bloomberg internship interview. I added some details to make it clearer:

Q: Design a stack that supports push, pop, top, and retrieving the minimum element all with O(1) time complexity in the worst case.

There exists a function compare(Item A, Item B) that returns 1 if A is greater, 0 if equal, and -1 if A is smaller.

  • getMin() — Retrieve the minimum element in the stack.
  • push(x) — Push element x onto stack.
  • pop() — Removes the element on top of the stack.
  • top() — Get the top element.

==== analysis =====

The most efficient getMin() data structure is the binary heap, but insert/delete runs in O(logN). Therefore I felt the requirements here are computationally impossible. But in this context, we only need to support deleting the last added item 🙂

Key insight — popping is a constrained form of deletion. It’s hard to hit O(1) while supporting unconstrained deletions, BUT with a constraint on deletions, all operations can be O(1).

I need a regular stack + a helper data structure. A linked list or vector can support the stack operations

— The helper — a naturally sorted stack (or vector or deque) to hold record-breakers and record-matchers.

IFF a new minimum (record breaker) or another item matching the existing minimum (record-matcher) is added, we push it to the sorted stack.

After every pop(), we check the popped item. If equal to top of sorted stack, then pop the sorted stack.

At any time, top of the sorted stack is the current minimum.

Vector and deque are actually fine. Interviewer may feel they are inferior to a stack, but with a modified requirement, they may become very useful.

— Here’s a design to beat binary_heap, based on finite-sized (32 or 64-bit int) keys

Assuming the comparison is based on 32-bit integer key (string or floats can also use radix sort). I will use a radix array structure. Perhaps 4 arrays of 256-elements.  Or perhaps a 4×256 matrix, Ultimately the payload stored in the data structure are pointers to stack nodes. This saves memory since a stack node may be 99MB.

Every time we push a node, we derive the new key value in O(1), use the key value to locate its ‘home’ in the radix structure and store the new node’s address in a linked list therein. After the push(), we can compare and update (in constant time) a global variable pointing to the minimum node.

Each stack node also has a back-pointer to the iterator into the list, so before pop() we can use the iterator to locate the object in the radix structure and delete it from the host linked list. We will also update the global variable.