cgroups #basics

  • group-of-Processes, …. not group of resources.
    • Initially named “Process Containers”
  • 2006 started at google
  • 2007 “mainlined” into linux kernel
  • Not available beyond linux, as far as I know
Advertisements

Optional.empty()=immutable !!singleton

https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html#empty– points out that Optional.empty() probably should return a global singleton instance, but no-guarantee ! No one should assume it is a singleton. Use of q[ == ] assumes it, and is wrong.

Unlike enums, JVM doesn’t guarantee “singleton”. I think this no-guarantee decision was chosen so as to give compiler/JVM maximum freedom.

[[effJava]] suggests that immutable instances need no copying. They probably can be singletons, but don’t need to be singletons.

 

Optional.java notes

Q: if an optional is empty, will it remain forever empty?

— An Optional.java variable could but should never be null, as this instance need a state to hold at least the boolean isPresent.

If a method is declared to return Optional<C>, then the author need to ensure she doesn’t return a null Optional ! This is not guaranteed by the language.

https://dzone.com/articles/considerations-when-returning-java-8s-optional-from-a-method illustrates a simple rule — use a local var retVal throughout then, at the very last moment, return Optional.ofNullable(retVal). This way, retVal can be null but the returned reference is never null.

If needed, an Optional variable should be initialized to Optional.empty() rather than null.

–immutability is tricky

  1. the referent object is mutable
  2. the Optional reference can be reseated, i.e. not q[ final ]
  3. the Optional instance itself is immutable.
  4. Therefore, I think an Optional == a mutable ptr to a const wrapper object enclosing a regular ptr to a mutable java object.

Similarity to String.java — [B/C]

Compare to shared_ptr instance — [A] is true.

  • C) In contrast, a shared_ptr instance has mutable State, in terms of refcount etc
  • B) I say Not-applicable as I seldom use a pointer to a shared_ptr

— get() can throw exception if not present

— not serializable

— My motivation for learning Optional is 1) QQ 2) simplify my design in a common yet simple scenario

https://www.mkyong.com/java8/java-8-optional-in-depth/ is a demo, featuring … flatMap() !!

dev jobs ] Citi SG+NY #Yifei

My friend Yifei spent 6+ years in ICG (i.e. the investment banking arm) of Citi Singapore.

  • Over 6Y no layoff. Stability is Yifei’s #1 remark
  • Some old timers stay for 10+ years and have no portable skill. This is common in many ibanks.
  • Commute? Mostly in Changi Biz Park, not in Asia Square
  • Low bonus, mostly below 1M
  • VP within 6 years is unheard-of for a fresh grad

I feel Citi is rather profitable and not extremely inefficient, just less efficient than other ibanks.

Overall, I have a warm feeling towards Citi and I wish it would survive and thrive. It offers good work-life balance, much better than GS, ML, LB etc

fear@large codebase #web/script coders

One Conclusion — my c++ /mileage/ made me a slightly more confident, and slightly more competent programmer, having “been there; done that”, but see the big Question 1 below.

— Historical view

For half my career I avoided enterprise technologies like java/c++/c#/SQL/storedProc/Corba/sockets… and favored light-weight technologies like web apps and scripting languages. I suspect that many young programmers also feel the same way — no need to struggle with the older, harder technologies.

Until GS, I was scared of the technical jargon, complexities, low-level API’s debuggers/linkers/IDE, compiler errors and opaque failures in java/SQL … (even more scared of C and Windows). Scared of the larger, more verbose codebases in these languages (cf the small php/perl/javascript programs)… so scared that I had no appetite to study these languages.

— many guys are unused to large codebases

Look around your office. Many developers have at most a single (rarely two) project involving a large codebase. Large like 50k to 100k lines of code excluding comments.

I feel the devops/RTB/DBA or BA/PM roles within dev teams don’t require the individual to take on those large codebases. Since it’s no fun, time-consuming and possibly impenetrable, few of them would take it on. In other words, most people who try would give up sooner or later. Searching in a large codebase is perhaps their first challenge. Even figuring out a variable’s actual type can be a challenge in a compiled language.

Compiling can be a challenge esp. with C/c++, given the more complex tool chain, as Stroustrup told me.

Tracing code flow is a common complexity across languages but worse in compiled languages.

In my experience, perl,php,py,javascript codebases are usually small like pets. When they grow to big creatures they are daunting and formidable just like compiled language projects. Some personal experiences —
* Qz? Not a python codebase at all
* pwm comm perl codebase? I would STILL say codebase would be bigger if using a compiled language

Many young male/female coders are not committed to large scale dev as a long-term career, so they probably don’t like this kinda tough, boring task.

— on a new level

  • Analogy — if you have not run marathons you would be afraid of it.
  • Analogy — if you have not coached a child on big exams you would be afraid of it.

I feel web (or batch) app developers often lack the “hardcore” experience described above. They operate at a higher level, cleaner and simpler. Note Java is cleaner than c++. In fact I feel weaker as java programmer compared to a c++ programmer.

Q1: I have successfully mastered a few sizable codebases in C++, java, c#. So how many more successful experiences do I need to feel competent?
A: ….?

Virtually every codebase feels too big at some time during the first 1-2 years, often when I am in a low mood, despite the fact that in my experience, I was competent with many of these large codebases.
I think Ashish, Andrew Yap etc were able to operate well with limited understanding.
I now see the whole experience as a grueling marathon. Tough for every runner, but I tend to start the race assuming I’m the weakest — impostor syndrome.
Everyone has to rely on log and primitive code browsing tools. Any special tools are usually marginal value. With java, live debugger is the most promising tool but still limited pain-relief. Virtually all of my fellow developers face exactly the same challenges so we all have to guess. I mean Yang, Piroz, Sundip, Shubin, … virtually all of them, even the original authors of the codebase. Even after spending 10Y with a codebase, we could face opaque issues. However, these peers are more confident against ambiguity.

[19] 3Deeper mtv2work4more$ after basic ffree

Note sometimes I feel my current ffree is so basic it’s not real ffree at all. At other times I feel it is real, albeit basic, ffree. After achieving my basic ffree, here are 3 deeper motivations for working hard for even more money:

  • am still seeking a suitable job for Phase B. Something like a light-duty, semi-retirement job providing plenty of free time (mostly for self-learning, blogging, helping kids). This goal qualifies as a $-motivation because … with more financial resources, I can afford to take some desirable Phase-B jobs at lower pay. In fact, I did try this route in my 2019 SG job search.
  • I wish to spend more days with grandparents — need more unpaid leaves, or work in BJ home
  • more respect, from colleagues and from myself
  • stay relevant for 25 years. For next 10 years, I still want more upstream yet churn-resistant tech skills like c++.

–Below are some motivations not so “deep”

  • better home location (not size) — clean streets; shorter commute; reasonable school.. Eg Bayonne, JC

— Below are some secondary $-motivations

  • * more time with kids? Outside top 10 motivations.
  • * better (healthy) food? usually I can find cheaper alternatives
  • * workout classes? Usually not expensive

G5 valuable features@2019job

== G3 attitudes that make this hibernation better than qSG phase

  1. not fixated on inferiority. At age 38 I was still feeling at-peak and hence … struggling against (and missing) the expectations repeatedly (boxer taking a brutal beating). Now at age 45 my self-expectations match reality way better
  2. mlp job — not aiming to tackle tough tech domains like c++ or quant. Boss expectation is easier than Macq and Qz. so far no PIP risk
  3. not aiming to “upgrade” my son’s benchmarks

In hindsight, during qSG I grossly underestimated the impact of PIP, damaged goods, stigma, …

== G5 valuable features@MLP

  1. Mgr in NY.. no micro-managing; Colleagues frequently MIA, showing ..
  2. Commute – able to camp out over weekends, or weekday evenings for that matter
  3. Instrumentation … slightly higher self-sufficiency. Independent xx (big gun) will happen quicker.
  4. respectable salary, not one of those 120k jobs. Remember peer-comparison is one of two reasons for first-aid.

Note some of the most valuable features are the absence of (or freedom from) “smells”

— beyond G5

  1. mkt depth — as a key factor to job satisfaction, as explained in the job_satisfaction spreadsheet. Mainstream, churn-resistant domain, similar to mkt data
  2. Extra leaves — not needed really but valuable
  3. not a job primarily using Qz, java1.4, or c#, or plain old C, or javascript/go-lang
  4. flexible or remote working
  5. Med-bx — a purely monetary benefit, more valuable to my peers + family
  6. not a system seldom used, or used by very few people
  7. not a code base that’s never put into production
  8. —— trivialities :
  9. Low-cost fruit shop nearby
  10. No prod emails so far

profilers for low-latency java

Most (simple) java profilers are based on jvm safepoint. At a safepoint, they can use JVM API to query the JVM. Safepoint-based profiling is relatively easy to implement.

s-sync (Martin) is not based on safepoint.

Async profiler is written for openJDK, but some features are usable on Zinc JVM. Async profiler is based on process level counters. Can’t really measure micro-latency with any precision.

Perf is an OS-level profiler, probably based on kernel counters.

“Strategic” needs a re-definition #fitness

“Strategic” i.e. long-term planning/t-budgeting needs a re-definition. quant and c# were two wake-up calls that I tragically missed.

For a long time, the No.1 strategic t-expense was quant, then c#/c++QQ, then codingDrill (the current yellowJersey).

Throughout 2019, I considered workout time as inferior to coding drill .. Over weekends or evenings I often feel nothing-done even though I push myself to do “a bit of” yoga, workout, or math-with-boy, exp tracking,

Now I feel yoga and other fitness t-spend is arguably more strategic than tech muscle building. I say this even though fitness improvement may not last.

Fitness has arguably the biggest impact on brain health and career longevity

git | reword historical commit msg

Warning — may not work if there’s a recent merge-in on your branch

Find the target commit and its immediate parent commit.

git rebase -i the.parent.commit

First commit in the list would be your target commit. Use ‘r’ for the target commit and don’t change other commits. You will land in vim to edit the original bad commit msg. Once you save and quit vim, the rebase will complete, usually without error.

Now you can reword subsequent commit messages.

c++low-^high-end job market prospect

As of 2019, c++ low-end jobs are becoming scarce but high-end jobs continue to show robust demand. I think you can see those jobs across many web2.0 companies.

Therefore, it appears that only high-end developers are needed. The way they select a candidates is … QQ. I have just accumulated the minimum critical mass for self-sustained renewal.

In contrast, I continue to hold my position in high-end coreJava QQ interviews.

conclusions on mvea xp

  • Not much positive feedback beside ‘providing new, different viewpoints’, but Josh doesn’t give positive feedback anyway
  • should be able to come back to MS unless very stringent requirement
  • Josh might remember Victor as more suitable for greenfield projects.
  • I think Josh likes me as a person and understands my priorities. I did give him 4W notice and he appreciated.
  • I didn’t get the so-called “big picture” that Josh probably valued. Therefore I was unable to “support the floor” when team is out. The last time I achieved that was in GS.
  • A few times I worked hard and made personal sacrifices. Josh noticed.
  • In the final month, I saw myself as fairly efficient to wrap up my final projects

Q: was the mvea c++ codebase too big for me?
A: No, given my projects are always localized.

I had a few proud deliveries where I had some impetus to capture the momentum (camp out). Listed below. I think colleagues were impressed to some extent even though other people probably achieved more. Well, I don’t need to compare with those and feel belittled.

This analysis revealed that Josh is not easily impressed.  Perhaps he has high standard as he never praised anyone openly.

  • * I identified two stateless calc engines in pspc. Seeing the elegant simplicity in the design, I quickly zoomed in, stepped over and documented the internal logic and replicated it in spreadsheet.
  • * my pspc avg price sheet successfully replicated a prod “issue”, shedding light into a hitherto murky part of the codebase
  • * I quickly figure out the serialization root cause of the outage
  • * I had two brave attempts to introduce my QOT innovation
  • * My 5.1 Brazil pspc project was the biggest config project to date. I single-handedly I overcame many compilation (gm-install) and startup errors.

##With time2kill..Come2 jobjob blog

  • for coding drill : go over
    • [o] t_algoClassicProb
    • [o] t_commonCodingQ22
    • t_algoQQ11
    • open questions
  • go over and possibly de-list
    1. [o] zoo category — need to clear them sooner or later
    2. [o] t_oq tags
    3. t_nonSticky tags
    4. [o] t_fuxi tags
    5. Draft blogposts
    6. [o] *tmp categories? Sometimes low-value
    7. remove obsolete tags and categories
  • Hygiene scan for blogposts with too many categories/tags to speed up future searches? Low value?
  • [o=good for open house]

c++screening Questions for GregR

Similar to https://bintanvictor.wordpress.com/2017/09/15/more-mcq-questions-on-core-java-skills/, hopefully these questions can be given over phone.

Other topics easy to quiz over phone: smart pointers, std::thread, rvr/move, big4

  • — Q3: Suppose you have a simple class “Account” with only simple data fields like integers and strings, we can get an Account object constructed on 1) heap or in 2) data segment. Where else can it be constructed? For clarity, Data segment holds things like global variables.  [on stack]
  • Q3b: Suppose Account class has a default constructor Only. In which cases above is this constructor used to instantiate the Account object? [all three cases]
  • Q3c: in which one of the three cases must we use pointer to access the constructed object? [the Heap case ]
  • Q3d: in which one of the three cases above do we have a risk of memory leak? [the Heap case]
  • Q3e: in which of the three cases can the Account object construction happen before main() function starts? Hint: dynamic vs static allocation [ data segment case ]
  • Q3e2 (advanced): for a static Account object allocated in data segment, is the construction always before main()? [Not always. Consider local static variables]
  • Q3f (advanced): RAII relies on which of the 3 types of allocation? Hint: RAII invokes the destructor automatically and reliably. [stack allocation]
  • Q3g: can you have an array of Account objects constructed on heap? [yes]
  • Q3h: YES we can get such an array of Account objects, using array-new, but do we get a pointer or an array or do we get the first array element in an Account variable? [pointer]
  • Q3i: what happens when one of these Account objects is destructed? [destructor would run]
  • Q3j: OK let’s assume Account destructor runs. In which cases above is the destructor Guaranteed to run? Note we have four destruction scenarios so far — on stack, in data-segment, on heap and after array-new. [all cases, including static objects]
  • Q3k: what if we don’t define any destructor in Account class, in which cases above is destructor skipped? [Never skipped in any  case]
  • Q3L: In the array-new case, suppose 100 Account objects are constructed on heap, when we destruct the array via array-delete, how many times would the Account destructor run? [ 100 times]
  • — Q4: I have a variable var1 as a integer pointer, and I pass var1 directly into a function, what types of functions below can accept this argument?
    • A: funcA takes a parameter of non-constant reference to integer
    • B: funcB takes a parameter of constant reference to integer
    • * C: funcC takes a parameter of pointer to integer
    • D: funcD takes a parameter of bare integer (not pointer or reference)
  • Q4b: I want to call funcX(100+200+300), what types of function can accept this argument? [B/D]
  • Q4c: I have a variable var2 as a integer variable, and I want to pass it in like funcX(var2), what types of functions below can accept this argument? [A/B/D]
  • — Q5: which casts below are platform-specific? [correct answer is marked with *]
    • A: static_cast
    • B: down_cast
    • C: const_cast
    • D: dynamic_cast
    • * E: reinterpret_cast
  • Q5b: which casts are Not part of the c++ standard?
  • Q5c: which casts are closest to the old-school cast in C language? [A]
  • Q5d: which casts are likely to require the vtable? [D]
  • Q5e: which casts usually work with pointers and references Only? [D/E]
  • Q5f (advanced): when applied on pointers, which casts could produce an address different from the input pointer? [D]
  • — Q6: which standard data types below usually require heap allocation [see asterisks]
    • A: integer
    • * B: the string in standard library
    • C: plain vanilla array of 20 timestamps
    • * D: the list in standard library
    • * E: the vector in standard library
    • * F: the map in standard library
    • * G: unordered_map
    • H: C-style string
  • Q6b: which data types are available in C? [A C H]
  • Q6c: which data types by design won’t grow to accommodate more elements? [C H] A is a trivial case.
  • Q6d (advanced): which implementation classes for these data types must include code to allocate an array on heap? [B E G]
  • Q6d2 (more advanced, if last question was correctly answered): which one among these three classes may skip array allocation in a common optimization? [B in the form of small-string optimization]
  • Q6e: which data types offer random-access iterators capable of jumping by an arbitrary offset? [B C E H]
  • Q6f: which data types offer amortized constant-time lookup? [BCEH and G]
  • Q6g (advanced): which data type(s) offer unconditional guaranteed performance for insertion/deletion at any position and in all scenarios ? [D F]
  • Q6h: (advanced): which data structures are allocated on heap but doesn’t require reallocation of existing elements? [list and map]

##command line c++dev tools: Never phased out

Consider C++ build chain + dev tools  on the command line. They never get phased out, never lost relevance, never became useless, at least till my 70s. New tools always, always keep the old features. In contrast, java and newer languages don’t need so many dev tools. Their tools are more likely to use GUI.
  • — Top 5 examples similar things (I don’t have good adjectives)
  • unix command line power tools
  • unix shell scripting for automation
  • C API: socket API, not the concepts
  • — secondary examples
  • C API: pthreads
  • C API: shared memory
  • concepts: TCP+UDP, http+cookies
Insight — unix/linux tradition is more stable and consistent. Windows tradition is more disruptive.

Note this post is more about churn (phased-out) and less about accumulation (growing depth)

opaque challenge:(intermediate)data browser #push the limit

Opacity — is a top 3 (possibly biggest) fear and burden in terms of figure-things-out-relative-to-cowokers on a localSys. The most effective and direct solution is some form of instrumentation tool for intermediate data. If you develop or master an effective browsing tool, however primitive, it would likely become a competitive advantage in terms of figure-out speed, and consequently earn you some respect.

LocalSys — I feel most of the effective data browser knowledge is localSys knowledge.

If you are serious about your figure-out speed weakness, if you are seriously affected by the opaque issues, then consider investing more time in these browsing tools.

Hard work, but worthwhile.

  • eg: Piroz built a Gemfire data browser and it became crucial in the Neoreo project
  • #1 eg: in my GS projects, the intermediate data was often written into RDBMS. Also important — input and output data are also written into RDBMS tables. Crucial in everyday trouble-shooting. I rank this as #1 in terms of complexity and value. Also this is my personal first-hand experience
  • #2 eg: RTS rebus — during development, I captured lots of output CTF messages as intermediate data… extremely valuable
    • Once deployed, QA team relied on some web tools. I didn’t need to debug production issues.
    • I remember the static data team save their static data to RDBMS, so they relied on the RDBMS query tool on a daily basis.

Now some negative experiences

  • eg: I’m not too sure, but during the Stirt QZ dev projects I didn’t develop enough investigation skill esp. in terms of checking intermediate data.
  • eg: in Mvea, we rely on net-admin commands to query order state, flow-element state and specific fills… Not as convenient as a data store. I never became proficient.
    • I would say the FIX messages are logged consistently and serve as input and output data browser.

Many projects in my recent past have no such data store. I don’t know if there’s some effective solution to the opacity, but everyone else face the same issue.

finding1st is easier than optimizing

  • Problem Type: iterate all valid choices without duplicates — sounds harder than other types, but usually the search space is quite constrained and tractable
    • eg: regex
    • eg: multiple-word search in matrix
  • Problem Type: find best route/path/combo, possibly pruning large subtrees in the search space — often the hardest type
  • Problem Type: find fist — usually easier

In each case, there are recurring patterns.

automation scripts for short-term GTD

Background — automation scripts have higher values in some areas than others

  1. portable GTD
    • Automation scripts often use bash, git, SQL, gradle, python/perl… similar to other portable GTD skills like instrumentation know-how
  2. long-term local (non-portable) GTD
  3. short-term local (non-portable) GTD

However, for now let’s focus on short-term local GTD. Automation scripts are controversial in this respect. They take up lots of time but offer some measurable payoff

  • they could consolidate localSys knowledge .. thick -> thin
  • They can serve as “executable-documentation”, verified on every execution.
  • They reduce errors and codify operational best practices.
  • They speed up repeated tasks. This last benefit is often overrated. In rare contexts, a task is so repetitive that we get tired and have to slow down.

— strength — Automation scripts is my competitive strength, even though I’m not the strongest.

— respect – Automation scripts often earn respect and start a virtuous cycle

— engagement honeymoon — I often experience rare engagement

— Absorbency — sometimes I get absorbed, though the actual value-add is questionable .. we need to keep in mind the 3 levels of value-add listed above.

jGC heap: 2 unrelated advantages over malloc

Advantage 1: faster allocation, as explained in other blogposts

Advantage 2: programmer can "carelessly" create an "local" Object in any method1, pass (by reference) the object into other methods and happily forget about freeing the memory.

In this extremely common set-up, the reference itself is a stack variable in method1, but the heapy thingy is "owned" by the GC.

In contrast, c/c++ requires some "owner" to free the heap memory, otherwise memory would leak. There’s also the risk of double-free. Therefore, we absolutely need clearly documented ownership.

max-sum non-adjacent subSequence: signedIntArr#70%

Q: given a signed int array a[], find the best sub-sequence sum. Any Adjacent elements in a[] should not both show up. O(N) time and O(1) space.

====analysis

— O(1) idea 2:

Two simple variables needed: m_1 is the m[cur-1] and m_2 is m[cur-2]. So compare a[cur] + m_2 vs m_1.

https://github.com/tiger40490/repo1/blob/py1/py/algo_arr/nonAdjSeqSum.py is self-tested DP solution.

FB 2019 ibt Indeed onsite

  • coding rounds — not as hard as the 2011 FB interview — regex problem
    • Eric gave positive feedback to confirm my success but perhaps other candidates did even better
    • No miscommunication as happened in the VersionedDict.
    • 2nd Indeed interviewers failed me even though I “completed” it. Pregnant interviewer may follow suit.
  • data structure is fundamental to all the problems today.
  • SDI — was still my weakness but I think I did slightly better this time
  • Career round — played to my advantage as I have multiple real war stories. I didn’t prepare much. The stories just came out raw and hopefully authentic
  • the mini coding rounds — played to my advantage as I reacted fast, thanks to python, my leetcode practice …

So overall I feel getting much closer to passing. Now I feel One interview is all it takes to enter a new world

* higher salary than HFT
* possibly more green field
* possibly more time to research, not as pressed as in ibanks

sorting collection@chars|smallIntegers

Therefore, given a single (possibly long) string, sorting it should Never be O(N logN) !

Whenever an algo problem requires sorting a bunch of English letters, it’s always, always better to use counting sort. O(N) rather O(N logN)

Similarly, in more than one problems, we are given a bunch of integers bound within a limited range. For example, the array index values could be presented to us as a collection and we may want to sort them. Sorting such integers should always , always be counting sort.

GTD fear-graph: delays^opacity^benchmark..

  • — manifestations first, fundamental, hidden factors last
  • PIP, stigma, damagedGood. Note I’m not worried about cashflow
  • [f] project delays
  • [f] long hours
  • [f] long commute
  • large codebase in brown field. Am slower than others at reading code.
  • [f] opacity — is worse than complexity or codebase size
  • figure-out speed benchmark — including impostor’s syndrome
  • [f = faced by every team member]
The connections among these “nodes” look complex but may not be really complex.
PIP is usually the most acute, harmful item among them. I usually freak out, though in hind sight I always feel I tried my best and should not feel ashamed. Josh is one manager who refuses to use PIP, so under Josh I had no real fear.

dev career]U.S.: fish out of water in SG

Nowadays I feel in-demand on 1) Wall St, 2) with the web2.0 shops. I also feel welcome by 3) the U.S. startups. In Singapore, this feeling of in-demand was sadly missing. Even the bank hiring managers considered me a bit too old.

Singapore banks only has perm jobs for me, which feel unsuitable, unattractive and stressful.

In every Singapore bank I worked, I felt visibly old and left behind. Guys at my age were all managers… painful.

expertise: Y I trust lecture notes more than forums

Q: A lot of times we get technical information in forums, online lecture notes, research papers. why do I trust some more than others?

  1. printed books and magazines — receive the most scrutiny because once printed, mistakes are harder to correct. Due to this fundamental reason, there is usually more scrutiny.
  2. research papers — receive a lot of expert peer review and most stringent editorial scrutiny, esp. in a top journal and top conference
  3. college lecture notes — are more “serious” than forums,
    • mostly due to the consequence of misinforming students.
    • When deciding what to include in lecture notes, many professors are conservative and prudent when adding less established, less proven research findings. The professor may mention those findings verbally but more prudent about her posted lecture notes.
    • research students ask lots of curious questions and serve as a semi-professional scrutiny of the lecture notes
    • lecture notes are usually written by PhD holders
  4. Stackoverlow and Quora — have a quality control via voting by accredited members.
  5.  the average forum — Least reliable.

[11]concurrent^serial allocation in JVM^c++

–adapted from online article (Covalent)

Problem: Multithreaded apps create new objects at the same time. During object creation, memory is locked. On a multi CPU machine (threads run concurrently) there can be contention

Solution: Allow each thread to have a private piece of the EDEN space. Thread Local Allocation Buffer
-XX:+UseTLAB
-XX:TLABSize=
-XX:+ResizeTLAB

You can also Analyse TLAB usage -XX:+PrintTLAB

Low-latency c++ apps use a similar technique. http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919 reveals insight of lock contention in malloc()

Q: When do U choose c++over python4personal coding #FB

A facebook interviewer asked me “I agree that python is great for coding interviews. When do you choose c++?” I said

  1. when I need to use template meta programming
  2. when I need a balanced tree

Now I think there are other reasons

  1. when I practice socket programming
  2. when I practice memory mgmt techniques — even java won’t give me the low-level controls
  3. when I parse binary data using low-level constructs like reinterpret_cast, endianness conversion
  4. when I practice pthreads — java is easier

I didn’t showcase c++ but I felt very confident and strong about my c++, which probably shined through in front of the last interviewer (from the architect team)

I think my c++/python combo was good combo for the FB onsite, even though only one interviewer noticed my c++ strength.

Other FB onsite coding/SDI questions

— Q: design type-ahead i.e. search suggestion. Scalability is key.

— Q: innerProduct2SparseArray (Table aa, Table bb). First you need to design the undefined “Table” class to represent a sparse array.
Then you need to write real code for innerProduct2SparseArray(), assuming the two Tables are already populated according to your definition.

— Q: add2DecimalsSameLength (string decimal1, string decimal2) to return a string version of the sum. Can’t use the python integer to hold the sum, as in java/c++ integers have limited size. You must use string instead.

Aha — carry can only be 0 or 1
I forget to add the last carry as an additional digit beyond the length of the input strings 😦

— Q: add2longBinary(string binaryNum1, string binaryNum2). I told interviewer that my add2BigDecimal solution should work, so we skipped this problem.

— Q: checkRisingFalling(listOfInts) to return 1 for rising, -1 for falling and 0 for neither. Rising means every new num is no lower than previous

— Q: checkDiameter(rootOfBinTree)

SDI: 3 ways to expire cached items

server-push update ^ TTL ^ conditional-GET # write-through is not cache expiration

Few Online articles list these solutions explicitly. Some of these are simple concepts but fundamental to DB tuning and app tuning. https://docs.oracle.com/cd/E15357_01/coh.360/e15723/cache_rtwtwbra.htm#COHDG198 compares write-through ^ write-behind ^ refresh-ahead. I think refresh-ahead is similar to TTL.

B) cache-invalidation — some “events” would trigger an invalidation. Without invalidation, a cache item would live forever with a infinity TTL, like the list of China provinces.

After cache proxies get the invalidation message in a small payload (bandwidth-friendly), the proxies discard the outdated item, and can decide when to request an update. The request may be skipped completely if the item is no longer needed.

B2) cache-update by server push — IFF bandwidth is available, server can send not only a tiny invalidation message, but also the new cache content.

IFF combined with TTL, or with reliability added, then multicast can be used to deliver cache updates, as explained in my other blogposts.

T) TTL — more common. Each “cache item” embeds a time-to-live data field a.k.a expiry timestamp. Http cookie is the prime example.

In Coherence, it’s possible for the cache proxy to pre-emptively request an update on an expired item. This would reduce latency but requires a multi-threaded cache proxy.

G) conditional-GET in HTTP is a proven industrial strength solution described in my 2005 book [[computer networking]]. The cache proxy always sends a GET to the database but with a If-modified-since header. This reduces unnecessary database load and network load.

W) write-behind (asynchronous) or write-through — in some contexts, the cache proxy is not only handling Reads but also Writes. So the Read requests will read or add to cache, and Write requests will update both cache proxy and the master data store. Drawback — In distributed topology, updates from other sources are not visible to “me” the cache proxy, so I still rely one of the other 3 means.

TTL eager server-push conditional-GET
if frequent query, in-frequent updates efficient efficient frequent but tiny requests between DB and cache proxy
if latency important OK lowest latency slower lazy fetch, though efficient
if in-frequent query good waste DB/proxy/NW resources as “push” is unnecessary efficient on DB/proxy/NW
if frequent update unsuitable high load on DB/proxy/NW efficient conflation
if frequent update+query unsuitable can be wasteful perhaps most efficient

 

@outage: localSys know-how beats generic expertise

When a production system fails to work, do you contact

  • XX) the original developer or the current maintainer over last 3Y (or longer) with a no-name college degree + working knowledge of the programming language, or
  • YY) the recently hired (1Y history) expert in the programming language with PhD and published papers

Clearly we trust XX more. She knows the localSys and likely has seen something similar.

Exception — what if YY has a power tool like a remote debugger? I think YY may gain fresh insight that XX is lacking.

XX may be poor at explaining the system design. YY may be a great presenter without low-level and hands-on know-how.

If you discuss the language and underlying technologies with XX he may show very limited knowledge… Remember Andrew Yap and Viswa of RTS?

Q: Who would earn the respect of teammates, mgr and external teams?

XX may have a hard time getting a job elsewhere .. I have met many people like XX.

lockfree stack with ABA fix #AtomicStampedReference

http://15418.courses.cs.cmu.edu/spring2013/article/46 explains ABA in the specific context of a lockfree stack. It uses CAS2 to counter ABA problem.

Java CAS2? I believe AtomicStampedReference  is designed for it.

http://tutorials.jenkov.com/java-util-concurrent/atomicstampedreference.html#atomicstampedreference-and-the-a-b-a-problem explains the AtomicStampedReference solving the ABA problem

Actually, many ABA illustrations are simplistic. Consider this illustration:

  1. Thread 0 begins a POP and sees “A” as the top, followed by “B”. Thread saves “A” and “B” before committing.
  2. Thread 1 begins and completes a POP , returning “A”.
  3. Thread 1 begins and completes a push of “D”.
  4. Thread 1 pushes “A” back onto the stack and completes. So now the actual stack top has A above D above B.
  5. Thread 0 sees that “A” is on top and returns “A”, setting the new top to “B”.
  6. Node D is lost.

— With a vector-of-pointer implementation, Thread 0 needs to save integer position within the stack. At Step 5, it should then notice the A sitting at stack top is now at a higher position than before, and avoid ABA.

The rest is simple. Thread 0 should then query the new item (D) below A. Lastly, the CAS would compare current stack top position with the saved position, before committing.

However, in reality I think a vector-of-ptr is extremely complex to implement if we have two shared mutable things to update via CAS: 1) the stackTopPosition integer and 2) the (null) ptr in the next slot of the vector.

— with a linked list implementation, I think we only have the node addresses, so at Step 5, Thread 0 can’t tell that D has been inserted between A and B.

You may consider using stack size as a second check, but it would be similar complexity as CAS2 but less reliable.

array-based order book: phrasebook

For HFT + mkt data + .. this is a favorite interview topic, kind of similar to retransmission.

Perhaps look at … https://web.archive.org/web/20141222151051/https://dl.dropboxusercontent.com/u/3001534/engine.c has a brief design doc, referenced by https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book

  • TLB?
  • cache efficiency

impostor’s syndrome: IV^on-the-job

I feel like impostor more on the job than in interviews. I have strong QQ (+ some zbs) knowledge during interviews. I feel it more and more often in my c++ in addition to java interviews.

Impostor’s syndrome is all about benchmarking. In job interviews, I sometimes come up stronger than the interviewers, esp. with QQ topics, so I sometimes feel the interviewer is the impostor !

In my technical discussions with colleagues, I also feel like an expert. So I probably make them feel like impostors.

So far, all of the above “expert exchanges” are devoid of any locaySys. When the context is a localSys, I have basically zero advantage.  I often feel the impostor’s syndrome because I probably oversold during interview and set up sky-high expectations.

any zbs in algo questions@@

I think Rahul may have a different view, but I feel some of QQ questions qualify as zbs, but very few of the algo tricks do.

In contrast, data structure techniques are more likely to qualify as zbs.

Obviously, the algorithm/data-structure research innovations qualify as zbs, but they are usually too complex for a 45-min coding interview.

single-writer lockfree data structure@@

All the lockfree data structures I have seen have 2 writer threads or more.

Q: what if in your design, there can be at most one writer thread but some reader threads. Any need for lockfree?
A: I don’t think so. I think both scenarios below are very common.

  • Scenario: If notification required, then mutex + condVar
  • Scenario: In many scenarios, notification is not needed — Readers simply drop by and read the latest record. In java, a volatile field suffices.

 

multi-core as j4 multi-threading #STM

Q: why choose multi-threaded design instead of single-threaded processes?

Most publications mention multi-core hardware as an answer. Questionable.

With multi-threading, you can run 30 threads in one process. Or you can run 30 single-threaded processes as in RTS parser and Rebus — industrial strength proven solution. Both designs make use of all CPU cores.

Between these two designs, heap memory efficiency can be different, as the 30 threads are able to share 99GB of objects in the same address space, but the 30 processes would need shared memory.

I feel the lesser known middle-ground design is … 30 threads running in single-threaded-mode, By definition, these 30 threads can only share immutable data only,. Anything mutable is thread-local.

In any multi-threaded design, those 30 threads can share the text segment i.e. memory occupied by code. Text segment tend to be smaller than the heap footprint.

In a boss-worker design, the worker threads may need to share very few mutable object, so these worker threads are not strictly STM. They are quasi-STM

funcPtr as argument, with default value

Suppose you write a dumpTree() which takes a “callback” parameter, so you can invoke callback(aTreeNodeOfTheTree).

Sounds like a common requirement?

Additionally, Suppose you want “callback” to have a default value of no-op, basically a nullptr.

Sounds like a common requirement?

I think c/c++ doesn’t have easy support for this. In my tested code https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_binTree/binTreeUtil.h, I have to define a wrapper function without the callback parameter. Wrapper would call the real function, passing in a nullptr as callback. Note nullptr needs casting.

clean language==abstracted away{hardware

Nowadays I use the word “clean” language more and more.

Classic example — java is cleaner than c# and c++. I told a TowerResearch interviewer that java is easier to reason with. Fewer surprises and inconsistencies.

The hardware is not “consistent” as wished. Those languages closer to the hardware (and programmers of those languages) must deal with the “dirty” realities. To achieve consistency, many modern languages make heavy use of heap and smart pointers.

For consistency, Everything is treated as a heapy thingy, including functions, methods, metadata about types …

For consistency, member functions are sometimes treated as data members.

job satisfaction^salary #CSDoctor

I feel CSDoctor has reasonable job satisfaction. If I were in his role my job satisfaction would be lower.

His financial domain is rather niche. Equity volatility data analysis, not a big infrastructure that needs a team to support. I guess it could be considered a glorified spreadsheet. I think the users are front office traders looking at his data to make strategic business decisions.

His technical domain is very niche. The team (4 people) made a conscious decision to use c# early on, and has not really changed to java. C# is shunned by West Coast and all internet companies including mobile, cloud and big-data companies. Even Wall St is increasing investment in javascript based GUI, instead of c# GUI.

I see very high career risk in such a role. What if I get kicked out? (Over 7 years the team did kick out at least 2 guys.. Last-in-first-out.) What if I don’t get promoted beyond VP and want to move out?

I won’t enjoy such a niche system. It limits career mobility. 路越走越窄.

This is one of the major job dis-satisfactions in my view. Other dis-satisfactions:

* long hours — I think he is not forced to work long hours. He decides to put in extra hours, perhaps to get promotion.

* stress — CSDoctor is motivated by the stress. I would not feel such high motivation, if put in his shoes.

* commute — I believe he has 1H+ commute. I feel he has very few choices in terms of jobs closer to home, because his domain is extremely niche.

STL container=#1 common resource owner #RAII

Stroustrup said every resource (usually heapy thingies) need to have an owner, who will eventually return the resource.

By his definition, every resource has an acquire/release protocol. These resources include locks, DB connections and file handles. The owner is the Object responsible for the release.

  • The most common resource owner is the STL container. When a std::vector or std::unorderded_multimap … is destructed, it would release/return the resource to the heap memory manager.
  • The best-known resource-owner is the family of smart pointers.
  • You can also combine them as a container of smart pointers.
  • All of these resource owners rely on RAII

central data-store updater in write-heavy system

I don’t know how often we encounter this stringent requirement —

Soccer world cup final, or a big news about Amazon … millions of users posts comments on a web page and all comments need to be persisted and shown on some screen.

Rahul and I discussed some simple design. At the center is a single central data store.

  • In this logical view, all the comments are available at one place to support queries by region, rating, keyword etc.
  • In the physical implementation, could use multiple files or shared-memory or distributed cache.

Since the comments come in a burst, this data store becomes the bottleneck. Rahul said there are two unrelated responsibilities on the data store updaters. (A cluster of updaters might be possible.)

  1. immediately broadcast each comment to multiple front-end read-servers
  2. send an async request to some other machine that can store the data records. Alternatively, wait to collect enough records and write to the data store in a batch

Each read-server has a huge cache holding all the comments. The server receives the broadcast and updates its cache, and uses this cache to service client requests.

val@pattern_recognition imt reusable_technique

Reusable coding techniques include my classic generators, DP, augmented trees, array-element swap, bigO tricks like hash tables and medium-of-3 pivot partitioning.

  • One of the best examples of reusable technique — generate “paths/splits” without duplicates. Usually tough.

More useful than “reusable techniques” are pattern-recognition insight into the structure and constraints in a given coding problem. Without these insights, the problem is /impenetrable/intractable/.

Often we need a worst input to highlight the the hidden constraint. The worst input can sometimes quickly eliminate many wrong pathways through the forest and therefore help us see the right pathway.

However, in some context, a bigO trick could wipe out the competition, so much so that pattern recognition, however smart, doesn’t matter.

 

##high-level factors for dev-till-70

  • Jiang Ling said even at an old age, “easy to find paid, meaningful coding projects, perhaps at home”. Xia Rong agreed. I kinda prefer an office environment, but if the work location is far away, at least there exists an option to take it on remotely.
  • see j4 dev-till-70
  • — #1 factor is brain health. See my blog tag. I believe I’m in control of my destiny.
  • — #2 factor is demand
  • Wall St contract market is age-friendly — [19] y WallSt_contract=my best Arena #Grandpa
  • c++ is good skillset for older guys. Core java is also good.
  • churn
  • — #3 factor is my competitiveness among candidates. See Contract: unattractive to young developers

##strongER trec taking up large codebases

I have grown from a sysAdmin to a dev
I have grown from web and scripting pro into a java pro then c++ pro !
Next, I hope to grow my competence with large codebase.

I feel large codebase is the #1 diffentiator separating the wheat from the chaff — “casual” vs hardcore coders.

With a large codebase, I tend to focus on the parts I don’t understand, regardless that’s 20% or 80% of the code I need to read. I can learn to live with that ambiguity. I guess Rahul was good at that.

In a few cases, within 3M I was able to “take up” a sizable brown field codebase and become somewhat productive. As I told Kyle, I don’t need bottom-up in-depth knowledge to be competent.

In terms of count of successes taking up sizable brown-field codebase, I am a seasoned contractor, so I would score more points than a typical VP or an average old timer in a given system.

  • eg: Guardian — don’t belittle the challenge and complexity
  • eg: mvea c++ codebase — My changes were localized, but the ets ecosystem codebase was huge
  • eg: mvea pspc apportionment/allocation logic + price matching logic
  • eg: mtg comm (small green field), integrated into a huge brown field codebase
  • eg: StirtRisk personalization — I made it work and manager was impressed
  • eg: StirtRisk bug fixes on GUI
  • eg: Macq build system — many GTD challenges
  • eg: Moodles at NIE
  • eg: Panwest website contract job at end of 2016
  • ~~~ the obvious success stories
  • eg: AICE — huge stored proc + perl
  • eg: Quest app owner
  • eg: error memos
  • eg: RTS

Contract: unattractive to young developers

GregR convinced me that most of the young developers aren’t interested in Wall St contract jobs. I can’t remember the reasons but something like

In my experience the contractors I see are mostly above 40 or at least 35+. The younger guys (Nikhil..) tend to be part of a contract agency like Infosys.

The upshot — I have fewer strong competition. Most of the older guys are not so competitive in either IV or GTD on the job. In particular, their figure-things-out speed is slower.

toy^surgeon #GS-HK@lockfree

H:= Interviewers was an OMS guy in GS-Hongkong, bent on breaking the candidate.

Q3: any real application of CAS in your project?

Q3b: why did you choose CAS instead of traditional lock-based? Justify your decision.
%%A: very simple usage scenario .. therefore, we were very sure it was correct and not slower, probably faster [1]. In fact, it was based on published solutions, customized slightly and reviewed within my team.
%%A: In this context, I would make my decisions. Actually my manager liked this kinda new ideas. Users didn’t need to know this level of details.

[1] uncontended lock acquisition is still slower than CAS

Q3d: how much faster than lock-based?
%%A: I didn’t benchmark. I know it wasn’t slower.

H: but any change is risky
%%A: no. There was no existing codebase to change.

H: but writing new code is a change
%%A: in that case, lock-based solution is also risky.

H: “Not slower” is not a good answer.

%%A: I don’t think I can convince you, but let me try one last time. Suppose my son wants to try a new toy. We know it doesn’t cost more than a familiar toy. We aren’t sure if he would actually keep it, but there’s no harm trying it out.

I said this as a last-ditch effort, since I had all but lost the chance to convince him. So I took a risky sharp turn.

H: but this is production not a toy !

So he was on the hook! What i should have said:

A: No we didn’t roll it out to production without testing. Look at what google lab does.

A: well, my wife went through laser surgery. The surgeon was very experienced and tried a relatively new technique. She was not a guinea pig. Actually the procedure is new but shown to be no worse than the traditional techniques. Basically, we don’t need to do lots of benchmarks to demonstrate a new technique is worth trying. For simple, safe, well-known-yet-new techniques, it’s not always a bad idea to try it on a small sample. Demanding extensive analysis and benchmark is a way to slow down incremental innovations.

A: the CAS technique has been well-researched for decades and tried in many systems. I also used it before.

find any black-corner subMatrix #52%

https://www.geeksforgeeks.org/find-rectangle-binary-matrix-corners-1/

Q: given a black/white matrix, find any rectangle whose all four corners are black.
Q2: list all of them
Q3 (google): find the largest

— idea 2: record all black cell locations and look out for 3-corner groups

a Rec class with {northwest corner, northeast corner, southwest corner}

first pass, For each pair on the same row, create a pair object and save it in hashmap {northwest cell -> list of x-coordinates on my right} We will iterate the list later on.

2nd pass, scan each column. For each pair on the same col, say cell A and C, use A to look-up the list of northeast corners in the hashmap. Each northeast corner B would give us a 3-corner group. For every 3-corner pack, check if the forth corner exists.

— idea 1: row by row scan.
R rows and C columns. Assuming C < R i.e. slender matrix

For first row, record each ascending pair [A.x,B,x] (No need to save y coordinate) in a big hashset. If S black cells on this row, then O(SS).

In next row, For each new pair, probe the hashset. If hit, then we have a rectangle, otherwise add to the hashset. If T black cells on this row, then O(TT) probes and (if no lucky break) O(TT) inserts.

Note one single hashset is enough. Any pair matching an earlier pair identifies a rectangle. The matching would never occur on the same row 🙂 Optionally, We can associate a y-coordinate to each record, to enable easy calculation of area.

After all rows are processed, if no rectangle, we have O(SS+TT+…). Worst case the hashset can hold C(C-1)/2 pairs, so we are bound by O(CC). We also need to visit every cell, in O(CR)

If C > R, then we should process column-by-column, rather than row-by-row

Therefore, we are bound by O( min(CC,RR) + CR). Now min(CC,RR) < CR, so we are bound by O(CR) .. the theoretical limit.

— idea 4: for each diagonal pair found, probe the other corners
If there are H black cells, we have up to HH pairs to check 😦 In each check, the success rate is too low.

— Idea 5: Brute force — typewriter-scan. For each black cell, treat it as a top-left, and look rightward for a top-right. If found, then scan down for a pair of bottom-left/bottom-right. If no then complete the given row and discard the row.

For a space/time trade-off,

  • once I find a black cell on current row, I put it in a “horizontal” vector.

standard SQL to support pagination #Indeed

Context — In the Indeed SDI, each company page shows the latest “reviews” or “articles” submitted by members. When you scroll down (or hit Next10 button) you will see the next most recent articles.

Q: What standard SQL query can support pagination? Suppose each record is an “article” and page size is 10 articles.

I will assume each article has an auto-increment id (easier than a unique timestamp) maintained in the database table. This id enables the “seek”” method.  First page (usually latest 10 articles) is sent to browser.  Then the “fetch-next” command from browser would contain the last id fetched. When this command hits the server, should we return the next 10 articles after that id (AA), or (BB) should we check the latest articles again, and skip first 10? I prefer AA. BB can wait until user refreshes the web page.

The SQL-2008 industry standard supports both (XX) top-N feature and (YY) offset feature, but for several reasons [1], only XX is recommended :

select * from Articles where id < lastFetchedId fetch first 10

[1] http://www.use-the-index-luke.com clearly explains that the “seek” method is superior to the “offset” method. The BB scenario above is one confusing scenario affecting the offset method. Performance is also problematic when offset value is high. Fetching the 900,000th page is roughly 900,000 times slower than the first page.

WallSt=kind to older guys like me@@

I would say Wall St is Open to older techies.

Q: I sometimes feel WallSt hiring managers are kind to older techies like me, but really?
A: I feel WallSt hiring managers are generally a greedy species but there are some undercurrents :

  • 🙂 traditionally, they have been open to older techies who are somewhat less ambitious, less driven to move up, less energetic, less willing to sacrifice personally
  • 🙂 U.S.hir`mgr may avoid bright young candd #Alan
  • 🙂 some older guys do perform well above expectation
  • 😦 if an older guy needs to be cut, many hiring managers won’t hesitate.

Overall, I think Wall St hiring managers are open to older guys but not sympathetic or merciful. They are profit-driven, not compassionate. The fact that I am so welcome on Wall St is mostly due to my java/c++ QQ, not anyone’s kindness.

I thank God. I don’t need to thank Wall St.

j4 dev-till-70 [def]

  • j4: I’m good at coding. We want to work with what we are good at. I have worked 20 years so by now I kinda know what I’m good at.
  • j4: given my mental power, use it or lose it.
  • j4: real world problem-solving, not hypothetical problems, not personal problems.
  • j4: responsibility/ownership — over some module
    • teaching also works
    • volunteering also works
  • j4: interaction with young people
    • teaching also works
  • j4: respect — from coworkers and users. I want to be a shining example of an old but respectable hands-on techie but not an expert.
    • teaching also works
  • j4: service — provide a useful service to other people. Who are these other people? LG2
    • teaching also works
  • j4: meaningful work? vague
  • j4: be relevant to the new economy? vague

advantage@old_techie: less to lose]race !!江郎才尽

Imagine on a new job you struggle to clear the bar

  • in figure-things-out speed,
  • in ramp-up speed,
  • in independent code-reading…
  • in delivery speed
  • in absorbency,
  • in level of focus
  • in memory capacity — asking the same questions over and over.
  • in dealing with ambiguity and lack of details
  • in dealing with frequent changes

As a 30-something, You would feel terrified, broken, downcast, desperate .., since you are supposed to be at your prime in terms of capacity growth. You would worry about passing your peak way too early, and facing a prolonged decline … 江郎才尽.

In contrast, an older techie like me starts the race in a new team, against a lower level of expectation [1] and have nothing (or less) to prove, so I can compete with less emotional baggage.

A related advantage — older techies have gone through a lot so we have some wisdom. The other side of the coin — we could fall in the trap of 刻舟求剑.

Manager and cowokers naturally have a lower expectation of older techies. Grandma’s wisdom — she always remind me that I don’t have to always benchmark myself against younger team members. In some teams, I can follow her advice.

Any example? Bill Pinsky, Paul and CSY of RTS?

[1] WallSt contract market

opaque c++trouble-shooting: bustFE streamIn

This is a good illustration of fairly common opaque c++ problems, the most dreadful/terrifying species of developer nightmares.

The error seems to be somewhat consistent but not quite.

Reproducing it in dev enviroment was a first milestone. Adding debug prints proved helpful in this case, but sometimes it would take too long.

In the end, I needed a good hypothesis, before we could set out to verify it.

     81     bool SwapBustAction::streamInImpl(ETSFlowArchive& ar)
     82     { // non-virtual
     83       if (exchSliceId.empty())
     84       {
     85         ar >> exchSliceId;
     86       }
    104     }
    105     void SwapBustAction::streamOutImpl(ETSFlowArchive& ar) const
    106     { // non-virtual
    107       if (exchSliceId.size())
    108       {
    109         ar << exchSliceId;
    110       }

When we save the flow element to file, we write out the exchSliceId field conditionally as on Line 107, but when we restore the same flow element from file, the function looks for this exchSliceId field unconditionally as on Line 85. When the function can’t find this field in the file, it hits BufferUnderflow and aborts the restore of entire flow chain.

The serialization file uses field delimiters between the exchSliceId field and the next field which could be a map. When the exchSliceId field is missing, and the map is present, the runtime would notice an unusable data item. It throws a runtime exception in the form of assertion errors.

The “unconditional” restore of exchSliceId is the bug. We need to check the exchSliceId field is present in the file, before reading it.

In my testing, I only had a test case where exchSliceId was present. Insufficient testing.

##skillist to keep brain active+healthy

— generally I prefer low-churn (but not lethargic) domains to keep my brain reasonably loaded:
[as] classic data structure+algorithms — anti-aging
[s] C/C++,
SQL,
c++ build using tool chain, and shooting using instrumentation
c++ TMP
[s] memory-mgmt
[s] socket,
[s] mkt data,
[s] bond math,
basic http?
[a=favor accu ]
[s=favor slow-changing domains]
— Avoid churn
  • jxee,
  • c#
  • scripting,
  • GUI
— Avoid white hot domains popular with young bright guys … too competitive, but if you can cope with the competition, then it could keep your brain young.
  • quant
  • cloud? but ask Zhao Bin
  • big data
  • machine learning

anagramIndexOf(): frqTable+slidingWindow

Q: Similar to java indexOf(string pattern, string haystack), determine the earliest index where a permutation of pattern starts.

====analysis

https://github.com/tiger40490/repo1/blob/py1/py/algo_str/anagramIndexOf.py is my tested solution featuring

  • O(H+P) where H and P are the lengths
  • elegant sliding window with frequency

Aha — worst input involves only 2 letters in haystack and pattern. I used to waste time on the “average” input.

On the spot I had no idea, so I told interviewer (pregnant) about brute force solution to generate all P! permutations and look for each one in the haystack

Insight into the structure of the problem — the pattern can be represented by a frequency table, therefore, this string search is easier than regex!

Then I came up with frequency table constructed for the pattern. Then I explained checking each position in haystack. Interviewer was fine with O(H*P) so I implemented it fully, with only 5 minutes left. Basically implementation was presumably slower than other candidates.

A few hours later, I realized there’s an obvious linear-time sliding window solution, but it would have taken more than the available time to implement. Interviewer didn’t hint at all there was a linear time solution.

— difficulty and intensity

I remember feeling extremely intense and extreme pressure after the interview, because of the initial panic. So on the spot I did my best. I improved over the brute force solution after calming down.

Many string search problems require DP or recursion-in-loop, and would be too challenging for me. This problem was not obvious and not easy for me, so I did my best.

I didn’t know how to deep clone a dict. The Indeed.com interviewers would probably consider that a serious weakness.

dev-till-70: 7external inputs

Context — professional (high or low end) programmer career till my 70’s. The #1 derailer is not physical health [3] but my eventual decline of “brain power” including …?

[3] CSY and Jenny Lu don’t seem to agree.

This discussion is kinda vague, and my own thoughts are likely limited in scope, not systematic. Therefore, external inputs are extremely useful. I posed the same questions to multiple friends

Q2: what can I do now given my dev-till-70 plan defined above.
Q1: how do I keep my brain healthy, and avoid harmful stress?

— Josh felt that the harmful stress in his job was worse in his junior years when he didn’t know the “big picture”. Now he feels much better because he knows the full context. I said “You are confident you can hold your end of the log. Earlier you didn’t know if you were good enough.”

— Grandpa gave the Marx example — in between intense research and writing, Marx would solve math problems to relax the brain. I said “I switch between algo problem solving and QQ knowledge”

— Alex V of MS — Ask yourself
Q: compare to the young grads, what job function, what problems can you handle better? My mental picture of myself competing against the young guys is biased against my (valuable) battlefield experience. Such experience is discounted to almost $zero in that mental picture!

When I told Alex my plan to earn a living as a programmer till 70, Alex felt I definitely need a technical specialization. Without it, you have very little hope competing with people 40 years younger. I said I intend to remain a generalist. Alex gave some examples of skills younger people may not have the opportunity to learn.

  • low-latency c++
  • c++ memory mgmt
  • specific product knowledge
  • — I said
  • sockets
  • .. I have a few skillist blogposts related to this

— Sudhir
Mental gymnastics is good, like board games and coding practice and Marx’s math practice, but all of these are all secondary to (hold your breath) … physical workout, including aerobic and strength training!

Grandpa said repeatedly the #1 key factor is physical health, though he didn’t say physical health affects brain capacity.

I told Sudhir that I personally enjoy outdoor exercise more than anything else. This is a blessing.

Also important is sleep. I think CSDoctor and grandpa are affected.

Sudhir hinted that lack of time affects sleep, workout and personal learning.

  • Me: I see physical exercise and sleep as fundamental “protections” of my brain. You also pointed out when we reach home we often feel exhausted. I wonder if a shorter commute would help create more time for sleep/workout and self-study. If yes, then is commute is a brain-health factor?
  • Sudhir: Absolutely, shorter commutes are always better, even if that means we can only afford smaller accommodation. Or look for a position that allows working remotely more frequently.

Sudhir also felt (due to current negative experience) an encouraging team environment is crucial to brain health. He said mental stress is necessary, but fear is harmful. I responded  “Startup might be better”.

–Jenny Lu felt by far the most important factor is consistent physical exercise to maintain vitality. She felt this is more important than mental exercise.

I said it is hard to maintain consistency. She replied that it is doable and necessary.

–Junli…. Felt mental exercise and physical exercise are both important.

When I asked him what I can do to support dev-till-70, he identified several demand-side factors —

  • He mentioned 3 mega-trends — cloud; container; micro-service.
    • Serverless is a cloud feature.
  • He singled out Spring framework as a technology “relevant till our retirement time”

— CSY pointed out the risk of bone injury.

He said a major bone injury in old age can lead to immobility and the start of a series of declines in many body parts.

— XR’s demand-oriented answer is simple– keep interviewing. He felt this is the single most effective thing I can do for dev-till-70.

freelist in pre-allocated object pool #Deepak

Deepak CM described a pre-allocated free-list used in his telecom system.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_66mem/fixedSizedFreeList.cpp is my self-tested implementation. Am proud of the low-level details that I had to nail down one by one.

He said his system initialization could make 400,000 new() calls to allocate 400,000 dummy objects and put them into a linked list. Personally, My design /emulates/ it with a single malloc() call. This is at startup time.

During the day, each new msg will overwrite [1] a linked node retrieved at the Head of the slist.

[1] using operator=(). Placement-new would be needed if we use a single malloc()

Every release() will link-in the node at the Tail of the slist. Can we link it in at the Head? I think so. Benefit — It would leave a large chunk of continuous free space near the tail. Improved Fragmentation.

Illustration — Initially the physical addresses in the slist are likely consecutive like addr1 -> addr2 -> addr3…. After some release() calls, it would look like a random sequence.

Using return-to-Head, I would get

  • pop, pop, pop: 4->5->..
  • rel 2: 2->4->5..
  • pop: 4->5->….
  • rel 1: 1->4->5…

— The API usage:

  • void ffree(void *)
  • void * fmalloc(size_t), possibly used by placement new

TreeNode lock/unlocking #Rahul

Q: given a static binary tree, provide O(H) implementation of lock/unlocking subject to one rule — Before committing locking/unlocking on any node AA, we must check that all of AA’s subtree nodes are currently in the “free” state, i.e. already unlocked. Failing the check, we must abort the operation.

H:= tree height

====analysis

— my O(H) solution, applicable on any k-ary tree.

Initial tree must be fully unlocked. If not, we need pre-processing.

Each node will hold a private hashset of “locked descendants”. Every time after I lock a node AA, I will add AA into the hashset of AA’s parent, AA’s grand parent, AA’s great grandparent etc. Every time after I unlock AA, I will do the reverse i.e. removal.

I said “AFTER locking/unlocking” because there’s a validation routine

bool canChange(node* AA){return AA->empty(); } # to be used before locking/unlocking.

Note this design requires uplink. If no uplink available, then we can run a pre-processing routine to populate an uplink lookup hash table { node -> parent }

— simplified solution: Instead of a hashset, we may make do with a count, but the hashset provides useful info.

 

longest run@same char,allow`K replacements #70%

https://leetcode.com/problems/longest-repeating-character-replacement/

Q: Given a string s that consists of only uppercase English letters, you can perform at most k operations on that string. In one operation, you can choose any character of the string and change it to any other character. Find the length of the longest sub-string containing all repeating letters you can get after performing the above operations.

Here’s my description of the same problem:

Q: Suppose we stand by highway and watch cars of the each color. Only 26 possible colors. Cars pass fast, so sometimes we miscount.

My son says “I saw 11 red cars in a row in the fast lane”.
My daughter says “I saw 22 blue cars in a row in the middle lane”
We allow kids to miss up to 3 cars in their answer. In other words, my son may have seen only 8, 9 or 10 red cars in a row.

When we review the traffic video footage of N cars in a single lane, determine the max X cars in a row of the same color, allowing k mistakes. K < N.
====analysis
Suppose k is 3

— solution 1: O(N) use 2 variables to maintain topFrq and w i.e. winSize

Within a sliding window of size w, maintain a frq table. initialize w to a good conservative value of 4 (i.e. k+1).

If we notice top frq is 2, better than (w-k) i.e. w-k<=topFrq , then lucky we can be less conservative and we can expand the current window backward (possibly safer than fwd).

After expansion, immediate try further expansion. IFF impossible i.e. w – topFrq > k, then slide the window.

If correct answer is 11 i.e there’s a 11-substring containing 8 reds, I feel my sliding window will not miss it.

key^payload: realistic treeNode #hashset

I think only SEARCH trees have keys. “Search” means key search. In a search tree (and hashset), each node carries a (usually unique) key + 0 or more data fields as payload.

Insight — Conceptually the key is not part of the payload. Payload implies “additional information”. In contrast, key is part of the search tree (and hashset) infrastructure, similar to auto-increment record IDs in database. In many systems, the key also has natural meaning, unlike auto-incremented IDs.

Insight — If a tree has no payload field, then useful info can only exist in the key. This is fairly common.

For a treemap or hashmap, the key/value pair can be seen as “key+payload”. My re_hash_table_LinearProbing.py implementation saves a key/value pair in each bucket.

A useful graph (including tree , linked list, but not hashset) can have payloads but no search keys. The graph nodes can hold pointers to payload objects on heap [1]. Or The graph nodes can hold Boolean values. Graph construction can be quite flexible and arbitrary. So how do you select a graph node? Just follow some iteration/navigation rules.

Aha — similar situation: linked list has well-defined iterator but no search key . Ditto vector.

[1] With pre-allocation, a static array substitutes for the heap.

Insight — BST, priorityQ, and hash tables need search key in each item.

I think non-search-TREES are rarely useful. They mostly show up in contrived interview questions. You can run BFT, pre|post-order DFT, but not BF S / DF S, since there’s nothing to “search”.

Insight — You may say search by payload, but some trees have Boolean payloads.

std::move(): robbed object still usable !

Conventional wisdom says after std::move(obj2), this object is robbed and invalid for any operation…. Well, not quite!

https://en.cppreference.com/w/cpp/utility/move specifies exactly what operations are invalid. To my surprise, a few common operations are still valid, such as clear() and operator=().

The way I read it — if (but not iFF) an operation wipes out the object content regardless of current content, then this operation is valid on a robbed/hollowed object like our obj2.

Crucially, any robbed object should never hold a pointer to any “resource”, since that resource is now used by the robber object. Most movable data types hold such pointers. The classic implementation (and the only implementation I know) is by pointer reseat.

how many ways to decode #60%

Q(Leetcode 91): A message containing letters from A-Z is being encoded to numbers using the following mapping:

‘A’ -> 1, ‘B’ -> 2, … ‘Z’ -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.

====analysis

I think this is similar to the punctuation problem.

–my botup solution

At each position in the string, keep a “score” number that represents “how many ways to decode a left-subtring ending here”

Useful — define my convenient jargon: we will say the encoding 10 to 26 are “high letters”, and the encoding 1 to 9 are “low letters”. If there are 95 ways to decode a string, i will call them 95 “formulas”.

At Position 33, i will look at score[31] (say equal to 95) and score[32] (say, equal to 97). if the two-char substring str[32:34] is between 10 and 26, then score[33] should include the 95 ways to decode str[:32]. Those 95 “formulas” can grow one high letter.

If str[33] is not ‘0’, then score[33] should also include the 97 ways to decode str[:33], because those 97 “formulas” can grow one low letter.

Aha — The 95 and the 97 formulas are all distinct because of the ending letter

I think we only need two variables to hold the previous two scores, but it’s easier to code with a score[] array.

y C++will live on #in infrastructure

I feel c++ will continue to dominate the “infrastructure” domains but application developer jobs will continue to shift towards modern languages.

Stroustrup was confident that the lines of source code out there basically ensure that c++compiler will still be needed 20 years out. I asked him “What competitors do you see in 20 years”. He estimated there are billions of c++ source code by line count.

I said C would surely survive and he dismissed it. Apparently, many of the hot new domains rely on c++. My examples below all fall under the “infrastructure” category.

  • mobile OS
  • new languages’ runtimes such as the dotnet CLR, JVM
  • google cloud
  • TensorFlow
  • AlphaGo
  • Jupyter for data science
  • Most deep learning base libraries are written in c++, probably for efficiency

hash table expansion: implementation note

(I don’t use the word “rehash” as it has another meaning in java hashmap. See separate blogpost.)

Note this blogpost applies to separate chaining as well as linear probing.

As illustrated in my re_hash_table_LinearProbing.py, the sequence of actions in a expansion is tricky. Here is what worked:

  1. compute bucket id
  2. insert
  3. check new size. If exceeding load factor, then
    1. create new bucket array
    2. insert all entries, including the last one
    3. reseat the pointer at the new bucket array

If you try to reduce the double-insertion of the last entry, you would try moving Step 2 to later. This is tricky and likely buggy.

Say after the expansion, the computed bucket id was 25, so you insert at (or around Bucket25), but this 25 was based on the old “capacity”. When we lookup this key, we would use the current capacity to get a bucket id of 9, so we won’t find this key.

zero out rows/columns +! auxDS

Q (Leetcode 73): Given a m x n matrix, if an element is 0, set its entire row and column to 0. Do it in-place in O(1) space. No time complexity.

I will assume all signed ints.

====analysis
I find this problem very well-defined, but the O(1) space is highly contrived. I think it only needs some clever technique, not really reusable technique.

Reusable technique — for array of integers with stringent space complexity, we can save indices in the array

Aha — I only need to know the full set of rowIDs and columnIDs.

— My O(minimum(m,n)) space solution 1:
zeroRowCnt:=how many rows to be zeroed out
zeroColCnt  :=how many columns to be zeroed out

Compare the two. Suppose zeroRowCnt == 11 is smaller. I will save the 11 rowID’s in a collection. Then scan horizontally to zero out by column. Then use the rowIDs to zero out by row

–My O(1) space idea 2 — more elaborate than the published solution.

Aha — Can we save the 11 rowID’s in a column to be zeroed out?

Compare zeroRowCnt and zeroColCnt as earlier. Get first rowID among the 11. Suppose it’s Row #3.

Now we know Row#3 has some zeros, so find the first column having a zero. It might be the last column (farthest east). Wherever it is, we pick that column as our “bookkeeper column”.

Visual insight — Suppose bookkeeper is Column #33. Then a[3,33] would be the first zero if we scan entire matrix by-row-and-internally-by-column

We scan row by row again (since we don’t remember those 11 rowIDs), starting after that first rowID. For every rowID found, we will zero out one corresponding cell in bookkeeper column.

Insight — We should end up with exactly 11 zeros in that column. Can’t exceed 11 (only 11 rows having zeros). Can’t fall below 11 (we save all 11 rowIDs)

From now on, freeze that column until further notice. Now zero out each Column to be zeroed out, but leave out our bookkeeper column.

Lastly, follow our bookkeeper column to zero out every “dirty row”.

func overloading: pro^con #c++j..

Overloading is a common design tool in java and other languages, but more controversial in c++. As interviewer, I once asked “justification for using overload as a design tool”. I kinda prefer printInt/printStr/printPtr… rather than overloading on print(). I like explicit type differentiation, similar to [[safe c++]] advice on asEnum()/asString() conversion function.

Beside the key features listed below, the most common j4 is convenience | readability….. a on-technical justification!

— ADL — is a key c++ feature to support smart overload resolution

— TMP — often relies heavily on function overloading

— optional parameter and default arguments — unsupported in java so overloading is the alternative.

— visitor pattern — uses overloading. See https://wordpress.com/post/bintanvictor.wordpress.com/2115

— ctor and operator overloading — no choice. Unable to use differentiated function names

— C language — doesn’t support overloading. In a sense, overloading is non-essential.

Name mangling is a key ABI bridge from c++ to C

##algo Q categories: %%strength/weakness

— relative strengths compared:

  • data-structure heavy
  • union find
  • generator — more experienced than average
  • geometry
  • sliding window
  • whiteboard relative to IDE

— weakness

  1. DP applied to string
  2. DP — as I usually can’t visualize and wrap my mind around it,
    • I have put in more effort than others, but not connecting the dots and developing intuition
  3. recursion in a loop — over my head
  4. 2D grid and matrix
    1. but I have put in more effort than others
  5. num-array. CSY is stronger
  6. priorityQ

 

reliable multicast for replicated-cache update

https://www.usenix.org/conference/usits-99/scalable-web-caching-frequently-updated-objects-using-reliable-multicast is a 1999 research paper. I hope by now multicast has grown more mature more proven. Not sure where this is used, perhaps within certain network boundaries such as a private network of data servers.

This paper examines reliable multicast for invalidation and delivery of popular, frequently updated objects to web cache proxies.

BST: post-order as serialization

Q: if I backscan a post-order output (unsorted sequence), is there only a single BST we can build?

Intuitively, I think this is like fwd scanning a pre-order output so yes.

–insights
If I keep a webcam at each tree node
* During a fwd scan of pre-order output, every node’s left child is born before right child.
* During a backscan of post-order output, every node’s right child is born before left child. During the actual post-order walk, my left child is always printed before my right child.

 

For a given BST, the pre-order output is unique; the post-order output is also unique. However,

Can two BSTs produce the same pre-order output? I think impossible
Can two BSTs produce the same post-order output? I think impossible

Like the pre-order, the post-order sequence is also a serialization but only for a BST.

vector used in hash table bucket

Summary — Same worst case time/space complexity as traditional chaining, but d-cache efficient during lookup, insert and removal.

Background — hash table worst case complexity is due to a bad input (and a weak hash function) leading to heavy collision and a single hot bucket. Load factor will not help as overall hash table size is still small. We end up with linear search in a long linked list. This affects not only lookup and removal. Insert also requires a lookup to prevent duplicate.

— Based on https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures

  • This design makes more efficient use of CPU caching and the translation lookaside buffer(TLB), because slot entries are stored in sequential memory positions.
  • It also dispenses with the next pointers that are required by linked lists, which saves space.
  • Despite frequent array resizing, space overheads incurred by the operating system such as memory fragmentation were found to be small

speed^soft qualities: CIV scoring

If you are very strong, you can achieve both

  • complete the required number of problems in time with no serious bugs
  • demonstrate requirement clarification, edge cases, dry-run, thorough error handling, detailed complexity analysis, coding style.

In reality, some speed coding tests at FB, Indeed etc really favor speed over the other things.

When we take such a test, there is always internal tension between two motivations — speedy completion vs demonstrating other qualities.

Thinking out loud is a common recommendation but it can create unnecessary conversation and slow you down. However, some of these conversations may lead you to the right path.

multicast routing: efficient data duplication

The generic  routing algorithm in multicast has optimization features that offer efficient data duplication.

— optimized message duplication at routing layer (Layer 3?)

https://www.metaswitch.com/knowledge-center/reference/what-is-multicast-ip-routing explains that

Routers duplicate data packets and forward multiple copies wherever the path to recipients diverges. Group membership information is used to calculate optimal branch points i.e. the best routers at which to duplicate the packets to optimize the use of the network.

The multicast distribution tree of receiving hosts holds the route to every recipient that has joined the multicast group, and is optimized so that

  • multicast traffic does not reach networks that do not have any such recipients (unless the network is a transit network on the way to other recipients)
  • duplicate copies of packets are kept to a minimum.

##very few theoretical compSci constructs 4 CIV

Most comp science constructs are too advanced too complicated for a 45-minute coding interview. So reading any comp science book is like my son reading science books not written for his exams. What a candidate need is drills targeted at interviews.” I told friends, based on Leetcode.

A few exceptional interviews (eg: Google) go beyond Leetcode, but still use only simple features of advanced comp science constructs. Here are A few notable comp science constructs to study, mostly advanced data structures

  • [s] trie, suffix array, suffix tree
  • geometry (is dStruct-heavy domain) —
    • nearest neighbor query;
    • query: which (x,y) points fall within this rectangle?
    • line sweep
  • segment tree
  • topological sort – 2 algos
  • [s] disjoint set
  • relationships among in|pre|post-order binTree walks — these insights are valuable for some Leetcode problems.
  • self-balancing algo in RBTree
  • [s] B-tree
  • [s] shortest path algos like BFT
  • [s=only the simple ones are needed in CIV]
  • —- techniques
  • technique: augmented tree
  • technique: back tracking
  • technique: DP and greedy
  • technique bottom-up DP with memoization

Q: How about coding interview books? Can help. They are not as deep or wide ranging as algorithm books.

Q: how about bigO on common data structures? Yes relevant because of the interviewer’s insistence.

 

minimize average distance to N cells]matrix #Kyle 60%

Q: given N red cells in a white matrix, find the best cell anywhere in the matrix, whose average distance to the N red cells is minimized. You can imagine N homes deciding to build a waterpoint.

Distance is like how many horizontal or vertical steps.

=====analysis

Insight — vertical vs horizontal dimensions are independent, so this is really a one-dimension problem.

center of gravity?

Insight — rate of change should be zero at the trough…

5 constructs: c++implicit singletons

#1 most implicit singleton in c++ is the ubiquitous “file-scope variable”. Extremely common in my projects.

  • — The constructs below are less implicit as they all use some explicit keyword to highlight the programmer’s intent
  • keyword “extern” — file-scope variable with extern
    • I seldom need it and don’t feel the need to remember the the details.. see other blogposts
  • keyword “static” — file-scope static variables
  • keyword “static” within function body — local static variables — have nice feature of predictable timing of initializaiton
  • keyword “static” within a class declaration —  static field

~~~~~~  The above are the 5 implicit singleton constructs ~~~~~~

Aha — it’s useful to recognize that when a data type is instantiated many many times i.e. non-singleton usage, it is usually part of a collection, or a local (stack) variable.

Sometimes we have the ambiguous situation where we use one of the constructs above, but we instantiate multiple instances of the class. It’s best to document the purpose like “instance1 for …; instance2 for …”

multicast over public internet

Many people say that in general, multicast over public internet is unsupported. I think many retail websites (including the dominant west coast firms) are technically unable to use multicast.

https://networkengineering.stackexchange.com/questions/47994/is-multicast-on-the-public-internet-possible-and-if-yes-how

As an end-user, You can multicast across the public Internet to another site by using a tunnel that supports multicast.

As a larger organization, like a video provider or an ISP, it is certainly possible to forward multicast packets across their domain boundary (i.e. across an Internet).

To forward those multicast packets to another ISP, you would need a peering agreement with them and use the Multicast Source Discovery Protocol (MSDP), configured on both ends.

While you won’t propagate your multicast across the global Internet, crossing network boundaries with multicast packets is not impossible.

27 Jul coding drill

— presumably solved:

  1. power of 3
  2. Fizz Buzz
  3. first unique char in a string
  4. longest common prefix
  5. grouping anagram
  6. in-place-remove all dupes from sorted array
  7. getByRank() in sorted matrix #51%
  8. longest descending path through matrix #60%
  9. count lower ints to my right #50%
  10. Accounts merge
  11. https://leetcode.com/problems/find-peak-element/
— already-solved problems reviewed
  1. Roman to int
  2. generate all loop-free paths between two nodes
  3. append+maxInRangeK #Rahul
  4. O(1)getRandom+add+del on Bag#Rahul
— pending
  1. intersection of two arrays

homemade clustered sparse array

O(1) lookup and Random Access is hard to achieve among sparse array implementation. Here my sparse array is a special subset of sparse arrays — the populated sections come in clusters. For with special subset, I hope Random access is feasible. My design is Fragment table:

  • My fragment table is a std::deque to hold raw pointers to “fragment arrays” not vectors. I choose the word “fragment” since std::deque has its own internal “segment” that is completely irrelevant.
  • elements 0 to 99 —  live in a 100-array, whose address is saved in fragmentTable[0]
  • elements 100 to 199: live in a 100-array, whose address is saved in fragmentTable[1]
  • elements 200 to 299: live in a 100-array, whose address is saved in fragmentTable[2]
  • elements 500 to 599: all missing, so a special ptr is is saved in fragmentTable[5]
  • elements 603 to 605 are present. A special ptr is is saved in fragmentTable[6]. This fragment consists of a small-footprint 3-array. This fragment also has a field this->offset=3
  • Note the fragmentTable is a deque of void pointers. Each void pointer can be a regular plain old array or a special pointer.

— As a bonus, this design supports unlimited append. Here’s a limited prepend() procedure

  1. to prepend a value X in front of the current sparseArr[0], we first allocate a new 100-array as a new fragment whose offset=99. This new fragment supports 99 future prepend() operations, but each time offset needs decrementing.
  2. save X as the last element in the new fragment.
  3. record the new fragment’s address in a new fragmentTable[0]. This is possible because the fragmentTable is a std::deque.

Note I can also allocate a small-footprint new fragment, iFF no further prepend() is allowed.

— Cornerstone of my design — all fragments have Identical range=100 inspired by the std::deque and the ragged 2D array. This enables simple and fast random access. The underlying arrays are the same size with few exceptions — empty fragments, and tiny-n-frozen fragments.

If one of the “tiny fragments” is expected to get populated in the future i.e. not frozen, then it should be allocated as a full array with bigger footprint. The wasted space should not be wasted for long.


Even if this design is not flexible and general-purpose, it meets my design goals so I’m proud of it.

Many implementations use linked fragments, where each fragment has a “next_fragment” pointer. I don’t see any benefit.

I don’t want to look at pure-java implementation as java array has many restrictions.

count lower ints to my right

https://leetcode.com/problems/count-of-smaller-numbers-after-self/ is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]

====analysis

Order statistics tree  (i.e. an augmented RBTree) should make O(N logN). However, the actual algorithm is not clear to me.

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions, like mergesort??

longest descending path through matrix #60%

https://leetcode.com/problems/longest-increasing-path-in-a-matrix/

Q: given an int-matrix that allows 4-way moves,  find longest path with strictly descending nodes. Lateral move disallowed.

====analysis

I have code to generate all paths… generate loopfree paths: graph node A→B

— Solution 1: O(N logN)
O(N) First pass  to construct “waterflow” graph — visit N nodes. For each, check the four neighbors. Add an outgoing edge (in a Node.edgeList) to a neighbor node if it is strictly lower.

Now put all nodes into a min-heap, according to node height. Can be part of first pass.

Second pass we visit each item in the min-heap. For each node, compute longest descending path starting therefrom. Record this path length as Node.score. Note if a node is surrounded by higher or equal nodes, then it has score 0, as in the lowest node.

A higher node AA is always upstream to some “computed” nodes. (We don’t care about any incoming edges into AA). So pick the max of (up to four) lower neighbors’ scores. Add 1 to get AA’s score. This computation is O(1) but the heap pop() makes second pass O(N logN)

Note the lowest node may not be part of the longest path, if it is surrounded by the highest nodes.

## defaultdict tricks #matrix

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/commonAncestor_Indeed.py demos a few time-saving tricks with defaultdict

  • mydict[key] # without printing or using the return value … would trigger the dictionary to construct a default value for this key IFF missing
  • from key/value pairs, build dict {key -> valueList}

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py shows

  • defaultdict(int) as a frequency counter.
    • mydict[key] += 1 #works even when key was previously missing in mydict
  • defaultdict(lambda: ‘_’).
    • mydict[(row,col)] can implement a matrix. negative (or other invalid) indices results in default value of string ‘_’. I believe dict key can be tuple but not list.
    • In contrast, the 2D array interprets negative indices as wrap-around !
  • defaultdict(lambda: [[],[]]) creates a list of 2 small lists for any “invalid” key. In contrast defaultdict(list) will create a monolithic list which hits runtime error when you access element of the nested list i.e. dict[someKey][0][anyIndex]

average_case ^ amortized

–> bigO (and big-Theta) is about input SIZE, therefore, usable on average or worst case data quality

There’s no special notation like bigX for average case.

–> Worst vs average case … is about quality of input data.

With hash table, we (novices) are permitted to say “quality of hash function” but this is incorrect.  My book [[algorithms in a nutshell]] P93 states that IFF given a uniform hash function, even in worst case bucket sort runs in linear time O(N).

–> Amortized … is about one expensive step in a long series of steps like inserts. Note Sorting has no amortized analysis AFAIK.

worst case amortized — is a standard analysis
Eg: vector expansion. Worst-case doesn’t bother us.

eg: quicksort+quickSelect can degrade to O(NN). Amortization doesn’t apply.

average case amortized — is a standard analysis
eg: hash table collision. A few operations happen to hit long chain, while most operations hit short chain, so O(1) on average.

Amortization is relevant iFF an insert triggers a hash table expansion.

What if the input is so bad that all hash codes are identical? Then amortization won’t help. For hash tables, people don’t worry about worst case, because a decent, usable hash function is always, always possible.

average case per-operation — can be quite pessimistic
eg: vector insert

worst case per-operation —

RBTree O(1)insert quite common ] coding IV

— for single-insert
https://en.cppreference.com/w/cpp/container/map/insert is precise saying that hinted insert is O(1).
— for mass insert
Special case — If we know the input sequence is pre-sorted and going to be “appended” to existing tree nodes, then each insert will always hit the end(). We can rely on hinted insert to achieve O(N) .

http://www.cplusplus.com/reference/map/map/insert/  is even more encouraging — ” If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted” so as long as pre-sorted, then we get O(N)

For 64-bit integer or float inputs, we can radix sort in O(N) then mass-insert in O(N).

— for construction from a pre-sorted sequence
https://en.cppreference.com/w/cpp/container/map/map confirms it’s O(N) if the input range is already sorted.

getByRank() in sorted matrix: priorityQ^RBTree

https://leetcode.com/problems/kth-smallest-element-in-a-sorted-matrix/

====analysis

recombinant binTree pyramid, where “water” flows east or south.

  • first level has one node .. lowest value. Add it to pq (i.e. priorityQ)
  • pop the pq and insert the two downstream nodes
  • total K pops, each pop is followed by up to 2 inserts

Heap will grow to up to K items, so each pop will be up to logK

Total O(K logK). To achieve this time complexity, we can also use a RBTree. The tree nodes can come from a pre-allocated array.

pre-allocated array as backing store for graph nodes #java No

I think any graph node can use the same technique, but here I present a simple yet interesting use case — a linked list with each node allocated from an array. https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_66mem/slistFromArray.cpp shows three home-made implementations:

  1. backing array of dummy link nodes, pre-allocated at compile time
  2. backing array of dummy link nodes, pre-allocated from free store aka DMA
  3. backing array is a byte array on heap or data section. Each link node is constructed via placement-new.

Here are a few Advantages that I consider minor because linked list is seldom needed in low-latency

  1. d-cache efficiency
  2. eliminates runtime load on heap allocator, since memory is pre-allocated. See malloc=long considered costly

Advantage #3: For c++ algo questions, this set-up has an interesting advantage — The node address is now an index into the backing array. This index is a natural auto-increment ID , based on creation order.

Now, the biggest advantage of linked list over vector is mid-stream insert/delete. One of the biggest disadvantages is lack of random-access. If nothing happens mid-stream (as in coding questions), then we can achieve random-access-by-id using array as backing store.

If nothing happens mid-stream, then this linked list is physically similar to an array with extra capacity.

This technique won’t work in java because java array of Node is array-of-pointers.

power-of-2 sized hash tables: implications

https://dzone.com/articles/hashmap-internal  and http://coding-geek.com/how-does-a-hashmap-work-in-java/ explain that java HashMap uses bucket count equal to power-of-two. A few implications

  • the modulo operation is a fast bit-masking i.e. select the lowest N bits: hash & (length-1)
  • an unlucky hashCode() may not offer dispersion among the lower bits. Assuming 16 buckets so only the lowest 4 bits matter, but hashCode() returns predominantly multiples of 8. Then two out of 16 buckets would be white hot. To guard against it, HashMap performs a rehash on the hashCode() output.

I implemented the same feature in https://github.com/tiger40490/repo1/blob/py1/py/88miscIVQ/re_hash_table_LinearProbing.py

Note rehash is ineffective if hashCode() returns very few distinct values. In java8, there’s one more protection against it — RBTree as alternative to linked list… See RBTree used inside java8 hashmap

RBTree used inside java8 hashmap

https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures is a language-neutral discussion.

https://digitalmars.com/d/archives//640.html#N803 shows an early implementation

they are implemented as a hashed array of binary trees. 
My experience with them is that they are very efficient.

— JEP 180 is the white paper to introduce a self-balanced tree as an alternative to linked list.

https://dzone.com/articles/hashmap-performance shows with one million entries in a HashMap a single lookup taken 20 CPU cycles, or less than 10 nanoseconds. Another benchmark test demonstrates O(logN) get(key) in java8, but O(N) in java7, as traditionally known.

If two hashCode() values are different but ended up in the same bucket (due to rehashing and bucketing), one is considered bigger and goes to the right. If hashCodes are identical (as in a contrived hashCode() implementation), HashMap hopes that the keys are Comparable, so that it can establish some order. This is not a requirement of HashMap keys.

If hashCodes are mostly identical (rare) and keys are not comparable, don’t expect any performance improvements in case of heavy hash collisions. Here’s my analysis of this last case:

  • if your Key.equals() is based on address and hashCode() is mostly the same, then the RBTree ordering can and should use address. You won’t be able to look up using a “clone” key.
  • if you have customized Key.hashCode() then you ought to customize equals(), but suppose you don’t implement Comparable, then you are allowed to lookup using a clone key. Since there’s no real ordering among the tree nodes, the only way to look up is running equals() on every node.

http://coding-geek.com/how-does-a-hashmap-work-in-java/#JAVA_8_improvements says

A given bucket contains both Node (linked list) and TreeNode (red-black tree). Oracle decided to use both data structures with the following rules:

  • If for a given index (bucket) in the inner table there are more than 8 nodes, the linked list is transformed into a red black tree
  • If for a given index (bucket) in the inner table there are less than 6 nodes, the tree is transformed into a linked list

With the self-balanced tree replacing the linked list, worst-case lookup, insert and delete are no longer O(N) but O(logN) guaranteed.

This technique, albeit new, is one of the best simple ideas I have ever seen. Why has nobody thought of it earlier?

RBTree: technical notes #Realtime

Red–black trees offer … worst-case guarantees, valuable in real-time applications. The Completely Fair Scheduler used in current Linux kernels and epoll system call implementation[19] uses red–black trees.

This valuable guarantee is valid only on always-balanced trees, but no need to be strictly balanced. In fact, AVL is more rigidly balanced but lacks this guarantee.

In contrast, hash table doesn’t offer worst-case guarantee in the face of hash collision. In fact, Java 8 HashMap uses RBTree in additional to linked list… see my blogpost RBTree used in java hashmap.

After an insert or delete, restoring the red-black properties requires a small number (O(log n) or amortized O(1)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). Although insert and delete operations are complicated logically, their times remain O(log n).

The AVL tree is another structure supporting O(log n) search, insertion, and removal. AVL trees can be colored red-black, thus are a subset of RB trees. Worst-case height is better than the worst-case height of RB trees, so AVL trees are more rigidly balanced. However, Mehlhorn & Sanders (2008) point out: “AVL trees do not support constant amortized deletion costs”, but red-black trees do.[25]

doubly-linked list as AuxDS for BST

I find it a very natural auxDS. Every time the BST gets an insert/delete, this list can be updated easily.

Q: How about the self-adjustment after an insert or delete?
%%A: I think this list is unaffected

With this list, in-order walk becomes really easy.

https://leetcode.com/problems/kth-smallest-element-in-a-bst/solution/ is the first place I saw this simple technique.

CPU run-queue #java perspective

— Mostly based on [[JavaPerf]] P28

Runtime.availableProcessors() returns the count of virtual processors, or count of hardware threads. This is an important number for CPU tuning, bottleneck analysis.

When a run-queue depth exceeds 4 times the processor count, then host system will become visibly slow. For a host dedicated to java, this is a 2nd reason for CPU saturation. First reason is high CPU usage.

Note run-queue depth is the first column in vmstat output

[19] q[extern] idiosyncracies

Despite my numerous summaries, I could never reach the bottom of this keyword.

http://www.goldsborough.me/c/c++/linker/2016/03/30/19-34-25-internal_and_external_linkage_in_c++/ has good details like

int x;          // is the same as
extern int x{}; // which will both likely cause linker errors if placed in header.

extern int x;   // while this only declares the integer, which is ok.

lowest missing+ve int#Codility #70%

Q: Write a function int solution(int[] A);  that, given an array A of N integers, returns the smallest natural number that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.

• each element of array A is an 32-bit signed int
• expected worst-case time complexity is O(N);
• expected worst-case space complexity is O(N).
* Elements of input arrays can be modified.

https://leetcode.com/problems/first-missing-positive/description/ is similar but O(1) space and average O(N) time!

—— my analysis —–

The mutable and O(1) space hints at — saving array indices as payload !

—- Idea A:

first scan to swap non-positives to the end and remember the new boundary (say #111).

In the same scan also determine min and max. Suppose min=55.

Another scan to reduce all payloads by 55 so they fall in the range of 0 to max-55.

Now use CSY technique .. check array@0-N in-situ #Nsdq#contrived

—-solutionB: make_heap O(1) space but O(N logN) time. Build a min-heap based on the array in O(1) space, then keep calling min().

  • make_heap shows random access container (vector used in demo), and “rearrange”, implying O(1) space
  • make_heap shows O(N) to construct the heap

min() is O(1) but delete_min() is O(log N), so overall we probably get O(N logN)

—-solutionC: radix sort. https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations shows an in-place binary radix sort.

First pass to transform all negative numbers to 0. Then iterate the sorted array and check for the earliest “gap”. Worst case — you get 1,2,3… without gap, so answer is the 1+ largest array element.

O(W*N) where W is width of the largest integer. If we assume 64-bit then W is a constant:)

http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734 is another in-place radix sort.

java array footprint: hypothetical calc

In low-latency or high-volume java apps (like exchange-facing), I think array of primitives is quite popular due to its efficiency advantage. Here I review a few potential footprint issues. Suppose we have an array AA of 12 ints, each 32-bit.

  • AA carries housekeeping data such as the array length (i.e. 12). AA also has the “standard” object header, of twelve (or eight) bytes… See size of Object.java, and how does it add up@@
  • The 12 ints take up 48 bytes, allocated in contiguous memory… lucky. For 12 Trade objects, we would get 12 scattered allocations.. expensive for runtime allocation and cache-unfriendly.
  • So we have 12 + 48 = 60 bytes so far. There might be padding for alignment. 64-bit CPU probably uses 8-byte alignment as shown on https://stackoverflow.com/questions/44468639/memory-alignment-of-java-classes
  • Finally, footprint is 64 bytes, but I have not verified it.

ask`lower base2reduce mgr expectation

This plan didn’t work out at Macq. Expectation was still too high.

The logic is, if my coworkers get total comp 200k and I ask only 160k, then I’m more likely to get some bonus. Even if I underperform them, I would still hit somewhere below 200k.

Now I think if I qualify to stay, then there will be some bonus even if my base is, say 190k. Hiring managers would not agree to a 200k base and run the risk paying doughnut bonus to a qualified employee.

## Y revisit old algo Qn #Kyle

I find it hard to be interested in previously solved problems. I think many people (not only those in a hurry) simply skip such problems. However, we are not so “good at” previously solved problems actually.

  • Reason – there are often insights and repeating patterns in an old problem, that require slow and repeated digestion. Solving it quickly is usually not enough.
  • reason — our solutions may not be optimal.
  • reason — there are very smart solutions out there we don’t know.
  • reason — we forget.
  • reason — with a simple tweak the problem can become a new problem. How well and how fast we can solve the modified problem depends on our insight into the original problem.

Therefore, I feel folks tend to trivialize some well-known problems and fail to learn enough from them.

Well-known doesn’t mean well-understood.
Well-known doesn’t mean simple.

Revisiting old problems is boring to many programmers, but can often feels easier than tackling new problems. I don’t get tired easily when revisiting old problems. If you are like me, then it could be more beneficial than tackling new problems.

However, my visible progress is much slower in terms of #problems solved per week. I think Kyle also pointed out something related to this.

oldtimers trapped ] pressure-cooker #Davis

See also G3 survival capabilities #health;burn rate

Q: Why are so many old timers unable to get out of a shitty job in a pressure-cooker company?

  • A small number of old timers change job but some of them regret as new job turns out worse. Other old timers hear those war-stories and prefer the certainty of known pain in the current job. They don’t feel confident they can find a better job.
  • bench time — Many old timers can’t afford bench time due to low reserve and high burn rate.
    • Old timers tend to be very unused to unstable income.
    • Damien (Macq) relied on his bonus payout..
    • Some had no savings at all.
  • Most old timers won’t even consider a 6M contract.
  • Most old timers will never consider a voluntary pay cut to get a more comfortable job
  • golden handshake — Some old timers (UBS?) can’t let go of the promised compensation package. They would not resign and give up on a 100k promised windfall
  • some old timers worry about job loss so much that they work their ass off to keep the job, even if the workload is unreasonable/unfair. I think in GS I was like them. I have a friend in a similar situation. I think the boss wants to let him go but may feel merciful, so they keep him but give him shitty tasks and demand fast turnaround at good quality…. unreasonable/unfair workload.
  • Some old timers may have a wife who completely opposes job change due to potential impact on family.

Q: why XR and I can afford to quit a job, stay home, and find a lower job?
A: Supportive wife, burn rate, passive income and savings.

##[15]types@Work2slow brain aging #campout

This is an important topic for the researchers. As laymen, I will just reflect on my personal experience — looki.

What kind of activity is considered “work”? I feel it’s the responsibility or commitment, obligation, consequences on users -> fear …

Most but not all the “work” is tiring. I suppose creative work can help keep the brain young.

  • IT — Learning a new tech in body-building mode doesn’t feel tiring to me, but how about to XR?
  • IT — A big chunk of everyday IT work (including troubleshooting) is partly creative and investigative. Can be brain-boosting.
  • Kids doing homework – can be tiring but at their age it won’t speed brain aging. How about adults?
— Avoid stressful, demanding jobs that require burning my candle on both ends.

Need to listen to the inside signals when working the long hours. When I work day and night on coding assignments, I was like my Dad and perhaps CSDoctor. I was driven by joy and burning pleasure, not pressure like in Stirt and GS. I feel my body knows if the pressure is positive or negative, pleasure or pain, so better listen to the signals and avoid aggravating the aging process.

Sudhir told me about encouraging environment vs fear… See blogpost on 5 external inputs

##deeply felt Priorities b4 U.S.→SG@45 #big-picture

  1. — priorities over the next 2-10Y horizon
  2. Career[a] longevity till 70, probably on wall st, not in Singapore or West Coast. A related keyword is “relevance” to the tech economy.
    1. On Wall St, I continue to keep a keen focus on robust technologies like core Java, cpp, SQL, sockets, core threading, common data structures. Outside Wall st, jxee (and possibly web stacks) offers the best market depth and demand.
    2. Compared to Wall St, West coast is possibly low priority for now as I don’t see long term visibility.
  3. my wellness — Need to guard against PIP-hell, trapped… A “stability factor” arguably more impactful than GC, housing, schooling… One of the key signs of wellness is weight and calorie count.
  4. boy’s education — I don’t know if U.S. system is better for him
  5. GC — a G5 priority in my current plan, primarily on the back of the longevity factor.
  6. — 2nd tier
  7. increase precious time with grand parents — fly business class to NY.
  8. preparing for war at new job — short-term, immediate but actionable.
  9. wife’s and daughter’s life-chances in U.S. — important but not much I can do now
  10. more passive income to reduce the cash flow stress
  11. Saving up for U.S. housing — not much I can do now.
  12. I now have a deep desire but limited hope to keep up my 细水长流 motivation for coding drill and QQ learning. Burning pleasure; self-esteem; satisfaction; absorbency.
  13. leaving a good impression with MS manager

[a] I didn’t say “income”. I think more important to me is my marketability (+ relevance, in-demand ..). Career longevity is the basis of entire family’s well-being for 15Y until kids start working.

Salary — is kinda secondary because the room for improvement is negligible in my biased mental picture.

GS-sg core IV with rare feedback

A Tech win by my own assessment.

–rare email feedback, with highlighting by me. It is clear o me the Hongkong OMS interviewer gave the most direct (negative) feedback. I feel grateful for that.

I feel this was a technical win. I think only the Hongkong OMS interviewer found some weakness in me.

I feel perm role selection/screening was more stringent than contractors because of leadership, communication, personality match, ownership… The new permanent hire is often seen as a potential manager to join the “race”.

In contrast, contractors are mostly assessed for technical + basic communication/attitude.

In SG, more than half the time I was assessed as a dev lead (or architect) but I don’t want to. U.S. contract market is better.

As he interviewed till the end, I’d like to share a bit of feedback.

The interviewers believed he had good industry experience (low latency, low-level coding, memory usage, CPU usage, MSA) and can implement/maintain systems, and appreciated his honesty e.g. not hiding the fact that he didn’t perform performance testing.

However, the interviewers wished to see more leadership and communication skills cognizant of 15+ years experience, and to conduct his own tests rather than easily take others’ assumptions at face value.

— Tokyo interviewer
Given a BST, how to do you insert, look-up by key?

Lastly, how do you remove an arbitrary branch node? Need a deterministic algorithm to reshuffle affected subtree and keep the BST property. Considering the python coding rounds, this is about the hardest pure-algo question throughout the interview process.

  • This level of difficulty is very different from tech shops
  • This level of difficulty is much lower than the coreJava QQ questions

— QQ interviewer Ming

Q: suppose you find your prod log file suddenly stops growing, what do you do?
%%A: check disk space; check DB query hung due to a pending transaction; check socket data input/output blocking
%%A: take thread dump, which is a nice JVM feature not available in c++ or other applications

Q: how did you tune your jvm?

Q2: how did you size the heap enough to ensure no jGC for entire day?
%%A: We estimated the high watermark (like 2GB) and configure my jvm with a heap bigger than that.

This technique is simple and crude but it usually works. I think interviewer may not be impressed but I was confident because … it works!

Q2b: but will that prevent GC? Do you know every GC must stop all application thread first?
%%A: at least the app thread can run after the snapshot, during the actual collection, right? I assume the snapshot is quick.

Q: in java, even without any protection, reading a 32-bit int will give you either the before-value or the after-value, right?
%%A: Not in c++. See c++11 atomic{int}^AtomicInteger.java #Bool can be half-written

Q: beside speed, what are other benefits/advantages of lock-free compared to lock-based designs?
%%A: the unsuccessful thread can do something else before retrying, rather than block indefinitely
%%A: a blocked thread (due to lock contention) is likely context-switched off the CPU. All the CPU data caches would soon be flushed.

Q: single-producer, single-consumer … can you design an efficient data structure to share data between them
Now I think we only have one shared mutable variable — the producer’s moving pointer. Interviewer hinted that all we need is for the producer to notify consumer where the new position of the moving pointer, in the data structure. Perhaps we don’t need lockfree. Volatile is probably sufficient.  Alternatively, mutex+condVar is a proven solution.

Aha — If the consumer thread needs to wait (not busy-wait), then notification is the key, so condVar is more suitable than lockfree.

I tend to think of that moving pointer as an integer.

I asked interviewer: are these concurrency knowledge and skills needed in your projects?
Interviewer: “Your CV mentioned these skills“. I believe interviewer implicitly answered NO. Standard practice of verifying/scrutinizing claims on CV !

–This OMS interviewer was trying to break the candidate…

Q: describe CAS. I recalled the details in https://bintanvictor.wordpress.com/2018/04/05/cas-cpu-instruction-takes-3-inputs/..
Q: how many clock cycles for a CAS instruction?

See further details in toy^surgeon

Q: what’s your actual CAS usage in your project, not a textbook use case?
%%A: At citi-muni, I had two threads putting quotes on a lockfree queue or stack. Queue is natural. In hindsight, the stack can be useful — when readers want to process the latest quote first, LIFO.

Q: how would you design an order book, if you ignore your current system?
Q: why did you say array-based is faster than AVL?
%%A: most likely cache efficiency

Q: what’s the performance level of your order book? “I (the interviewer) would use array-based order book .. can easily achieve 1M msg/sec”
Q: what’s the disadvantage of an order book based on AVL tree?

— coding^QQ^non-tech questions

I think I did well on non-tech (A-) and coding (A), OK on QQ (B). With the non-technical, again, I feel interviewers can’t reach conclusions.

I believe the tough questions listed above are classic QQ i.e. theoretical knowledge not needed in real projects. Interviewers may consider it zbs but I would call it QQ.

Consistency and stability in high-end coreJava QQ topics — No real change since 2007

Do I look like Deepak? I think with algos I seldom feel fake and broken. As to the QQ topics if I am armed with a tidbit of c++ low-level knowledge then I can often defend myself, because on those questions, c++ is invariably more low-level than java. There are still some “weakness topics” where I may come across as … superficial+academic, like Deepak.

I generally concede to the interviewer, but not today during the “CAS justification” discussion.

j8 MethodRef #cheatsheet

Need to developer more low-level insights as QQ…

  • Out of the four kinds of method refs, only AA) static-method and BB) specific-instance kinds are popular. The other two types are obscure.
  • q[ :: ] is used in all four kinds
  • I think the static-method kind is most readable and most intuitive. The javadoc tutorial features a BB example that should be implemented as static method IMHO.
  • a lambda expression has parameter types. A method ref has none and must be converted to a lambda expression. Where does compiler infer the parameter types? I think it is inferred from the calling context.

QuickSelect: select nth key#O(1)space

Quickselect has average O(N) runtime but worst-case O(NN), if the partitioning goes bad repeatedly. Randomized pivot can reduce the chance but I don’t think we can eliminate it

Space complexity is O(1) despite the apparent recursive set-up. Tail recursion optimization is usually available to reuse the same stack frame. Otherwise, recursion can be replaced by a loop.

std::nth_element() is linear on average but quadratic in worst case — https://stackoverflow.com/questions/11068429/nth-element-implementations-complexities explains QuickSelect algo

Quickselect is by the same inventor of quicksort.

https://en.wikipedia.org/wiki/Quickselect

get_majority_elem]unsorted array,O(1)space #90%

Q: Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times. You may assume that the array is non-empty and the majority element always exist in the array.

====analysis
worst input: odd length, only X and Y. X occurs once more than Y.

hash table solution needs O(N) space since there can be N/2 distinct values. To improve space complexity, how about quick select? Discard the smaller side and pick another random pivot.

Median-finder algorithm can solve this problem, using std::nth_element() which uses QuickSelect… O(1) space despite recursive set-up.

— idea 3 O(1) space: random pick then verify
Random pick and count the occurrence of this pick. Is it more than N/2? Within a few trials we should find a good pick.

K-palindrome #careercup 20%

Q (Facebook India, careercup): A k-palindrome is a string which transforms into a palindrome on removing at most k characters. Implement check(str, k). str has at most 20,000 characters. 0<=k<=30

Sample Test Case: Input : abdxa 1 -> false
====analysis
It’s crucial to zoom into the worst input early on — binary string with the correct “kills” well-dispersed.

Insight — once we group the chars by ascii code, I can see a successful palindrome is really an A-palindrome interleaved with a B-palindrome and C-palindrome…

This problem is related to the (tougher) problem of “max palindrome sub-sequence”

Eg k=33.
–idea 3: per-club distribution table
each char gets a dedicated club/distro, where we record all positions occupied by this char.
Since K is small, then every distro should be almost “balanced” around the midpoint.
— idea 7
There can be only up to 33 possible midpoints, including those in-between. (If all kills hit the upper end, then midpoint drops by 33/2.) Let’s try each possible midpoint.

Insight — once we pick a midpoint, we know how many we can kill above/below. For example, if we pick the highest midpoint, then all kills must hit lower half. The extreme midpoints are easier due to stringent constraints.

If we pick a central midpoint, then we can kill about 33/2 in each half, but the kills must match — nice constraint.

For each midpoint picked, we run Algo B:
At init-time, we compute the defects in each club/distro using Algo B1:
given the midpoint and the positions of all J’s, the defect count is the number of kills (killing a J or non-J) to balance J’s distro…

We will compile all 26 defect counts.  Now we start growing our seed at the chose midpoint. We expand both ways. If next 2 positions belong to different clubs J vs L, we evaluate the two kill options using Algo B2
One of the two choices will reduce overall defects. We first look at J’s distro. The J kill would improve or worsen J club’s defect.

If there’s an efficient way to compare evaluate the two kills or update the default counts, then we have a decent solution.

–idea 9: frq table. If there are 35 odd clubs, then 33 of them can become even but still 2 odd clubs –> immediately failure. However, This idea fails with the binary string.

## coding drill 19 Jul

  1. https://leetcode.com/problems/factorial-trailing-zeroes/
  2. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31671&action=edit — merge sort in-place

— presumably solved

  1. https://leetcode.com/problems/group-anagrams/
  2. https://leetcode.com/problems/merge-two-sorted-lists/
  3. https://wordpress.com/post/bintanvictor.wordpress.com/31650 — pacific/atlantic
  4. https://wordpress.com/post/bintanvictor.wordpress.com/31657 — max-product subarray
  5. https://leetcode.com/problems/contains-duplicate/
  6. https://leetcode.com/problems/majority-element/
  7. https://wordpress.com/post/bintanvictor.wordpress.com/21764 find min in rotated array
  8. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31592&action=edit — T.isSubtree(S)
  9. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31792&action=edit — longest run@same char
  10. 3-sum

— previously solved problem forgotten and refreshed

  1. Leetcode #139 word-break. .. not really “medium”
  2. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31745&action=edit — max sub difference
  3. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31749&action=edit — min range to include all clubs
  4. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31736&action=edit — find lone-wolf among AAABBBCCC
  5. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31755&action=edit — flip binTree
  6. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31758&action=edit — serialize linked list: each node points to a random node

— unsolved:

  1. https://leetcode.com/problems/valid-parenthesis-string/
  2. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31825&action=edit — K-palindrome

 

T.isLikeSubtree(S) #60%

Q (Leetcode 572): Given two non-empty binary trees s and t, check whether tree t has exactly the same structure and node values as a subtree of s. A subtree of s is a tree consists of a node in s and all of this node’s descendants. The tree s could also be considered as a subtree of itself.

====analysis
https://leetcode.com/problems/subtree-of-another-tree/solution/ relies (as “primary check”) on the payloads of  each tree node, but in some trees, all payloads are empty or all payloads are either True or False. In these cases, the comparison of payload is only usable as a secondary check. The primary check must be structural. See key in a tree node

The O(N+K) is plain wrong.

I guess the 2nd solution (and possibly 1st solution) would compare every node in S to the root of T. I think there are more efficient solutions using subtree size and subtree height as secondary checks – more reliable than payload check.

My solution below uses BFT + pre/post/in-order walk !

— Preliminary step: post-order walk to get subtree-size, subtree-height at each S node + T root. (I will skip other T nodes.). Suppose T is size 22 height 4. We will look for any node Y of size 22 and height 4 and a matching payload. This would eliminate lots of S nodes:

If T height is more than 2, then lots of low-level S nodes are eliminated.
If T height is 2 or 1, then T size would be at most 3. Most high-level S nodes are eliminated.

— Solution 1: For both T and S, We take in-order walk to assign incremental IDs, then take pre-order walk to produce an array of IDs that represent the tree structure.

Can We run a level-aware BST. Only one level need to be examined … wrong!

I think the in-order walk itself is what I need. Find any node Y in S that matches size+height+payload of T root. Suppose ID(Y)=44 but ID(T root) = 4, then simply shift down by 40 and do a linear scan. Must check payload, but not height/size.

 

min int_range2include all clubs#presorted 60% careercup

Q (careercup): You have k uneven lists of pre-sorted integers, N ints in total. Find the smallest range that includes at least one number from each of the k lists. For example,
List 1: [4, 10, 15, 24, 26]
List 2: [0, 9, 12, 20]
List 3: [5, 18, 22, 30]
The smallest range here would be [20, 24] as it contains 24 from list 1, 20 from list 2, and 22 from list 3
==== Analysis
Looks rather contrived but very popular.
— solution 1 O(N): moving window algo:
O(N) merge all K clubs’ items into a big array, where payload includes the original clubId of each item. We end up with
[0/2nd 4/1st 5/3rd 9/2nd 10/1st …]

Once I come up with this big array, I feel confident to conceive a O(N) moving-window algo.

A ‘basic window’ is any subarray having representatives from each club.
A ‘tight window’ is a basic window that can’t shrink further, i.e. the 1st and last items are the only reps of their clubs.

Now findInitiaLwindow(), similar to findNextWin().
Simple procedure —
As we build the window, we update a hashtable like {club1: position 1,3,6; club2: position 0,5; club3: position 2,4,..}
Basically an array of queues.
We update this table each time we increment the front pointer in search of the next basic window.
When we find a basic window, this table should have all K queues non-empty and at least one of them singular, probably the last updated queue.

Now shrink this window via shrink(). Here’s a fast algo but more tricky and no O() impact —
For each queue, find the last added i.e. right-most position.
Put these K positions in a container(actually no container needed).
Now out of these K positions, find the minimum. Say 22. I claim 22 is the left edge of a smaller window.
Keep shrinking if we can but I doubt we can.

Here’s a slow shrink algo —
For the current window, look at the clubId of the left edge (back ptr).
Use the clubId to locate a queue in the table.
Head item in the queue should match the position of back ptr.
If not empty, pop this queue and increment the back ptr by one; else exit shrink()

Now we have a tight window, we remember its size (and update current winner).

After shrink(), we call findNextWin() —
Suppose the basic window’s earliest item is a clubX.
Now we move back ptr (clubX queue is now empty).
Now we move front pointer i.e. right edge of the window, looking for the next clubX item.

* findNextWin() is O(1) per front ptr increment. Only touches the tail of queues
* shrink() is O(1) per back ptr increment. Only touches the head of queues
* therefore, at most N front ptr moves and N back ptr moves.. O(N)

PriorityQ can speed up some, but won’t improve bigO

## reasons4salary_diff ] ibank #Davis,Jay

Primarily supply-demand driven

* Reason (demand-side): age discrimination — Davis pointed out that many ibank and other employers would accept (reality) to pay higher salary to a fresh grad than an old-timer because these employers long for fresh blood

* Reason (demand-side): core dev teams — are more central than peripheral teams like devops (eg QAPM), QA, DBA, reporting,.. Consider Ling, the QAPM devops scripting guru.

* Reason (demand-side): Jay Hu pointed out that hard-core dev skillset — is perceived as more valuable, harder than “tools” including reporting tools, devops tools, automation tools, vendor products that encapsulate the underlying complexities.

Similarly, c++ skillset is often perceived as harder than coreJava
Similarly, coreJava skillset is often perceived as harder than jxee
Similarly, c++java skillset is often perceived as harder than scripting

On the supply side, there are fewer hardcore c++ developer than coreJava developers, and a lot more jxee developers. However, there’s also the Curious case of quant dev — tight supply, but dwindling demand.

If you have some brain power (like the 华中理工 compScience friend of Jay Hu), then don’t be put off by hard-core dev, even though it’s dry, boring, tough, unglamorous. See my blogpost on my geek profile.

Pacific/Atlantic #51%

https://leetcode.com/problems/pacific-atlantic-water-flow/

Q: Given an m*n matrix of non-negative integers representing the height of each unit cell in a continent, the “Pacific ocean” touches the left and top edges of the matrix and the “Atlantic ocean” touches the right and bottom edges. Water can only flow in four directions (up, down, left, or right) from a cell to another one with height equal or lower.

Print all cells where water can flow to both the Pacific and Atlantic ocean.

====Analysis
peek? not yet
— solution 1:
Create m * n nodes in a 2D grid. Two more nodes:
All north|west boundary nodes have an edge From Pacific node
All south|east boundary nodes have an edge From Atlantic node

Between two adj cells, there’s 1 or 2 directed edges, from low to high. Intuition — tracing pollutant upstream. An arrow from P to Q means Q is suspect.

How do I represent the graph? Each node can hold a list of “suspect neighbors” (higher or equal).

Run BFT or DFT from Pacific node and turn on flag in each reachable node.

Run the same from Atlantic. Every reachable node with the flag set is a double-suspect to be printed.

clean-up: non-overlapping intervals #70%

Q (L435): Given a collection of intervals, find the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Note:

  1. You may assume the interval’s end point is always bigger than its start point.
  2. Intervals like [1,2] and [2,3] have borders “touching” but they don’t overlap each other.

==== analysis:

I think this is same as the greedy room scheduler. Schedule the earliest-ending task, so to to maximize accepted meetings.

A deselected meeting is an interval removed.

recent algo questions: mostly presented in intArray+binTree

binary tree + int_array represent 60% of my recent algo questions. With string and matrix included, we get 80%.

  • “int_array” also include array of int_tuples. Other arrays are rarely quizzed
  • In the real world, binTree is less common than arrays , but trees are more common than binTree. I think interview questions tend to get watered down to the binary species. K-ary trees are rarely quizzed
  • matrix — 95% of Matrix problems are 2D grid problems or integer array problems.
  • graph — Most graph problems are presented as matrix or binary trees, occasionally with cycle.
  • slist — shows up in 5-8% of problem descriptions. Some problems need slist as auxDS.

Leetcode is representative of a constellation of websites.

iterate K pre-sorted uneven immutable lists #FB

Interviewer (Richard) was kind enough to offer me a tip early enough. so we didn’t waste time (which could easily result in out-of-time)

Q: given K pre-sorted immutable lists, each up to N items, return an iterator that on demand yields each of the (up to K*N) items in sorted sequence.

Estimate time and space complexities.

====analysis

— I first proposed pair-wise merge. Since there are logK passes, Time complexity == O(NK logK)

Space complexity is tricky. Very first pass i have to create a single list up to NK items. Then I can reuse this list in each merge. so space complexity == NK [1], but I said NK*logK. Interviewer wasn’t familiar with this solution and didn’t correct me.

[1] See https://www.geeksforgeeks.org/sort-array-two-halves-sorted/. Suppose 8 lists to merge. I will merge A+B into first quarter of the big array (size NK), then C+D into 2nd quarter… In next pass, I will merge AB + CD in-place using the first half of my big array.

The advantage of this solution — once I create a single merged array, each iteration is O(1). This can be valuable if run-time iteration speed is crucial but initial warm-up speed is unimportant.

bigO insight — merging N pre-sorted arrays is O(N logN), same as merge sort?

— Then interviewer suggested iterating over the K lists so I implemented the solution in https://github.com/tiger40490/repo1/blob/py1/py/88miscIVQ/itr_K_presortedLists_FB.py

  • Space complexity = K
  • Time complexity:
    • next() O(logK) due to popping. I said Fib heap has O(1) insert
    • init() O(K)
    • hasNext() O(1)

How about a RBTree instead of heap? Space complexity is unchanged.  Time complexity:

  • next() O(logK) for both remove and insert
  • init()  O(K logK), worse than priorityQ
  • hasNext() O(1)

— FB interviewer asked why I prefer to keep state in global var rather than a per-record field

%%A: Runtime heap allocation is slower if the record is bigger. In contrast, the global dictionary is tiny and likely lives in L1-cache

[19] zbs cf to QQ+GTD #compiler+syntax expertise

Why bother — I spend a lot of time accumulating zbs, in addition to QQ halos and localSys GTD

I have t_zbs99 and other categories/tags on my blogposts showcasing zbs (真本事/real expertise) across languages. Important to recognize the relative insignificance of zbs

  • #1 QQ — goal is mobility. See the halo* tags. However, I often feel fake about these QQ halos.
  • #2 GTD — localSys or external tools … goal is PIP, stigma, helping colleagues. Basic skill to Make the damn thing work. LG2 : quality, code smell, maintainability etc
  • #3 zbs — goal is self-esteem, respect and “expert” status. By definition, zbs knowledge pearls are often not needed for GTD. In other words zbs is “Deeper expertise than basic GTD”. Scope is inherently vague but..
    • Sometimes I can convert zbs knowledge pearls to QQ halos, but the chance is lower than I wished, so I often find myself overspending on zbs. Therefore I consider the zbs topics a distant Number 3.
    • Zbs (beyond GTD) is required as architect, lead developer, decision makers.

I also have blog categories on (mostly c++) bulderQuirks + syntax tricks. These knowledge pearls fall under GTD or zbs.

post-order walk: create valuable subtree statistics

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_binTree/subTreeStats.cpp is rather simple and short, showing stats like

  • max path sum from current node to any subtree node. Note a path must be non-empty, so this max can be negative
    • We can even save in each node this actual path
  • subtree height,
  • subtree size,
  • d2root — tested but doesn’t use subtree statistics
  • — can be easily implemented:
  • subtree payload sum
  • subtree max payload… These stats are powerful AuxDS

Applicable on k-ary tree

reverse post-order mirrors pre-order #diagram+Dot

For a given binTree the pre-order sequence is mirrored by the reverse-post-order sequence.

“Reverse” := “right_child then left_child”

I had an “Aha” moment when reading P 389 [[discrete mathematics]]. Insightful illustrated path-tracer diagram showing that pre-order = ABCDEFGHIJ with root as first visited, and J=right most leaf node. The reverse-post-order = JIHGFEDCBA with root as the last visited, due to post-order. Note the DOT is on the Left for both reverse-post and pre.

(https://en.wikipedia.org/wiki/Tree_traversal#Depth-first_search has standard diagrams with a DOT on the left for pre-order; post-order has dot on the right of each node)

Note any post-order curve is less intuitive less visual (than pre-order) because the curve hits a root of subtree upon leaving ! The earlier encounters are touch-n-go. The dot can reinforce this key insight. The dot can help bridge the conceptual gap.

Similarly post-order sequence is mirrored by reverse-pre-order. For the same binTree above, reverse-pre-order = AFGHJIBDEC (not mentioned in the book). Post-order = CEDBIJHGFA, as printed in the middle of P 389. Note root is last visited due to post-order.

max proceeds selling 2 homes #Kyle

Q: you own a residential and an investment property. You have the historical price time series on both. You want to see the max aggregate amount you could have sold both, provided you must sell the investment property before the residential property. You can pick a time to sell the investment and a later time to sell the residential.

I created this questions for

find offending section ] originally-sorted array

I rephrase Leetcode 581 Q: There was an ascending int array containing duplicates. Someone reshuffled some elements so no longer sorted. Write a function to determine the shortest subarray needs reshuffle. Before or after this subarray, no change needed.

O(N) time and O(1) space.

==== analysis
Not “easy” as labelled.

I want to find lowest integer that’s out of place. (Do the same on the other end and problem completed.)

first scan to find global min and max values, and ensure they are correctly placed.

Next scan from beginning on the rising slope. At first down-turn, treat the array as two partitions. In the right partition after the first peak, we determine the lowest value, say, -55. Then we will find the correct position for -55, within the Left partition! We will look for the last item equal-or-lower-than -55…. This is the key constraint of this entire problem.

Insight — We know -55 must move left and we don’t care about any other item in the right partition.

Insight — As a concrete illustration, if within left partition, -55 should stand after #7, then we know all Earlier items up to #7 are already correctly placed. Why? All Subsequent items in the left section are strictly higher than -55; and all other items in right section are all (equal or)higher than -55.

— my idea 1 in ON) space : radix sort a clone of the array, then compare to the input array.

count unique squares ] matrix #70%

Q1: Count unique squares within a R*C matrix. A unique square is identified by two opposite corners. For example, [0,0] and [1,1] identifies a 4-cell square; [2,3], [2,3] identifies a 1-cell square.

Q2: print out all of them.

====analysis

Q1 Feels like a math problem, but printing is a algo question

— solution 1: Let’s focus on height-2 for now. I will denote the subset of squares having height-2 as “2-subset”. (The 1-subset is trivial.)

Insight — 2-subset is naturally non-overlapping with the 3-subset, 4-subset etc. These non-overlapping subsets constitute the whole solution. This fact is extremely valuable to the count algorithm.

For each height-2 square, I will focus on the north-west corner i.e. the corner among four, having lowest r and c IDs. We can divide the 2-subsets by the rowID. For a 55-row/44-column matrix, there are 54 height-2 squares with rowID=0. Same count with rowID=1 etc.

Aha — each square can be identified by the north-west corner {rowID, colID} + height. So at the higher level I divide the whole solution set by height. At the lower level, I sub-divide the set by the “low rowID”

bigO CIV: clarify_requirement means..worst_data

———- Forwarded message ———
Date: Tue, 9 Jul 2019 at 23:10

Big-O means “worst” case complexity. We need to be doubly sure what “worst” case data look like. (Note it doesn’t mean ridiculously rare data like integer overflow.)

On high-end coding interviews, these “worst data set” frequently shed light on the structure of the original problem, and often point to the correct direction for an efficient solution. (Note a nice-looking solution is not efficient if it exhibits poor time complexity in the face of a worst yet realistic data set. Such a system may not scale well in practice, given its Achilles’ heel.)

If interviewer wants a solution optimized for bigO time complexity, you can waste precious time tackling the wrong problem. The wrong problem is the problem defined by your idea of “worst data set”, but the real worst data set is very different.

Sometimes, the worst data is _subtly_ different but it points to a different algorithm direction. It may point to an iterative solution rather than recursion, for example.

The original interview question may not be that hard iFF we identify the worst data set early on, but sadly, we run out of time.

For some tough problems, the first (sometimes the only) challenge is quickly identifying worst data set. Interviewer always gives you the simple data set, NEVER the worst data set. It’s your job to identify the worst data set… often the key insight into the problem.

It’s no exaggeration to say — identifying the worst data set early or too late can make-or-break your chance at this employer. You may kiss good-bye to this job opportunity exactly because you are too slow to identify the worst data set. I know what this means. Other candidates probably got shot by this arrow on the heel, without knowing what happened.

Time is ticking as soon as the problem is presented to you. If interviewer says time is not a factor… take your time.. we want quality and thought process … I just ignore it. Our speed is always part of assessment. Therefore, a very common failure scenario is —

.. tractable problem but candidate runs out of time after wasting time on preliminaries like using incorrect worst data set to reason about the (wrong) problem.

print height@subtree at every node #post-order

For these problems, elegant implementation is expected.

Note in-order is for binary tree, but post-order is for any tree.

== Q: given Any tree without cycle, print the size of subtree at every node

I think post-order walk is usable.

== Q: given Any tree without cycle, print the height of subtree at every node. Also print the node address just for identification

Intuition — dft post-order walk. Need to become more familiar with the detailed steps. https://www.techiedelight.com/calculate-height-binary-tree-iterative-recursive/ has details.

== Q (Leetcode): Given a binary tree, determine if it is height-balanced — a binary tree in which the depth of the two subtrees of every node never differ by more than 1.

I can try it on Leetcode .. no need to set up test harness myself but I might be frustrated

Intuition– post-order walk. A simple recursive algo — for any node, find the its subtree height including itself and return it. Before returning, also compare the left vs right subtree heights