generate all abbr starting from longest.. +! recursion

I won’t insist on relative ordering among the shortest.

Idea 1() — Start with longest abbreviation i.e. the original string S, assuming 5 characters.

  1. populate the smallHM with the original word
  2. copy every char except the first. save into bigHM, then print/process this abbrevation.
  3. copy every char except the 2nd and print
  4. ..
  5. copy every char except the last. Now we have 5 strings in bigHM (a Level-4 hashmap), each length S-1=4
  6. make smallHM point to bigHM object; point bigHM to an empty hm
  7. now take a string from smallHM (Level-4 collection) and generate 4 shorter strings and save them in bigHM (a Level-3 collection), each length S-2=3
  8. now take 2nd string from Level-4 …
  9. After we finish Level-4, we have generated 20 strings in Level-3, but there are only 10 distinct items! so we need a L3 hashmap.

 

inserting interval #merging

Q (Leetcode): Given a set of non-overlapping intervals, insert a new interval into existing intervals (merge if necessary) and print updated list of intervals. Intervals were a vector sorted according to their start times.

–analysis–

Now I feel the #1 main data structure is a doubly linked list (dlist) of Segment objects:

  • { segment_left_mark,
  • ptr to next node, ptr to prev node
  • optionally a (bool or) enum having A/B, where A means current segment is AboveWater (an interval) or BelowWater i.e. a gap}.

Every time this dlist is modified, we would update a “helper container” — a tree of node pointers, sorted by the segment_left_mark value. Tree to help successive inserts. However, if each insert(vector intervals) has a sorted vector then we can binary search the vector and don’t need to tree.

First, binary search to locate the left mark among all existing marks. Ditto right mark. Based on these 2 results, there are many cases.

  1. done — Case (simple) both fall into the same existing interval. No op
  2. done — case (simple) both fall into the same gap segment. Create 2 new segments and insert into the dlist
  3. done — case (simple) one boundary falls into a gap the other falls into a adjacent interval — just adjust the segment_left_mark without inserting new segment
  4. done — case — bridge: both boundaries fall into different intervals. Adjust segment_left_mark of 2 affected segments, then link up the two to skip the intermediate segments
  5. done — case — wipeout: both boundaries fall into different gaps, wiping out at least 1 interval.
  6. done — case (most complex) — one falls into an interval, the other into a non-adjacent gap.
  7. case — incoming interval left boundary is lower than all boundaries, but right boundary falls into some segment
  8. case — incoming interval is very low
  9. case (special) — if an interval becomes adjacent to another, then merge the two.

Need a sorted tree of all marks + array of segments. Redundant but helpful.

Each segment (interval or gap) is represented by {left mark, right mark} where left <= right. I will save the segment objects into (a linked list and) an array. Even elements are interval objects and odd elements are gap objects. Now superceded by dlist.

I think this problem is all about corner cases. Perhaps start with the complex cases which will take care of the simpler cases. No need to pass Leetcode tests. Due to the pointer complexity, I prefer python.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/insertInterval.py is my solution but I dare not test on Leetcode

max rectangle ] histogram

Q: https://leetcode.com/problems/largest-rectangle-in-histogram/description/. Given N possibly recurring non-negative integers representing the histogram’s bar heights, and given the width of each bar is 1, find the area of largest rectangle in the histogram.

Visually well-defined problem. Kind of naturally-occurring. Very simple data structure. No O() requirement, so I will just try my own solution.

https://github.com/tiger40490/repo1/blob/py1/py/array/maxHistoBox.py is my solution. 100% passed on Leetcode.

==== analysis — heavy on data structure design.

Key insight — one scan to update a clever data structure.

key insight — data structure is not per bar, but per height!

For every bar J, there exists an enclosing max-rectangle of J’s height. We can just compare all of these rectangles.

We might start with two extreme candidates
1) the peak — whose enclosing rectangle is likely slender — O(N) one scan to find all the peaks
2) the lowest bar — whose enclosing rectangle has width N — O(N)

If we paint the histogram as a binary matrix, then this is equivalent to anther problem max all-black submatrix #DP #zhurongbut I think there exists better solutions like O(N logN) or O(N*S) …

–homegrown algo with O[N*S] where S:= #unique heights. The binary search doesn’t show up as logS.

A pre-scan to get all distinct heights. For each distinct height, we maintain a RunRecord object {bestRun, currentRunStart, height}, in a sorted map {height -> record}. In py, I can use a pre-sorted vector of Records, sorted on height

In main scan, As we encounter a new bar of height J, we update these records.

  • if not falling or rising
    • record-J and each record-H below J must have a current run … extend that run (no-op)
  • if rising from height H
    • each record up to H must have a current run … extend that run by no-op
      • iterate the treemap up to H
    • iterate treemap from H+1 to J. start a new run for each record
  • if falling from height P to J
    • record-J and each record-H (where H <J) must have a current run … extend that run
    • iterate treemap from J+1 to P … each record-K must have a current run, indicated by a valid currentRunStart, then this record’s current run has just ended. We update bestRun and put a invalid value into currentRunStart.

At end of the main scan, every record has a bestRun i.e. the duration. I can then calc the area under each bestRun and return the max.

personal projects: any ROI ] salary@@

See also

  1. most(new/old)specializations turn out non-strategic
  2. gzTsn category
  3. t_gzSpecialize11 tag

Q: given the limited spare time we working fathers have, what specific (learning?) pet projects can enhance our salary?
A: invariably poor ROI, mostly a mirage. Moving up is the most common success story.

  • xp: MSFM?
  • xp: c++
  • xp: low latency engineering knowledge, including sockets, kernel, pthread …
  • prepare for west coast high-end interviews
  • coding practice
  • data science

find min substring containing(but!!limited to)all my chars

Update — a similar sliding window is used in longest substring without repeating chars

Q (leetcode): Given a string Haystack and a string T, find the minimum window in Haystack which contains (at least) all the characters in T according to the frequencies. Time complexity O(n). Eg: minWindow(ccbabccbabcb, bbc)==bcb

If there is such a window, you are guaranteed that there will always be only one unique minimum window in Haystack. <– I thought this guarantee means something but it doesn’t.

Without loss of generality, I will assume the chars are a-z. I believe those Leetcode corner cases will use only 3 chars

—analysis—

For single-string problem, use array indexed by ascii code. I can convert T to such an array to store the required frequencies (reqFrq)

I can construct a shadow array, same length as Haystack with these payloads:

  • if the hay is not in reqFrq, then payload is a special value like nullptr
  • if the hay is in reqFrq, then….?

–SolSW: sliding-window based

  1. Scan Haystack from left and keep count of actual frequency (check against reqFrq each time). I will inevitably find the earliest good window. By construction, both ends of this window are in reqFrq.
    • Note the entire haystack is more than a good window.
  2. Now I slide the fixed-sized window. If I find another good window, with extra chars on the left, then I have found a shorter window, so I truncate my window on the left
  3. continue Step 2

##OK,%%spare time had low ROTI..how about Theirs@@

I often feel bad that all of my efforts in my spare time had no tangible ROTI, but look around, who fared better?

Note this post is more about peer comparison (recrec blog), less about my spare time usage (open blog)

For the record my spare time effort did produce some tangible outcomes

  • coding drill in github and wordpress. I think most of my friends didn’t bother to record. Their practice is short-term.
  • yoga, jogging
  • blogging on housing (and school districts) — our real needs. The time spent convinced me to live in Bayonne
  • blogging on personal investment — complex. The time spent directly affected my investment decisions
  • blogging, discussion on boy. The time spent directly influenced my views, decisions and communications with family members
  • helping friends (Deepak,Ashish,YH) with job hunting
  • helping my kids with sports, piano, renzi
  • –less tangible:
  • studying risk systems, data science, crypto-currency? Helps way-finding and navigating the job market

[18] confidence-build`:another reason y I don’t look@answer

Update — my view was overridden by recent IV experiences. In math problem solving, I studied solutions then built my intuition. My peers actually do the same with algo challenges. They become very confident this way.

–confidence-build`:another reason y I don’t look@answer

Hi XR,

Just to share some minor thoughts.

I solved many “hard” leetcode problems and also the 25-horse puzzle without peeking at answers. These problems are non-trivial, and my solutions are usually efficient but not easy to understand. They are part of a tiny but growing wellspring of technical confidence. They reinforced my self-image of analytical mind. Similarly, CSDoctor Wang often shows a healthy self-image of technical competence, based on many successful experiences. It’s hard to feel confident without experience, simply by reading other people’s answers. (Experience is not that effective at building confidence — we could get a string of unsuccessful experiences … but personal experience is usually the most important factor.)

Also, these problems gave me confidence during non-technical interview, so I could say with conviction that

I am confident with algorithms + data structures even though I’m not the best. I solved more than 20 of the hard leetcode problems on my own, without reading any discussion.

I won’t have the conviction to say the same if I mostly read other people’s solutions.

I sometimes give this long answer to the behavior question “Q: tell me your strength and why”

A: The best comp science students might have good answer to, say, 30% of a question pool including familiar and novel problems. I’m slightly below their standard. However, if the question pool consists of only familiar problems like Leetcode problems, then they would be much stronger than me since I don’t have very high mileage on Leetcode and similar sites. If you want to benchmark me then please pick highly non-standard questions. I am fairly good at creating original algorithms.

Some naysayers would cast doubt — Having solved those problems years ago doesn’t mean he can solve them now. Well, you can say the same about any skill, but we know that once the “circuitry” is wired up in our “system”, the problem solving skills are enhanced. A Programmer who has never solved these problems independently is clearly weaker.

##observations@high-volume,latency sensitive eq OMS #CSY

This is a probably the biggest sell-side equity order-management-system (OMS) on Wall St, written in c++11. Daily order volume is probably highest among all investment banks, presumably 7 figures based on my speculation, though a lot of them get canceled, rejected or unfilled. I am disallowed to reveal too many internal details due to compliance.

In contrast, GS used to get about a million individual trades (perhaps the partial fills of an order?) a day, probably not counting the high-frequency small trades.

  • synchronization — I haven’t noticed any locking or condition variable so far. I think single-threaded mode is faster than synchronized multi-threading. Multiple instances of the same software runs in parallel across machines. I think this is in many ways better than one big monolithic process hosting many threads. We have 4 threads per instance in some cases.
  • ticking market data — is available, though I don’t know if my OM system needs them beside the restriction indicators
  • For crash recovery, every order and every fill is persisted in non-volatile memory , and often swapped out to disk and free up memory. These records are never cleared until EOD. Consequently, any fill can be busted any time before EOD. A recovery would reinstate them. So what “order objects”?
    • All pending orders and (for busting support) closed orders. Basically all orders.
    • Each logical order requires a chain of stateful FlowElement objects created on the fly to support the order.
  • data persistence — the OMS enriches every order and also generates new orders. These orders are persisted automatically in case of a server crash and restart. They persistence files are binary and cleared at EOD
  • RDBMS — is loaded into cache at Start-of-Day and seldom accessed intra-day. I confirmed it with an ex-DBA colleague.
    • However, some product DB system sends intra-day real time updates via messaging (not FIX)
  • MOM — I have not seen a message queue so far but they could be hidden somewhere. Earlier I heard other ibanks’ employees telling me Tibco (and similar messaging middlewares) were popular in fixed income but now I doubt it. Queues add latency.
    • We do use some pub-sub MOM (CPS) but not for order messages therefore not part of order flow.
  • socket — is not needed in any module. I believe the applications communicate via FIX, SOAP etc, on top of well-encapsulated TCP library modules.
  • garbage collection — no GC like in java and dotnet
  • CRTP — heavy use of CRTP. I don’t remember seeing many virtual functions.
  • The most important message is the order object, represented by a FIX message. The order object gets enriched and modified by multiple functions in a chain. Then it is sent out via FIX session to the next machine. As in any OMS, the order object is stateful. All order objects are  are persisted somewhere so a crash won’t wipe out pending orders.
    • (Elsewhere, I have seen very lean and mean buy-side OMS systems that don’t persist any order! After crash, it would query the exchange for order states.)
  • The 2nd most important message is probably the response object, represented by a FIX msg. If there are 100,000 order objects then there are roughly 300,000 response objects. Each order generates multiple responses such as Rejection, PendingNew, New, PartialFill, PendingCancel, Cancelled… Response objects probably don’t need to be persisted in my view.
  • The 3rd most common message is the report message object, again in FIX format. Each order object probably generate at least one report, even if rejected. Report objects sound simple but they carry essential responsibilities , not only regulatory reporting and client confirmations, but also trade booking, trade capture… If we miss an execution report the essential books and records (inventory, positions..) would be messed up. However, these reports are not so latency sensitive.

git | merge conflict #personal tips

I prefer cherry-pick.

Often the merge/rebase/cherry-pick/stash-pop operation would silently put some of the changes in _staging_, therefore git-diff would fail to show them. I have to use git-diff-HEAD instead:(

–If you have no choice then here’s the standard procedure

  1. git rebase master
  2. you get a merge conflict and file1 now contains <<<< ===
  3. vi file1 # to remove the bad stuff
  4. git add file1 # on the unnamed branch
  5. # no commit needed
  6. git rebase –continue # will bring you back to your feature branch

 

airport gate #maximum people alive

https://careercup.com/question?id=5153263227764736 defines the problem

Q (Amazon): In a city, year of birth/death of people who where born and died between year 1900 to 2000 are given. Write an algorithm to find the year in which max people were alive. Note the years are not unique and not sorted

Similarly,

Q (FlexTrade): For an airport gate system, flight arrival/departure times are given for yesterday. What’s the maximum number of gates required at the busiest time?


Solution1: O(N logN) merge-sort all timestamps, then scan it in one pass. If an arrival, then increment counter; if a departure then decrement it.

??Solution2 (assuming arrival times are pre-sorted) Using hashtable, keyed by arrival time. Value is a count of flights arriving at that time. Every arrival creates or updates in the hashtable. Every departure deletes or decrements. Maintain a separate total count.

I think we still need sorting.

Solution3: O(N). Use array if all the years are small integers. (Regular timestamp is also small integers — 0 to 2355 in steps of 5.) Fill all arrival/departure events as +1/-1 in an array indexed by year.

Longest Parentheses run with multiple hierarchies

Q (Leetcode): Given a string containing nothing but the characters ‘(‘ and ‘)’, find the length of the longest valid (well-formed) parentheses substring.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/str/maxParensRun.cpp is my solution 100% tested on Leetcode

–My Single-iteration solution:

Challenge is data structure. I ended up with 2 data structures to be updated during the iteration

  1. A stack (holding openers’ index values) to locate the matching openers
  2. an array to save “scores”

For each closer, I will record the position of the matching opener, then compute the distance (minimum two).

 

 

simple Windows c++(python)set-up #Eclipse is better4java

See also easiest windows GCC installer #c++17

GitBash + StrawberryPerl is the best combo for me on Windows. I put g++ commands in bash scripts to automate my GCC build and test. Note my typical project has at most 20 source files.

GitBash + StrawberryPerl + Notepad++ is better for me than any c++ IDE like Eclipse CDT (4 installations), Bloodshed (4), CodeBlocks (1), NetBeans…

  • I don’t need code completion..
  • I don’t need jump into a type definition.
  • I don’t need my debugger to be visual. StrawberryPerl includes gdb
  • I use Notepad++ for text search across hundreds of files
  • I use Notepad++ for search/replace
  • I use diff and git-diff to see code changes
  • I used github for version history

I’m such a die-hard fan of command line that the only GUI development tool I use is notepad++ and it’s optional. Since 2015, my Linux c++ code editor has been VIM, which I used on large codebases consisting of 1000+ source files.

For about 2 years EclipseCDT was my default choice, then the simpler Bloodshed became my default choice as it is simpler than EclipseCDT. However they are still over-complicated. I didn’t have a favorite until 2017, when I discovered Notepad++/GitBash/StrawberryPerl

Q: why do I use Eclipse for Java but not for c++?
A: Eclipse is more problematic, more messy for c++ than for java. The benefits (convenience, automation..) is offset by the high TCO (total cost of ownership). For example, java debugger, java code navigation, java renaming/refactor all work better than c++.

Q: how about MS VisualStudio?
A: I managed a large c++ codebase on MSVS 2015 in 2015-2016, about a year+. Too complicated, worse than MSVS for c#. I would never prefer it over my command line set-up, if given a choice.

By the way, GitBash works well with a standard python installation, but I do recommend the tweak at https://stackoverflow.com/questions/32597209/python-not-working-in-the-command-line-of-git-bash

easiest MSWindows G++ installer #c++17

http://strawberryperl.com is an easy installer for Perl and it bundles GCC. I was able to compile in c++17 mode:

g++ -std=c++17 my.cpp

My experiences with cygwin and mingw were both horrible.

  1. cygwin — too bulky and unnecessary, and affected my existing Windows features. I tried sygwin once many years ago and quickly concluded that it was designed for people unlike me. Those people would have only positive experiences about cygwin that I don’t share.
  2. mingw — supposed to be minimalist, but I tried installing it at least 3 (up to 6) times and always painful.

In contrast, I installed strawberryPerl about 3 times and once on a friend’s laptop … always quick and easy. No tweaking required. GCC + gdb works out of the box.

gdb to show c++thread wait`for mutex+condVar shows gdb works out of the box.

Through almost 10 installations of GCC on various windows laptops, I have come to the obvious conclusion.

git-stash survival guide

Two simple steps:

git stash #no args

# Now git diff will no longer show the uncommitted changes:) …later

git stash pop

# Now git diff shows the original uncommitted changes

If git-stash-pop hits conflict on 1 file out of 3

  • the one file content will show >>>>. I will leave it alone for now
  • the other 2 files would show the original uncommitted changes. git-diff-HEAD would show these changes.
  • now I can use git-reset-HEAD and git-diff

reference(instead of ptr) to smart ptr instance

I usually pass smart pointers by value (copy-constructor or move-constructor), just like copying a raw ptr.  Therefore the code below looks unnatural:

unique_ptr<Trade> & ref2smartPtr

Actually it is rather common because

  • As Herb Sutter suggested, when we need to put pointer into containers, we should avoid raw ptr. Unique ptr is the default choice, and the first choice, followed by shared_ptr
  • I often use unique_ptr as map value . The operator[] return type is a reference to the value type i.e. reference to unque_ptr
  • I may also put unique_ptr into a vector…. ditto for vector operator[]

git | fork auto-sync

https://confluence.atlassian.com/bitbucketserver/keeping-forks-synchronized-776639961.html says

Fork syncing helps you to keep your fork in Bitbucket Server up-to-date with changes in the upstream repository. Bitbucket Server can do this automatically for all branches (and tags) you haven’t modified in the fork.

I guess forking (and pull request) is not a feature of GIT per se but a feature of Bitbucket and GitHUB server. I think the server get notified upon a new commit and runs fast-forward git pull.

git | rebasing: basics

For me, rebase is the procedure to re-sequence commits… A simple form of history rewrite.

git pull --rebase # moves your local commits to the end of the linked list
How do I merge multiple Work-In-Progress commits into a single commit?

The following will take the last five commits and give you an interactive session that allows you to (optionally) merge them.

git rebase -i HEAD~4

More details are here http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html

git-diff tips #investigating merges

git diff –help # manpage

git diff –color-words # good

git diff -U22 # 22 lines of context

3 forms of diff — See 3trees@git

— diff of a merge-conflict file is different.

I think it is a 3-way diff. Output shows current “merged” file vs orig1 and orig2. https://git-scm.com/docs/git-diff#_combined_diff_format explains “++”

— if you see a line showing up due to nothing but white space change, you can usually use “-w” to suppress it but sometimes you need to pinpoint the white space change:

git diff |cat -vet

git | reverting a file: various ways

I didn’t like git-revert as it complicates history. See https://www.atlassian.com/git/tutorials/undoing-changes/git-revert

How do I revert a file to remove bad commits?

git checkout HEAD -- <file or directory>

This use of git-checkout is unrelated (as far as I know) to the commit-level checkout. This solution is officially /sanctioned/ … Some git commands would suggest …

(use "git checkout -- <file>..." to discard changes in working directory)

How to Revert non-commited changes

git reset --hard HEAD  # reverts tracked files back to the head
git clean -fxd .   # deletes untracked files in current dir. You may want to cd to the repository base first

[Victor] My vote for best explanation of git-reset goes to https://git-scm.com/blog/2011/07/11/reset.html. Detailed, but more readable than the manual. Many examples are useful to me.

Warning — if you want to temporarily play with an older version of file1.java, then be careful. Make sure you save the current tip of your branch. If you git-reset–hard then you may lose that tip!

How to Revert branches back to origin

git fetch origin
git reset --hard origin/yourBranch

git-cherry-pick commits to push to main branch

How to Cherry-Pick Commits to Push to main ‘develop’ branch

Say you are working on a feature branch with other people, with you all making commits and pushing to the remote replica of this branch. At some point you will want your commits merged with develop/production. Git does provide a method of choosing which commits you would like to push. The easiest way to explain this is with an example. We assume that all the following commands are run in a command line tool.

git checkout feature/commodity
git log -3 --pretty=oneline #look at the 3 most recent commits to the repository (includes all commits to all branches - in a nicely displayed format)
$ git log -3

Looking at this log, I want to push only the latest change to ‘develop’ branch (via a pull request), the commit associated with the e5a67d7088622dd8821dfdd8e2fafa2dc938b75c hash number. Keeping a record of this hash number I then create a new feature branch from ‘develop’ branch in Stash. Assume this new branch is called feature/cherry_pick

git fetch #update local repo to get this new branch info
git checkout feature/cherry_pick #switch to this new branch, at the moment is exactly the same as origin/develop
git cherry-pick e5a67d7088622dd8821dfdd8e2fafa2dc938b75c #where the hash number is the commit I actually want to merge into 'develop'
git commit #commit this change to the local feature/cherry_pick 
git push #push this change to the remote feature/cherry_pick branch

You can now create a pull request in Stash to merge this cherry-picked commit into ‘develop’

git | diff tools #BeyondCompare,Stash..

Git – diff tools (BeyondCompare, Stash …)

–Method: git diff

This is a basic but versatile tool. It looks at two commits you provide. Note each tag represents a commit. Each branch name also represents the commit at the tip of the branch.

–Method: Stash/bitbucket comparison between branches/tags

Stash supports a directional PR diff, distinct from the git-diff command. Based on the pull-request concept, Stash treats your two commits as a FROM and a TO and shows the sequence of commits between them, as if in a pull-request.
If you click the arrow to reverse the direction, then Stash will always show a different diff result!
  • If from A to B there’s a fast forward merge, then Stash PR diff from B to A would show zero commit, since such a pull request would be empty.
  • If between A and B there’s no fast-forward merge, this PR-diff would show fewer commits than git-diff.
In summary, PR-diff is directional; git-diff is directionless. This directional diff feature is kind of unique to Stash and therefore valuable.

–Method: have two local repos (clones) to check out different branches

Figure out how to clone two physical repos. After that you could use many conventional diff tools.

–Method: git diff using Beyond Compare 3

http://www.scootersoftware.com/support.php?zz=kb_vcs#gitwindows worked for me, including a 3-way merge GUI —

git mergetool feature/my_br6 origin/feature/my_br6 -- buildChoops.sh

–Method: git diff using Winmerge

see http://stackoverflow.com/questions/1881594/use-winmerge-inside-of-git-to-file-diff, updated June 2015

I hope there are similar solutions for other diff GUI tools.

–Method: download remote version as complete file, then diff that against local file

Here’s my favorite choice to diff a remote branch version vs a local branch version:

mv buildChoops.sh buildChoops.local
git checkout origin/feature/my_br6 buildChoops.sh # download the file from the remote branch into the current directory locally
... now run any diff tool between the two files

tidy up messy commits, for pull request

Git – tidy up messy commits, for pull request

A common situation: you make many incremental, quick and dirty commits on your branch, undoing and redoing changes. You don’t need to push the entire history to a shared branch like Develop.

Many solutions exist. Git offers many commands to rewrite history on a branch. Many git users use them regularly. We could use them to clean up the commits on a feature branch before pulling them into a pull request. Here are some of the methods.

This will rewrite the history of your branch, you should only do this if no one is sharing your branch (or you don’t care about them).

1st approach – rebase, if your history is clean

This example will let you interactively select from the last four commits.

git rebase -i HEAD~4
# Alternatively, specify the last-good commit:
git rebase -i e8a9cf093   # Rebase all subsequent commits

2nd approach – If your history is too complicated (many merges etc.)

git tag backup                      # we do this so that we don't lose our history
git checkout -B temp origin/develop # checkout a clean temporary branch from develop
git merge --squash <branch>         # merge our old branch in, squishing all the commits
git commit
git branch -f <branch> temp         # force the branch to point at the new commit

3rd approach – git reset and commit again

Suppose you have 5 commits to squash,

git reset --soft <a previous commit hash, or something like HEAD^5> # rollback to the named commit.
# reset --soft modifies only commit history, not work tree or index.
# Now you can look at the files involved in those 5 commits, staged for the next commit
git diff --cached --name-status
git commit -m 'your msg'
# git push ... and create pull request

All methods so far assume the complicated commit history is in local repo not on a Stash branch. If your Stash branch has stuff you want to pull into Develop, but your Stash branch has complicated history, then …

4th approach – create a pull-request candidate branch

# create the candidate_branch from Develop, on stash
git tag backup                      # we do this so that we don't lose our history
git fetch                           # will show the new branch
git checkout candidate_branch
git merge --squash old_branch       # merge our old branch in, squashing all the commits
git commit                          # the default msg shows list of files. you might want to uncomment that in the msg.
git push origin candidate_branch    # now the candidate branch is up-to-date on Stash. Ready for pull request.

5th approach – rewrite public history

Dangerous – Know what you are doing
# clean up your local branch as illustrated above
git push --force origin   your_Stash_branch    # overwrite the Stash branch and overwrite public history - Dangerous!

6th approach – edit commit message after many merges, using git-filter-branch

The git-rebase and git-cherry-pick commands often fail in the presence of a complicated history of merges, file deletes etc. Try filter-branch.

Dangerous – can modify many commit messages by accident
# clean up the branch xyz
git checkout -b xyzRewrite # Always do the rewrite on a cloned branch please
git filter-branch --force --msg-filter 'sed "s/addYENRates/[CFMQUANT-262] addYENRates/" ' e635d034ea8..HEAD

At end of the command, 1st hash is the parent of the flawed commit. 2nd hash must be HEAD (or perhaps the name of a branch).

Command should tell you the branch is rewritten. If no commit message matches the pattern, then the command would tell you the branch is left unchanged, so it’s safe if you have a typo that matches nothing.

In either case, you can then run git diff against the original branch and verify they are identical content-wise.

In this real example we were lucky the message “addYENRates” was rather unique. If it shows up on 2 commits, then both would be rewritten.

sequence@{max within sliding window} O(N) no dequeue #40%

Q (leetcode hard problem 239): Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position, return the max item in it. Total run time should be O(N) but I think within each window we may not always get O(1).

https://www.interviewbit.com/problems/sliding-window-maximum/ is the same problem stated more clearly.

====analysis====

Since there exists a O(N) solution i’m confident I can figure out a O(N) solution

–idea — first scan to identify all the troughs. Items around trough to be removed, but always keep original subscripts of remaining nodes

–latest idea, not using dequeue therefore likely original —

Update — consider a rising sequence before going any further. Virtually all items are showmen…

in my showman-stack, every element {idx,height} will show up on stage at least one episode. I would say each showman goes on stage either when it first enters the sliding window, or before it leaves the sliding window. If that’s the case at each slide i only need to consider the head and tail.

(Now i think this is in the right direction but is messier than it needs to be.)

first forward scan populates this stack and ensures only showmen are included. Let’s work out some examples. First part of first scan would simply find the max among first K items. 2nd scan might be combined with first scan, but for now, my goal is clear and focused — produce a clean stack of showmen-only

  • degenerate case — monotonic sequence requires virtually all elements be included
  • rules on the forward scan —
    • every new item needs to be pushed to the stack as it could be higher than all followers. The tricky logic is “what before the push”
    • Before we examine a new item, top of the stack is always the immediate predecessor. Need to add assert.

pseudo code:

O(N) pre-scan to find all troughs. For each of N items, record the preceding trough. No stack yet.

First scan: check if the most recent trough is within range. If NO then nothing to pop from stack, so simply push new item and advance. Now assuming YES.

while new item >= top of stack __&&__ the 2nd top is within range (distance <=K):

keep pop

while new item >= top of stack && distance between is below K && there would still exist an earlier higher item within range:
//at end of the while loop, we could actually produce some output, but I would postpone it to 2nd scan

(For the last K items in array, we need some O(K) special handling.)

I think both conditions are necessary for popping the stack. Are these two conditions alone sufficient justification to remove the top?

—- segment-based solution #elegant
https://stackoverflow.com/questions/8031939/finding-maximum-for-every-window-of-size-k-in-an-array shows an elegant segment-based algo.

Suppose windows size := 10 and subscripts start at 0.
A) Conceptually we split the array into fixed segments of length 10. (The stub at the end is LG2.) Note unlike the ring algorithm, no physical data structure is implemented for this purely conceptual segments, but this conceptual data structure is essential to this clever solution.
B) Pre-scan – pre-scan the array twice .. rightward and leftward to populate two physical data structures, the LR-lookup and RL-lookup i.e. segment-wise max-till-here. For example,

RL[22] := max(#29 to #22) := Leftward max-till-22 within the enclosing egment@20 enclosing #20 to #29.

With the two auxiliary data structures RL lookup and LR lookup, the solution is mind-boggling simple and elegant. For example,

C) window@22 will overlap segment@20 and segment@30. The 29-to-22 section is covered by RL[22] and the 30-to-31 section is covered by LR[31]. So simply compare RL[22] vs LR[31].

I find this solution more intuitive, more visual than the ring solution. After the pre-scans, there’s no data structure to maintain so the iterations are stateless and extremely straightforward.

Q: what if increasing sequence?
Q: what if decreasing sequence?

—- The same webpage also has a concise explanation of the deque-based solution, kind of similar to but cleaner than my showman idea

• I will use a ring buffer of size=W as a ring is allocated at start-up time (at compile time if W is compile-time constant) and requires no dynamic allocation.
• Ring is FIFO. Terminology — I will say the earlier elements are removed from the Head like a ticketing queue; new elements are appended at the Tail.
• Invariant — I think within this ring, payloads are descending from head to tail.
• Ring holds subscripts, not payloads. Payloads could be strings or floats.
• At each iteration, 0 or 1 head item is removed if it is out of reach.
• At each iteration, incoming item kicks out all smaller tail items, as they are too close to the incoming giant. This operation enforces the invariant.

As a result, head of the ring (as a subscript) always points to the current max payload.

 

overcoming exchange-FIX-session throughput limit

Some exchanges (CME?) limits each client to 30 orders per second. If we have a burst of order to send , I can see two common solutions A) upstream queuing B) multiple sessions

  1. upstream queuing is a must in many contexts. I think this is similar to TCP flow control.
    • queuing in MOM? Possible but not the only choice
  2. an exchange can allow 100+ FIX sessions for one big client like a big ibank.
    • Note a big exchange operator like nsdq can have dozens of individual exchanges.

Q: is there any (sender self-discipline) flow control in intranet FIX?
A: not needed.

rotate square matrix 90-degree #clever swap

https://leetcode.com/problems/rotate-image/discuss/18872/A-common-method-to-rotate-the-image shows

/*
 * clockwise rotate
 * first reverse up to down, then swap the x/y subscripts
 * 1 2 3     7 8 9     7 4 1
 * 4 5 6  => 4 5 6  => 8 5 2
 * 7 8 9     1 2 3     9 6 3
*/

For anticlockwise, personally I would reverse horizontally, then same swap

This technique relies on swap(). What if the payload objects are blackboxes (or immutable) so you can’t even see contents? Just swap the 2 pointers.

— My own solution is more legwork but possibly more versatile. Suppose the matrix is N by N

m=N-1. Start from [0 0]. find new home, move in and kick out old tenant [0 m] …Each group has 4 nodes. a-> b -> c -> d -> a

See also spiral … If N is odd, then the kernel node stays changed. If N is even, then the innermost shell has a 2×2 matrix.

aggregation unit

I now believe there are at least two purposes, not necessarily reflected in any systems I worked on.

  • Purpose: FINRA regulatory reporting on aggregate short positions on a given stock like AAPL. Probably under Regulation SHO
  • Purpose: Self trades (also wash trades) that create a false impression of activity. I believe trading volume for AAPL would be artificially inflated by these trades. “Bona fide” trade reporting is expected. To deal with self-trades, a firm need to exclude them in trade reporting. But what if a self-trade involves two trading accounts or two algorithms? Are the two systems completely unrelated (therefore not self-trade) or both come under a single umbrella (therefore self-trade)? That’s why we assign an “aggregation unit” to each account. If the two accounts share an AggUnit then yes self-trade.

O(1)space,O(1)search,O(N)sort: tricks

Every time I see O(1) space required on an array problem, I think of …. swapping.

Every time I see O(1) space required on a list problem, I ask is it a ….. linked list.

Every time I see O(N) time required on an array problem, I think of … radix sort, applicable to 64-bit integers, 64-bit floats and strings.

Every time I see O(1) search, I think of … hash table and radix array

overvalued analytics applications #CSDoctor

Is GPS navigator necessary? Are side signal lights necessary? Some things are more necessary than others.

Trading system, risk systems, market data systems are mainstream. In contrast, I have found out that pricing analytics system is not really mainstream. Many buy-side and most smaller sell-side firms don’t use any pricing analytics beyond rudimentary derivations from raw market data.

There are also a number of sophisticated derivative and fixed-income analytics vendors including Bloomberg, Murex, Numerix.. These vendors focus on analytics so their customers don’t need deep expertise. OCBC’s quant team’s main job was validating the analytics offered by Murex.

Pricing analytics tools are “advisory” and not mandatory. The creators of those tools (like CSDoctor) tend to over-value their creations as if they are going to make the traders faster, safer, more profitable. In reality, traders can always choose not to use them.

As a contrast, take market data as example – 80% of trading shops need to build or buy market data systems as they can’t operate without it.

Don’t use null/None to indicate empty

Update — [[c++codingStandard]] says we should use (smart) ptr parameter if it can be null.

Many developers return (python) None/null (java) and put such a value in a containers to positively indicate an empty value.

This practice creates unnecessary ambiguity (rather than reduce ambiguity) because many built-in modules authors use None/null when they have no choice.  If you positively return this value, it’s harder to differentiate the various scenarios. This becomes an unnecessary ambiguity when troubleshooting in production.

In Java, I often create a dummy instance of a MyType and return it to mean “empty”.  I think applicable to c++ too.

In python, I tend to return a negative number from a function that ordinarily returns  a MyType, since python functions can return type1 in ContextA but type2 in ContextB! Python list also allows unrelated types.

## 9 short but deep c++IV Q

  • [0x] Q: exactly .. differences and similarities between auto_ptr and unique_ptr?
  • [0x] Q: exactly .. what’s the usage of weak_ptr?
  • [0x] Q: exactly when do you need to call std::move()
  • [0x] Q: what kind of things can be assigned to a rvr variable?
  • [0x] Q: give some examples of famous move-only types and explain why and how they are used
  • [0x] Q: give an exact example of exception safety enhancement provided by make_shared
  • [0x] Q: beside the big6, what kind of functions would accept a rvr param? push_back().. rather few outside the STL
  • — also-ran:
  • Q: why catch by reference?
  • Q: explain some consequences when a dtor throws?
  • [0x=c++0x]

isSymetric(root of a binary tree)

Leetcode published solution has a clever BFT solution, easier to implement.

–idea 1 (untested): BFT to list all nodes at a given level
For each node’s enqueue() (including the null nodes), record the path-from-root as a list. As a lighter alternative to this “list”, the path can degenerate to the last step, as a le/ri flag.

Now scan each level from both ends. The left item’s path should mirror the right item’s path.

(Before the scan. confirm the node count is even.)

–in-order dft
first scan records the paths to each leaf node. (Optionally, each path can includes east/west directions).
2nd scan does the same but always visits right child first.

The two outputs should match

## UPSTREAM tech domains: defined by eg

It pays to specialize in a domain that helps you break into similar domains. (Unsuccessful with my quant adventure so far.)

Note “wrappers” API are never upstream. Very common in big banks including Qz.

  • #1 eg: [L] socket —- C++ is the alpha wolf, leader of the wolf pack
  • [D] market data —- Nyse/Nasdaq (binary) feed parer seems to be the leader, over the slower FIX feeds
  • [D] execution algo —- sell side equities desk seems to be the leader in terms of volume and variety of algos
  • data analytics language —- python looks like the leader
  • eg: [D] alpha —- equities trading is the leader
  • eg: threading —- java is the leader of the pack. In c++ threading is too hard, so once I master some important details in java threading, it helps me build zbs in c/c#, but only to some extent
  • eg: regex/string —- Perl is the leader
  • eg: [D] consolidated firm-wide risk —- SecDB is the leader. Applicable to buy-side? I think so.
  • lambda —- C# is the leader. Scripting languages follow a different pattern!
  • list/stream —- Linq is the leader
  • drv pricing theory —- Black-Scholes —- feels like a leader or upstream, but not /convincing/
  • [L] heap/stack —- C++ is the leader. Java and c# are cleaner, simpler and hides the complexities.
  • defensive coding —- (and crash management) C++ is the leader, because c++ hits more errors, like Japan on earthquake management.
  • eg: collections —- C++ is the leader of the pack. There’s a lot of low level details you gain in STL that help you understand the java/c#/python collections
  • eg: [L] latency —- C++ is the leader of the pack in low-level optimizations .. In-line, cache-efficiency, footprint..
  • eg: [L] pbref^val —- C++ is the leader. C# has limited complexity and java has no complxity
  • [L = lowLevel]
  • [D= well-defined application domain]
  • ? automated pricing —- no upstream
  • ? Fixed income trading systems —- no upstream. IRS^Treasury^credit bonds (trading platforms) have limited features in common.
  • [D] VaR/risk analytics —- FI derivatives in big banks seem to be the leader, but I feel one trading desk in one firm may not respect another system

##controversial decisions #home,imm,retire..#YJL

Hi Junli,

You don’t need to reply. This is my periodic review of “everything in my life”.

I have recently implemented a few controversial decisions about my career, investment, family..

(As an example, the biggest is moving back to U.S. alone and starting the green card process.)

I make major decisions carefully and slowly (unless decisiveness needed), but an observer may say I’m not a good decision maker and point out my track record. Actually I don’t remember anyone pointed them out, not even my family members. The person who point a finger at my “unwise” decisions is the “judge” in my head…

Here are some of those controversial decisions

  • I will not give up Singapore citizenship, and I will retire in Singapore, relying on the Singapore government for my retirement. Singapore system is much more caring and efficient than China or U.S. systems.
  • I plan to work till 70 or older, perhaps for a token salary. I will keep up my interview skills.
  • I have stayed away from most of the new technologies — javascript, mobile apps, big data, social media, noSQL, block-chain … Instead, I have embraced the shrinking domain of c++
  • I feel my relationship and communication skills are not my strengths so through a series of trials-and-errors I have decided to stick to a technical career.
  • I’m staying in Bayonne, planning to buy my first home here. The schools are just above average.
  • I have always preferred home locations that doesn’t need a car.
  • At age 44 I decided to leave my family in Singapore and come to the U.S. to start the GC process

tried 3″hard”leetcode Q’s #tests !! 100%

I tried Q4, Q10, Q23.

Observation — they are not really harder in terms of pure algo. I found some “medium” questions actually harder than Q4/Q23 in terms of pure algo.

Beside the algorithm, there are other factor to make a problem hard. For me and my peers, coding speed and syntax are a real problem. So the longer my program, the harder it becomes. Some of the “medium” questions require longer solutions than these “hard” problems.

Logistics of instrumentation is another factor. Some problems are easy to set up and easy to debug, whereas 3D, graph or recursive problems are tedious to set up and often confusing when you try to debug with print’s.

There’s another factor that can make any “medium” problem really hard

pick java if you aspire 2be arch #py,c#

If you want to be architect, you need to pick some domains.

Compared to python.. c#.. cpp, Java appears to be the #1 best language overall for most enterprise applications.

  • Python performance limitations seem to require proprietary extensions. I rarely see pure python server that’s heavy-duty.
  • c#is less proven less mature. More importantly it doesn’t work well with the #1 platform — linux.
  • cpp is my 2nd pick. Some concerns:
    • much harder to find talents
    • Fewer open-source packages
    • java is one of the cleanest languages. cpp is a blue-collar language, rough around the edges and far more complex.

reverse slist in K-groups

https://leetcode.com/problems/reverse-nodes-in-k-group/description/ is the problem I tried today, not a classic problem. Challenge is not the algorithm per-se but the Edit-Compile-Test-Debug cycle. I think some of us can come up with a conceptual algorithm quickly, but to implement it correctly took me hours.

Similarly, the problems below are not tough due to algorithm but the ECTD cycle can take hours, sometimes due to c++ iterator pitfalls, sometimes because we can’t easily visualize the data structure .. I wrestled with all of these problem, so please feel free to try them and discuss with me.

* print any tree (you can start with a binary) by level, in zigzag sequence
* given a linked list, write a function to remove all nodes greater than 55 (or any user input). Return the head of the modified list.
* https://www.geeksforgeeks.org/zigzag-or-diagonal-traversal-of-matrix/
* https://www.geeksforgeeks.org/create-a-matrix-with-alternating-rectangles-of-0-and-x/
* https://bintanvictor.wordpress.com/2018/02/06/spiral-number-printer/

As decided last week, I didn’t bother to run the Leetcode test suit. They make me feel frustrated, worthless, defeated, inferior, weakling, quitter…. Without these tests I ran my own tests and I feel like a joyful hacker.

Even though I may not pass all Leetcode tests, I feel my code is reasonable quality and I’m proud of it.

—-Problem is well-defined but not very common.

Given a linked list, reverse the nodes of a linked list k at a time and return its modified list.

k is a positive integer and is less than or equal to the length of the linked list. If the number of nodes is not a multiple of k then left-out nodes in the end should remain as it is. O(1) space. Hopefully O(N) time.

—-My sol1: use my existing O(1) solution but now keep a count.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/linkedListReverseInK_group.cpp

The first group and the last group are both tricky and can take up hours.

xx: weekendCIV^codingDrill^work

Amount of tech learning and zbs growth over weekend coding assignments are different from coding drill or work projects

  • intensity — highest
  • QQ halos — comparable to my language feature experiments — improves QQ
  • scale — larger than coding drill but not too large like work projects
  • BP — is a major learning focus and evaluation criteria
  • absorbency — highest
  • sustained focus — lasting over a few days, pretty rare for me

NAT: linux server can become a simple router

See https://www.karlrupp.net/en/computer/nat_tutorial.

What limitations are there? I only know a few

  • need multiple network interfaces

Here’s how we use NAT to translate destination port number in every incoming IP packet:

[sa@rtppeslo2 ~]$  sudo iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)

target     prot opt source               destination         
DNAT       udp  --  0.0.0.0/0            224.0.121.4         udp dpt:18020 to::48020 
....

Math+intelligence ] trading ≠> high value-add

  • — Examples of math applied outside traditional proven quant domains like VaR
  • Trade analytics, execution analytics systems — analyzing past executions, and uses statistic tools to derive some empirical or parametric distribution.
  • Sell-side pre-trade analytics to evaluate a proposed trade….
  • Real-time risk analytics

Q: How much math in these systems? Not that much.

  • Fundamentally, trading domain is math-lite…
  • Risk management is slightly more mathematical due to large data set, relaxed latency requirement, many scenarios
  • The fancier and more advanced math, the more dubious

Q: Value-add? Questionable. That’s one reason why most financial institutions don’t spend billions building such systems. They do spend billions on traditional automation systems.

Q: Who would want to pay to use these systems? Rather Few.

Q: Python? Possibly.

case study: CSDoctor’s — value-add@analytics #CSDoctor

Are equities simpler than FICC@@

I agree that FICC products are more complex, even if we exclude derivatives

  • FI product valuations are sensitive to multiple factors such as yield curve, credit spread
  • FI products all have an expiry date
  • We often calculate a theoretical price since market price is often unavailable or illiquid.
  • I will omit other reasons, because I want to talk more (but not too much) about …

I see some complexities (mostly) specific to equities. Disclaimer — I have only a short few years of experience in this space. Some of the complexities here may not be complex in many systems but may be artificially, unnecessarily complex in one specific system. Your mileage may vary.

  • Many regulatory requirements, not all straightforward
  • Restrictions – Bloomberg publishes many types of restrictions for each stock
  • Short sale — Many rules and processes around short sale
  • Benchmarks, Execution algorithms and alphas. HFT is mostly on equities (+ some FX pairs)
  • Market impact – is a non-trivial topic for quants
  • Closing auctions and opening auctions
  • Market microstructure
  • Order books – are valuable, not easy to replicate, and change by the second
  • Many orders in a published order book get cancelled quickly. I think some highly liquid government bonds may have similar features
  • Many small rules about commission and exchange fees
  • Aggregate exposure — to a single stock… aggregation across accounts is a challenge mostly in equities since there are so many trades. You often lose track of your aggregate exposure.
  • Exchange connectivity
  • Order routing
  • Order management

malloc()performance #tips #CSY

See https://stackoverflow.com/questions/161053/which-is-faster-stack-allocation-or-heap-allocation

  • stack allocation is much faster than heap allocation but you may not notice the difference
  • custom heap allocator to replace malloc() can be as fast as stack allocation
  • Many shops like FaceBook create custom allocators because standard malloc is too slow

Why is malloc so slow? Many online commentators point their fingers at the complexity of heap memory (and free list) management.

I don’t think system call is the issue. System calls are very infrequent with malloc.

2 reasons: BM is poor model for bond price

Reason 1 — terminal value is known. It’s more than deterministic. It’s exactly $100 at maturity. Brownian Motion doesn’t match that.

Reason 2 — drift estimate is too hard too sensitive. A BM process has a drift value. U can be very careful very thorough to estimate it, but any minor change in the drift estimate would result in very large differences in the price evolution, if the bond’s lifespan is longer than 10Y.

 

q[struct]^q[class] in c++

–difference: by default,

  • deriving from a q[struct] is public derivation
  • deriving from a q[class] is private derivation
  • P 243 [[ARM]] explains why.
  • Best to specify explicitly private or public

–difference: members (including static) default to public in a struct

–difference: keyword q[class] is usable in template<class X>

–difference: q[struct] is a keyword in C and C has a few other syntax rules about q[struct]

I separate pure-algo from real-coding questions

Many job seekers simply say “algorithm question” or “coding interview” as if they mean the same. I see significant differences between two types of coding test — compilable code vs pure algo.

Here’s one difference to start with — real-coding questions require us to be very familiar with syntax and very fast edit-compile-test cycles. I often struggle even when my idea was correct. I often run out of time. No such requirement in a white-board algo test.

I try to visualize a spectrum image like

<=== harder on language+implementation (blue side) … (red side) increasingly harder on pure algorithm ===>

  • * (On far left “Ultraviolet” side of the spectrum) multi-threading coding questions are light on pure computer science algorithms.
  • * (Also on the far left  side) some weekend coding assignments are very serious about design trade-off’s, reasonable assumptions, thorough automated testing … No such requirement in a pure-algo test.
  • * (On the mid-left side) real coding questions can have a code smell issue, esp. in the weekend take-home assignment. I failed many times there even though my code worked fine.
  • * (in the middle) many Hackerrank and Codility questions involve difficult algorithm and fast edit-compile-test. Very hard for me.
  • * (on the right side) pure algorithm questions are sometimes on white board, and even over phone if very short. Light on syntax; no edit-compile-test; no code smells.
  • * (on the far right side) some west coast white board interviews require candidates to ask clarifying questions, and later eyeball-test our code without a compiler. No such requirement in a real-coding interview.
  • * (On the extreme right end) very tough algorithm questions care only about O() not about language efficiency. Pick your language. The toughest algorithm questions don’t even care about O() so long as you can find any solution within 15 minutes.

I often perform better with pure algo questions than real-coding questions. My worst game is codility/hackerrank.

I find it imprecise to call all of them “algorithm questions” and disregard the key distinctions between the two types. Both types have some algorithm elements, but the real-coding questions often require additional skills such as quick completion, language efficiency, and code smells knowledge. A bug can often take us 10 minutes to locate — something rare in pure-algo questions.

Effective strategies to prepare for real-coding vs pure algo interviews are rather different. In this email I will only mention a few.
1) for speed, we need to practice and practice writing real code and solving short problems; you can try timing yourself.
2) for pure algo, we can use python; we can look at other people’s solutions; we still need to practice implementing the ideas
3) for white-board, we need to practice talking through a problem solution… need to practice asking clarifying questions and eyeball-testing
4) for weekend take-home tests, practice is not effective for me. I won’t prepare.
5) for multi-threading problem, need to practice with basic multi-threading problems.

factory patterns explained graphically

https://vivekcek.wordpress.com/2013/03/17/simple-factory-vs-factory-method-vs-abstract-factory-by-example/ has nice block diagrams and examples.

  • CC) “Simple factory” is the most widely used one but not a widely known pattern, not even a named pattern. “SimpleFactory” is just an informal name

I mostly used simple factory and not the other factory patterns.

Any time I have a method that creates and returns polymorphic objects from a family (like different Trades), I would name the entire class SomeFactory. I often add some states and additional features but as a class or struct, it is still a simple manufacturer.

  • BB) factory method — a family of manufacturer classes. Each supports a virtual create() method. Each is responsible for instantiating just one specific type of object. NokiaFactory.create() only creates NokiaPhone objects.
  • AA) AbstractFactory — too complex and needed in very special situations only.

swap usage when RAM already adequate

Adapted from blog by Hayden James

Even when our average memory usage is smaller than RAM capacity, system still benefits from swap!

Most server processes are daemons. Any daemon can create lots of memory pages rarely accessed till shutdown. Kernel often decides to relocate rarely used memory pages to swap for performance reasons, mostly to free up RAM. The reclaimed RAM space can remain vacant for some time, so you may think the relocation is unnecessary but ..

  • the relocation is usually harmless — the kswap pid uses very little cpu unless such relocation workload becomes frequent and bidirectional, a sign of insufficient RAM.
  • the relocation is preemptive — The vacant RAM is available to any process that can use it more productively. In general, faster cache ought to hold “hotter” data. In other words, hotter data should be cheaper to access.

But what if there’s no other process or not hotter data? What if all the data can fit into RAM? This is rare but yes you can disable swap.

Note, as explained in my [[linux kernel]], kswapd is both a process and a kernel thread that wakes up from time to time.

merge K presorted lists #O(what)

Q: Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Note K could be much larger than N.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/merge4lists.py is my solution.

I feel this is mostly an optimization challenge. I can think of a few solutions

–Sol1: merge 2nd list into first. Then merge 3rd list into first …

https://leetcode.com/problems/merge-k-sorted-lists/solution/ shows that this has higher runtime cost than the brackets solution.

Reason is, each 2-merge-to-1 must visit every node in both lists. So the first list nodes get visited K times!

–Sol1b: brackets.

There are only (log K) levels in the bracket so any list gets visited that many times.

–Sol3: in-place (inefficient)

We maintain K node-pointers for the K lists (K teams)

We also maintain a pointer to the last-added node in the merged list.

first node in K lists are put into a min-heap. Winner (smallest) team would be the “current list”. Now the winner team offers next node and add it into the heap. Winning team ..

What if N=1 and K is 1 billion?

Name some Original contributions as a wall St techie@@

Background — expert test-takers and Math Olympians don’t create value. Grandpa is more than an expert in his domain — he created ground-breaking original research.

See also realistic+significant legacy of me as a developer@@

Generally, developers make more lasting impact than other roles. Developers are seen as value-creators. Here are my original contributions as a Wall St techie:

  1. 95G — I created the stored proc as the basis of concurrency control
  2. 95G — I created the wait/notify framework in B2B
  3. Barc — emulating the functional programming spreadsheet in java
  4. —-Below are less “original”
  5. RTS — nyse integrated feed XDP parser. Huge impact to billions of downstream users.

python slicing q( [] )ALWAYS generate clone

This is crucial knowledge for ECT coding test

See mystr[:-2] copy_truncate last 2 chars #mystr[2:]

See copy_reverse list/string/tuple by [::-1]

See python initialize list/dict : perf

Among the various list cloning solutions on https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list:

  • slicing is consistent with string/tuple
  • list(oldList) is similar to java API
  • copy.deepcopy is useful across many containers

custom-basket ^ portflio trading

A client can ask a broker to buy “two IBM, one MSFT” either as a AA) custom basket or a BB) portfolio. The broker handles the two differently.

Only the Basket (not the portfolio) is “listed” on Bloomberg (but not on any exchanges). Client can see the pricing details in Bloomberg terminal, with a unique basket identifier.

Booking — the basket trade is recorded as a single indivisible position; whereas the portfolio trade gets booked as individual positions. Client can only sell the entire basket; whereas the portfolio client can sell individual component stocks.

Fees — There is only one brokerage fee for the basket, but 5 for a portfolio of 5 stocks.

The broker or investment advisor often has a “view” and advice on a given basket.

Corporate actions should be handled in the basket automatically.

I feel portfolio is more flexible, more informal than custom basket which is less formalized, less regulated than an index-tracking ETF.

TowerResearch IV: c++,algo,past prj..

Q1: how could you use enum to improve performance
AA: use it to replace strings
AA: enum is used in template programming to move computation from run time to compile time. See Q4 and SFINAE in github.
%%A: see clever use of enum(char?) in demanding c++ app

Q1b: you put in your change but it turns out to hurt performance. Any idea?

Q4: how do you compute Fib(N) like Fib(99999) at compile time using template programming?
A: See my Fibonacci code in github

Q: how would you use multicast to send a message to a group of registered users?
%%A: less data copying than TCP

Q: In O(1) space, Given an unsorted array of natural numbers, how do you find any pair to produce a target sum? What’s your best time complexity?
%%A: if I assume the integer size is 32-bit then a fixed-size O(1) space radix structure can sort the array in O(N). See also my blog post on [[locate a pair with targetSum=55 #bbg IV #Morris]]
%%A: if integer size is unlimited, then I think the best is O(NN)

Q: How is your muni bond pricing engine designed internally?
Q2b: what kind of pricing rules live in the cache?
Q2c: how many threads to react to a given event?

Q: how did you calculate FX fwd points?
%%A: use the interest rates in the two currencies and the spot FX rate.

Q: how is your implied volatility computed from an option price?
%%A: if there’s not a closed-form formula, then Newton’s method or something similar should be able to converge quickly.

Q3 (OO design): How would you design a game software with Animal objects, various birds, and fish?
Q3a: Say the system is running fine, but now you need to add ostrich class?

[18] Integer(like String)objects always immutable: java+python #XR

Integer(like String)objects always immutable in java. My google search confirmed that.

Beside serialization, there is no practical reason to deep-copy them.

Python integer objects are also immutable. Every time we modify an int, the new value has a different id(). See also my blog post on python immutable types.

See also https://medium.com/@meghamohan/mutable-and-immutable-side-of-python-c2145cf72747

linux desktop set-up: don’t over-xx

Based on some prior experience, I feel I would not use linux as a desktop.

  • Avoid overspending time learning the various “interesting” details
  • Avoid overspending time understanding the installation issues
  • Avoid overspending $ to buy stuff

Just do the minimum to set up a sandbox and get the coding interview done. The GTD/zbs about linux desktop has zero market value. It doesn’t enhance my skills with “professional linux”, as proven over the years.

I had many problems installing and using linux desktops each time

  • linspire
  • ubuntu from XR — this installation was successful but many problems using it.
  • oracle virtualbox

coding drill:LASTING value4family-well-being]U.S.10Y

This blog post is poorly structured, but let’s not spend too much cleaning it up. Not worth it. Most of the content is repeated elsewhere.

See also  the benefits of coding drill in 4 benefits of coding practice #beatFront and Coding drill is anti-aging

Background: for many years until late 2017, I have focused on QQ, and neglected coding tests.

  • If you bring your pure algo skill from 4/10 to 6/10, it will remain around that level, or slightly lower. It will help you with your high-end job interviews for 5-10 years.
  • a lot of “algo skill” consist of key ideas behind the top 100 common coding questions. (I tried about 1/3 of them.) I feel it’s crucial to review the high-value problems to refresh 1 or 2 key ideas for each problem.
    • Note the various max-profit problems look similar but Key ideas are vastly different.
  • If you rely on your work projects, you will remain around 3/10 😦 Your projects won’t train you for those coding challenges in terms of algo, BestPractice, ECT…
  • Only a small part of the required syntax can be practiced but a lot more practice still needed 😦
  • You may think the STL syntax will become familiar naturally, but in reality over a year our projects used no std::map no lower_bound() no sort() no getline() so most of the basic syntax were not developed!
  • One builds the ECT skills (including syntax) with focused practice, outside the day job. No shortcut. Programmers without this “mileage” won’t have this skill. Just like driving or any athletic or artistic skills.

I’m reluctant to break down by “skill” but ..

  1. quick-n-dirty syntax used for coding test — Yes we can remember most of it for 10Y, based on my experience with perl, java, php
  2. ECT — easy to lose the speed. I feel some of the ECT improvement does remain.
  3. pure algo ideas — already discussed above

Low “churn” — most coding questions have remained unchanged for 20 years.

I also realized that my c++ ECT/BP proficiency is different from my java ECT/BP proficiency.

 

fixation@ROTI@tech-xx : too result-oriented; disengaged

fixation@ROTI/payoff/success/result/accu … dampens job satisfaction+joy@learning.

This affects my “engagement”. Granted, we should not Ignore these ROTI factors, or those “smells” … instead we should evaluate our direction and take stock, but let’s not overdo it.

  • +ve Eg: Barcap option math
  • +ve Eg: Barcap swing learning
  • +ve Eg: RTS socket programming
  • -ve Eg: git
  • -ve Eg: curve building
  • -ve Eg: WCF

Consider a tour guide aiming for the tip at the end.
Consider Grandpa in his research career.
Consider a singer like 王杰 or the last few years of 邓丽君。
Consider Einstein’s violin

Q: has that increased your income or benchmark score? # more time in office, shorter commute, MSFM, c# ….

  1. This question can be posed to grandpa.
  2. This question can be posed to any education institute including the “top schools 名校”. Ironically the same questioners seem to be /fixated/ on these top schools for their kids. So for these people, this question is self-contradictory.
  3. This question can be posed to my friends engaged in quantitative investment analysis.

This question is harmful, misleading, derogatory, discriminatory, browbeating, pessimistic/fatalistic, myopic, … This question tosses aside many important things to our lives, our joys, and satisfaction —

  • career safety net
  • exploration of personal talents and personal interests
  • “in-demand” satisfaction
  • market depth
  • mobility between firms
  • freedom — I don’t want to feel “trapped”
  • observation (even conviction) on various topics, based on in-depth personal research

dummy param in c++func

I used to name unused parameters of myfunction as “dummy”.

  • now I also use ‘_’. I think this also works in python
  • now I also use nothing as in void fn(T*)

Not always better than ‘dummy’ but at least this kind of compiler knowledge is useful in troubleshooting.

I don’t name the parameter “unused” as it is a confusing name.

spend more$ to prolong engagement+absorbency

  • increase spend on good mobile data and mobile devices to capture the creativity, absorbency …
  • increase spend on printer
  • increase spend on hotel stay near office (or taxi home) to capture the engagement on localSys
  • spend on flights to gain family time, engaged
  • spend unpaid leave to attend interviews .. to gain joy, motivation, engagement, precious insight into their selection priority

Not so sure about …

  • Regrettable — spent unpaid leave before starting Macq job .. to gain peaceful family time? low ROI
  • questionable spend (gave up higher pay rate) to gain … c++ skills like QQ, GTD, zbs