next(mylist.begin())
++(myslist.begin())sr
Both return the an iterator pointed at 2nd element
Note you can’t use q(+1) because list has no random access iterator
next(mylist.begin())
++(myslist.begin())sr
Both return the an iterator pointed at 2nd element
Note you can’t use q(+1) because list has no random access iterator
I won’t insist on relative ordering among the shortest.
Idea 1() — Start with longest abbreviation i.e. the original string S, assuming 5 characters.
fb >> google > Uber.. according a 27-year old Indian rockstar who received 7 offers
Rahul felt google > fb > uber
A Columbia intern said At intern level, fb = goog (are the two hardest) > twitter, Uber, Amazon and the rest
Q (Leetcode): Given a set of non-overlapping intervals, insert a new interval into existing intervals (merge if necessary) and print updated list of intervals. Intervals were a vector sorted according to their start times.
–analysis–
Now I feel the #1 main data structure is a doubly linked list (dlist) of Segment objects:
Every time this dlist is modified, we would update a “helper container” — a tree of node pointers, sorted by the segment_left_mark value. Tree to help successive inserts. However, if each insert(vector intervals) has a sorted vector then we can binary search the vector and don’t need to tree.
First, binary search to locate the left mark among all existing marks. Ditto right mark. Based on these 2 results, there are many cases.
Need a sorted tree of all marks + array of segments. Redundant but helpful.
Each segment (interval or gap) is represented by {left mark, right mark} where left <= right. I will save the segment objects into (a linked list and) an array. Even elements are interval objects and odd elements are gap objects. Now superceded by dlist.
I think this problem is all about corner cases. Perhaps start with the complex cases which will take care of the simpler cases. No need to pass Leetcode tests. Due to the pointer complexity, I prefer python.
https://github.com/tiger40490/repo1/blob/py1/py/linklist/insertInterval.py is my solution but I dare not test on Leetcode
Q: https://leetcode.com/problems/largest-rectangle-in-histogram/description/. Given N possibly recurring non-negative integers representing the histogram’s bar heights, and given the width of each bar is 1, find the area of largest rectangle in the histogram.
Visually well-defined problem. Kind of naturally-occurring. Very simple data structure. No O() requirement, so I will just try my own solution.
https://github.com/tiger40490/repo1/blob/py1/py/array/maxHistoBox.py is my solution. 100% passed on Leetcode.
==== analysis — heavy on data structure design.
Key insight — one scan to update a clever data structure.
key insight — data structure is not per bar, but per height!
For every bar J, there exists an enclosing max-rectangle of J’s height. We can just compare all of these rectangles.
We might start with two extreme candidates
1) the peak — whose enclosing rectangle is likely slender — O(N) one scan to find all the peaks
2) the lowest bar — whose enclosing rectangle has width N — O(N)
If we paint the histogram as a binary matrix, then this is equivalent to anther problem max all-black submatrix #DP #zhurong. but I think there exists better solutions like O(N logN) or O(N*S) …
–homegrown algo with O[N*S] where S:= #unique heights. The binary search doesn’t show up as logS.
A pre-scan to get all distinct heights. For each distinct height, we maintain a RunRecord object {bestRun, currentRunStart, height}, in a sorted map {height -> record}. In py, I can use a pre-sorted vector of Records, sorted on height
In main scan, As we encounter a new bar of height J, we update these records.
At end of the main scan, every record has a bestRun i.e. the duration. I can then calc the area under each bestRun and return the max.
See also
Q: given the limited spare time we working fathers have, what specific (learning?) pet projects can enhance our salary?
A: invariably poor ROI, mostly a mirage. Moving up is the most common success story.
One coding convention was introduced in [[Alexandrescu]]
https://www.geeksforgeeks.org/variable-length-arrays-in-c-and-c/ has pointers to
[[c++primer]] P113 says that as of c++11, array size has to be known at compile time.
— java arrays are VLA i.e. run-time allocated.
Java array is allocated on heap.
Are arrays on heap always run-time allocated, such as vector and hash table? I think so.
Update — a similar sliding window is used in longest substring without repeating chars
Q (leetcode): Given a string Haystack and a string T, find the minimum window in Haystack which contains (at least) all the characters in T according to the frequencies. Time complexity O(n). Eg: minWindow(“ccbabccbabcb“, “bbc“)==“bcb“
If there is such a window, you are guaranteed that there will always be only one unique minimum window in Haystack. <– I thought this guarantee means something but it doesn’t.
Without loss of generality, I will assume the chars are a-z. I believe those Leetcode corner cases will use only 3 chars
—analysis—
For single-string problem, use array indexed by ascii code. I can convert T to such an array to store the required frequencies (reqFrq)
I can construct a shadow array, same length as Haystack with these payloads:
–SolSW: sliding-window based
I often feel bad that all of my efforts in my spare time had no tangible ROTI, but look around, who fared better?
Note this post is more about peer comparison (recrec blog), less about my spare time usage (open blog)
For the record my spare time effort did produce some tangible outcomes
Update — my view was overridden by recent IV experiences. In math problem solving, I studied solutions then built my intuition. My peers actually do the same with algo challenges. They become very confident this way.
–confidence-build`:another reason y I don’t look@answer
Hi XR,
Just to share some minor thoughts.
I solved many “hard” leetcode problems and also the 25-horse puzzle without peeking at answers. These problems are non-trivial, and my solutions are usually efficient but not easy to understand. They are part of a tiny but growing wellspring of technical confidence. They reinforced my self-image of analytical mind. Similarly, CSDoctor Wang often shows a healthy self-image of technical competence, based on many successful experiences. It’s hard to feel confident without experience, simply by reading other people’s answers. (Experience is not that effective at building confidence — we could get a string of unsuccessful experiences … but personal experience is usually the most important factor.)
Also, these problems gave me confidence during non-technical interview, so I could say with conviction that
“I am confident with algorithms + data structures even though I’m not the best. I solved more than 20 of the hard leetcode problems on my own, without reading any discussion.”
I won’t have the conviction to say the same if I mostly read other people’s solutions.
I sometimes give this long answer to the behavior question “Q: tell me your strength and why”
A: The best comp science students might have good answer to, say, 30% of a question pool including familiar and novel problems. I’m slightly below their standard. However, if the question pool consists of only familiar problems like Leetcode problems, then they would be much stronger than me since I don’t have very high mileage on Leetcode and similar sites. If you want to benchmark me then please pick highly non-standard questions. I am fairly good at creating original algorithms.
Some naysayers would cast doubt — Having solved those problems years ago doesn’t mean he can solve them now. Well, you can say the same about any skill, but we know that once the “circuitry” is wired up in our “system”, the problem solving skills are enhanced. A Programmer who has never solved these problems independently is clearly weaker.
This is a probably the biggest sell-side equity order-management-system (OMS) on Wall St, written in c++11. Daily order volume is probably highest among all investment banks, presumably 7 figures based on my speculation, though a lot of them get canceled, rejected or unfilled. I am disallowed to reveal too many internal details due to compliance.
In contrast, GS used to get about a million individual trades (perhaps the partial fills of an order?) a day, probably not counting the high-frequency small trades.
I prefer cherry-pick.
Often the merge/rebase/cherry-pick/stash-pop operation would silently put some of the changes in _staging_, therefore git-diff would fail to show them. I have to use git-diff-HEAD instead:(
–If you have no choice then here’s the standard procedure
https://careercup.com/question?id=5153263227764736 defines the problem
Q (Amazon): In a city, year of birth/death of people who where born and died between year 1900 to 2000 are given. Write an algorithm to find the year in which max people were alive. Note the years are not unique and not sorted
Similarly,
Q (FlexTrade): For an airport gate system, flight arrival/departure times are given for yesterday. What’s the maximum number of gates required at the busiest time?
Solution1: O(N logN) merge-sort all timestamps, then scan it in one pass. If an arrival, then increment counter; if a departure then decrement it.
??Solution2 (assuming arrival times are pre-sorted) Using hashtable, keyed by arrival time. Value is a count of flights arriving at that time. Every arrival creates or updates in the hashtable. Every departure deletes or decrements. Maintain a separate total count.
I think we still need sorting.
Solution3: O(N). Use array if all the years are small integers. (Regular timestamp is also small integers — 0 to 2355 in steps of 5.) Fill all arrival/departure events as +1/-1 in an array indexed by year.
https://github.com/tiger40490/repo1/blob/py1/py/linklist/insertInterval.py shows
A homemade implementation is more flexible. Node1.next -> Node2 but Node2.prev may not point to Node1
Q (Leetcode): Given a string containing nothing but the characters ‘(‘ and ‘)’, find the length of the longest valid (well-formed) parentheses substring.
https://github.com/tiger40490/repo1/blob/cpp1/cpp/str/maxParensRun.cpp is my solution 100% tested on Leetcode
–My Single-iteration solution:
Challenge is data structure. I ended up with 2 data structures to be updated during the iteration
For each closer, I will record the position of the matching opener, then compute the distance (minimum two).
See also easiest windows GCC installer #c++17
Git-Bash + StrawberryPerl is the best combo for me on Windows. I put g++ commands in bash scripts to automate my GCC build and test. Note my typical project has at most 20 source files.
Git-Bash + StrawberryPerl + Notepad++ is better for me than any c++ IDE like Eclipse CDT (4 installations), Bloodshed (4), CodeBlocks (1), NetBeans…
I’m such a die-hard fan of command line that the only GUI development tool I use is notepad++ and it’s optional. Since 2015, my Linux c++ code editor has been VIM, which I used on large codebases consisting of 1000+ source files.
For about 2 years EclipseCDT was my default choice, then the simpler Bloodshed became my default choice as it is simpler than EclipseCDT. However they are still over-complicated. I didn’t have a favorite until 2017, when I discovered Notepad++/Git-Bash/StrawberryPerl
Q: why do I use Eclipse for Java but not for c++?
A: Eclipse is more problematic, more messy for c++ than for java. The benefits (convenience, automation..) is offset by the high TCO (total cost of ownership). For example, java debugger, java code navigation, java renaming/refactor all work better than c++.
Q: how about MS VisualStudio?
A: I managed a large c++ codebase on MSVS 2015 in 2015-2016, about a year+. Too complicated, worse than MSVS for c#. I would never prefer it over my command line set-up, if given a choice.
By the way, Git-Bash works well with a standard python installation, but I do recommend the tweak at https://stackoverflow.com/questions/32597209/python-not-working-in-the-command-line-of-git-bash
http://strawberryperl.com is an easy installer for Perl and it bundles GCC. I was able to compile in c++17 mode:
g++ -std=c++17 my.cpp
My experiences with cygwin and mingw were both horrible.
In contrast, I installed strawberryPerl about 3 times and once on a friend’s laptop … always quick and easy. No tweaking required. GCC + gdb works out of the box.
gdb to show c++thread wait`for mutex+condVar shows gdb works out of the box.
Through almost 10 installations of GCC on various windows laptops, I have come to the obvious conclusion.
char symbol[ 55 ]={‘\0‘};
I used it in https://github.com/tiger40490/repo1/tree/cppProj/cppProj/Houston
for(char c: stock_symbol)
See https://github.com/tiger40490/repo1/blob/cppProj/cppProj/Houston/AbstractEngine.h
Q: Given an N-array of immutable ints, write an efficient sumMatrixSum(le,ri,top,bot). Assume top-most and left-most indices are zeros.
My solution: pre-populate in O(NN) a shadow-matrix s, where at each location s(row, col) is the sum of submatrix(0,col, 0, bot).
Query will be O(1) :
s(0,ri, bot) + s(le-1, top-1) – s(le-1, bot) – s(r, top-1)
Two simple steps:
git stash #no args
# Now git diff will no longer show the uncommitted changes:) …later
git stash pop
# Now git diff shows the original uncommitted changes
If git-stash-pop hits conflict on 1 file out of 3
I usually pass smart pointers by value (copy-constructor or move-constructor), just like copying a raw ptr. Therefore the code below looks unnatural:
unique_ptr<Trade> & ref2smartPtr
Well, my “pbclone” intuition was incorrect. Actually pbref is rather common because
https://confluence.atlassian.com/bitbucketserver/keeping-forks-synchronized-776639961.html says
Fork syncing helps you to keep your fork in Bitbucket Server up-to-date with changes in the upstream repository. Bitbucket Server can do this automatically for all branches (and tags) you haven’t modified in the fork.
I guess forking (and pull request) is not a feature of GIT per se but a feature of Bitbucket and GitHUB server. I think the server get notified upon a new commit and runs fast-forward git pull.
For me, rebase is the procedure to re-sequence commits… A simple form of history rewrite.
git pull --rebase # moves your local commits to the end of the linked list
The following will take the last five commits and give you an interactive session that allows you to (optionally) merge them.
git rebase -i HEAD~4 |
More details are here http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html
git diff –help # manpage
git diff –color-words # good
git diff -U22 # 22 lines of context
3 forms of diff — See 3trees@git
— diff of a merge-conflict file is different.
I think it is a 3-way diff. Output shows current “merged” file vs orig1 and orig2. https://git-scm.com/docs/git-diff#_combined_diff_format explains “++”
— if you see a line showing up due to nothing but white space change, you can usually use “-w” to suppress it but sometimes you need to pinpoint the white space change:
git diff |cat -vet
— on linux, git-diff output seems to truncate long lines
https://stackoverflow.com/questions/136178/git-diff-handling-long-lines shows
git config core.pager 'less -r'
I didn’t like git-revert as it complicates history. See https://www.atlassian.com/git/tutorials/undoing-changes/git-revert
git checkout HEAD -- <file or directory> |
This use of git-checkout is unrelated (as far as I know) to the commit-level checkout. This solution is officially /sanctioned/ … Some git commands would suggest …
(use "git checkout -- <file>..." to discard changes in working directory)
git reset --hard HEAD # reverts tracked files back to the head git clean -fxd . # deletes untracked files in current dir. You may want to cd to the repository base first |
[Victor] My vote for best explanation of git-reset goes to https://git-scm.com/blog/2011/07/11/reset.html. Detailed, but more readable than the manual. Many examples are useful to me.
Warning — if you want to temporarily play with an older version of file1.java, then be careful. Make sure you save the current tip of your branch. If you git-reset–hard then you may lose that tip!
git fetch origin git reset --hard origin/yourBranch |
Say you are working on a feature branch with other people, with you all making commits and pushing to the remote replica of this branch. At some point you will want your commits merged with develop/production. Git does provide a method of choosing which commits you would like to push. The easiest way to explain this is with an example. We assume that all the following commands are run in a command line tool.
git checkout feature/commodity git log - 3 --pretty=oneline #look at the 3 most recent commits to the repository (includes all commits to all branches - in a nicely displayed format) $ git log - 3
|
Looking at this log, I want to push only the latest change to ‘develop’ branch (via a pull request), the commit associated with the e5a67d7088622dd8821dfdd8e2fafa2dc938b75c hash number. Keeping a record of this hash number I then create a new feature branch from ‘develop’ branch in Stash. Assume this new branch is called feature/cherry_pick
git fetch #update local repo to get this new branch info git checkout feature/cherry_pick # switch to this new branch, at the moment is exactly the same as origin/develop git cherry-pick e5a67d7088622dd8821dfdd8e2fafa2dc938b75c #where the hash number is the commit I actually want to merge into 'develop' git commit #commit this change to the local feature/cherry_pick git push #push this change to the remote feature/cherry_pick branch |
You can now create a pull request in Stash to merge this cherry-picked commit into ‘develop’
–Method: git-diff command
This is a basic but versatile tool. It looks at two commits you provide. Note each tag represents a commit. Each branch name also represents the commit at the tip of the branch.
–Method: have two local repos (clones) to check out different branches
–Method: download remote version as complete file, then diff that against local file
Here’s my earlier favorite choice to diff a remote branch version vs a local branch version:
mv buildChoops.sh buildChoops.local git checkout origin/feature/my_br6 buildChoops.sh # download the file from the remote branch into the current directory locally ... now run any diff tool between the two files |
–Method: git diff using Beyond Compare 3
See more details in separate blogpost git+beyondCompare
–Method: git diff using Winmerge
see http://stackoverflow.com/questions/1881594/use-winmerge-inside-of-git-to-file-diff, updated June 2015
I hope there are similar solutions for other diff GUI tools.
A common situation: you make many incremental, quick and dirty commits on your branch, undoing and redoing changes. You don’t need to push the entire history to a shared branch like Develop.
Many solutions exist. Git offers many commands to rewrite history on a branch. Many git users use them regularly. We could use them to clean up the commits on a feature branch before pulling them into a pull request. Here are some of the methods.
This will rewrite the history of your branch, you should only do this if no one is sharing your branch (or you don’t care about them).
This example will let you interactively select from the last four commits.
git rebase -i HEAD~ 4 # Alternatively, specify the last-good commit: git rebase -i e8a9cf093 # Rebase all subsequent commits |
git tag backup # we do this so that we don't lose our history git checkout -B temp origin/develop # checkout a clean temporary branch from develop git merge --squash <branch> # merge our old branch in, squishing all the commits git commit git branch -f <branch> temp # force the branch to point at the new commit |
Suppose you have 5 commits to squash,
git reset --soft <a previous commit hash, or something like HEAD^ 5 > # rollback to the named commit. # reset --soft modifies only commit history, not work tree or index. # Now you can look at the files involved in those 5 commits, staged for the next commit git diff --cached --name-status git commit -m 'your msg' # git push ... and create pull request |
All methods so far assume the complicated commit history is in local repo not on a Stash branch. If your Stash branch has stuff you want to pull into Develop, but your Stash branch has complicated history, then …
# create the candidate_branch from Develop, on stash git tag backup # we do this so that we don't lose our history git fetch # will show the new branch git checkout candidate_branch git merge --squash old_branch # merge our old branch in, squashing all the commits git commit # the default msg shows list of files. you might want to uncomment that in the msg. git push origin candidate_branch # now the candidate branch is up-to-date on Stash. Ready for pull request. |
# clean up your local branch as illustrated above git push --force origin your_Stash_branch # overwrite the Stash branch and overwrite public history - Dangerous! |
The git-rebase and git-cherry-pick commands often fail in the presence of a complicated history of merges, file deletes etc. Try filter-branch.
# clean up the branch xyz git checkout -b xyzRewrite # Always do the rewrite on a cloned branch please git filter-branch --force --msg-filter 'sed "s/addYENRates/[CFMQUANT-262] addYENRates/" ' e635d034ea8..HEAD |
At end of the command, 1st hash is the parent of the flawed commit. 2nd hash must be HEAD (or perhaps the name of a branch).
Command should tell you the branch is rewritten. If no commit message matches the pattern, then the command would tell you the branch is left unchanged, so it’s safe if you have a typo that matches nothing.
In either case, you can then run git diff against the original branch and verify they are identical content-wise.
In this real example we were lucky the message “addYENRates” was rather unique. If it shows up on 2 commits, then both would be rewritten.
git remote -v # will show the URL
A typical hacky solution — uses xml comment or database comment column for data
But
std::replace(line.begin(), line.end(), ‘,’, ‘ ‘);
istringstream(line)>> symbol >> price;
I used this solution in https://github.com/tiger40490/repo1/tree/cppProj/cppProj/Houston
Q (leetcode hard problem 239): Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position, return the max item in it. Total run time should be O(N) but I think within each window we may not always get O(1).
https://www.interviewbit.com/problems/sliding-window-maximum/ is the same problem stated more clearly.
====analysis====
Since there exists a O(N) solution i’m confident I can figure out a O(N) solution
–idea — first scan to identify all the troughs. Items around trough to be removed, but always keep original subscripts of remaining nodes
–latest idea, not using dequeue therefore likely original —
Update — consider a rising sequence before going any further. Virtually all items are showmen…
in my showman-stack, every element {idx,height} will show up on stage at least one episode. I would say each showman goes on stage either when it first enters the sliding window, or before it leaves the sliding window. If that’s the case at each slide i only need to consider the head and tail.
(Now i think this is in the right direction but is messier than it needs to be.)
first forward scan populates this stack and ensures only showmen are included. Let’s work out some examples. First part of first scan would simply find the max among first K items. 2nd scan might be combined with first scan, but for now, my goal is clear and focused — produce a clean stack of showmen-only
pseudo code:
O(N) pre-scan to find all troughs. For each of N items, record the preceding trough. No stack yet.
First scan: check if the most recent trough is within range. If NO then nothing to pop from stack, so simply push new item and advance. Now assuming YES.
while new item >= top of stack __&&__ the 2nd top is within range (distance <=K):
keep pop
while new item >= top of stack && distance between is below K && there would still exist an earlier higher item within range:
//at end of the while loop, we could actually produce some output, but I would postpone it to 2nd scan
(For the last K items in array, we need some O(K) special handling.)
I think both conditions are necessary for popping the stack. Are these two conditions alone sufficient justification to remove the top?
—- segment-based solution #elegant
https://stackoverflow.com/questions/8031939/finding-maximum-for-every-window-of-size-k-in-an-array shows an elegant segment-based algo.
Suppose windows size := 10 and subscripts start at 0.
A) Conceptually we split the array into fixed segments of length 10. (The stub at the end is LG2.) Note unlike the ring algorithm, no physical data structure is implemented for this purely conceptual segments, but this conceptual data structure is essential to this clever solution.
B) Pre-scan – pre-scan the array twice .. rightward and leftward to populate two physical data structures, the LR-lookup and RL-lookup i.e. segment-wise max-till-here. For example,
RL[22] := max(#29 to #22) := Leftward max-till-22 within the enclosing egment@20 enclosing #20 to #29.
With the two auxiliary data structures RL lookup and LR lookup, the solution is mind-boggling simple and elegant. For example,
C) window@22 will overlap segment@20 and segment@30. The 29-to-22 section is covered by RL[22] and the 30-to-31 section is covered by LR[31]. So simply compare RL[22] vs LR[31].
I find this solution more intuitive, more visual than the ring solution. After the pre-scans, there’s no data structure to maintain so the iterations are stateless and extremely straightforward.
Q: what if increasing sequence?
Q: what if decreasing sequence?
—- The same webpage also has a concise explanation of the deque-based solution, kind of similar to but cleaner than my showman idea
• I will use a ring buffer of size=W as a ring is allocated at start-up time (at compile time if W is compile-time constant) and requires no dynamic allocation.
• Ring is FIFO. Terminology — I will say the earlier elements are removed from the Head like a ticketing queue; new elements are appended at the Tail.
• Invariant — I think within this ring, payloads are descending from head to tail.
• Ring holds subscripts, not payloads. Payloads could be strings or floats.
• At each iteration, 0 or 1 head item is removed if it is out of reach.
• At each iteration, incoming item kicks out all smaller tail items, as they are too close to the incoming giant. This operation enforces the invariant.
As a result, head of the ring (as a subscript) always points to the current max payload.
Some exchanges (CME?) limits each client to 30 orders per second. If we have a burst of order to send , I can see two common solutions A) upstream queuing B) multiple sessions
Q: is there any (sender self-discipline) flow control in intranet FIX?
A: not needed.
Scenario: If you left-click (even if to the right side) to select file1.txt, then right click, you would get the per-file context menu showing Copy/Cut/Delete etc.
Scenario: If you directly right click in an empty space (even if to the right side of some file1.txt), you get the per-dir context menu.
Q: Why no file selected? Suppose you do a right-click to the right of file1.txt. Before you release the right button, file1.txt may be highlighted temporarily but not selected. After you release the right button, you will see no file selected and you get the per-dir context menu.
Scenario: What if you have already selected file1.txt and now you want the per-dir context menu? Confusing… Solution — Just right-click in another empty space, perhaps to the right of file2.txt. Again, you will see no file selected.
— FIX Tag 35:
35=8 Execution Report
35=D Order – Single
35=9 Order Cancel Reject
35=F Order Cancel Request
35=G Order Replace Request
35=3 means rejection response due to session level errors, or “protocol-level” reject
35=9 means rejection response to a cancellation/replacement request. ‘UnableToHonorYourCancel’
— FIX tag 150:
150=0 new
150=2 fill
150=5 replace
150=6 pending cancel
150=8 Rejected. There should be a reason. Note 35 is also 8 in this case
150=A pending new — To my surprise, this one also come with 35=8 !
Q: (classic bottom-up DP) LCS i.e. longest common subsequence (not sub-string) between two strings X and Y
github has my python code with instrumentation and tests. Code shows many relatively advanced features both at language level and algo level.
https://leetcode.com/problems/rotate-image/discuss/18872/A-common-method-to-rotate-the-image shows
/*
* clockwise rotate
* first reverse up to down, then swap the x/y subscripts
* 1 2 3 7 8 9 7 4 1
* 4 5 6 => 4 5 6 => 8 5 2
* 7 8 9 1 2 3 9 6 3
*/
For anticlockwise, personally I would reverse horizontally, then same swap
This technique relies on swap(). What if the payload objects are blackboxes (or immutable) so you can’t even see contents? Just swap the 2 pointers.
— My own solution is more legwork but possibly more versatile. Suppose the matrix is N by N
m=N-1. Start from [0 0]. find new home, move in and kick out old tenant [0 m] …Each group has 4 nodes. a-> b -> c -> d -> a
See also spiral … If N is odd, then the kernel node stays changed. If N is even, then the innermost shell has a 2×2 matrix.
I now believe there are at least two purposes, not necessarily reflected in any systems I worked on.
Is GPS navigator necessary? Are side signal lights necessary? Some things are more necessary than others.
Trading system, risk systems, market data systems are mainstream. In contrast, I have found out that pricing analytics system is not really mainstream. Many buy-side and most smaller sell-side firms don’t use any pricing analytics beyond rudimentary derivations from raw market data.
There are also a number of sophisticated derivative and fixed-income analytics vendors including Bloomberg, Murex, Numerix.. These vendors focus on analytics so their customers don’t need deep expertise. OCBC’s quant team’s main job was validating the analytics offered by Murex.
Pricing analytics tools are “advisory” and not mandatory. The creators of those tools (like CSDoctor) tend to over-value their creations as if they are going to make the traders faster, safer, more profitable. In reality, traders can always choose not to use them.
As a contrast, take market data as example – 80% of trading shops need to build or buy market data systems as they can’t operate without it.
Update — [[c++codingStandard]] says we should use (smart) ptr parameter if it can be null.
Many developers return (python) None/null (java) and put such a value in a containers to positively indicate an empty value.
This practice creates unnecessary ambiguity (rather than reduce ambiguity) because many built-in modules authors use None/null when they have no choice. If you positively return this value, it’s harder to differentiate the various scenarios. This becomes an unnecessary ambiguity when troubleshooting in production.
In Java, I often create a dummy instance of a MyType and return it to mean “empty”. I think applicable to c++ too.
In python, I tend to return a negative number from a function that ordinarily returns a MyType, since python functions can return type1 in ContextA but type2 in ContextB! Python list also allows unrelated types.
in matrix or 2D-grid traversal,
See also modified BFT to check binTree is symmetric
Leetcode published solution has a clever BFT solution, easier to implement.
–idea 1 (untested): BFT to list all nodes at a given level
For each node’s enqueue() (including the null nodes), record the path-from-root as a list. As a lighter alternative to this “list”, the path can degenerate to the last step, as a le/ri flag.
Now scan each level from both ends. The left item’s path should mirror the right item’s path.
(Before the scan. confirm the node count is even.)
–in-order dft
first scan records the paths to each leaf node. (Optionally, each path can includes east/west directions).
2nd scan does the same but always visits right child first.
The two outputs should match
It pays to specialize in a domain that helps you break into similar domains. (Unsuccessful with my quant adventure so far.)
Note “wrappers” API are never upstream. Very common in big banks including Qz.
Hi Junli,
You don’t need to reply. This is my periodic review of “everything in my life”.
I have recently implemented a few controversial decisions about my career, investment, family..
(As an example, the biggest is moving back to U.S. alone and starting the green card process.)
I make major decisions carefully and slowly (unless decisiveness needed), but an observer may say I’m not a good decision maker and point out my track record. Actually I don’t remember anyone pointed them out, not even my family members. The person who point a finger at my “unwise” decisions is the “judge” in my head…
Here are some of those controversial decisions
I tried Q4, Q10, Q23.
Observation — they are not really harder in terms of pure algo. I found some “medium” questions actually harder than Q4/Q23 in terms of pure algo.
Beside the algorithm, there are other factor to make a problem hard. For me and my peers, coding speed and syntax are a real problem. So the longer my program, the harder it becomes. Some of the “medium” questions require longer solutions than these “hard” problems.
Logistics of instrumentation is another factor. Some problems are easy to set up and easy to debug, whereas 3D, graph or recursive problems are tedious to set up and often confusing when you try to debug with print’s.
There’s another factor that can make any “medium” problem really hard
In terms of innovation, simplicity, elegance, testability, instrumentation…. my design is more worthwhile than most other jobs
Still, it was not enough to impress boss. I still got PIP.
If you want to be architect, you need to pick some domains.
Compared to python.. c#.. cpp, Java appears to be the #1 best language overall for most enterprise applications.
Stroustrup was the first one to tell me c++ std::sort() can beat C qsort() easily.
— https://travisdowns.github.io/blog/2019/05/22/sorting.html says:
Since the qsort() code is compiled ahead of time and is found inside the shared libc binary, there is no chance that the comparator funciton, passed as a function pointer, can be inlined.
— https://martin-ueding.de/articles/qsort-vs-std-sort/index.html says
For qsort(), since the function is passed as a pointer, and the elements are passed as void pointers as well, it means that each comparison costs three indirections and a function call.
In C++, the std::sort
is a template algorithm, so that it can be compiled once for each type. The operator<
of the type is usually baked into the sort algorithm as well (inlining), reducing the cost of the comparison significantly.
Amount of tech learning and zbs growth over weekend coding assignments are different from coding drill or work projects
See https://www.karlrupp.net/en/computer/nat_tutorial.
What limitations are there? I only know a few
Here’s how we use NAT to translate destination port number in every incoming IP packet:
[sa@rtppeslo2 ~]$ sudo iptables -t nat -nL Chain PREROUTING (policy ACCEPT) target prot opt source destination DNAT udp -- 0.0.0.0/0 224.0.121.4 udp dpt:18020 to::48020 ....
Q: How much math in these systems? Not that much.
Q: Value-add? Questionable. That’s one reason why most financial institutions don’t spend billions building such systems. They do spend billions on traditional automation systems.
Q: Who would want to pay to use these systems? Rather Few.
Q: Python? Possibly.
case study: CSDoctor’s — value-add@analytics #CSDoctor
I agree that FICC products are more complex, even if we exclude derivatives
I see some complexities (mostly) specific to equities. Disclaimer — I have only a short few years of experience in this space. Some of the complexities here may not be complex in many systems but may be artificially, unnecessarily complex in one specific system. Your mileage may vary.
In derivatives, FICC etc.. risk usually refers to sensitivity to some observable factor like 10Y yield.
In delta-one context, risk often means unhedged exposure to a particular stock.
Q: TCP connection close .. handshakes?
Q3: why is new() so slow?
Q3b: what if I used array-new but then regular delete?
Q: implement Fib calculation at compile time
Q: write code to extract names and floating point numbers from a free-form string (no constraints)
Reason 1 — terminal value is known. It’s more than deterministic. It’s exactly $100 at maturity. Brownian Motion doesn’t match that.
Reason 2 — drift estimate is too hard too sensitive. A BM process has a drift value. U can be very careful very thorough to estimate it, but any minor change in the drift estimate would result in very large differences in the price evolution, if the bond’s lifespan is longer than 10Y.
–difference: by default,
–difference: members (including static) default to public in a struct
–difference: keyword q[class] is usable in template<class X>
–difference: q[struct] is a keyword in C and C has a few other syntax rules about q[struct]
STL is faster compared to
* alternative designs using virtual functions
* alternative designs using runtime logic instead of compile-time TMP
However, STL containers are slower than containers without on-the-fly allocation. To match those containers, you can pre-allocate enough capacity at start-up.
shared_ptr and STL containers, java autoboxing …
… are avoided in busy mkt data systems due to on-the-fly DAM allocations.
shared_ptr has extra DAM in terms of the ref counter!
consider pre-allocation to pre-empty “on-the-fly”
Many job seekers simply say “algorithm question” or “coding interview” as if they mean the same. I see significant differences between two types of coding test — compilable code vs pure algo.
Here’s one difference to start with — real-coding questions require us to be very familiar with syntax and very fast edit-compile-test cycles. I often struggle even when my idea was correct. I often run out of time. No such requirement in a white-board algo test.
I try to visualize a spectrum image like
<=== harder on language+implementation (blue side) … (red side) increasingly harder on pure algorithm ===>
I often perform better with pure algo questions than real-coding questions. My worst game is codility/hackerrank.
I find it imprecise to call all of them “algorithm questions” and disregard the key distinctions between the two types. Both types have some algorithm elements, but the real-coding questions often require additional skills such as quick completion, language efficiency, and code smells knowledge. A bug can often take us 10 minutes to locate — something rare in pure-algo questions.
Effective strategies to prepare for real-coding vs pure algo interviews are rather different. In this email I will only mention a few.
1) for speed, we need to practice and practice writing real code and solving short problems; you can try timing yourself.
2) for pure algo, we can use python; we can look at other people’s solutions; we still need to practice implementing the ideas
3) for white-board, we need to practice talking through a problem solution… need to practice asking clarifying questions and eyeball-testing
4) for weekend take-home tests, practice is not effective for me. I won’t prepare.
5) for multi-threading problem, need to practice with basic multi-threading problems.
https://vivekcek.wordpress.com/2013/03/17/simple-factory-vs-factory-method-vs-abstract-factory-by-example/ has nice block diagrams and examples.
I mostly used simple factory and not the other factory patterns.
Any time I have a method that creates and returns polymorphic objects from a family (like different Trades), I would name the entire class SomeFactory. I often add some states and additional features but as a class or struct, it is still a simple manufacturer.
Adapted from blog by Hayden James
Even when our average memory usage is smaller than RAM capacity, system still benefits from swap!
Most server processes are daemons. Any daemon can create lots of memory pages rarely accessed till shutdown. Kernel often decides to relocate rarely used memory pages to swap for performance reasons, mostly to free up RAM. The reclaimed RAM space can remain vacant for some time, so you may think the relocation is unnecessary but ..
But what if there’s no other process or hotter data? What if all the data can fit into RAM? This is rare but yes you can disable swap. Rarely needed tweaks are sometimes under-documented, as kernel is a very efficient “government” like Singapore.
Note, as explained in my [[linux kernel]], kswapd is both a process and a kernel thread that wakes up from time to time.
Q: Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.
Note K could be much larger than N.
https://github.com/tiger40490/repo1/blob/py1/py/linklist/merge4lists.py is my solution.
I feel this is mostly an optimization challenge. I can think of a few solutions
–Sol1: merge 2nd list into first. Then merge 3rd list into first …
https://leetcode.com/problems/merge-k-sorted-lists/solution/ shows that this has higher runtime cost than the brackets solution.
Reason is, each 2-merge-to-1 must visit every node in both lists. So the first list nodes get visited K times!
–Sol1b: brackets.
There are only (log K) levels in the bracket so any list gets visited that many times.
–Sol3: in-place (inefficient)
We maintain K node-pointers for the K lists (K teams)
We also maintain a pointer to the last-added node in the merged list.
first node in K lists are put into a min-heap. Winner (smallest) team would be the “current list”. Now the winner team offers next node and add it into the heap. Winning team ..
What if N=1 and K is 1 billion?
Background — expert test-takers and Math Olympians don’t create value. Grandpa is more than an expert in his domain — he created ground-breaking original research.
See also realistic+significant legacy of me as a developer@@
Generally, developers make more lasting impact than other roles. Developers are seen as value-creators. Here are my original contributions as a Wall St techie:
This is crucial knowledge for ECT coding test
See mystr[:-2] copy_truncate last 2 chars #mystr[2:]
See copy_reverse list/string/tuple by [::-1]
See python initialize list/dict : perf
Among the various list cloning solutions on https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list:
A client can ask a broker to buy “two IBM, one MSFT” either as a AA) custom basket or a BB) portfolio. The broker handles the two differently.
Only the Basket (not the portfolio) is “listed” on Bloomberg (but not on any exchanges). Client can see the pricing details in Bloomberg terminal, with a unique basket identifier.
Booking — the basket trade is recorded as a single indivisible position; whereas the portfolio trade gets booked as individual positions. Client can only sell the entire basket; whereas the portfolio client can sell individual component stocks.
Fees — There is only one brokerage fee for the basket, but 5 for a portfolio of 5 stocks.
The broker or investment advisor often has a “view” and advice on a given basket.
Corporate actions should be handled in the basket automatically.
I feel portfolio is more flexible, more informal than custom basket which is less formalized, less regulated than an index-tracking ETF.
Q1: how could you use enum to improve performance
AA: use it to replace strings
AA: enum is used in template programming to move computation from run time to compile time. See Q4 and SFINAE in github.
%%A: see clever use of enum(char?) in demanding c++ app
Q1b: you put in your change but it turns out to hurt performance. Any idea?
Q4: how do you compute Fib(N) like Fib(99999) at compile time using template programming?
A: See my Fibonacci code in github
Q: how would you use multicast to send a message to a group of registered users?
%%A: less data copying than TCP
Q: In O(1) space, Given an unsorted array of natural numbers, how do you find any pair to produce a target sum? What’s your best time complexity?
%%A: if I assume the integer size is 32-bit then a fixed-size O(1) space radix structure can sort the array in O(N). See also my blog post on [[locate a pair with targetSum=55 #bbg IV #Morris]]
%%A: if integer size is unlimited, then I think the best is O(NN)
Q: How is your muni bond pricing engine designed internally?
Q2b: what kind of pricing rules live in the cache?
Q2c: how many threads to react to a given event?
Q: how did you calculate FX fwd points?
%%A: use the interest rates in the two currencies and the spot FX rate.
Q: how is your implied volatility computed from an option price?
%%A: if there’s not a closed-form formula, then Newton’s method or something similar should be able to converge quickly.
Q3 (OO design): How would you design a game software with Animal objects, various birds, and fish?
Q3a: Say the system is running fine, but now you need to add ostrich class?
Integer(like String)objects always immutable in java. My google search confirmed that.
Beside serialization, there is no practical reason to deep-copy them.
Python integer objects are also immutable. Every time we modify an int, the new value has a different id(). See also my blog post on python immutable types.
See also https://medium.com/@meghamohan/mutable-and-immutable-side-of-python-c2145cf72747
Based on some prior experience, I feel I would not use linux as a desktop.
Just do the minimum to set up a sandbox and get the coding interview done. The GTD/zbs about linux desktop has zero market value. It doesn’t enhance my skills with “professional linux”, as proven over the years.
I had many problems installing and using linux desktops each time
This blog post is poorly structured, but let’s not spend too much cleaning it up. Not worth it. Most of the content is repeated elsewhere.
See also the benefits of coding drill in 4 benefits of coding practice #beatFront and Coding drill is anti-aging
Background: for many years until late 2017, I have focused on QQ, and neglected coding tests.
I’m reluctant to break down by “skill” but ..
Low “churn” — most coding questions have remained unchanged for 20 years.
I also realized that my c++ ECT/BP proficiency is different from my java ECT/BP proficiency.
I now feel the quant domain knowledge doesn’t change so fast, but ..
1) Quant domain is an elitist, exclusive sector with low market depth (highly specialized).
I think many intelligent/powerful developers can succeed as a quant dev, even without formal quant training, if motivated or interested enough. Abilities + effort (due to keen interest) is all I needed when I succeeded at Barcap
Not a semi-retirement domain. When you lose the mental power (Re A.Brooks) you better get out from this hot domain.
2) The high salary + analytical complexity + limelight attracts bright young people. Bright young people tend to be innovative, even when there’s no such necessity
3) see my blogpost ruthless march of technology
fixation@ROTI/payoff/success/result/accu … dampens job satisfaction+joy@learning.
This affects my “engagement”. Granted, we should not Ignore these ROTI factors, or those “smells” … instead we should evaluate our direction and take stock, but let’s not overdo it.
Consider a tour guide aiming for the tip at the end.
Consider Grandpa in his research career.
Consider a singer like 王杰 or the last few years of 邓丽君。
Consider Einstein’s violin
Q: has that increased your income or benchmark score? # more time in office, shorter commute, MSFM, c# ….
This question is harmful, misleading, derogatory, discriminatory, browbeating, pessimistic/fatalistic, myopic, … This question tosses aside many important things to our lives, our joys, and satisfaction —
I used to name unused parameters of myfunction as “dummy”.
Not always better than ‘dummy’ but at least this kind of compiler knowledge is useful in troubleshooting.
I don’t name the parameter “unused” as it is a confusing name.
https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python basically says no way. I simply use
WORD_RECORD_OFFSET=100
Only routine python code is “transparent”.
Many python features make your code impenetrable/opaque as soon as you toss in a dose of it.
Not so sure about …
questioning the value of coding drill is extremely harmful. I need 1st-aid
If for a given key a lookup array (or map) returns a non-zero value, then decrement the value.
Here I use a single reference variable in both lookup-by-key and update-by-key.
if (auto & cnt = frq[ evicted ]){ --cnt; }
Not sure about java, but in c++, if you have a data member named stockTicker, you can’t have a member function named stockTicker().
In my HRT project, I have to name the function stockTicker_()