#define PACKED __attribute__ ((packed))
This is used in our parsers. https://stackoverflow.com/questions/11770451/what-is-the-meaning-of-attribute-packed-aligned4 explained that this is a compiler feature.
#define PACKED __attribute__ ((packed))
This is used in our parsers. https://stackoverflow.com/questions/11770451/what-is-the-meaning-of-attribute-packed-aligned4 explained that this is a compiler feature.
I tend to learn something (very indirectly) about competition, churn, focus, …
I feel my choice of java and c++ have been good, but regret my investment in dotnet, pthreads
I feel any dominant technology can get displaced. A few examples (not intended to be complete) — C/C++ (by java), RDBMS, Solaris (by Linux)
My tech bets for 2019-2020:
Still no definitive answers…
In one machine, q(ulimit -c) returns 0 meaning suppressing core files. I had to run q(ulimit -c unlimited) to get my core files generated.
Suppose we have a class MoveOnlyStr which has only move-ctor, no copy-ctor. Suppose we pass an unnamed temporary instance of this class into a function by value, like void func1(MoverOnlyStr arg_mos).
Q: Will the move-ctor be used to create the argument object arg_mos? We discussed this in your car last time we met up.
A: If the temp is produce by a function, then No. My test shows I was right to predict that compiler optimizes away the temporary, due to RVO. So move-ctor is NOT used. This RVO optimization has existed long before c++11.
A: if the temp is not produced by a function, then RVO is irrelevant (nothing “Returned”) but I don’t know if there’s still some copy-elision.
Context — real time high volume market data feed from exchanges.
[D=design question]
Many details I tend to forget:
myStr = "%(var1)d" % locals()) # locals() returns a dict including var1
“#%03d” % (id(node)%1000) # return last 3 digit of object id, with 0-padding
In the late 2010’s, Wall street java jobs were informally categorized into core-java vs J2EE. Nowadays “J2EE” is replaced by “full-stack” and “big-data”.
The typical core java interview requirements have remained unchanged — collections, threading, JVM tuning, compiler details (including keywords, generics, overriding, reflection, serialization ), …, but relatively few add-on packages.
(With the notable exception of java collections) Those add-on packages are, by definition, not part of the “core” java language. The full-stack and big-data java jobs use plenty of add-on packages. It’s no surprise that these jobs pay on par with core-java jobs. More than 5 years ago J2EE jobs, too, used to pay on par with core-java jobs, and sometimes higher.
My long-standing preference for core-java rests on one observation — churn. The add-on packages tend to have a relatively short shelf-life. They become outdated and lose relevance. I remember some of the add-on
None of them is absolutely necessary. I have seen many enterprise java systems using only one or two of these add-on packages.
https://stackoverflow.com/questions/6146963/when-is-del-useful-in-python
q(+=) is well supported for number , timedelta and …. STRING
q(++) is unsupported, so you would need
myInt += 1
1) Some hiring teams have an official policy — focus on coding skills and let candidate pick any language they like. I can see some interviewers are actually language-agnostic.
2) 99% of the hiring teams also have a timing expectation. Many candidates are deemed too slow. This is obvious in online coding tests. We all have failed those due to timing. (However, I guess on white-board python is not so much faster to code.)
If these two factors are important to a particular position, then python is likely better than c++ or java or c#.
If the real challenge lies in the algorithm, then solving it in any language is equally hard, but I feel timed coding tests are never that hard. A Facebook seminar presenter emphasized that across tech companies, every single coding problem is always, always solvable within the time limit.
Every char in a given long string is one of the 6 unique characters {}[](). Write a bool isStrictlyBalanced(string)
Note {[}] is not strictly balanced.
I wrote and tested a c++ stack/map solution within 30 minutes
/*requirement: given 99 sentences and 22 queries, check each query against all 99 sentences. All the words in the query must show up in the sentence to qualify. */ bool check(string & q, string & sen){ istringstream lineStream(q); string qword; while(getline(lineStream, qword, ' ')) if ( string::npos == sen.find(qword)) return false; return true; } void textQueries(vector <string> sentences, vector <string> queries) { for (string & qr: queries){ string output1query; for(int i=0; i<sentences.size(); ++i){ if (check(qr, sentences[i])) output1query.append(to_string(i)+" "); } if (output1query.empty()) output1query = "-1"; cout<<output1query<<endl; } } int main(){ vector<string> sen={"a b c d", "a b c", "b c d e"}; vector<string> que={"a b", "c d", "a e"}; textQueries(sen, que); }
git clean -n #dry-run
git ls-files -o #also works but harder to remember
I still don’t know how to list all files each with a status like uncommitted/untracked/committed
Many interviewers ask about MI.
Python uses “protocol” to mean something else.
[[EffC++]] P 201 has a succinct definition for a protocol class
See also the marketable_xp spreadsheet… I have consistently demonstrated strength in
1) TT: theoretical complexity: %%strength
2) LL: lowLevel IV topics (seldom needed in GTD) — threads, dStruct, vptr, language rules…
So how could TT/LL influence my 10Y career direction?
https://stackoverflow.com/questions/27728142/c11-what-is-its-gc-interface-and-how-to-implement
GC interface is partly designed to enable
The probe program listed in the URL shows that as of 2019, all major compilers provide trivial support for GC.
Q: why does c++ need GC, given RAII and smart pointers?
A: system-managed automatic GC instead of manual deallocation, without smart pointers
Let’s set the stage — you can either pass in a function name (like myFunc), or a ptr to function (&myFunc) or a functor object. The functor is recommended even though it involves more typing. Justification — inlining. I remember a top c++ expert said so.
I believe “myFunc” is implicitly converted to & myFunc by compiler, so these two forms are equivalent.
Reality warning — every time I try the SG job market, recruiters would tell me there are so many “new” employers or new markets, but invariably, i need to focus on the old guards .. mostly ibanks, as the new market is not open to me. This is similar to Deepak, Shanyou trying the wall St c++ job market.
I must stop /romanticizing/ about the “improvement” in SG job market.
Basically no change in the landscape since 2015. The jobs available to me are mostly ibanks. Cherish the MLP job but beware attachment. If this job goes sour, I would have to consider WallSt, rather than another perm job in SG.
From ptr-to-const-char, you need an explicit cast to ptr-to-cast, removing constness.
The reverse needs no cast — the language/compiler automatically converts a regular ptr-to-char to a ptr-to-const-char. Why? I think it’s relevant to consider the rationale behind
int i=5;
int const ci = i; // no cast needed.
How about smart pointers?
why do we have to define static field myStaticInt in a cpp file?
For a non-static field myInt, the allocation happens when the class instance is allocated on stack, on heap (with new()) or in global area.
However, myStaticInt isn’t take care of. It’s not on the real estate of the new instance. That’s why we need to declare it in the class header, and then define it exactly once (ODR) in a cpp file. It is allocated at compile time — static allocation.
Let’s set the stage. A function returns a local Trade object “myTrade” by value. Will RVO kick in or move-semantic kicks in? Not both!
I had lots of confusions about these 2 features.[[effModernC++]] P176 has a long discussion and an advice — do not write std::move() hoping to “help” compiler on a local object being returned from a function
P23 [[c++stdLib]] gave 2-line answer:
So if condition for RVO is present, then most likely your move-ctor will NOT run.
I hit a similar question in NY, possibly LiquidNet or CVA
Q: Make a system (perhaps a function?) that returns the average number of hits per minute from the past 5 minutes.
I will keep things simple by computing the total hit over the last 300 seconds. (Same complexity if you want average order amount at Amazon over last 5 minutes.)
Let’s first build a simple system before expanding it for capacity.
Let’s first design a ticking system that logs an update every time there’s an update. The log can be displayed or broadcast like a “notice board”, or we can update a shared atomic<int>.
Whenever we get a new record (a hit), we save it in a data structure stamped with an expiry date (datetime). At any time, we want to quickly find the earliest unexpired record i.e. the blue record. There’s only one blue at any time.
What data structure? RingBuffer with enough capacity to hold the last 5 minutes worth of record.
I will keep the address of the current blue record which is defined as the earliest unexpired record in the last update. When a new record comes in, i check “Is the blue expired?” If NO, then easy.. this new record is too close to the last new record. I simply update my “notice board” in O(1). If YES then we run a binary search for the new blue. Once we find it, we have to compute a new update in O(W), where W is the minimum of two counts, A) recently expired records B) still unexpired records. After the update, we remove the expired items from our data structure.
–That concludes my first design. Now what if we also need to update the notice board even when there is no new record?
I would need an alarm set to the expiry time of the current blue.
–Now what if the updates are too frequent? I can run a schedule update job. I need to keep the address of a yellow record, defined as the newest record of the last update.
When triggered, routine is familiar. I check “Is the blue expired?” If NO then easy… If YES then binary-search for the new blue.
This is a __key__ part of understanding move-semantics, seldom quizzed. Let’s set the stage:
Q1: When would the compiler select the rvr version?
P22 [[c++stdLib]] has a limited outline. Here’s my illustration
After we are clear on Q1, we can look at Q2
Q2: how would std::move help?
A: insert(std::move(originalAmount)); // if we know the object behind originalAmount is no longer needed.
https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when we need to use std::move() and when we don’t need.
I feel move ctor (and move-assignment) is extremely implicit and “in-the-fabric”. I don’t know of any user function with a rvr parameter. Such a function is usually in some library. Consequently, in my projects I have not seen any user-level code that shows “std::move(…)”
Let’s look at move ctor. “In the fabric” means it’s mostly rather implicit i.e. invisible. Most of the time move ctor is picked by compiler based on some rules, and I have basically no influence over it.
https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when I need to call move() but it’s a contrived example — I have some object (holding a resource via heap pointer), I use it once then I don’t need it any more, so I “move” its resource into a container and abandon the crippled object.
Conclusion — as app developers I seldom write code using std::move.
I bet that most of the time when an app developer writes “move(…)”, she doesn’t know if the move ctor will actually get picked by compiler. Verification needed.
— P544 [[c++primer]] offers a “best practice” — Outside of class implementations (like big4++), use std::move only when you are certain that you need to do a move and it is guaranteed safe.
Basically, the author believes user code seldom needs std::move.
— Here’s one contrived example of app developer writing std::move:
string myStr=input;
vectorOfString.push_back(std::move(myStr)); //we promise to compiler we won’t use myStr any more.
Without std::move, a copy of myStr is constructed in the vector. I call this a contrived example because
The #1 usage of atomic<int> is load() and store(). I will use short form “load/store” or “l/s”.
The #2 usage is CAS. Interviewers are mostly interested in this usage, though I won’t bother to remember the function names —
* compare_exchange_strong()
* compare_exchange_week()
The CAS usage is same as AtomicInteger.java, but the load/store usage is more like the thread-safety feature of Vector.java. To see the need for load/store, we need to realize the simple “int” type assignment is not atomic [1]:
To solve this problem, atomic<int> uses internal locks (just like Vector.java) to ensure load() and store() is always atomic.
[1] different from java. https://stackoverflow.com/questions/11459543/should-getters-and-setters-be-synchronized points out that 32-bit int in java is never “half-written”. If you read a shared mutable int in java, you can hit a stale value but never a c++ style half-written value. Therefore, java doesn’t need guarded load()/store() functions on an integer.
Q: are these c++ atomic types lock-free?
A: for load/store — not lock-free. See P 1013
A: for CAS — lock-free CPU instructions are used, if available.
As stated repeatedly, c++ is the most complicated and biggest language used in industry, at least in terms of syntax (tooManyVariations) and QQ topics. Well, I have impressed many expert interviewers on my core-c++ language insight.
That means I must have some expertise in c++ QQ topics. For my c++ zbs growth, see separate blog posts.
Note socket, shared mem … are c++ ecosystem, like OS libraries.
Deepak, Shanyou, Dilip .. are not necessarily stronger. They know some c++ sub-domains better, and I know some c++ sub-domains better, in both QQ and zbs.
–Now some of the topics to motivate myself to study
There’s a high risk of under-performing. In a perm job, that invites warning, perf improvement, bonus fear — all forms of stigma-phobias.
With contract jobs, I can operate without the fear of stigma!
Here, “under-performing” mostly refers to “figure-things-out slower than team peers”, which (usually but) doesn’t always attracts those stigmas. Ultimately it’s the manager’s assessment.
Stirt/Quartz – For example, my figure-things-out speed was not slower than my peers and not slower than Barcap 2nd half, but still i got the stigma.
Citi — for an opposite example, my figure-things-out speed was rather slow but I didn’t get the stigma. I got renewed once.
Let’s set the stage — We have a stream of bytes in little-endian format. Let’s understand it according to the spec. The struct is packed tight without padding.
Spec says left-most field is a char. It is always on the left-most, regardless of endianness. If we look at the 8 bits, they are normal. 0x41 is ‘A’. Within the 8 bits, no reordering due to endianness.
Spec says next four bytes is an integer. Most significant bit (suppose a one) is on the right end, representing 2^31. What’s the integer value? To work out by hand, we need to pick the four bytes as is — Byte1 Byte2 Byte3 Byte4. Then we reverse them into Byte4 Byte 3 Byte2 Byte1. Now this 32-bit integer is human-readable. The human-readable form is now a binary number taught in classrooms.
Note the software program still uses the original “Byte1 Byte2 Byte3 Byte4” and can print out the correct integer value.
Spec says next four bytes is a float. There’s nothing I can do to make out its value without a computer, so I don’t bother to rearrange the bytes.
Next 2 bytes is a string like “XY”. First byte is “X”. Endian-ness doesn’t bother us.
vim by default would mess up the pasting, by inserting // before every line after the first comment.
:set paste
was able to disable this default behavior for me, but this command has side effects, so I usually turn it off immediately:
:set nopaste
https://vim.fandom.com/wiki/Toggle_auto-indenting_for_code_paste has some details also has link to official documentation.
shmget
, shmat
, shmdt
, shmctl
…).(I want to keep this blog in recrec, not tanbinvest. I want to be brief yet incisive.)
See 3 ffree scenarios: cashflow figures. What capabilities enabled me to achieved my current Comfortable Economic profile?
When I say “Comfortable” I don’t mean “above-peers”, and not complete financial freedom, but rather … easily affordable lifestyle without the modern-day pressure to work hard and make a living. In my life there are still too many pressures to cope with, but I don’t need to work so damn hard trying to earn enough to make ends meet.
A higher salary or promotion is “extremely desirable” but not needed. I’m satisfied with what I have now.
I can basically retire comfortably.
Just like my early learning curve in sockets, Dynamic Programming and swing, I have yet to achieve a breakthrough in these topics, So there are too many topics and I don’t know what to focus on.
It’s important not to exaggerate your expertise in these areas. Once interviewers find out your exaggeration, subconscious they would discount other parts of your resume.
Q: create an exchange with messaging for NewOrderSingle, ExecutionReport etc. (I think interviewer means the matching server.)
https://www.ibm.com/developerworks/library/l-semaphore/index.html — i have not read it.
My [[beginning linux programming]] book also touches on the differences.
I feel this is less important than the sharedMem topic.
The counting semaphore is best known and easy to understand.
Linux manpage pointed out — System V semaphores (semget(2), semop(2), etc.) are an older semaphore API. POSIX semaphores provide a simpler, and better designed interface than System V semaphores; on the other hand POSIX semaphores are less widely available (especially on older systems) than System V semaphores.
The same manage implies both APIs use a _counting_ semaphore semantic, without notification semantics
Personal best practice. Git History rewrite is not always no-brainer and riskless. Once in 100 times it can become nasty.
It should Never affect file content, so at end of the rewrite we need to diff against a before-image to confirm no change.
The dumbest (and most foolproof) before-image is a zip of entire directory but here’s a lighter alternative:
Note the branch name can be long but always explicit so I can delete it later without doubt.
http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.shared_memory_steps is excellent summary
* We (the app developer) need to pick a unique name for the shared memory region, managed by the kernel.
* we can use create_only, open_only or open_or_create
* When we link (or “attach” in sysV lingo) App1’s memory space to the shared memory region, the operating system looks for a big enough memory address range in App1’s address space and marks that address range as an special range. Changes in that address range are automatically seen by App2 that also has mapped the same shared memory object.
* As shared memory has kernel or filesystem persistence, we must explicitly destroy it.
Above is the posix mode. The sysV mode is somewhat different.
Hi Kam,
Thanks for your valuable tip. I found this 3-point summary on https://stackoverflow.com/questions/15176104/c11-range-based-loop-get-item-by-value-or-reference-to-const
1) Choose for(auto x : myVector) when you want to work with copies.
2) Choose for(auto &x : myVector) when you want to work with original items and may modify them.
3) Choose for(auto const &x : myVector) when you want to work with original items and will not modify them.
In the case of myMap, the first form for(auto myPair: myMap) still clones the pairs from myMap. To use a reference instead of a clone, we need the 2nd or 3rd forms.
In a timed coding test, I think it saves precious time to do
if some_condition: print some_result; sys.exit(). # can use in any function :)
If the pre-exit routine is needed at several exit-points in the program flow, then extract the routine as a function that ends in sys.exit()
Note The exit() function is completely irrelevant.
https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method has a few concise pointers
c++ interviews value deep insight more than any language. Java and c# interviews also value them highly, but not python interviews.
Reminder — zoom in and dig deep in c++, java and c# only. Don’t do that in python too much.
Instead of deep insight, accumulate ECT syntax … highly valued in TIMED coding tests.
Use brief blog posts with catchy titles
Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/DFT_show_level.cppstruct Node { int data; Node *left, *right, *next; Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {} }; Node _15(15); Node _14(14); Node _13(13); Node _12(12); Node _11(11); Node _10(10); Node _9(9); Node _8(8); Node _7(7, &_14, &_15); Node _6(6, NULL, &_13); Node _5(5, &_10, NULL); Node _4(4, NULL, &_9); Node _3(3, &_6, &_7); Node _2(2, &_4, &_5); Node root(1, &_2, &_3); int maxD=0; void recur(Node * n){ static int lvl=0; ++lvl; if (lvl>maxD) maxD = lvl; if (n->left){ recur(n->left); } cout<<n->data<<" processed at level = "<<lvl<<endl; if (n->right){ recur(n->right); } --lvl; } int maxDepth(){ recur(&root); cout<<maxD; } int main(){ maxDepth(); }
example — RTS exchange feed dissemination infrastructure uses raw TCP and UDP sockets and no MOM
example — the biggest sell-side equity OMS network uses MOM only for minor things (eg?). No MOM for market data. No MOM carrying FIX order messages. Between OMS nodes on the network, FIX over TCP is used
I read and recorded the same technique in 2009… in this blog
Q: why is this technique not used on west coast or main street ?
%%A: I feel on west coast throughput outweighs latency. MOM enhances throughput.
I have read about fork() many times without knowing these details, until Trex interviewer asked !
–based on http://man7.org/linux/man-pages/man2/fork.2.html
The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the new process, including the states of pthread mutexes, pthread condition variables, and other pthreads objects In particular, if in parent process a lock was held by some other thread t2, then child process only has the main thread (which called fork()) and no t2 but the lock is still unavailable. This is a common problem, addressed in http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch12lev1sec9.html.
The very 1st instruction executed in Child is the instruction after fork() — as proven in https://github.com/tiger40490/repo1/blob/cpp1/cpp1/fork3times.cpp
The child inherits copies of the parent’s set of open file descriptors, including stdin/stdout/stderr. Child process should usually close them.
Special case — socket file descriptor inherited. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/
https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/pthreadCondVar.cpp shows my experiment using gdb supplied by StrawberryPerl.
On this g++/gdb set-up, “info threads” shows thread id number 1 for main thread, “2” for the thread whose pthread_self() == 2 … matching 🙂
The same “info-threads” output also shows
stringstream ss; ss<<“before processMessage(), at “<<__FILE__<<“:”<<__LINE__;
Hi friends,
I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.
Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.
Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.
When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.
I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.
So what do interviewers focus on?
So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.
Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.
>> li=[[-2]] # simplest
>> tu=((-2,),) # more keyboard work
>>> len(li[0])
1
https://github.com/tiger40490/repo1/blob/py1/py/2d/printDiagonally.py uses similar list syntax to populate a bigger matrix
After reading http://valgrind.org/docs/manual/ms-manual.html#ms-manual.not-measured, I was able to get massif to capture non-heap memory:
valgrind --tool=massif --pages-as-heap=yes --massif-out-file=$massifOut .../xtap -c .... ms_print $massifOut
Heap allocation functions such as malloc
are built on top of system calls such as mmap
, mremap
, and brk
. For example, when needed, an allocator will typically call mmap
to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc
et al. Massif directly measures only these higher-level malloc
et al calls, not the lower-level system calls.
Furthermore, a client program may use these lower-level system calls directly to allocate memory. By default, Massif does not measure these. Nor does it measure the size of code, data and BSS segments. Therefore, the numbers reported by Massif may be significantly smaller than those reported by tools such as top
that measure a program’s total size in memory.
This classification helps me organize my java learning, but let’s not spend too much time on this imprecise concept —
So-called “java ecosystem” is anything outside the “core java” stack and include jxee plus ..
Q: Three players A/B/C flipping a fair coin one after each other until the first head is thrown, What’s the probability of Alice winning.
I think the problem is the same if coin is biased P(H)=0.6
Denote Pr(Alice eventually wins) as x.
Pr(first 3 are TTT AND Alice eventually wins) = 1/8 * x
x = 1/2 + 1/8 * x —> x=4/7
Background: Suppose in a big python application your main script imports a few packages and modules. One of them is mod2.py, which in turn imports mod2a.py.
Now You need to add invesgative logging/instrumentation to mod2a.py but this file is loaded from a readonly firm-wide repository, common practice in big teams. Here’s my tested technique:
1. clone mod2a.py to your home dir and add the logging. Now we need to import this modified version.
2. clone mod2.py to your home dir and open it to locate the importation of mod2a
3. edit mod2.py to change the importation of mod2a:
sys.path.insert(0, ‘/home/dir’)
import mod2a # via /home/dir
sys.path.remove(‘/home/dir’)
All other imports should be unaffected.
4. edit main.py and update the importation of mod2.py to load it too from /home/dir
As an additional hack, Some people may rename the modified mod2a.py file. This is doable IIF the import line is
from mod2a import someSymbol
Otherwise, every mention of “mod2a” in mod2.py needs a change
This is about shell interpreting the backslash sequence inside single-quote or double-quote.
Once bash does its parsing, it can pass the result to a command like perl or grep.
----Most escape sequences don't care about single-quote vs double-quote $ echo "msgType\t" msgType\t $ echo "msgType\b" msgType\b # \b is meaningful in perl regex 🙂 ----double backslash -- single-quote is simpler than double-quote $ echo 'msgType\\' msgType\\ $ echo "msgType\\" msgType\ ----single quote within single-quoted string is very tricky: $ echo 'msgType\'\' msgType\' # in the above, the last \' is a second token, a single-char string. $ echo $'msgType\'' # dollar sign is crucial msgType' $ echo 'msgType\'' # somehow doesn't work without $ >
I feel MSA is more of a architect interview topic, not a developer interview topic. Dev complexity is low by design.
eg: error acct lookup, receiving productId + possibly a clientId, returning an error acct
Now the phrasebook:
show your best practice in coding tests!
1) Opening example — Suppose a constant SSN=123456789 is used in a1.cpp only. It is therefore a “local constant” and should be kept in a1.cpp not some .H file. Reason?
The .H file may get included in some new .cpp file in the future. So we end up with multiple .cpp files dependent (at compile-time) on this .H file. Any change to the value or name of this SSN constant would require recompilation to not only a1.cpp but unnecessarily to other .cpp files 😦
2) #define and #include directives — should be kept in a1.cpp as much as possible, not .H files. This way, any change to the directives would only require recompiling a1.cpp.
The pimpl idiom and forward-declaration use similar techniques to speed up recompile.
3) documentation comments — some of these documentations are subject to frequent change. If put in .H then any comment change would trigger recompilation of multiple .cpp files
I would say “avoid” or “eliminate” rather than “minimize” byte copying. Market data volume is gigabytes so we want and can design solutions to completely eliminate byte copying.
Survival tip — Alt-t gets to Settings, if you need to unhide …
Motivation — most monitors are too “thin”, so the bars take up vertical space.
http://www.stroustrup.com/C++11FAQ.html#std-forward_list highlights the extreme space-efficiency
Not sure of the use cases but I suspect it is a choice in extremely space-efficient designs when arrays won’t work. Geek4geek mentions two use cases:
Java9/10 default GC is G1. CMS is officially deprecated in Java 9.
Java8/7 default GC is ParallelGC, CMS. See https://stackoverflow.com/questions/33206313/default-garbage-collector-for-java-8
Note parallelGC uses
…whereas parallelOldGC uses parallel in all generations.
Q: why is CMS deprecated?
A: one blogger seems to know the news well. He said JVM engineering team needs to focus on new GC engines and need to let go the most high-maintenance but outdated codebase — the CMS, As a result, new development will cease on CMS but CMS engine is likely to be available for a long time.
I feel ICE worked, because I quickly became confident with C++ GTD.
jxee (esp. web java) is fashionable … high growth, big job pool
Q: Is there some jxee component with stable demand and accu? Spring? Servlet is very relevant from 1999 to 2019 but not quizzed in IV!
Some developers are afraid of the unique challenges [1] in core java, but I’m more afraid of complexities in jxee packages esp. when combined in non-standard combinations. See my blogpost on python routine tasks and my blogpost on spring.
[1] threading, latency, collections .. but I don’t want to elaborate here.
Without enough evidence, I feel jxee skills are less elite, easy to self-study, and shallow until you hit project issues.
core java is mostly limited to ibanks + buy-side. jxee presumably offers better market depth and breadth. Without enough evidence, I feel job pool is growing for jxee not for core java or cpp. Similarly, job pool is growing for javascript, mobile, big data, cloud..
No need to experiment at home or read books like I did on JMS, EJB, Spring. It takes too much time but doesn’t really give me …
Latency knowledge
In terms of sorting performance, Arrays.sort(primitiveArray) is a few times faster than Collections.sort() even though both are O(N logN). My learning notes:
RandomAccess marker interface (ArrayList implements) is completely irrelevant. That’s because any List.java subtype that provides RandomAccess can simply override (at source code level) the default method as demonstrated in ArrayList.java. This is cleaner than checking RandomAccess at runtime. One or Both designs could potentially be JIT-compiled to remove the runtime check.
One of the most devastating damaged-goods experiences was the layoff at baml.
However, if I did better at codility i would have transferred and I would not have felt like damaged goods!
Of course there are multiple contributing factors to "damaged goods", but in this case, codility is one.
[1] https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup has the best quickguide with simple examples.
https://softwareengineering.stackexchange.com/questions/274306/free-standing-functions-in-global-namespace is a short, readable discussion.
My annotations on the formal, long-winded definition — ADL governs look-up of unqualified function names in function-call expressions (including operator calls). These function names are looked up in the namespaces of their arguments in addition to the usual namespace-based lookup.
https://github.com/tiger40490/repo1/blob/py1/py/rehash_table.py
Showcase: simple link Node class with a __str__()
Showcase : populate a python list with None’s
git config --global credential.helper wincred
$ git config credential.helper store
$ git push
Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>
git config --global user.email {ID}+{username}@users.noreply.github.com
You can find it in https://github.com/settings/emails
First learn the keyboard accelerator to select current line.
Remember shift-HOME selects back till beginning; shift-END selects till the end.
HOME then shift-END
END then shift-HOME
See also post on extern…
These rules are mostly based on [[c++primer]], about static Field, not local statics or file-scope static variables.
Rule 1 (the “Once” rule) — init must appear AND execute exactly once for each static field.
In my Ticker Plant xtap experience, the static field definition crucially sets aside storage for the static field. The initial value is often a dummy value.
Corollary: avoid doing init in header files, which is often included multiple times. See exception below.
Rule 2 (the “Twice” rule) — static field Must (See exception below) be DECLARED in the class definition block, and also DEFINED outside. Therefore, the same variable is “specified” exactly twice [1]. However, the run time would “see” the declaration multiple times if it’s included in multiple places.
Corollary: always illegal to init a static field at both declaration and definition.
[1] Note ‘static’ keyword should be at declaration not definition. Ditto for static methods. See P117 [[essential c++]]
The Exception — static integer constant Fields are special, and can be initialized in 2 ways
* at declaration. You don’t define it again.
* at definition, outside the class. In this case, declaration would NOT initialize — Rule 1
The exception is specifically for static integer constant field:
Rule 3: For all other static fields, init MUST be at-definition, outside the class body.
Therefore, it’s simpler to follow Rule 3 for all static fields including integer constants, though other people’s code are beyond my control.
——Here’s an email I sent about the Exception —–
It turned these are namespace variables, not member variables.
Re: removing “const” from static member variables like EXCHANGE_ID_L1
Hi Dilip,
I believe you need to define such a variable in the *.C file as soon as you remove the “const” keyword.
I just read online that “integer const” static member variables are special — they can be initialized at declaration, in the header file. All other static member variables must be declared in header and then defined in the *.C file.
Since you will overwrite those EXCHANE_ID_* member variables, they are no longer const, and they need to be defined in Parser.C.
Background — template function std::swap(T&, T&) works for int, float etc, but the same implementation will not work efficiently for vector, list, map or set. Therefore I suspected there might be specializations of swap() template function.
As it turns out, vector (and the other containers) provides a swap() member function. So the implementation of vector swap is indeed different from std::swap().
Very few JDK containers implement the RandomAccess marker interface. I only know Stack.java, ArrayList.java and subclass Vector.java. Raw array isn’t.
Only List.java subtypes can implement RandomAccess. Javadoc says
“The primary purpose of this interface is to allow generic algorithms to alter their behavior when applied to either random or sequential access lists.”
Q: which “generic algos” actually check RamdonAccess?
AA: Collections.binarySearch() in https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
AA: to my surprise, Collections.sort() does NOT care about RandomAccess, so ArrayList sorting is no different from LinkedList sorting! See my blogpost Arrays.sort(primitiveArray) beat List.sort()
http://etutorials.org/Programming/Java+performance+tuning/Chapter+11.+Appropriate+Data+Structures+and+Algorithms/11.6+The+RandomAccess+Interface/ has more details
https://stackoverflow.com/questions/1950878/c-for-loop-indexing-is-forward-indexing-faster-in-new-cpus top answer is concise. I think the observations may not be relevant in x years but the principles are.
Note L1/L2/L3 caches are considered part of the CPU even if some of them are physically outside the microprocessor.
In our discussions on ODR, global variables, file-scope static variables, global functions … the concept of “shared header” is often misunderstood.
Therefore, you may experiment by putting “wrong” things in such a private header and the set-up may work or fail, but it’s an invalid test. Your test is basically putting those “wrong” things in an implementation file!
“Most C library calls (such as I/O and memory allocation functions) perform thread synchronization underneath.” according to [[Java Native Interface]].
I guess memory allocation by default uses a process-wide shared heap, rather than thread-specific heap.
Outside Morgan, I only know (from reliable sources) that some machine learning teams use scala in addition to python and java.
I now think the simplest is xrange() followed by random.shuffle. Don’t bother with random.sample(). See my github code.
https://stackoverflow.com/questions/22842289/generate-n-unique-random-numbers-within-a-range shows
https://github.com/tiger40490/repo1/blob/py1/py/array/qsort.py uses
random.sample(xrange(–99, 100), 19)
In general, below construct is useful because sampleSize must never exceed size of the range i.e. the population:
random.sample(xrange(-99, -99+sampleSize), sampleSize)
I think the generated list has no duplicates, so I had to manually create some duplicates. I guess random.shuffle can work on a list containing duplicate…
–to generate an array of 8 integers between 0 and 100
>>> import random
>>> random.sample(xrange(100), 8)
[39, 53, 1, 80, 54, 61, 4, 26]
Ideally, I want to get a job role slightly lower than the highest salary, where my 80% effort can start to exceed the expectation of THE appraiser.
Grandpa said “At 100% if you don’t hit their requirement, then you don’t need to put in 120% and sacrifice family. It’s their hiring mistake. Their problem. if they don’t pay a compensation package then just leave.” Am I afraid of job change? See separate blogpost.
Looking at past jobs, the numbers below are very imprecise and subjective.
[G/g=Greenfield]
[B/b=brownfield]
There are 52 x 2 weekends and 9 public holidays, total of 113 non-billable days.
If there are 2 furlough days then total 115 non-billable days, leaving 250 billable days i.e. 2000 billable hours.
In my past experience, furlough was rare, but I usually take 15 vacation days each year.