##some of the memorable technology churns #tanko

Tanko

Here’s my expanded list of “worst” tech domains in terms of technology churn. Nothing but personal bias. For every IT professional, it’s his or her personal responsibility to identify these domains, and perhaps avoid investing (too much) into them.

  • —- not ranked
  • java generics QQ knowledge is a fad compared to c++ TMP
  • Object-oriented perl
  • javascript toolkits
  • GWT
  • silverlight
  • Gemfire, Coherence …
  • ADO.net
  • EJB, Weblogic
  • struts, spring integration
  • functional programming
  • Windowing GUI technologies – X-windows, PowerBuilder, Delphi, Borland c++, …
  • perl — is slowly being displaced by python, though bash scripting is robust
  • Javascript libraries like node.js, angular, jquery, GWT
  • ORM — product proliferation
  • MOM products — products proliferation like tibco, solace, 29west, Tervela, zeroc …
  • datagrid and noSQL — products proliferation
  • high-level integration
    • SOA, microservices?
    • web services, REST
    • EJB, RMI, RPC
    • JMS
  • anything to do with big data —
    • Map reduce? I hope Hadoop remains the standard
    • cloud?
    • machine learning — product proliferation
  • Web app development in general —
    • java web development including struts
    • Microsoft web development
    • PHP? I hope this is a bit more stable, but there are definitely new packages gaining popularity
  • anything on Windows
    • Powershell seems to challenge vbscript.
    • Windows administration – there seem to be many new utilities added every 5 years, replacing the old
  • anything on mobile
    • WAP
    • SMS based apps — used to be so popular in zed’s heydays
    • WindowsPhone, Symbian

–churn-resistant, robust techonologies

  • C/C++
  • C++ key libraries — STL, boost
  • socket, tcp/udp
  • Unix admin (relative to Windows admin) and scripting
  • core java i.e. at the core layer
  • SQL complex queries
  • DBA
  • Messaging architecture?
  • FIX
  • async architecture
  • http

unique_ptr and move()

Looking at my experiment moveOnlyType_pbvalue.cpp, I now believe we probably need to call std::move() frequently when passing around named instances of unique_ptr.

Unique_ptr ‘s copying is actual moving. The are different ways to code it.

  • sometimes you need to use someFunc(move(myUniquePtr));
  • sometimes you can omit move() and the semantics remain the same.

http://stackoverflow.com/questions/9827183/why-am-i-allowed-to-copy-unique-ptr has some examples. Note none of the functions have a parameter/return type showing “&&”. That’s because there is pbclone in play. The copying uses the move-constructor, which does have a && parameter.

I think some developers simply copy sample working code, without understanding why, like an ape. Some use threading constructs the same way. Nothing shame. I feel interviewers are interested in your understanding.

##10 c++coding habits to optimize perf

Many of these suggestions are based on [[optimized c++]]

· #1 Habit – in c++ at least, ++counter performance is strictly “better or equal to” counter++. If there’s no compelling reason, I would prefer the former.

· #2 Habit – in a for loop, one of the beginning and ending values is more expensive to evaluate. Choose the more expensive one as the beginning value, so you don’t evaluate it over and over. Some people object that compiler can cache the more expensive end value, but 2016 tests show otherwise.

· If a method can be static, then make it static. Good for performance and semantics.

· For a small if-else-if block, put the most likely scenario first. May affect readability. Worthwhile only in a hot spot.

· For a long if-elif-elif-elif-elif block, a switch statement performance is strictly “greater or equal”

· For-loop starts by checking the condition (2nd component in header). If this initial check is redundant (as often is), then use a do-while loop

· Call a loop in a function, rather than call a function in a loop. Another micro-optimization.

SO_REUSEPORT TCP server socket option – hungry chicks

With SO_REUSEPORT option, multiple TCP server processes could bind() to the same server endpoint. Designed for the busiest multithreaded servers.

http://i.dailymail.co.uk/i/pix/2011/03/03/article-1362552-0D7319F3000005DC-882_634x357.jpg – a bunch of hungry chicks competing to get the next worm the mother delivers. The mother can only give the worm to one chick at a time. SO_REUSEPORT option sets up a chick family. When an incoming connection hits the accept(), kernel picks one of the accepting threads/processes and delivers the data to it alone.

See https://lwn.net/Articles/542629/  + my socket book P102.

TCP server socket lingering briefly af host process exits

[[tcp/ip sockets in C]] P159 points out that after a host process exits, the socket enters the TIME_WAIT state for some time, visible in netstat.

Problem is, the socket still binds to some address:port, so if a new socket were to attempt bind() to the same it might fail. The exact rule is possibly more complicated but it does happen.

The book mentions 2 solutions:

  1. wait for the dying socket to exit TIME_WAIT. After I kill the process, I have seen this lingering for about a minute then disappearing.
  2. new socket to specify SO_REUSEADDR.

There are some simple rules about SO_REUSEADDR, so the new socket must be distinct from the existing socket in at least one of the 4 fields. Otherwise the selection rule in this post would have been buggy.

(server)promiscuous socket^connected socket

[[tcp/ip sockets in C]] P100 has a diagram showing that an incoming packet will be matched against multiple candidate listening sockets:

  • format: {local address:local port / remote address:remote port}
  • Socket 0: { *:99/*:*}
  • Socket 1: {10.1.2.3:99/*:*}
  • Socket 2: {192.168.3.2:99/ 172.1.2.3:30001} — this one has the remote address:port populated because it’s an Established connection)

An incoming packet need to match all fields otherwise it’s rejected.

However it could find multiple candidate sockets. Socket 0 is very “promiscuous”. The rule (described in the book) is — the more wild cards, the less likely selected.

(Each packet must be delivered to at most 1 socket as far as I know.)

IV^CV is real battle

(Adapted from a Mar 2017 letter to Lisa Wang… )Let me share my observations and reflections on this tough job hunt. Another stock-taking. Focus here is non-finance jobs in the U.S.

For months I used a slightly tweaked CV for non-banking (“main street”) tech positions, but it’s not working — Out of the 30 to 40 non-finance positions I applied, precious few (15%??) recruiters were interested. Suppose 5 recruiters showed interest, I guess not all of them submitted my resume. Suppose 4 did submit. So far, no hiring manager was impressed with my non-finance CV. (Response from financial firms are better but not my focus today.)

So different from my prime time (from 2010 to 2012) when my finance-oriented resume was selling like a hot cake. I would estimate more than 50% of the recruiters were impressed and many hiring managers showed interest.

Of course, I’m comparing my “main street” resume against my Wall-St resume. Not a fair comparison but it does highlight these key issues:

Recruiter engagement is the #1 issue and hiring manager engagement is #2 issue. Interview competence is a distant #3 and not a key issue. Many people disagree — “you need no more than one successful interview.” They believe a 50-80% interview success rate is the silver bullet needed. Well, how long must you wait before you fire your silver bullet?

I feel much better if my interview pass rate is only 20% (or 10%), but I get 5 times more interviews! I learned from experience that my interview performance improvement is limited without sufficient interviews. So it’s far more effective and strategic to work on getting more interviews. I don’t want to be one of those guys who need 6 months to find a job. I see them starved of oxygen. Steady flow of interviews keep me motivated and focused, too.

In conclusion the key issue is crafting a compelling resume to engage recruiters and hiring managers. (A more pressing issue on main-street front than on the Wall-st front.)

Therefore, I count each interview scheduled as a success. In contrast, an offer is less significant an achievement. Analogies:
* as a singer, each TV appearance is a success; Winning a singing contest is less significant.
* as a growing basketballer, each time I get to play on court is a success; winning a game is less significant.

I have always told my peers that 90% of the job candidate competition is on the resume, and 10% on interviews. (Now I feel 95%/5%) Many candidates can pass interviews if given the chance. The chance is given to winning resumes. I say this to my friends because I learned from experience to invest much more effort improving the resume, until it can impress a large percentage of recruiters and hiring managers.

For the “main street” positions, I hope to engage 33% of the recruiters and 10% of the hiring managers. With that, if I were to try 30 opportunities, I could expect to get 3 interviews!

wait()needs while-loop #spurious

Across all languages, I have never seen any exception. You must enclose wait() in a while-loop.

C++11 has a syntactic sugar — wait(myLock, myPredicate), hiding the while loop.

Q: why is spurious wake unavoidable? [[Josuttis]] P1004 has a concise answer:
A: thread library (not operating system) can’t reliably ensure to deliver the notification, so to play safe, the thread library wakes the waiting thread.

— I wrote about java:

Wait() is always awakened by a notification message sent to the monitor object.

Between the notification thread releasing the lock and the waking thread acquiring, a third thread can modify some shared data[1], and the condition you are waiting for disappears as a result.

Therefore the waking thread need to recheck the condition. if() won’t do[2]. while() is required.

If you know there is no 3rd thread messing with the condition, then the waking thread can assume the condition is still intact. However, this design unnecessarily weakens code extensibility and re-usability. It’s fairly easy to put a while() loop around a wait(). No good reason not to do it.

[2] This kind of bug is intermitten.
[1] in a database for example (a crude example)

condition.signalAll Usually requires locking

Across languages,

  • notify() doesn’t strictly need the lock;
  • wait() always requires the lock.
    • c++ wait() takes the lock as argument
    • old jdk uses the same object as lock and conditionVar
    • jdk5 makes the lock beget the conditionVar

[[DougLea]] P233 points out that pthreads signal() doesn’t require the lock being held

—- original jdk
100% — notify() must be called within synchronized block, otherwise exception. See https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#notify()

—- jdk 5 onwards
99% — https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html#signal() says that An implementation may (and typically does) require the lock acquired by the notifying thread.

Not 100% strict.

— compare await():
100% — On the same page, the await() javadoc is 100% strict on this requirement!

====c++
0% — notify_all() method does NOT require holding the lock.  See http://en.cppreference.com/w/cpp/thread/condition_variable/notify_all

–compare wait()
100% — wait(myLock) requires a lock as argument!

====c#
100% — Similar to old JDK, the notifying thread must hold the lock. See https://msdn.microsoft.com/en-us/library/system.threading.monitor.pulseall(v=vs.110).aspx

 

serialize access to shared mutable: mutex^CAS

[[optimized c++]] P290 points out that, in addition to mutex, CAS construct also serializes access to shared mutable objects.

I feel it’s nothing but a restatement of the definition of “shared mutable”.  More relevant question is

Q: what constructs support unimpeded concurrent access to shared mutable?
A: read-write lock lets multiple readers proceed, in the absence of writers.
A: RCU lets all readers proceed, but writers are restricted.

atomic≠lockFree≈CAS

See also blog on c++ atomic<int> — primary usage is load/store, not CAS

lock free isn’t equal to CAS — lock-free construct also include Read-Copy-Update, widely used in linux and other kernels.

atomic isn’t equal to lock-free — For example, c++ atomic classes (actually templates) can use locks if the underlying processor lacks CAS instruction.

 

sorted circular array max() in O(log N)

Q: A sorted array A[] with distinct elements is rotated at some unknown point, the task is to find the max element in it.

Expected Time Complexity : O(Log n)

–Analysis —
It takes a const time to determine if the list is ascending or descending.

(Take 1st 3 elements. If not sorted, then entire task is simple — answer is the max among the three because we hit the turning point)

Suppose we know it’s ascending, use binary search forward to locate the turn.

c++trick: pbref void max() on a list

I feel this technique may be needed somewhere else. In this example, we can move the amax() body into the for-loop!

// call by reference is used in x
template<class T, class U> static inline void amax(T &x, U y){
  if (x < y)
  x = y; //put bigger value into x
}

int main(){
  int array[] = { 4, -5, 6, -9, 2, 11 };
  int max_val = array[0];

  for (auto const &val : array)
    amax(max_val, val);

  std::cout << "Max value = " << max_val << "\n";
  return 0;
}

 

read-copy-update lockfree +! retry #RCU

RCU is an advanced concurrency construct, not implemented in a regular application, but valuable for lock-free interviews.

http://concurrencyfreaks.blogspot.sg/2016/09/a-simple-userspace-rcu-in-java.html is a “simple Userspace” java implementation. I didn’t read it in detail and assume it’s rather different from the linux kernel RCU

http://www.modernescpp.com/index.php/aba-a-is-not-the-same-as-a (by a published author) has a brief mention of Userspace RCU solution to ABA problem. Also touches on Garbage Collection.

https://lwn.net/Articles/262464/ — by RCU inventor. I can see RCU is non-trivial !

The Wikipedia article is accessible until the paragraph introducing read-side-critical-section and grace-period. This is the first paragraph on the implementation. I found it hard. Therefore a few background pointers:

  • · There must be multiple threads, typically many readers and few writers.
  • · There must a shared mutable data structure, probably on heap.
  • · Not all data structures are suitable for RCU. So which subset are? I would say pointer-graph including hash tables.

In the simplified model, we are talking about

  • · A reader thread R1 executing a block of code that has started reading the shared mutable data structure. This code block is the critical section
  • · A writer thread W1 executing a block of code attempting to update the same data.
  • · After the so-called grace period, a GC thread would reclaim the obsolete data that R1 has finished reading.
  • · A reader R2 enters the critical section after the update is done. R2 would see the new data

GC need to know when it’s safe to reclaim. It needs the wait-for-readers operation and the grace period.

Q: What if R1 gets a reference to the “obsolete node”, performs some slow task, reads the node, then exits the critical section? This node is reclaimable only after R1 exits critical section. The grace period would be long?

Q: Among 9999 nodes, how does system keep track which each node is reclaimable?
%%A: I feel kernel access to CPU registers might be needed.

Q: How does it compare with copy-on-write?
A: COW probably copies the entire data structure. Also, the modified copy is not visible to subsequent readers (like R2)

Q: How does it compare with read/write lock?
A: readlock can block

Q: How does it relate to lock-free concepts?
A: reader threads are lock-free and even better — not required to retry.
A GC threads need to wait.
A: Writer thread (like W1) ? not sure. Might be lock-free in some situations.

angular.js jQuery node.js — phrase book

server-side — Node.js is only library designed for server-side use.

web UI — most of the javascript libraries are meant for web UI

cross-browser support — jQuery, Angular

I feel jQuery is more popular than the other javascript libraries.

php-integration? I have seen books dedicated to jQuery+php

DOM — well supported in jQuery

Ajax — well supported in jQuery

data binding — a major feature of Angular.js, not jQuery.

##google-searchable dev technologies(!!Qz):lower stress

This topic is Rather important to my stress level, my available time for learning, my available time for family….

For example, with Quartz, I must ask around and experiment (millions of times). Slow to build a coherent understanding. Slow ramp-up. (In contrast, with Python I could do that by reading good articles online.) So my productivity lag/gap remains even after a few years.

Other Negative examples – Tibrv, Autosys, Accurev, Less-known boost libraries,..

MSVS? Most of the search results are about c#, so it’s somewhat harder to solve problems.

Eclipse CDT? Most of the search results are about Eclipse java.

Positive examples – vbscript, DOS batch,

Yet, this stressor is mild compared to “performance warnings”.

noSQL 2+2 categories: more notes

Q: is Sandra a document DB? Hierarchical for sure. I think it could be a graph DB with a hierarchical interface
Q: which category most resembles RDBMS? DocStore

https://www.linkedin.com/pulse/real-comparison-nosql-databases-hbase-cassandra-mongodb-sahu/ compares 2 columnar vs a DocStore product and shows “not good for“!

–category: graph DB? lest used, most specialized. Not worth learning
–category: columnar DB? less used in the finance projects I know.
eg: Cassandra/HBase, all based on Google BigTable

Not good at data query across rows.

–category: document store, like Mongo

  • hierarchy — JSON and XML
  • query into a document is supported (In contrast, key-value store is opaque.) Index into a document?
  • index is absolutely needed to avoid full table scan
  • search by a common attribute
  • hierarchical document often contains maps or lists in an enterprise application. I think it’s semi-structured. More flexible than a RDBMS schema

–category: distributed hashmap, like redis/memcached

  • usage — pub/sub
  • Key must support hashing. Value can be anything
  • Value can be a hierarchical document !
  • Opaque — What if your value objects have fields? To select all value objects having a certain field value, we may need to use the field value as key. Otherwise, full table scan is inevitable. I think document store supports query on a field in a document. However, I think Gemfire and friends do support query into those fields.

##challenges across noSQL categories

I see the traditional rdbms is unchallenged in terms of rock-bed reliable transactional guarantee. Every change is saved and never lost. Many financial applications require that.

Therefore, the owners buy expensive hardware and pay expensive software license to maintain the reliability.

–common requirement/challenges for all noSQL categories. Some of these are probably unimportant to your project.

  • node failure, replication
  • huge data size – partitioning
  • concurrent read/write
  • durability, possible loss of data
  • write performance? not a key requirement
  • query performance? much more important than write. Beside key lookup, There are many everyday query types such as range query, multiple-condition query, or joins.

##resilient WS tech: FIX,sh-script…

Background: the constant need to economize on my learning hours. Have to attempt to pick the “survivors”.

  • FIX as a tech skill is an example of unexpected resilience. (However, this is in a specialized domain, so competition is arguably lower.) FIX isn’t dominant. Exchange native API is faster. Many HFT shops don’t use FIX, but still FIX has good ROTI.
  • SQL is not longer dominant but still standard
  • Tibco isn’t dominant. Many teams use competing products. Still it’s resilient
  • XML, in the face of light-weight serialization protocols – json, protobuf
  • Bourne shell, in the face of python, perl…
  • STL, invented in the 80’s (??) with many limitations, challenged repeatedly
  • tibrv, challenged by 29west, solace,

I have a bias towards older technologies. They have stood the test of time.

where are jvm locks stored@@ #cf Windows

A 2017 Morgan Stanley on-site interviewer asked…

  • %%A: can’t be on stack, since multiple threads need access to it
  • %%A: probably kernel objects. On Windows, most of the synchronization objects are wrappers over kernel objects. See [[Optimized c++]] P288.
    • If java threads maps to kernel threads then I would think the same.
    • If java threads are implemented in userland, then kernel scheduler is unaware and can’t use kernel objects to control them. I would think JVM must have its own mutex objects without kernel support.

http://stackoverflow.com/questions/5713142/green-threads-vs-non-green-threads explains that only embedded or low-power devices use “green threads” i.e. userland threads. Linux and Windows JVM probably don’t support green threads, since the kernel provide “better” thread support.

git | %%tips { Macq

–to recreate current feature branch br2 as a clone of br3
The slower method is ” git checkout br3; git branch -d br2; git checkout -b br2″
The slightly faster method is “git reset –hard br3”

–Most git commands accept –help

  • Short Questions
    • How to rename a branch on Stash directly?
    • How to add a tag on Stash directly?
    • How to view the commit and annotation on one or multiple git tags?
    • How to add or edit the annotation of a tag on Stash?
    • How do I clean my external files?
    • How do I revert a file?
    • How do I get my local branch back to a remote?
    • Why is `git status` slow in Linux?
    • How do I list all branches that change a file?
    • How do I delete a local branch?
    • How do I merge multiple WIP commits into a single commit?
    • How do I get rid of obsolete remote branches
    • How to resolve pull request merge conflicts
    • A few ways to create new feature branch
      • Using command line without Stash
    • How to Revert non-commited changes
    • How to Revert branches back to origin
    • How to List names of modified files only
    • How to Cherry-Pick Commits to Push to develop
  • Using TortoiseGit
    • Overview
    • Some of the many convenience features
      • Feature: list of all uncommitted files (like git commit –name-status)
    • Installation
    • Using TortoiseGit
      • Setting up the Key
      • Clone the Stash repository (Workspace)
      • Interface 
      •  Commands
      • Enable icons in Win Explorer

Short Questions

How to rename a branch on Stash directly?

On your Stash page change base branch to the target branch (say BB) -> click on the “…” dropdown -> choose “Create a branch from here” -> specify new name (say NN) -> delete branch BB.

How to add a tag on Stash directly?

Navigate to the page of your chosen commit (such as this sample page), then find the link on the top right “Tag this commit”

How to view the commit and annotation on one or multiple git tags?

git show your_tag # look into one tag

The Stash page below also helps, provided the tag is published on Stash.

How to add or edit the annotation of a tag on Stash?

…  shows all tags, including the commit and the annotation. Choose one you want to edit and go to Actions column click the 3 dots and choose Edit.

Why is `git status` slow in Linux?

It might be due to the networked file system. We’ve found that doing the following might speed it up:

git config --global core.preloadindex True

See stackoverflow for details.

How do I list all branches that change a file?

FILENAME=<myfile>
git log --all --format=%H $FILENAME | while read f; do git branch --contains $f; done | sort -u

How do I get rid of obsolete remote branches

First remove it from Stash. Easy and safe to do on Stash web site. Go to the page listing all mod branches. Locate your branch you don’t want. Click on the “…” on the far right, to show the dropdown and find “Delete”. This would complete the deletion on the central repo. Next in your local repo, run

git fetch --prune

This command would sync up the local repo to match central repo on Stash. The deleted branch will now be deleted locally. Without this step, git branch -a would forever show the obsolete branch.

Remember each local repo or central repo holds a copy of all the branches. Deleting branch123 from one repo doesn’t automatically delete branch123 from another repo. Similarly, adding branch321 to one repo doesn’t automatically add branch321 to another repo.

How to resolve pull request merge conflicts

If possible, we use the Stash interface to merge. The resulting commit message on Develop branch looks like

Merge pull request #151 in CFMQUANT/mod from bugfix/tar to develop 

Merge pull request #147 in CFMQUANT/mod from feature/CFMQUANT-253 to develop 

Merge pull request #135 in CFMQUANT/mod from feature/CFMQUANT-243 to develop

Sometimes we get a conflict when merging the PR. For example, PR #153 has one file modified in the incoming branch release/0142c. Same file was very recently modified on Develop. Therefore this file had 2 concurrent changes, resulting in a merge conflict. This is what Victor did to resolve it, not necessarily optimal or recommended.

 

git fetch origin release/0142c
git branch -d develop
git checkout develop                    # check out latest from Stash
git merge FETCH_HEAD                    # one file in conflict, to be resolved
cd models/PyQXLModelsLib/ddl/
vi PyQXLModels_ddl_pycxx.cfg
git diff origin/develop PyQXLModels_ddl_pycxx.cfg    # ensure the diff is exactly what we expect
git add PyQXLModels_ddl_pycxx.cfg
git commit # see the comment message below
git log --simplify-merges               # verify
git push origin HEAD                    # may need special permission to push Develop to Stash

Here’s the commit message I used in the merge commit:

Merge pull request #153 manually from release/0142c to develop
[CFMQUANT-235] resolve minor conflict on one file

 

 

Using TortoiseGit

Overview

Another option for a visual interface for git under Windows is TortoiseGit.  This isn’t a GUI, as such, but a shell that sits on top of Windows Explorer.  Some people like it for this reason; some people dislike it for this reason.

It gives a convenient way to see the status of your files, and a convenient way (right-clicking) to select most commonly-used git commands (you can of course still use the command line whenever that is more convenient).  There is some pain involved in setting up the key for connecting to the server – scroll down to ‘Interface’ and ‘Commands’ below to see if you think the gain is worth the pain.

Git command line is rather rich and powerful, featuring hundreds (possibly thousands:) of commands + sub-commands + combinations of switches to those commands. (https://www.kernel.org/pub/software/scm/git/docs/ lists more than 100 top-level commands). It’s impractical for any GUI tool to emulate the command line. Tortoise provides a good information-radiator tool, that saves you lots of repetitive typing.

Some of the many convenience features

Convenience is where Tortise distinguishes itself from other software tools.

In WinExplorer, Tortise lets you right click any modified file (or multiple files) to commit to your local branch. After the commit, the dialog screen lets you push to the central branch on Stash.

Similarly, if you have a new file to commit, you can right click and add, then commit.

You can also right click and diff with previous version.

Feature: list of all uncommitted files (like git commit –name-status)

You can show in a window a list of all files modified locally but uncommitted. You can then double-click each file to pop-up a diff screen, such as BeyondCompare. You can click to select one or more of these files to commit. For me this is the most convenient way to keep track of a large number of code changes. Same feature exists in TortiseSVN.

You can limit the scope to one directory (AgModels for eg). Within the same screen, you get a checkbox to widen the scope to entire repo.

There’s another checkbox to include unversioned (i.e. newly created) files. You can then add them to git with very few clicks.

Installation

TortoiseGit is free, and can be downloaded from http://tortoisegit.org/download/  You’ll need to download the 64 bit version, and then run it (it will prompt for an x account password during the installation).  It does try to restart a lot of programs (so make sure you’ve saved everything), but doesn’t require a restart.  The default options in the install wizard worked fine for me.

Using TortoiseGit

Setting up the Key

For some reason, TortoiseGit only accepts keys in putty key format, so the key that works with Git needs to be converted to this different form (this should be a once-off).  An explanation of how is given here: http://develop-for-fun.blogspot.com.au/2013/12/configure-tortoise-git.html (basically you need to download puttygen.exe to convert the format of the key).

Clone the Stash repository (Workspace)

To create a new repository based on the stash repository, you can use Windows Explorer to create a new folder, move into that folder and then right click, and choose “Git Clone…” to bring up a dialog box.

Set the url to ssh://git@….. the directory to where you want it installed.  You’ll need to click on “Load Putty Key” and put the address of the putty key saved down from puttygen earlier, which could be any stable folder in your C: drive. Avoid moving this folder.

Interface

Once successfully installed, Windows Explorer in folders associated with a workspace will now look something like

  • A green circle with a tick: up-to-date. (Victor: based on my earlier TortoiseSVN knowledge, a folder marked green means everything in it is in-sync with remote, a valuable knowledge.)
  • A red circle with an exclamation mark: file on disk has been changed and not committed (or for a folder, a file somewhere in that folder – possibly several folders deep – has been changed).
  • A blue circle with a question mark: “external” ie have not been added to the list of committed files
  • A yellow triangle with an exclamation mark: “conflicted” ie the equivalent of “overlap” in AccuRev

A full list of icons is given at https://tortoisegit.org/docs/tortoisegit/tgit-dug.html#tgit-dug-general-dia-icons

 Commands

Right clicking on a file or folder brings up a menu of git commands.  Right clicking at the top of the directory tree means you can commit all changes, or click on merge to bring up a GUI interface to merge everything.

The full manual can be found at https://tortoisegit.org/docs/tortoisegit/

If you are using a workspace that you set up outside of TortoiseGit (e.g. via the command line), then before you push to, or pull from, the server, you will need to point TortoiseGit to the putty key, by selecting TortoiseGit->Settings, then under “Git”, select “Remote”, then click on “origin” in the “Remote” box, and then set Putty Key.

Enable icons in Win Explorer

Victor had to fix his registry to get Windows Explorer to show the (nice little) icons. In addition to http://martinbuberl.com/blog/tortoisegit-icons-not-showing-workaround/ , he had to kill all Explorer.exe processes in taskmgr before relaunching Explorer.

Explorer “overlay” (technical jargon) is no mature and stable technology. Victor estimates a few times a year the icons would disappear on (subset of) the version-controlled files, often for no obvious reason, and then reappear at a random time. In spite of frequent disappointments, many users love these little icons.

jargon: mutex^lock^monitor

All these terms appear in function names and class names. They must have well-defined technical meanings. Problem is, each platform has its own definitions. [[Optimized c++]] P288 singled out “Semaphore” jargon as rather different animals across operation systems.

–In Java

  • Monitor refers to the “combo” object serving as both lock and condition variable. Java 5 introduced an alternative family of objects, where a lock object gives birth to condition objects. Most java developers still use the simpler combo object.
  • I feel lock vs mutex mean the same.

–in c++

  • Lock is an RAII object, based on a mutex.
  • C++ doesn’t use “monitor” jargon as far as I know.

–On Win32

See https://bintanvictor.wordpress.com/2010/06/01/c-mutex-vs-semaphore/

create positive-stressful env n keep fit #UChicago;c++job

Hi XR,

We talked briefly about this human nature — no one has that “perfect” will-power, perseverance and persistent motivation, therefore as a result the environment we put ourselves in can help induce a significant personal effort.

Here’s the situation we discussed — you are in a perm job, you won’t feel motivated to keep up interviewing skills. I guess our workload + family responsibilities will “crowd out” any personal study plan to keep learning and stay fit for interviews. (This looks like one example of the boiling frog — a frog won’t jump out of hot water if on a slow cooker.)

I see many examples of this human nature:

  • I mention my blog and my github in my resume, so I sometimes worry that prospective interviewers may spot mistakes. That fear keeps me motivated to improve my “published” work. When
    I review my work, I often learn something, and I often feel excited that I’m slowly building up a positive “public” profile.
  • Without taking my current c++ job, how many (spare time) hours would I be able to put into c++ self-study? I think average 2hr/week is achievable but not easy. Nowadays I actually put in about 5 to 10 hours outside 9-6 work hours, on my c++ projects.
  • In Singapore, my wife can put in some effort to improve English but she doesn’t have time or energy. She finds it difficult to put in the effort. In the U.S. she WOULD indeed put in more effort learning English. On the other hand, many students in China could build a formidable English vocab simply by reading — they have a persistent motivation.
  • 30 minutes moderate exercise 5 times a week — is a basic guideline, but I know very few adults actually doing that consistently. I bet when one’s health condition requires that much exercise, there will be more motivation and more effort.
  • Once I paid course fees, I routinely put in 30 hours a week on my financial math studies. In theory I could self-study the same, but how many hours would I put in on average over a year? Perhaps 30 minutes/week!

(Most of my examples are about self-improvement and self-motivation. )

readLock +! writeLock – same as no lock@@

Q9: With regard to shared-mutable access semantics, using a readLock and discarding the writeLock (i.e. unable to use it) is similar to using no lock, but are they semantically the same?

This is a theoretical discussion. The implementations are not familiar to me and not relevant. Before we answer  Q9, let’s look at

Q1: in the absence of shared-mutable, do we need any lock?
A1: no

Q1b: what if we use readLock without writeLock?
A2: should be fine.

Q3: what if we do use a lock in a shared function?
A3: system behavior is modified — serialized access, which is unnecessary

A9: should be semantically identical. Everyone gets a free ticket at entrance.

lower pressure to move up ] U.S.^sg

In U.S.,  the overall income differences between a hands-on developer vs a leadership role is smaller.

UE: U.S. engineers;
UM: U.S. managers;
SE: Sgp engineers;
SM: Sgp managers;
  • salary — UE much better than SE. The few high salaries in SE are too rare and unreachable
  • career longevity — UE clearly better than SE; UE probably better than SM too.
  • job security — UE much better than SE due to abundance of similar jobs; UE probably better than SM
  • fungible — UE can move into technical UM and back, more easily, thanks to abundance of jobs
  • tech lead, architect roles  — UE can move up in that direction more easily than SE, thanks to abundance of jobs. SM and UM may not have enough technical capabilities.

Economy — I feel hands-on specialists are more central to the U.S. economy and U.S. companies than in other countries. In Singapore, manager is by far the most instrumental and dominant role.

For a Chinese techie in the U.S. the prospect of managerial path is limited. Most of these managers won’t rise beyond the entry-level. And then consider your own background relative to the average Chinese here.

My tentative conclusion is

c++CollabEdit/Broadway IV: implement hash table#python

Q: implement a hash table class in any language. You could use existing implementations of linked list, array, hash function…

Q: talk about how you would implement rehash?
%%A: hash code won’t change for the key objects. But I would rerun the modulus against the new bucketCount. Based on the new index values, I would create the linked lists in each new bucket. Every pair needs to be relocated. Lastly I need to get rid of the old bucket array.

Q: how would you test your hash table?
%%A: try inserting (key1, val1), then (key1, val2), then look up key1
%%A: if I know any common weakness of the hash function, then test those.
%%A: trigger rehash

Q: what could go wrong in a multi-threaded context?
%%A: things like lost update or duplicate entries

Q: What concurrency solution would you choose for best performance?
%%A: could use lockfree algo at each point of writing to the bucket array or writing to a linked list.

unique^shared^auto_ptr #container

unique_ptr shared_ptr auto_ptr #less important scoped_ptr #least important
container restricted usage. See [2] .
more popular than boost::ptr_container!
can put into container forbidden illegal
instance field yes [1] but rare yes good replacement
for raw ptr
? Yes!
stack obj YES yes popular OK? Yes
copyable no. move-only yes releases ownership illegal
as return type YES but rare yes
as factory return type default choice. See other posts. rare never no clue

[1] https://katyscode.wordpress.com/2012/10/04/c11-using-stdunique_ptr-as-a-class-member-initialization-move-semantics-and-custom-deleters/ and http://stackoverflow.com/questions/15648844/using-smart-pointers-for-class-members

[2] https://stackoverflow.com/questions/2876641/so-can-unique-ptr-be-used-safely-in-stl-collections

linker dislikes [non-generic]function definition in shared header

I used to feel header files are optional so we can make do without them if they get in our ways. This post shows they aren’t optional in any non-trivial c++ project. There is often only one (or few) correct way to structure the header vs implementation files. You can’t make do without them.

Suppose MyHeader.h is included in 2 cpp files and they are linked to create an executable.

A class definition is permitted in MyHeader.h:

class Test89{
void test123(){}
};

However, if the test123() is a free function, then linker will fail with “multiple definition” of this function when linking the two object files.

http://stackoverflow.com/questions/29526585/why-defining-classes-in-header-files-works-but-not-functions explains the rules

  • repeated definition of function (multiple files including the same header) must be inlined
  • repeated class definition (in a shared header) is permitted for a valid reason (sizing…). Since programmers could not only declare but define a member function in such a class, in a header, the compiler silently treats such member functions as inline

DB=%% favorite data store due to instrumentation

The noSQL products all provide some GUI/query, but not very good. Piroz had to write a web GUI to show the content of gemfire. Without the GUI it’s very hard to manage anything that’s build on gemfire.

As data stores, even binary files are valuable.

Note snoop/capture is no data-store, but falls in the same category as logging. They are easily suppressed, including critical error messages.

Why is RDBMS my #1 pick? ACID requires every datum to be persistent/durable, therefore viewable from any 3rd-party app, so we aren’t dependent on the writer application.

deque::erase is fairly efficient

Time complexity deleting one item from a deque:

Depending on the particular STL library implementation, there’s up to an additional linear time on the number of elements between position and one of the ends of the deque.

Deque maintains multiple linked segments. Each segment is continuous and designed to be small, so that the distance between any deletion position and end-of-encolosing-segment is small. For example,

. If a vector has size 100,000 and your deletion (or insertion) position is 5, then you must shift 99,995 elements.
. If your deque has segment size 32, and your deletion (or insertion) position is 5 in the enclosing segment, then you must shift the subsequent 27 elements

The choice of 32 as a segment size is my random guess. If segment size is too large, then deletion (and insertion) is slower. If too small, then there will be too many small segments — poor cache efficiency.

shared_ptr [+unique_ptr] pbclone^pbref#Sutter

Practical question. In practice, the safe and lazy choice is pbclone. In IV, I think pbclone is “acceptable”. I feel pbref (const ref) is a risky micro-optimization, but Herb Sutter advised differently:

  • Express that a function will store and share ownership of a heap object using a by-value shared_ptr parameter
  • Use a const shared_ptr& as a parameter only if you’re not sure whether or not you’ll take a copy and share ownership. Perhaps the default choice IMO.

In my apps, the  receiving function often saves the smart_ptr in some container. Both options above actually work fine.

If you just need to access the underlying raw pointer, then Sutter said just pass in the raw pointer. I feel a pbref is also acceptable.

Smart pointer objects are designed to mimic raw pointers, which are usually passed by clone.

see http://stackoverflow.com/questions/8385457/should-i-pass-a-shared-ptr-by-reference and http://stackoverflow.com/questions/3310737/shared-ptr-by-reference-or-by-value

—-With unique_ptr, rules are simpler. Pass by clone as much as you want. It will be move-constructed. I don’t think pbref is needed.

See other posts about unique_ptr, such as https://bintanvictor.wordpress.com/2017/03/31/unique_ptr-and-move/

it takes effort to remain in financial IT#Ashish

NY vs NJ — cost is basically the same my friend, in terms of rent, transportation, food … Most of my friends probably live in New York city suburbs. (I know no one who choose to live in NYC, paying 4% city tax.) My cost estimate is always based on my experience living in NYC suburbs.

The “relief” factor you identified is psychological, subtle but fundamental. It resonates in my bones. It’s a movie playing in my head every day.

In the present-day reality, at my age I probably can still find jobs at this salary in Singapore or Hongkong, with growing difficulties. How about in 10 years? Huge uncertainty. If you were me, you have to take a long, hard look into yourself and benchmark yourself against the competing (younger) job seekers. You would find negative evidence regarding your competitiveness on the Asia financial IT job market. That’s the get-the-job aspect. The 2nd aspect is keep-the-job and even tougher for me. Therefore in the U.S. I make myself a free-wheeling contractor!

As I age, I take the growing job market competition as a fact of life. I accept I’m past my prime. I keep working on my fitness.

On both [get|keep ]-the-job, U.S. offers huge psychological relief. Why? Ultimately, it boils down to long term income (and family cash flow). Since I believe U.S. employers can give me a well-paying job more easily, for longer periods, I feel financially more secure. I will hold meaningful jobs till 65 (not teaching in polytechnic, or selling insurance etc). I can plan for the family more confidently. I could plan for a new home.

There are many more pros and cons to consider. Will stick to the bare essentials.

Singapore offers security in terms of the social “safety net”, thanks to government. Even if my Singapore salary drops by half, we still enjoy subsidized healthcare, decent education, frequent reunions with grandparents, among many other benefits (I listed 20 in my blog..).

Besides, I always remind my wife and grandparents that I have overseas properties, some paying reliable rental yield. Under some assumptions, passive income can amount to $5k/mon so the “worry” my dad noticed is effectively addressed. Another huge relief.

As a sort of summary, in my mind these unrelated concerns are interlinked –

· My long-term strength/weakness on job markets

o coding practice, lifelong learning

o green card – a major weakness in me

· family cash flow

o passive incomes

o housing

· Singapore as base camp

Do you notice I didn’t include “job security” in the above list? I basically take job-Insecurity as a foregone conclusion. Most of my friends are the opposite – taking job security as an assumption. This difference defines me as a professional, shapes my outlook, and drives me to keep working on “fitness”.

If I had no kids, I would have already achieved financial freedom and completely free of cash flow concerns. As a couple, our combined burn rate is S$3k-4k/mon but our (inflation-proof) passive income has/will reach that level. Also medical and housing needs are taken care of in Singapore. In reality, kids add S$1k-2k to our monthly burn rate. We are still on our way to financial freedom. So why the “worry”?!

From: ASHISH SINGH

Sent: Thursday, 16 March 2017 8:07 AM
To: Victor Tan
Cc: ‘Bin TAN (Victor)’
Subject: Re: it takes effort to remain in financial IT

That jewish guy and avichal are exceptions . Most of developers struggle and get success by hard work, by spending time with code for atleast 1-2years in any new company. I have not met other fast learners like them. Those are rare to find.

Regarding the worry that your dad always notices , i think this is quite obvious “more money more stress” but yeah this is the right time for you to move to USA where there are plenty of tech jobs . This move will surely give you a shy of relief.

Regarding the cost if living, why would you want to stay in NY , it is indeed costly. How about NJ?

text parsing with stringstream

 

#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;

string _input =
"1 reverse this line\n"
"2 reverse 2nd line\n"
"3 sort gamma alpha";
istringstream entireInputStreamWithTabs(_input); //pretend to be raw input

/* With fixed column types, this parser is more strict. More professional in an interview.
Also supports DateTime parsing. See https://bintanvictor.wordpress.com/2017/03/16/cpp-parse-datetime-string-without-boost/
*/
void extractionTokenParser(string const & fullLine) {
 istringstream ss1line = istringstream(fullLine);
 int lineNum;
 ss1line >> lineNum;
 cout << "line num = " << lineNum << endl;

 for (string tmp; !ss1line.rdstate();) { // check EOL
   ss1line >> tmp; //whitespace removed:)
   cout << tmp << endl;
 }
}
// --------- simpler alternative ---------
void DelimTokenParser(string const & fullLine) {
 istringstream ss1line = istringstream(fullLine);
 for (string tmp; !ss1line.rdstate();) { // check EOL
   getline(ss1line, tmp, '\t');
   cout << tmp << endl;
 }
}
void parseUsingStringStream() {
 string fullLine;
 //cout << _input << endl;
 for (; !entireInputStreamWithTabs.rdstate();) { // check EOF
   getline(entireInputStreamWithTabs, fullLine);
   //cout << fullLine << endl;
   // you can now search or modify the string
   extractionTokenParser(fullLine);
   DelimTokenParser(fullLine);
 }
}

int main(){
 parseUsingStringStream();
 return 0;
}

eclipse to exclude a source folder

There is a straight way to do it:

  1. Right-click a project folder in Project Explorer tree and go to “Properties”.
  2. Resource -> Resource Filters.
  3. Add as much exclusion filters for files/folders as you like.

P.S. If your project tree is not updated automatically you may have to press F5 while having input focus in Project Explorer window.

c++parse DateTime using stringstream #no boost

This is the simplest way I have found.

#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
using namespace std;

//withou Boost, parsing string to DateTime and back
// from http://arsenmk.blogspot.sg/2014/07/converting-string-to-datetime-and-vice.html
int main(){
 stringstream ss{ "1970-01-01 8:00:01" };
 tm simpleStruct; //construct a placeholder on stack
 //parse and output to the placeholder
 ss >> get_time(&simpleStruct, "%Y-%m-%d %H:%M:%S");

 time_t secSinceEpoch = mktime(&simpleStruct);
 if (secSinceEpoch < 0) {
 cout << "parsing failed. (Very strict.) " << secSinceEpoch << endl;
 return -1;
 }
 cout << secSinceEpoch <<" seconds since Epoch (1970/1/1 midnight GMT) is -> ";
 cout << asctime(localtime(&secSinceEpoch));
}

std::string is usable in MSVS but can’t cout +! #include {string}

Many STL headers in Visual C++ (including iostream header) pull in a definition of the std::basic_string class (because they indirectly include the implementation-defined <xstring> header (never include that directly)). While that allows you to use the string class, the relevant operator<< is defined in the <string> header itself, so you must include that manually.

java8 lambda under the hood #phrasebook

Q: how are java8 lambda expressions translated?

* Special helper method – InvokeDynamic. You can see it in the bytecode
* Static methods – a non-capturing (stateless) lambda expression is simply converted to a static method
* a capturing lambda expression can also become a static method with the captures as additional method args. This may not be the actual compiler action, but it is a proven model. (Compare : separate chaining is one proven implementation of hash tables.)

However, static methods obscure an essential rule — the lambda expression’s type must “look like” a subtype of a SAM interface. Remember you often pass a lambda around as if it’s a SAM implementation instance.

So even if the actual work (like number crunching) is done in a static method, there must be some non-static wrapper method in a SAM subtype instance.

https://blog.codefx.org/java/dev/lambdas-java-peek-hood/ has some details on the evolution and design.

minimum-cost array shrinking #Citadel

Input is a vector of positive integers. You are to shrink it progressively. Each step you remove 2 selected elements, and replace with their sum. Therefore vector size drops by 1 at each step, until there’s one element left.

At each step there’s a cost, which is defined as that sum.

Eg: {4,2,1} becomes {4,3} after you combine 2/1. The cost at this step is the sum 2+1=3.

Q1: For a given vector, find a sequence of steps with the lowest total cost. Test your code in c++
Q2: how would you optimize your code, assuming high data volume.

#include <vector>
#include <queue>
#include <algorithm>
#include <iostream>
#include <string>
#include <functional> // std::greater
using namespace std;

vector<int> vec = { 3,2,1 }; // a std::set will fail when duplicate values show up like {3,3}
priority_queue<int, vector<int>, std::greater<int> > pq(vec.begin(), vec.end());

void dumpVector(string msg) {
 cout << msg << ", size = " << vec.size() << endl;
 for (auto it = vec.begin(); it != vec.end(); ++it) cout << *it << ' ';
 cout << endl;
}

int operateVector(int sum = 0) {
 auto lowestItem = min_element(vec.begin(), vec.end());
 sum += *lowestItem;
 vec.erase(lowestItem); // now s1 is an invalidated iterator and unusable

 //remove() is bad as it removes all not one item having target value!
 //v.erase(std::remove(v.begin(), v.end(), *lowestItem), v.end()); 

 dumpVector("afer erase");
 return sum;
}

void dumpHeap(string msg) {
 auto clone = pq;
 cout << msg << ", size = " << clone.size() << endl;
 for (; !clone.empty();clone.pop()) {
 std::cout << clone.top() << ' ';
 }
 cout << endl;
}
int operateHeap(int sum = 0) {
 sum += pq.top();
 pq.pop();
 //dumpHeap("afer pop");
 return sum;
}

int f1(int sum = 0) {
 return operateHeap(sum);
}
int main87() {
 int totalCost = 0;
 for (; pq.size()>1;) {
 int sum = f1(f1()); //call f1() twice recursively.
 pq.push(sum);
 dumpHeap("afer push");
 totalCost += sum;
 }
 cout << "total cost = " << totalCost << endl;
 return 0;
}

java override: strict on declared parameter types

Best practice – use @Override on the overriding method to request “approval” by the compiler. You will realize that

https://briangordon.github.io/2014/09/covariance-and-contravariance.html is concise –

Rule 1: “return type of the overriding method can (but not c++ [1]) be a subclass of the return type of the overridden method, but the argument types must match exactly”

So almost all discrepancies between parent/child parameter types (like int vs long) will be compiled as overloads. The only exception I know is — overriding method can remove “” from List as the parameter type.

There could be other subtle rules when we consider generics, but in the world without parameterized method signatures, the above Rule 1 is clean and simple.

[[ARM]] P212 explains the Multiple-inheritance would be problematic if this were allowed.

some low-level QQ questions will beat me, but remain confident during IV

See https://bintanvictor.wordpress.com/2017/02/02/c-and-java-iv-tend-to-beat-us-in-2-ways-high-end/. Inevitably, some questions will beat us, perhaps in terms of algo challenge, but many recent questions beat us in terms of in-depth knowledge. So how do you react in an interview after you are beaten on some questions?

Look at this linked-in endorsement — … has a solid attention to details. He relentlessly pursues the most efficient and sensible means to developing software applications. He can lead a team of developers locally and globally, be a team advocate, a problem-solver, and a top-notch software designer. I often relied on him to explain and help detail some of the most complex logical components.

Such a person may very well fail some of those interview questions. He needs to be self-confident about his GTD skills like design, delivery…

git-bash diff on win-word

beyond-compare… I haven’t configured it successfully.

–Pandoc

  • Pandoc shows word differences, whereas default shows entire paragraph even for a one-word change
  • minor imperfections in the display of words near table boundaries

I added this chunk into my .gitconfig file:

 [diff "pandoc"]
   textconv=pandoc --to=markdown
   prompt = false
 [alias]
   wdiff = diff --word-diff=color --unified=1

cloud4java developers – brief notes

Am an enterprise java developer, not a web developer. I feel PaaS is designed more for the web developer.

I agree with the general observation that IaaS doesn’t impact us significantly.

I feel SaaS doesn’t either. SaaS could offer devops (build/delivery) services for java developer teams.

PaaS has the biggest impact. We have to use the API /SDK provided by the PaaS vendor. Often no SQL DB. Can’t access a particular host’s file system. MOM is rarely provided.

mutex^condition top2constructs across platforms#except win32

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/About_NSPR is one more system programming manual that asserts the fundamental importance of mutex and condition variable.

NSPR started with java in mind but primary application is supporting clients written entirely in C or C++.

I said many times in this blog that most thread synchronization features are built on top these duo.

However, on Windows the fundamental constructs are locks and waitHandles…

java ReentrantLock^synchronized keyword

I told Morgan Stanley interviewers that reentrantLock is basically same thing as Synchronized keyword. Basically same thing but with additional features:

  • Feature: lockInterruptibly() is very useful if you need to cancel a “grabbing” thread.
  • Feature: tryLock() is very useful. It can an optional timeout argument.

Above features all help us deal with deadlock:)

  • Feature: Multiple condition variables on the same lock.
  • Feature: lock fairness is configurable. A fair lock favors longest-waiting thread. Synchronized keyword is always unfair.
  • — query operations —
  • Feature: bool hasQueuedThread(targetThread) gives a best-effort answer whether targetThread is waiting for this lock
  • Feature: Collection getQueuedThreads() gives a best-effort list of “grabbing” threads on this lock
  • Feature: Collection getWaitingThreads (aConditionVar) gives a best-effort view of the given “waiting room”.
  • Feature: int getHoldCount() basically gives the “net” re-entrancy count
  • Feature: bool isHeldByCurrentThread()

earliest java IV@2017 #MS+HSBC

  • Q: When is JIT compiled code performance higher than c++? See separate blog
  • Q: difference between JVM stack vs native stack?
  • Q: ThreadLocal internal implementation?
  • Q: data structures with concurrent modification notifications — how is it implemented?
  • — IPC between processes (language-neutral) —
  • Q: how is shared memory managed?
  • Q: messaging uses sockets and has high overhead. What other solutions can maintain FIFO?
  • %%A: nothing new. The earliest MOM has dealt with this problem long ago. Perhaps multiple files with single producer and single consumer would be ideal. The 2 processes need to operate on both ends of the file. (There could be some kernel support for this.) See https://coderanch.com/t/278842/java/Reading-writing-concurrent-threads-file
  • Q: what kind of jdk locks have you used?
  • %%A: readwrite lock, reentrant lock
  • Q: How would you size your thread pool, based on processor count?
  • Q: For a market data gateway, when would additional threads help (and when would they be useless or counterproductive)?
  • %%A: I/O bound, the processors could be 99% idle. More threads would increase the utilization rate. Ideal is simultaneous saturation.
  • Q: thread cancellation without using Futrues?
  • Q: default methods in interface — is it a breaking change? http://stackoverflow.com/questions/22618493/does-introducing-a-default-method-to-an-interface-really-preserve-back-compatibi has a concise answer
  • Q: how is lambda implemented in java 8? See my separate blog post.
  • Q: Singleton pattern — what issues do you know?

[17]FASTEST muscle-growth=b4/af job changes]U.S.

I now recall that my muscle-building and, to a lesser extent, zbs growth are clearly fastest in the 3 months around each job change. I get frequent interviews and positive feedback. This is a key (subconscious) reason why I prefer contracting even at a lower salary. I get the kick each time I change job.

My blogging activity shows the growth…

  • #1 factor … positive feedback from real offers from good companies.
  • #2 factor — I actually feel real zbs growth thought it tends to be less strategic in hindsight.
  • factor — on a new job, I am curious to learn things I have wanted to learn like Xaml, FIX, Tibco, kdb, SecDB, multicast, orderbook, curve building

Beside the months immediately b4/af job change, I also experienced significant growth in

No such environment in Singapore:(

Y more threads !! help`throughput if I/O bound

To keep things more concrete. You can think of the output interface in the I/O.

The paradox — given an I/O bound busy server, the conventional wisdom says more thread could increase CPU utilization [1]. However, the work queue for CPU gets quickly /drained/, whereas the I/O queue is constantly full, as the I/O subsystem is working at full capacity.

[1] In a CPU bound server, adding 20 threads will likely create 20 idle, starved new threads!

Holy Grail is simultaneous saturation. Suggestion: “steal” a cpu core from this engine and use it for unrelated tasks. Additional threads or processes basically achieve that purpose. In other words, the cpu cores aren’t dedicated to this purpose.

Assumption — adding more I/O hardware is not possible. (Instead, scaling out to more nodes could help.)

If the CPU cores are dedicated, then there’s no way to improve throughput without adding more I/O capacity. At a high level, I clearly see too much CPU /overcapacity/.

c++SCB eFX IV#Dmitry

100% QQ type, as defined in https://bintanvictor.wordpress.com/2017/02/15/qqzz-mutual-exclusion-cjava/.I feel many are micro optimizations with questionable improvement. I wonder how much value such obscure knowledge adds to the team.

Q: Scanning a vector of int (like finding the average or max). Forward iteration vs backward iteration, which one could be faster, considering all possible compiler optimizations.

%%A: forward. Memory read into cpu cache will be in chunks, not one element at a time. Easy for forward iteration. Not sure about backward.

Q: Which one could be fastest:

void f(double arg){…..}
void f(double & arg){….}

%%A: inlining for first but not 2nd?
A: See http://stackoverflow.com/questions/722257/should-i-take-arguments-to-inline-functions-by-reference-or-value esp. the long answer.

Q: Thr1 and Thr2 on 2 CPU’s both update an object s, having 2 fields. Thr1 only updates s.field1. Thr2 only updates s.field2. No interference. No synchronization required. We observe the performance is slower than using one thread to update both fields. Any explanation?
%%A: caching in cpu

Q: weak_ptr justification, when we have shared_ptr already? I feel [[effModernC++]] has a good chapter on it.

Ashish pointed out in some apps, you could identify a clear risk of circular dependency. Replace with weak_ptr.

Q: given an 2D array arr[10][5], how do you use pointer arithmetic to hit arr[1][5]

A: Contiguous. see http://stackoverflow.com/questions/7784758/c-c-multidimensional-array-internals. Note this is different from an array of pointers.

Q: what would you see if a TCP socket server has a full queue
%%A: TCP requires handshake, so if server is unable to accept a request the client would know it.
%%A: connection refused?

Q: what STL algorithms did you use?
%%A: foreach(), find(), copy_if(), transform(), reverse(), sort(), replace_if, remov_if

reinterpret_cast{int}( somePtr): practical use

http://stackoverflow.com/questions/17880960/what-does-it-mean-to-reinterpret-cast-a-pointer-as-long shows a real use case of reinterpret_cast from pointer to integer, and reinterpret_cast back to pointer.

What if I serialize an object to xml and the object contains a pointer field (reference field is less common but possible)? Boost::serialization would unwrap the pointer and serialize the pointee object.

Most of the time (like parsing unsigned char array), we reinterpret_cast from an (array) address into address of another data type

async – 2 threads 2 different execution contexts

See also https://bintanvictor.wordpress.com/2013/01/06/every-async-operation-involves-a-sync-call/

Sync call is simple — the output of the actual operation is processed on the same thread, so all the objects on the stack frame are available.

In an async call, the RR fires and forgets. Firing means registration.

The output data is processed …. usually not RR thread. Therefore the host object of the callback and other objects on the RR stack frame are not available.

If one of them is made available to the callback, then the object must be protected from concurrent access.

file/console input output ] python

Output typically uses “print”.

print >> myFileObject , arguments
print >> sys.stderr , arguments…

Quiz: so, in that case, how do you print to stdout?

Input from file is very common, so that's another quiz

Input from std input —

aLine = sys.stdin.readline()
all_the_Lines = myFileObject.readlines()

Some prefer the object-oriented way to output

myFileObject.write(arguments….)