mutex: allocated in heap or global memory

In java, mutex is always in heap. Primitive objects can’t be host to a mutex.

In pthreads, mutex object can be allocated anywhere, but I have seen it allocated only in heap or in global area.

  • in global area, you use a one-liner to allocate and initialize a non-ref variable
  • on heap, you first malloc then call init(). I think this is like a constructor.
    • As a special case, you can also allocate mutex in shared memory, creating a cross-process mutex! Not a common practice in java.

Q: what if I allocate a mutex in stack?
A: in java,the actual object is still on heap,though the variable is on stack and invisible to other threads
A: in C, the entire mutex object can be on stack, but such a mutex is useless. Imagine a lock on a door with a single key to be shared, but other people have their own doors, so there’s no synchronization no access control at all:(

What if I hadn’t worked this hard # kids

Over my 20Y career, I showed _some_ talent professionally.

In contrast, I showed significantly more talent in school. My massive study effort increased my abilities [1] to the extent that people don’t notice the difference between my talent vs abilities. Even my IQ score improved due to my intellectual curiosity and absorbency. If these efforts are considered a talent, then yes I have the talent of diligence.

[1] eg — Chinese composition, English grammar/vocabulary, many knowledge-intensive subjects

Q1: what if I had put in just an average amount of effort in school and at work? Some people (mostly guys I really don’t know well) seem to put in sporadic efforts that average out to be “just average” but still got into similar professional level like I did, or higher.
A: For my academic excellence, persistent effort was necessary.

A: I guess sporadic effort could be enough to reach my level of professional “success” for very bright and lucky people. I doubt any professional programmer got higher than mine without consistent effort for many years.

Professional success also depends on opportunity, on positioning and timing, on inter-personal skills and relationship skills, on judgment 判断力. None of these depends directly on consistent effort.

An example of positioning — my break into Wall St java market vs my peers who didn’t.

To analyze this question, I need to single out an under-appreciated talent — absorbency capacity i.e. 耐得住寂寞, where I score in the 97th percentile.

Q2: if my son only puts in sporadic and average efforts, could he end up losing the competitions?

make_shared syntax #Gotcha’s

Needed for take-home coding test.

#include <iostream>
#include <memory> // make_shared

struct C{
  float f;
  C(float arg): f(arg){}
};
int main() {
  std::shared_ptr<int> a(new int(13));
  std::shared_ptr<int> b = std::make_shared<int>(12);
  std::cout<<*b<<std::endl;

  std::shared_ptr<C> c = std::make_shared<C>(1.1); //must pass in ctor args
  //shared_ptr<C> b= make_shared<C>(new C(1.1)); //can't use "new" with make_shared.

  std::cout<<c->f<<std::endl;
}

defining+using your swap() #gotcha

Gotcha! If you define your own swap() then it may not get picked up depending on some subtle factors. In the demo below, when the args are declared as “int” variables, then the hidden std::swap() gets picked up instead of your custom swap(size_t, size_t)!

Note there’s no specific #include required.

  • Solution : if practical, avoid “using namespace std” even in cpp files
  • solution : except the outermost main(), enclose everything  in an anonymous namespace, to force the unqualified swap() to resolve to your custom version.
#include <iostream>
#include <assert.h>

//namespace{

using namespace std;

int value1 = 5;
int calls=0;
template <typename T> void swap(size_t a, size_t b){
        ++calls;
        std::cout<<"in my swap()"<<std::endl;
}

int mymain(){
  int a=0, b=1;
  int oldCount = calls;
  swap<int>(a,b); //int arguments won't invoke my swap()
  assert (calls == oldCount);

  std::cout<<"after 1st call to swap()"<<std::endl;
  swap<int>(0,1); //calls my swap()
  assert (calls == 1+oldCount);

  std::swap<int>(a,b);  //can compile even without any #include
  // no return required
}

//}

int main(){
  mymain();
}

##elegant/legit simplifications ] cod`IV

  • eg: reverse link list in K-groups — (CS algo challenge) assume there’s no stub, solve the real problem, then deal with the stub
  • eg: consumer thread dequeue() method. When empty, it “should” be waiting for a notification, according to Jun of Wells Fargo, but a simpler design returns a special value to indicate empty. The thread can then do other things or return to the thread pool or just die. I think the wait model is not ideal. It can waste a thread resource. We could end up with lots of waiting threads.
  • Eg: https://bintanvictor.wordpress.com/2017/09/10/lru-cache-concise-c-implementation/ requirement is daunting, so it’s important and completely legitimate to simplify lookup() so it doesn’t insert any data. API is simpler, not incomplete
  • Eg: find every combination adding up to a given target #Ashish permutation within each combination is a separate issue and not the main issue
  • recursive solution is often a quicker route to working code. Once we cross that milestone, we could optimize away the recursion.
  • eg: extract isPrime() as a unimplemented function and simply assume it is easy to implement when I get around to do it.
  • Eg: show free slots between meetings #bbg I solved a similar and more familiar problem.
  • eg: violation check for Sudoku is a tedious but simple utility function. We could assume it’s available
  • eg: violation check for n-queens is likewise slightly harder

 

c++ salaries: high-end^regular

I spoke with 5 to 10 fellow c++ developers in financial domain across a few countries. All of them seem to suggest c++ skill has higher market value than any other language like java, c# or python. One common reason they all gave is the high frequency trading jobs. My own observations also show that most HFT tech jobs use c++ and very few jobs are java or c#.

However, how many of us can ever pass the HFT interviews?

I have tried a few high-frequency or medium-frequency algo-trading jobs using c++. I didn’t pass any. Among my friends, only one person passed. He passed more than one such interviews and he’s clearly very strong.

So I feel the reality is, most of us won’t be able to get those jobs, or keep the jobs.

Q: how many percent of the c++ jobs in finance are HFT ?

I would guess 5 to 15%. So the vast majority of c++ salaries are ordinary. I think many of us can get those jobs:)

What’s your opinion?

detect cycle in slist #Part 1

Q: A singly-linked list (slist) contains a loop. You are given nothing but the head node. With O(1) space complexity, how do you locate the join node? For example,

0(head)->1->2->…101->102->103->4(again), so #4 is the merge point

Here’s Deepak’s ingenious trick

  1. first use the 2-pointer trick to find any node inside the loop.
  2. find the length (say, 55, denoted LL) of the loop using a single moving pointer, starting from that node
  3. now we cant discard that node
  4. Now start a pointer B from head and move LL steps.
  5. Now put pointer A at head
  6. Now move A and B in locksteps. They will meet for sure, at the merge point. Here’s the proof:

Suppose the merge point is at MM, i.e. MM steps from head. When A and B starts moving in locksteps,

  • how far is B from MM? MM steps!
  • how far is A from MM? MM steps too.

Note LL value could be small like 1.

struct Node{
  Node const * p;
  int const data;
  Node(int _data, Node const * _p): p(_p), data(_data){}
  friend ostream & operator<<(ostream &os, Node const & node){
        os<<node.data<<" / "<<&node<<" -> "<<node.p<<endl;
        return os;
  }
};

Node _9(9, NULL);
Node _8(8, &_9);
Node _7(7, &_8);
Node _6(6, &_7);
Node _5(5, &_6);
Node _4(4, &_5);
Node _3(3, &_4);
Node _2(2, &_3);
Node _1(1, &_2);
Node _0(0, &_1);
Node & root = _0;
Node const * mergePoint = &_1;

//how many distinct nodes in the loop
size_t getLoopLen(Node const & root){
  Node const * brunner = &root;
  Node const * frunner = &root;
  for(;;){
        frunner = frunner->p->p;
        brunner = brunner->p;
        if (frunner == brunner) break;
  }
  cout<<"now the two runners have met somewhere in the loop: "<<*frunner ;
  for(int ret = 1; ;++ret){
        frunner = frunner->p ;
        if (frunner == brunner) return ret;
  }
}

Node const * getMergePoint(Node const & root){
  size_t LL = getLoopLen(root);
  cout<<"# of nodes in loop = "<<LL<<endl;
  Node const * frunner = &root;
  for(int i = 0; i<LL; ++i){ frunner = frunner->p; }
  //cout<<"front runer is "<<*frunner;
  Node const * brunner = &root;
  for(;frunner != brunner;){
        brunner = brunner->p;
        frunner = frunner->p;
  }
  return frunner;
}

int main(){
  _9.p = mergePoint;
  Node const * ret = getMergePoint(root);
  cout<<"Merge point is "<<*ret;
  assert(ret == mergePoint);
}

declare variables ] loop header: c^j #IF/for/while

Small trick to show off in your coding test…

Background — In short code snippet, I want to minimize variable declarations. The loop control variable declaration is something I always want to avoid.

https://stackoverflow.com/questions/38766891/is-it-possible-to-declare-a-variable-within-a-java-while-conditional shows java WHILE-loop header allows assignment:

List<Object> processables;
while ((processables = retrieveProcessableItems(..)).size() > 0) {/*/}

But only (I’m 99% sure) c++ WHILe-loop header allows variable declaration.

The solution — both java/c++ FOR-loop headers allow variable declaration. Note the condition is checked Before first iteration, in both for/while loops.

update — c++0x allows variable declaration in IF-block header, designed to limit the variable scope.

if (int a=string().size()+3) cout<<a << ” = a \n”; // shows 3

symlink/hardlink: Win7 or later

https://www.howtogeek.com/howto/16226/complete-guide-to-symbolic-links-symlinks-on-windows-or-linux/ is a 2017 article.

The mklink command can create both hard links (known as “hard links” in Windows) and soft links (known as “symbolic links” in Windows).

On Windows XP, I have used “Junction.exe” for years, because mklink is not available.

ibank dev projects/timelines #XR

I have heard several people complaining about the unhealthy work culture in NY investment bank IT teams.

  • · Timeline is artificially tight.
  • · A typical senior manager is motivated by trophy systems to win him promotions + bonuses, rather than really useful systems making a real impact to business
  • · Lots of systems are developed but not used or used for superficial purposes only. (This is common in many systems. Only 1% of the features are actually put to productive use.)
  • · Low quality code; low quality output data; poor quality control
  • · Constant and wide-spread pressure to retire old systems and rewrite, when the old system is still usable and sometimes rather stable.
  • · Senseless requirements, questionable value, thrown-away systems

I heard that West coast shops, Chicago buy-side and other companies don’t have exactly the same problems. I feel my current company is also different from NY banks — Older technology; Slight higher quality standard; More stable codebase; Users are mostly external paying customers…

I guess the Wall St culture is “good” for some, while other developers hate it and exit. I’m planning to stick to finance but not banks.

share buy-back #basics

  • shares outstanding — reduced, since the repurchased shares (say 100M out of 500M total outstanding) is no longer available for trading.
  • Who pays cash to who? Company pays existing public shareholders (buying on the open market), so company need to pay out hard cash! Will reduce company’s cash position.
  • EPS — benefits, leading to immediate price appreciation
  • Total assets — reduces, improving ROA/ROE
  • Demonstrates comfortable cash position
  • Initiated by — Management when they think it is undervalued
  • Perhaps requested by — Existing share holder hoping to make a profit
  • company has excess capital and
  • A.k.a “share repurchase”

discover cpu count ] my machine #lscpu

NIC — There’s another blog post https://bintanvictor.wordpress.com/2018/03/19/discover-nic-my-machine-lspci/

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
CPU socket(s): 2
NUMA node(s): 2
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K


$ less /proc/cpuinfo|egrep 'core id|processor'  # on the same machine shows

16 “processors”, but #0 and #1 have the same core-id! Total 8 core-id values. I think this is because in each (of two) socket, there are 8 cores with unique core-id values.

dmesg | grep rocessor # doesn’t work any more

dxdiag # windows

dxdiag /t c:\deldel.txt
find “CPU” c:\deldel.txt # copy-paste may not work

check if a word can reshuffle to palindrome

requirement: https://codecon.bloomberg.com/challenger-series/29. With minimal time and space complexity, the corner cases are tricky. I decided to add a null terminator:)

int main() {
 string S;
  cin>>S;
  vector<char> v(S.begin(), S.end());
  sort(v.begin(), v.end());
  v.push_back(0); //null char
  char last=v[0];
  size_t occur=0;
  bool isOddFound = false;

  for(int i=0; i<v.size(); ++i) {
    bool isNew = v[i] != last;
    //cout<<"--> "<<v[i]<<" isNew = "<<isNew<<endl;
    if (!isNew){
        ++occur;
        //cout<<last<<" occured "<<occur<<" times"<<endl;
        continue;
    }
    //cout<<last<<" occured total "<<occur<<" times"<<endl;
    if (occur % 2){
      if(isOddFound){
        cout<<"no"<<endl;
        return 0 ;
      }
      isOddFound = true;
    }
    last = v[i];
    occur = 1;
  }
  cout<<"yes"<<endl;
}

coin problem #all large-enough amounts are decomposable

This is not really a algo IV question, but more like brain teaser problem.

Based on https://en.wikipedia.org/wiki/Coin_problem — For example, the largest amount that cannot be obtained using only coins of 3 and 5 units is 7 units. The solution to this problem for a given set of coin denominations is called the Frobenius number of the set. The Frobenius number exists as long as the set of coin denominations has no common divisor.

Note if a common divisor exists as in {2,4} then all the odd amounts will be non-decomposable.

Q: why a very large amount is always decomposable ? Give an intuitive explanation for 2 coin values like 3 and 5.

Here’s an incomplete answer — 15 (=3*5), 16, 17 are all decomposable. Any larger number can be solved by adding 3’s .

In fact, it was proven that any amount greater than (not equal to) [xy-x-y] are always decomposable. So if we are given 2 coin values (like 4,5, where x is the smaller value) we can easily figure out a range

xy-x-y+1  to xy-y

are each decomposable. Note this range has x distinct values. So any higher amount are easily solved by adding x’s

Also note xy-y is obviously decomposable as (x-1)y.

 

##G5 std::map tasks 4cod`IV

custom hash func? See short example on P 364 [[c++standard library]]. [[optimized c++]] has many examples too.

initialize? There’s a simple constructor taking a long initializer, but the insert() methods support the same and are more versatile.

insert? single pair; range (anotherMap.being(), end());

** insert single value — won’t overwrite pre-existing
** map1.emplace(…)
** map1[key2] = val3 // overwrites pre-existing
** insert list of values — http://en.cppreference.com/w/cpp/container/unordered_map/insert

(returning the value) lookup? at() better than operator[]

a pointer type as key? useful technique.

erase? by a specific key. No need to call another function to really erase the node.

Q: create only if absent; no update please
A: insert()

Q2: create or uppate
Q2b: look up or create
A: operator []

Q1: update only; no create please
Q1b: look up only. No create please
A: find() method

Q: check for existance
A: c.find() is slightly better than c.count() esp. for multi_* containers

 

## G20 operations(4cod`IV)on containers+str

“Operation” means logical operations any programmer (java, Perl, javascript, whatever) frequently performs on a common data structure. STL offers about 30 to 60 everyday operations. A beginner can benefit from a good short list.

List below leans towards sequential containers (shows up in 80% coding questions), but also includes essential operations on associative containers.

  1. ) print — using copy and ostream_iterator. See post. See stl-tut
  2. ) find, find_all
  3. slicing i.e. subsequence
  4. [iv] sort — using custom “myLess” as a type param. See post on functor-type. See effSTL
  5. [iv] construct a sorted container using customer “myLess” as type param. See blog
  6. filter
  7. [iv] nested container
  8. truncate at a point
  9. append, in bulk
  10. pop front
  11. binary search in pre-sorted
  12. [iv] deep copy
  13. mass-populate with one default value
  14. remove/erase – see effSTLplug back_inserter into copy, remove_copy, replace_copy …
  15. [iv] plug bind2nd into replace_if/replace_copy_if, count_if, remove_if/remove_copy_if …. See blog
  16. initialize with hardcoded literals — relatively easy
  17. conversion like between string and vector of char. See blog. See [[ stl tut ]]
  18. —-2nd tier
  19. insert mid-stream, in bulk
  20. [iv] = possibly picked by interviewers

q[-c] flag in g++

$ cat main.cpp:

void aa();
int main(){aa();}

g++ command without “-c” would try to create an executable but the “aa()” is declared not defined.

You can either user “-c” or

g++ lib1.o main.cpp # to link in the aa() definition from lib1.o

## a few devops technical tips : %%xp

  • * continuous bamboo build triggered by commits
  • * automated test as part of continuous build
  • * automated packaging (jar, c++ lib, python installable distro…) and upload into central repository (such as Maven/Nexus)
  • * automated build stream (with history) created in continuous build engine, when a branch is created in git
  • * require each commit to start with a jira id
  • * generate release notes from commit messages

##9dataStruct(+..)for c++speed cod`

  1. linked node manipulation in a graph context
  2. vector (more useful than array), std::string (more useful than cStr). Many string operations are still unfamiliar
    1. Array-based data structures are required in 80% of my coding tests.
    2. More than 50% of all my coding tests require nothing but arrays.
    3. Most of my toughest coding tests are presented in arrays but may need maps as helpers
  3. string in std::string or c-str
  4. iterator manipulation like erase, lower_bound, find,
  5. sorting, operator<(), upper_bound, binary search … on containers
  6. sorted data structure like std::map
  7. [w] stringstream — ECT to improve

Very few Wall St interviewers would test you on graph or DP. Here are other less important constructs:

  1. [w] binary tree is common and simple, but they can ask very tough questions on it!
  2. [w] double pointer is trickier and my advantage
  3. [w] Node class in a linked data structure.
  4. [w] stack, queue.
  5. [w] grid or matrix
  6. file I/O? only for IDE tests, not pure algo or take-home tests
  7. [w] basic syntax for pointer arithmetic.
  8. [w] dtor, copier, op=? only for design questions, not algo questions.
  9. [w] shared_ptr? Only for design questions, never needed in algo questions!
  10. [w] ref variable only as function I/O.
  11. stl algo? Only Citadel array-shrink
  12. never exception
  13. never template
  14. no (very, very seldom) threading in coding Q
  15. adv: pointer to function
  16. adv: circular buffer
  17. [w = no weakness]

 

milePerGallon #DeepakCM SIG

Similar to GS Tick Query question but without optimization requirements.

–Requirement: Your Choice of Editor, OS, Compiler & Programming Language. Total time allotted for this  test is 75 mins, The further rounds of Interview will be based on this programming test.

A group of N people go for driving. When ever they refill the fuel in their vehicles, they update a text file with the following fields

<Person>, <Car>, <No of Miles driven on that day, before the fill>, <NO of Gallons filled i.e. gallons used for those miles>,<Date of fill>

All the fields are comma separated. A sample such file is given below

John, Ford, 350, 20, 20160921
John, Ford, 220, 13, 20160920
John, Ford, 230, 35, 20160918
John, Ford, 300, 22.5, 20161112
Jonathan, Mercedes GLA, 220, 13, 20160920
Mishal, Ford Escort, 230, 35, 20160919

Write a function with the following parameters
GetMPG(name_of_person, Start_date, End_date)
such that it should return the following information for each User/driver
1) Name of Car
2) Miles per Gallon for the mentioned period for each of the Car

A person can have more than one car but never share his cars. If there is no record for the mentioned date range, the function should not return anything for that specific car.

–analysis

https://github.com/tiger40490/repo1/blob/py1/py/milePerGallon.py is my python solution.

lockfree usage in high speed trading system@@

Hi Deepak,

Further to our conversation, do high frequency shops use lock-free? I would guess that the most demanding high-speeding data processing system (such as the matching engine in an exchange) are single-threaded, rather than lock-free multithreaded.

I hope to hear a typical high-speed data processing system that has lots of shared mutable data. I have not found one.

· Order matching

· Airline booking

· Vote counting in real time?

If 99% of the data is not shared mutable, then single-threaded design is probably better.

· I feel one reason is complexity vs simplicity. Lock-free designs can be very tricky, according to experts.

· Another reason is performance. Lock-free is, in my view, definitely slower than single-threaded. The memory fence required on those atomic objects impeded compiler optimizations.

· Most important reason, in my view — it’s always possible to achieve the same functionality without using locks or lock-free designs. Single-threaded designs are possible, simpler and faster.

If we have 64 cores, just run 64 processes, each single-threaded. However, in reality these systems often do have some shared mutable data between two threads. There are various solutions I’m not qualified to compare and illustrate. These solutions could use lock-free or locks.

volume alone doesn’t qualify a system as big-data

The Oracle nosql book has these four “V”s to qualify any system as big data system. I added my annotations:

  1. Volume
  2. Velocity
  3. Variety of data format — If any two data formats account for more than 99.9% of your data in your system, then it doesn’t meet this definition. For example, FIX is one format.
  4. Variability in value — Does the system treat each datum equally?

Most of the so-called big data systems I have seen don’t have these four V’s. All of them have some volume but none has the Variety or the Variability.

I would venture to say that

  • 1% of the big-data systems today have all four V’s
  • 50%+ of the big-data systems have no Variety no Variability
    • 90% of financial big-data systems are probably in this category
  • 10% of the big-data systems have 3 of the 4 V’s

My friend JunLi said most of the data stores he has seen are strictly structured data, and cited credit bureau report as an example.

The reason that these systems are considered “big data” is the big-data technologies applied. You may call it “big data technologies applied on traditional data”

See #top 5 big-data technologies

Does my exchange market data qualify? Definitely high volume and velocity, but no Variety or Variability. So not big-data.

data science^big data Tech

The value-add of big-data (as an industry or skillset) == tools + models + data

  1. If we look at 100 big-data projects in practice, each one has all 3 elements, but 90-99% of them would have limited value-add, mostly due to .. model — exploratory research
    1. data mining probably uses similar models IMHO but we know its value-add is not so impressive
  2. tools —- are mostly software but also include cloud.
  3. models —- are the essence of the tools. Tools are invented, designed mostly for models. Models are often theoretical. Some statistical tools are tightly coupled with the models…

Fundamentally, the relationship between tools and models is similar to Quant library technology vs quant research.

  • Big data technologies (acquisition, parsing, cleansing, indexing, tagging, classifying..) is not exploratory. It’s more similar to database technology than scientific research.
  • Data science is an experimental/exploratory discovery task, like other scientific research. I feel it’s somewhat academic and theoretical. As a result, salary is not comparable to commercial sectors. My friend Jingsong worked with data scientists in Nokia/Microsoft.

The biggest improvement in recent years are in … tools

The biggest “growth” over the last 20 years is in data. I feel user-generated data is dwarfed by machine generated data

%%c++keep crash` I keep grow`as hacker #zbs#AshS

Note these are fairly portable zbs, more than local GTD know-how !

My current c++ project has high data volume, some business logic, some socket programming challenges, … and frequent crashes.

The truly enriching part are the crashes. Three months ago I was afraid of c++, largely because I was afraid of any crash.

Going back to 2015, I was also afraid of c++ build errors in VisualStudio and Makefiles, esp. those related to linkers and One-Definition-Rule, but I overcame most of that fear in 2015-2016. In contrast, crashes are harder to fix because 70% of the crashes come with no usable clue. If there’s a core file I may not be able to locate it. If I locate it, it may not have symbols. If it has symbols the crash site is usually in some classes unrelated to any classes that I wrote. I have since learned many lessons how to handle these crashes:

  • I have a mental list like “10 common crash patterns” in my log
  • I have learned to focus on the 20% of my codebase that are most convoluted, most important, most tricky and contribute most to debugging difficulties. I then invest my time strategically to rewrite (parts of) that 20% and dramatically simplify them. I managed to get familiar and confident with that 20%.
    • If the code belongs to someone else including 3rd party, I try to rewrite it locally for my dev
  • I have learned to pick the most useful things to log, so they show a *pattern*. The crashes usually deviate from the patterns and are now easier to spot.
  • I have developed my binary data dumper to show me the raw market data received, which often “cause” crashes.
  • I have learned to use more assertions and a hell lot of other validations to confirm my program is not in some unexpected *state*. I might even overdo this and /leave no stoned unturned/.
  • I figured out memset(), memcpy(), raw arrays are the most crash-prone constructs so I try to avoid them or at least build assertions around them.
  • I also figured signed integers can become negative and don’t make sense in my case so I now use unsigned int exclusively. In hind sight not sure if this is best practice, but it removed some surprises and confusions.
  • I also gained quite a bit of debugger (gdb) hands-on experience

Most of these lessons I picked up in debugging program crashes, so these crashes are the most enriching experience. I believe other c++ programs (including my previous jobs) don’t crash so often. I used to (and still do) curse the fragile framework I’m using, but now I also recognize these crashes are accelerating my growth as a c++ developer.

data mining^big-data

Data mining has been around for 20 years (before 1995). The most visible and /compelling/ value-add in big-data always involves some form of data mining, often using AI including machine-learning.

Data mining is The valuable thing that customers pay for, whereas Big-data technologies enhance the infrastructure supporting the mining

https://www.quora.com/What-is-the-difference-between-the-concepts-of-Data-Mining-and-Big-Data has a /critical/ and concise comment. I modified it slightly for emphasis.

Data mining involves finding patterns from datasets. Big data involves large scale storage and processing of datasets. So combining both, data mining done on big data(e.g, finding buying patterns from large purchase logs) is getting lot of attention currently.

NOT All big data task are data mining ones(e.g, large scale indexing).

NOT All data mining tasks are on big data(e.g, data mining on a small file which can be performed on a single node). However, note that wikipedia(as on 10 Sept. 2012) defines data mining as “the process that attempts to discover patterns in large data sets”.

(Revisit) senior manager IV: what they watch out for

Common, non-trivial, perhaps hidden issues in a candidate, ranked:

  • twist and turn and not /candid/ about his past
  • not listening
    • jumping to conclusions
    • assuming too much
    • not waiting for a long-winded interviewer to finish, perhaps interrupting
    • jumping to the defensive, when deliberately provoked
  • desperate for the job and losing his perspective
  • opinionated
  • choosy on projects
  • conceited beyond “cool confidence”
  • —–secondary
  • impulsive answers ^ elusive, guarded answers
    • elusive answers? common
  • bad-mouth past managers
  • lack of humor? common
  • tensed up and not cool and calm, perhaps prone to workplace stress
  • exaggeration about his past? common
  • not articulate enough? common
  • poor eye contact? common

next_perm@N color socks #complex #100%

A common IDE coding challenge — given x pairs socks of various colors, generate all the permutations, in ascending order. Each color has a value.

–Solution 1: std::next_permutation() and prev_permutation()

–solution 2: I can probably write my own next_perm(). Using This function we can generate an ascending sequence of permutations starting from the current content of a vector.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/nextPerm.cpp is my iterative solution, but should use lower_bound()

https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextPermOfNSocks.py is my python solution, overcoming the lower_bound problem.

relating my perm/comb algos

–next_perm: I have an iterative solution

–next perm 5C3: iterative algo, growing global collection

–next_combo 5C3: I have a (approx) 10-line recursive algo. (Iterative algo is non-ideal). Passing in two collections.

–next_abbreviation of any length: I have a (approx) 10-line recursive algo. Using global collection, this algo is simpler than next_combo

–So far, these algos are decent but have nothing in common.

 

next abbreviation@word with repeat char #2^N #100%

This is a real coding question at 10gen/mongoDB

https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/abbrIterative.py is a simple iterative py solution. Not ascending.

Q1: Given a word with unique letter (like “abc”), Can you write c++ program to generate all abbreviations in any order?
Q2: same question, but don’t use any external storage beside character arrays.
Q3: same question, but output in ascending order? Expected output:

  1. a
  2. ab
  3. abc
  4. ac
  5. b
  6. bc
  7. c

It’s daunting to generate this sequence without recursion. https://github.com/tiger40490/repo1/blob/cpp1/cpp1/combo_permu/abbr_ascendRecursive.cpp is my recursive solution to Q3.

Q4: what if the word has non-unique letters, like “mongo”? The only solution I know relies on an external lookup device.

find anagram groups]file #1-pass, FB

Requirement: if any 2 words are anagrams, that’s a group. Within a file, identify all such groups.

Normalize each word using counting sort on the characters of the word. There’s no competing alternative in this isolated part of the solution. Main part of the solution is sorting all words based on the normalized version. There are many alternative solutions to this sort.

desirable — memory efficiency. create no or few objects besides the strings read in from the file.

desirable — avoid the need to create/allocate linkNode or wrapper objects around original words. In c++, such a wrapper is smaller than in java, probably a pointer to the original word + a link pointer

Solution 1 — insert into a sorted set, using a customized comparison (subclass binary_function) based on the normalized string and something unique, such as line number or object address. After all the inserts, when you iterate the set, an anagram group would show up in a row. We can also keep a separate array or iterators (pointers) that point to the first word in each group — emulating a multimap.

Without the something unique, the set would drop words.

I think insertion is inefficient since a normalized string is regenerated for the same word over and over. To overcome this, we could create a wrapper object with 2 fields — normalized string + a pointer to the original word. However, if “SAT” is a very common normalized string then we create multiple copies of it.

Solution 2 — separate chaining. Maintain a hashed map (faster) or sorted map keyed by the normalized strings, just a single copy for each normalized string. Value of each map entry is a linked list of the original words.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
using namespace std;

map<vector<char>, size_t> m;
template<typename T> ostream & operator<<(ostream & out , vector<T> const & v){
        for(int i=0; i<v.size(); ++i) out<<setw(1)<<v[i];
        out<<" ";
}
int main(){
  ifstream fs("words.txt");
  string line, word;
  while(getline(fs, line)){
     stringstream ss(line);
     while(getline(ss, word, ' ')){
        //trim
        int endpos = word.find_last_not_of(" ");
        if (endpos < string::npos) word = word.substr(0, endpos+1);
        if (word.empty()) continue;
        //lower case
        transform(word.begin(), word.end(), word.begin(), ::tolower);
        //sort
        vector<char> sorted(word.begin(), word.end()); sort(sorted.begin(), sorted.end());

        int count = ++m[sorted]; //1st time we get 1 🙂
        cout<<sorted<<"\t<- "<<word<<" :" <<count<<endl;
     }
  }
}

 

generate random line-up@Any combo@N boys #70%

A standard permutation/combination problem in some coding tests. You are often required to iterate all of them.

Given N cities, how many permutations of any combinations are there in total.

My iterative sum formula: answer(N)= N_choose_0 + N_choose_1 * 1! + N_choose_2 * 2! + … + N_choose_N * N!

N_choose_0 = 1 !

–recursive algo: use answer(N-1) to answer(N)

–iterative algo:

  • Suppose we have an iterative_next_perm(list) function already written.
  • suppose we have an iterative_next_combo(N, 3) that generates all combinations of 3 out of N distinct chars.

Then, here’s a solution — call

iterative_next_combo(N,1)
iterative_next_combo(N,2)
iterative_next_combo(N,3)…
iterative_next_combo(N,N), in a loop (of N iterations).

Suppose one of the N calls generated 221 combinations. Run a loop (of 221 iterations) each time pass one combination into iterative_next_perm(combo). So our main program has only 2 nested loops. Most of the complexities are encapsulated in iterative_next_perm() and iterative_next_combo()

next Combo@3,using5distinct chars,permitting redraw

Modified from Leetcode problem 17. Suppose there is nothing but one red button. and there are L (like 5) letters on it.

Q: With 4 (=C) draws from 3 (=L) letters (same phone pad button), how many permutations? L^C.
Q: With 4 (=C) draws from 3 (=L) letters, how many combinations?

For C=1, Ans=3
For C=2, Ans=6
For C=3, Ans=10= 6 + 3 + 1?
For C=4, Ans=15=10+(10-6)+1
For C=5, Ans=21=15+(15-10)+1

–My explanation of the count:

Key: Each combo is represented as a sorted vector (ascending). There’s one-to-one mapping between such a vector and a combo.

let’s say L=3 and C grows from 1 to 3 .. For C=2, I put all combos on 3 rows (3 vectors or lists)

  • aa ab ac #all that starts with a
  • bb bc #all that start with b
  • cc # lone wolf

For C=3, Ans

  • first container is populated by prepending ‘a’ to all 3 “old” containers of items
  • 2nd container is populated by prepending ‘b’ to 2nd-and-lower old containers
  • 3rd container is always a lone wolf

Both solutions below are simple, reusable and implemented in https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/nextComboOf3redrawFrom5.cpp

–sol-1 efficient incremental solution — for C=3, given one combo as a string, find the “next” combo. For example after bbc, we should get bcc. Bonus feature — output is naturally sorted 🙂

–sol 2: append the next char ..

next split@N boys #51%

N boys are all unique “entities”, each identified by student id. Later we can look into “next split of N colored balls, having J unique colors”.

In a split, every boy belongs to exactly one group (minimum 1 member). We could rely on python’s default sort order, but for now let’s define a sort order between two groups AA and BB:

  1. sort by group size
  2. If same size, sort by the lowest student id in AA vs in BB.

Every “split” is recorded as a sorted list of groups. I will define a sort order between 2 splits:

  1. sort by list size
  2. if same size, sort by first group, the by 2nd group..

Q1: how many ways to split N boys?
%%A: First we line up N boys — there are N! line-ups. In each line-up, we have 2^(N-1) splits because there are N-1 slots to place a group-boundary. This is a way to iterate over ALL splits, but without uniqueness.

Q2: Write a program to generate all splits in some order. For example, with 3 boys h,i,j:

  1. h/i/j in a giant group…
  2. –3 split-of-2-groups: h, i/j
  3. i, h/j
  4. j, h/i…
  5. –1 split of 3 singular groups: h, i, j
  6. ==== total 5 distinct splits showing 10 group objects but not all distinct

With 3+1 boys h,i,j,k: First add k to each of 10 group objects:

  1. h/i/j/k
  2. h/k, i/j
  3. h, i/j/k
  4. i/k, h/j
  5. i, h/j/k
  6. j/k, h/i
  7. j, h/i/k
  8. h,i, j/k
  9. h,j, i/k
  10. i,j, h/… — so far, 10 splits. Now 5 more splits featuring k in its own singular group
  11. k, h/i/j
  12. h, k, i/j
  13. i, k, h/j
  14. j, k, h/i
  15. h,i,j,k

—- analysis: I don’t know how many splits there will be for N boys. An iterative function will probably work:

After I listed out exactly 5 splits of 3 boys, and I added another boy. So here’s the algo I used — For each of the 5 (existing) splits, add this boy to each of 10 group objects, or put himself as a new singular group.

next Perm@3boys out@5, non-recursive #complex

Latest -- https://github.com/tiger40490/repo1/blob/cpp1/cpp1/combo_permu/iterative_nextPerm3boysOf5.cpp


// Requirement: generate all permutations of C(like 3) chars
//from N(like 5). The count should be N!/(N-C)!
// Bonus: generate in ascending order, where 'ascending' is
//defined on the basis that within the original word, a char
//has lower value than any char on its right. This is more clear
//when the word itself is a sorted string, but actually it's
//not needed.
//
//global collection is simpler than stack variables. Easier to visualize
//and uses less memory
#include <iostream>
#include <sstream>
#include <vector>
#include <deque>
#include <algorithm>
#include <iomanip>
#include <assert.h>
using namespace std;

string _tmp="abcd"; vector<char> const pool(_tmp.begin(), _tmp.end());
vector<size_t> poolIndex;
size_t const C = 3, N = pool.size(); //wanted: all permutations of C, out of the pool of items

//global collection, to be updated in each recursive layer.
vector<vector<size_t> > coll;
// Note each item (like a char or a color or a studentId) is
// represented in the global collection by an unsigned integer.
// This is the positional index in the original "pool" of items.
// This scheme ensures the permutations produced is ascending

string show(vector<size_t> const & v, string headline="", bool isPrint=true){
  stringstream ss;
  for (int i=0; i<v.size(); ++i){
    ss<<setw(5)<<pool[v[i]];
  }
  if (isPrint) cout<<ss.str()<<" <-- "<<headline<<v.size()<<endl; return ss.str(); 
} 
void dump(string headline="", bool isAssert=true){ size_t const cnt = coll.size(); assert(cnt); size_t const len = coll[0].size(); size_t exp = 1; for (int t=N; t>N-len; --t) exp *= t; //loop len times
  assert(cnt == exp);

  stringstream ss;
  string last = "";
  for(int i=0; i<cnt; ++i){
    string item=show(coll[i], "", false);
    ss<<item<<endl;
    if (!isAssert) continue;
    assert(last<item && "should be strictly asending");
    last = item;
  }
  cout<<headline<<"\t size "<<cnt<<endl<<ss.str()<<endl;
}

//RVO should kick in to skip the copying upon return
vector<size_t> const find_unused(vector<size_t> const & perm, size_t const len){
      vector<size_t> tmp4set_diff(perm); //permutations are by defnition unsorted.
      sort(tmp4set_diff.begin(), tmp4set_diff.end());
      vector<size_t> unused(N-len);
      set_difference(poolIndex.begin(), poolIndex.end(), tmp4set_diff.begin(),tmp4set_diff.end(),unused.begin());
      //show(tmp4set_diff, "tmp4set_diff"); show(poolIndex, "poolIndex"); show(unused, "unused");
      return unused;
}

//RVO should kick in to skip the copying upon return
vector<size_t> const new_perm(vector<size_t> const & perm, size_t unused){
        vector<size_t> newPerm(perm);
        newPerm.push_back(unused);
        //show(newPerm, "newPerm");
        return newPerm;
}
//This algo is considerably more complex than many recursive algos
//we have written recently, largely due to the set_difference()
void next_perm_from_pool_iterative(){
  for(size_t loop=0;loop<9 /*useful during dev to control when to exit*/;++loop){
    if (coll.empty()){
      for(size_t j=0; j<pool.size(); ++j){
        coll.push_back(vector<size_t>(1, j));
        poolIndex.push_back(j);
      }
      // dump("^^^^ after initilization of coll ^^^^");
    }
    assert(!coll.empty());
    size_t const len=coll[0].size();
    assert(loop+1 == len);
    if (len == C) {
      cout<<"Done:)"<<endl;
      return;
    }
    vector<vector<size_t> > newcoll;
    for(int kk=0; kk<coll.size(); ++kk){
      vector<size_t> const & perm = coll[kk];
      vector<size_t> unused(find_unused (perm, len));

      //now unused contains the pool items not in the current perm.
      //Suppose there are 3 unused items, we will create 3 new permutations
      //by appending each one to the current perm
      for(vector<size_t>::iterator it=unused.begin(); it != unused.end(); ++it){
        newcoll.push_back(new_perm(perm, *it));
      }
    }
    coll = newcoll;
    dump("^^^^ end of iteration ^^^^");
  }
}

int main(){
  next_perm_from_pool_iterative();
}

next_Perm@3boys out@5 #80%

algo-practice: generate permutation@3, using5distinct chars

Such a skill might be needed in some coding IV sooner or later. Let’s write this in py or c++. Formally,

Q1: generate all permutations of 3, from 5 distinct chars, in any order.
Q2: generate all permutations of 3, from 5 distinct chars, in ascending order. You can sort the 5 chars first.
Q2b: once you have generated one permutation, how do you identify The next?

Note the same solution is a generalization of std::next_permutation(), so once you have this solution you have that solution.

How many? 5-choose-3 * 3! = 5!/(5-3)! = 5*4*3 = 60, denoted Answer(3). Paradoxically, Answer(4) == Answer(5) = 120

–algorithm 1 for Q1

  • 1st generate 5 single-char strings;
  • then for each generate 4 two-char strings. We get 20 strings.

–algorithm 1b for Q1: rec(5,3) will call rec(5,2). rec(5,2) has 20 items, each can generate 3 items for rec(5.3), because each item has 2 characters and a void to be filled by the 3 unused chars.

The 20 items returned should be a pair{vector of 2, vector of 3}

This produces a sorted collection:)

See my code  https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextPerm%403boysFrom5.py

 

 

 

python^c++for coding drill

For some coding questions, c++ is not too bad and you get to practice and improve your c++ ECT:

  • array questions
  • number questions

However, I feel overall py is significantly simpler, faster for many tasks like

  • file parsing, without using getline, cin…
  • string split
  • converting numbers to/from strings
  • regex

next_Combo@3 boys out@5 # 100%

Given, abcde, How many combinations of 3? 5-choose-3 = 10 standard formula, but let’s implement in py or c++.

It’s useful to define a score for a combination — sort the combination into a string. The string is the score. Now,

Q1: generate all combinations@3 from 5 distinct chars, in ascending score.
Q2 (more importantly) given any combination of 3, generate the next higher combination of 3.  Output each combination as a sorted string to keep things clean. Extremely effective constraint and simplification.

1) https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextComboOf3from6boys.py is a decent recursive.

2) fixed-length abbreviation generator also generates exactly the same  sequence!

3) Perhaps we can modify the iterative generate random perm@3boys out@5 algorithm:

Key: Scan from end of string to identify position2upgrade. Compare current (i.e.last) position with the unused characters. If can’t upgrade me [eg abe], then move left which is guaranteed to be a lower character. Now compare the new “me” with only the unused characters (in this algo we don’t allow swap within the string) If can’t upgrade “me”, then move left.

If current position is upgradeable, then set it as p2u. Upgrade to the lowest unused character. Then reset all characters on my right and return.

  1. abc
  2. abd
  3. abe
  4. acd
  5. ace
  6. ade //last 2 letters are the max. p2u = 0
  7. bcd
  8. bce
  9. bde
  10. cde

 

next_Combo@3 boys out@5 #recursive descending

//This recursive version suffers from stack overflow
//but it's able to print out combinations in Descending order and
//maintains the relative positions between any 2 items
//
//This version reduces vector cloning by growing/shrinking the prefix vector
#include <iostream>
#include <sstream>
#include <vector>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
vector<char> emptyV;

template<typename T> void dump(vector<T> const & p, string const & s){
  cout<<"------------ "<<s<<" ------------ size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
  if (p.size()) cout<<endl;
}
template<typename T> int show(vector<T> const * p, vector<T> const * v = NULL){
  ++ combos;
  stringstream ss;
  for(int i=0; i<p->size(); ++i) ss<<setw(5)<<(*p)[i];
  if (v){
    //cout<<",";
    for(int i=0; i<v->size(); ++i) ss<<setw(5)<<(*v)[i];
  }
  static string last("ZZZZZ");
  string combo=ss.str();
  cout<<"combo: "<<combo<<endl; assert(last >= combo);
  last = combo;
}

template<typename T> int recurs(vector<T> & prefix, vector<T> const & pool){// size_t const suffix ){
  ++calls;
  dump(prefix, "entering ... prefix"); dump(pool, "pool");
  if (prefix.size()            == C) return show(&prefix);
  if (prefix.size()+pool.size()== C) return show(&prefix, &pool);
  assert (!pool.empty());
  vector<T> const newPool(pool.begin()+1, pool.end());
  recurs(prefix, newPool);
  prefix.push_back(pool[0]);
  recurs(prefix, newPool);//this function returns only after all the layer are done
  prefix.pop_back();
}

int main() {
  string tmp = "abcde";
  vector<char> v(tmp.begin(), tmp.end());
  assert(C <= v.size());
  recurs(emptyV, v);
  cout<<calls<<"  calls to the recursive funcion to generate "<<combos<<endl;
}

(latency) DataGrid^^noSQL (throughput)

  • Coherence/Gemfire/gigaspace are traditional data grids, probably distributed hashmaps.
  • One of the four categories of noSQL systems is also a distributed key/value hashmaps, such as redis
  • …. so what’s the diff?

https://blog.octo.com/en/data-grid-or-nosql-same-same-but-different/ has an insightful answer — DataGrids were designed for latency; noSQL were designed for throughput.

I can see the same trade-off —

  • FaceBook’s main challenge/priority is fanout (throughput)
  • RTS main challenge is TPS measured in messages per second throughput
  • HFT main challenge is nanosec latency.

For a busy exchange, latency and throughput are both important but if they must pick one? .. throughput. I think most systems designers would lean towards throughput if data volume becomes unbearable.

I guess a system optimized for latency may be unable to cope with such volume. The request queue would probably overflow … lost of data

In contrast, a throughput-oriented design would still stay alive under unusual data volume. I feel such designs are likely more HA and more fault-tolerant.

next_Combo@3 boys out@5 #recursive

//This recursive version suffers from stack overflow but
//it's able to print out combinations in ascending order and
//maintains the relative positions between any 2 items
//
//This version reduces vector cloning by growing/shrinking
//global objects but with higher complexity
//
//However, global variables actually simplify the logic!
#include <iostream>
#include <sstream>
#include <deque>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
//#define DEBUG
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
string tmp = "abcde";
deque<char> pool(tmp.begin(), tmp.end());
deque<char> prefix;

template<typename T> void dumpDeque(deque<T> const & p, string const & headline){
  cout<<"-- "<<headline<<" -- size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
  cout<<endl;
}
template<typename T> int showCombo(deque<T> const * p){
  ++ combos;
  stringstream ss;
  for(int i=0; i<p->size(); ++i) ss<<setw(5)<<(*p)[i];
  static string last;
  string combo=ss.str();
  cout<<"combo: "<<combo<<endl;
  assert(last <= combo && "should be ascending");
  last = combo;
}

template<typename T> int recurs(){
  ++calls;
#ifdef DEBUG
  cout<<"-------------\nentering "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
#endif
  if (prefix.size() == C) return showCombo(&prefix);
  if (pool.empty()) return 0;
  T poolhead = pool.front(); pool.pop_front();

  prefix.push_back(poolhead); //add poolhead to prefix

  //this 1st recursive function call starts a rather deep call stack and prints
  //all combinations with the given (new longer) prefix
  recurs<T>();//use the longer prefix and the shorter pool
  prefix.pop_back();//restore prefix
  recurs<T>();
  pool.push_front(poolhead); //restore pool, needed by the 2nd call in the parent stack
#ifdef DEBUG
  cout<<"^^^^^^ restored before returning "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
#endif
}

int main() {
  assert(C <= pool.size());
  recurs<char>();
  cout<<calls<<"  calls to the recursive function to generate "<<combos<<endl;
}

next_Combo@3 boys out@5 #iterative complex

//Without loss of generality, each combination is internally represented
//as a sorted vector (ascending).
//There's one-to-one mapping between such a vector and a combination
#include <iostream>
#include <vector>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
using namespace std;
size_t changes=0, calls=0;
size_t const C=3; //how many in each combination

template<typename ITR> bool isAscending (ITR const b, ITR const end){
  for (ITR last = b, i = b; i!=end; ++i){
    if (*last > *i) {
      cout<<*last<<" should be lower (or equal) but is higher than "<<*i<<endl;
      return false;
    }
  }
  return true;
}
template<typename T> void dump(vector<T> & v,  bool isAssert = true){
  for(int i=0; i<v.size(); ++i) {
    cout<<setw(5)<<v[i];
    if (i == C-1) cout<<"  unused:";
  }
  cout<<endl;
  for(int i=0; i<v.size(); ++i){
    cout<<setw(5)<<i;
    if (i == C-1) cout<<"  unused:";
  }
  cout<<endl<<"---------------------------------"<<endl;
  if(isAssert){
    assert(isAscending(v.begin(), v.begin()+C) && "1st C elements should be ascending after next_combo (not next_perm)");
    assert(isAscending(v.begin()+C+1, v.end()) && "unused section should be ascending");
  }
}

template<typename T> bool reshuffle(vector<T> & v, int p2u){
//      cout<<"after swap"<<endl; dump(v);
        sort(v.begin()+p2u+1, v.end());
//      cout<<"after sorting everyting to the right of p2u"<<endl; dump(v);

        if (p2u == C-1){
                sort(v.begin()+C, v.end());
                ++changes;
                return true;
        }
        assert(p2u<C-1);
        //now reset everything to my right
        //now locate the best_man (next item after the new p2u) .. can use binary search
        for(int i=p2u+1; i<v.size() ; ++i){
          if (i==v.size()){ //p2u is highest possible!
                //cout<<"p2u is already highest"<<endl;
                sort(v.begin()+C, v.end());
                ++changes;
                return true;
          }
          if (v[p2u]<v[i]){
                //cout<<"best man = "<<i<<endl;
                for(int j=0; p2u+1+j<=C-1; ++j){
                  swap(v[p2u+1+j], v[i+j]);
                }
                sort(v.begin()+C, v.end());
                ++changes;
                return true;
          }
        }//for
        // now must return!

        assert(1==0 && "should never reach here");
        cout<<"after best_man search"<<endl; dump(v);
}

// reshuffles vector to the next higher combo
//Assuming 5-choose-3, the 1st 3 chars represent the combination,
//and the remaining characters at the end of the vector are
//unused in the current combination.
template<typename T> bool next_combo(vector<T> & v){
  ++calls;
  dump(v );
  if (v.size() == C) return false; // C-choose-C == 1 !

  for(int p2u=C-1; /*last position*/ p2u >=0 ;--p2u){
    for (int unusedItem=C; unusedItem<v.size(); ++unusedItem){ //scan the unused section of the array
        if (v[p2u] < v[unusedItem]) {
          assert(p2u<unusedItem);
          swap(v[p2u], v[unusedItem]);  //p2u should not change further
        //cout<<"identified "<<p2u<<" as position to upgrade... Will reset subsequent positions, and return"<<endl;
          return reshuffle(v, p2u);
        }
    }
    // no p2u identified yet. move p2u marker to the left
  }//for
  cout<<"no more higher combo. This is the end"<<endl;
  return false;
}
int main() {
//  vector<float> v{111,222,333,444,555,666};
  string tmp = "abcdefg";
  vector<char> v(tmp.begin(), tmp.end());
  assert(C <= v.size());
  for(; calls<9992; ){
        if (!next_combo(v)){
         cout<<changes<<" changes performed till the highest combo; next_combo() call count = "<<calls<<endl;
         return 0;
        }
  }
}

complexities: replicate exchange order book #Level1+2

–For a generic orderbook, the Level 1 complexity is mostly about trade cancel/correct.
All trades must be databased (like our TickCache). In the database, each trade has a trade Id but also an arrival time.

When a trade is canceled by TradeId, we need to regenerate LastPrice/OpenPrice, so we need the ArrivalTime attribute.

VWAP/High/Low are all generated from TickCache.

The other complexity is BBO generation from Level 2.

–For a Level-based Level 2 order book, the complexity is higher than Level 1.

OrderId is usually required so as to cancel/modify.

python run complex external commands #subprocess

I prefer one single full-feature solution that’s enough for all my needs. The os.system() solution is limited. The subprocess module is clearly superior. One of the simplest features is

>>> subprocess.call([“ls”, “-l”])

If you need redirection and background, then try the single-string version

>>> subprocess.call(‘ls /tmp > /tmp/a.log &’, shell=True) # output goes to STDOUT, hard to capture

c++4beatFronts: which2LetGo if I must pick1

Q: Which one to let go, given I have limited O2/laser/bandwidth and mental capacity?

  1. give up BP — biggest room for improvement but least hope
  2. give up codility style — Just get other friends to help me. See codility: ignore overflow, bigO

How about pure algo?

  • already decent? Can improve.
  • diminishing return? Smaller room for improvement? but I can learn a few key ideas about the G100 questions

##failed cod`IV

Here I only list the failed ones. There are many successful ones in ##some@the significant coding IV experiences

 

QQ BP ECT speed syntax algo where
 ? 😦 NA NA ? home c GS tick-engine
NA 😦 NA NA ? home c Tetris
 ? 😦 NA NA ? home c Macq
😦 😦 NA NA NA home j MS-commodity
? ? NA NA 😦 home j Amazon
😦 ? NA NA NA home j MS itr
😦 ? NA NA NA home j MS FIX
? NA good good 😦 codility c/j Mako
NA NA 😦 ? ? codility j FXoption
NA NA 😦 ? ? codility c Jump #3
NA NA 😦 slow good IDE c Bbg 2017
 NA ? 😦 slow good webex c Citadel array-shrink
none ? 😦 good simple IDE j UBS Suntec
😦 NA NA NA ? paper c Nsdq array 0-N
NA NA NA NA 😦 paper c Nsdq day-trading
😦 NA NA NA 😦 paper c FB regex
😦 NA NA NA 😦 paper c Goog
😦 NA NA NA ? paper j Barc SmartOrderRouter
😦 ? ? NA NA NA paper j UBS YACC
😦 NA NA NA NA paper c Bbg London
😦 ? NA NA NA paper c Imagine Software

See C++(n java)IV tend to beat us]4fronts.

Q: where is my biggest coding IV weakness? crack the algo? complete in pseudo code? or get the damn thing to compile and work?

  1. take-home tests usually beat me in terms of Best Practice
    • possibly give up if I must give up something 😦
  2. speed coding tests usually beat me in terms of ECT speed
  3. white-board tests usually beat me in algoQQ

c++cod`IV: no threading/boost/template/socket..

See C++(n java) high-end IV tend2beat us]4ways

This post is about the c++ topics. The BP/ECT/algo fronts primarily use basic constructs in ##10 basic program`constructs for c++ cod`IV

QQ BP ECT algo#pure
heavy no never never thread
heavy yes seldom never smart ptr
seldom never never never other boost
yes never never never rvr+move
seldom optional never other c++11
heavy seldom never never memory tricks
yes never never never templates meta..
heavy yes never never c^c++
heavy never never never sockets
never never never once object graph serialization
heavy never never never inline
heavy never never never cpu cache
yes never never never exception safety
yes never never never pimpl

FTE^contractor: mgr choosing 1 guy to let go

There are individual differences, but I now feel majority of manager-decision-makers across U.S. and Singapore do feel a higher resistance to let go an FTE rather than a contractor. Deterrence:

  • expectation of affected employee that the job is long-term. Most managers won’t ignore your feeling. They are worried about your complaints
  • reflection on her own record as a manager
    • they often prefer an internal transfer
  • formal performance improvement process is not required for non-performing contractor
  • severance
  • official complaints
  • litigation (in the U.S.)

The deterrences are a form of protection for the FTE, but paradoxically, I hate these deterrences, as they lengthen the slow death.

 

#1 career SAFETY enhancer@past5Y #muscleBuild`IV

See also I can professionally qualify as …

From 2010 to 2015, “IV muscle building” was my #1 career safety enhancer. That’s why I had so much “joy”. That’s where I invested so much time and money.

  • MSFM to open “new market” or at least cement my position on the quant-dev job market
    • SQL is, intriguingly, similar
  • c++, c#, py
  • swing– as a distinct job market segment

Now in 2017 I don’t see it as my #1 career safety enhancer. In retrospect, I find it not very simple to reach a conclusion.

One factor — I think core java skill demand turned out to be extremely robust (whereas c# and c++11 were underwhelming)

One factor — traditional pricing quant is shrinking as a job category

One factor — For both high end c++ and quant domains, bar is much higher than anticipated.

c# static class emulated in java/c++ #all-static

–c++:

use a (possibly nested) namespace to group related free functions. See google style guide.

c# has static classes. C++ offers something similar — P120 effC++. It’s a struct containing static fields. You are free to create multiple instances of this struct, but there’s just one copy for each field object. Kind of alternative design for a singleton.

This simulates a namespace.

–java:

In [[DougLea]] P86, this foremost OO expert briefly noted that it can be best practice to replace a java singleton with an all-static class

–c# is the most avant-garde on this front

  • C# static class can be stateful but rarely are
  • it can have a private ctor

##(IncInc)most effective 10Y direction to INCrease INCome

Reality! We better embrace reality, not ignore it.

  • new skills like py, hadoop, data science? I see low chance of increasing my income, but I could be wrong
  • some specialist role, perhaps
    • risk/pricing analytics? questionable premium
    • low-latency algo trading in c++/java? unlikely … Get Real
  • hands-on architect (different from specialist role)
  • (perhaps maintenance mode) app owner, and grow in a big company?
    • must be a high value application
  • high-end contractor (probably@ibanks)? most practical
  • coding drill? practical 🙂
  • portable instrumentation zbs
  • [p] portable GTD skills (eg instrumentation)
  • [p] improve localSys learning process to speed up get-over-the-hump
  • [m] algo practice
  • socket QQ/GTD? not directly increasing my income but prevent income decline
  • [m] low level QQ topics in ARM, linux, STL, concurrency
  • [m=muscle building for iv]
  • [p=can help me take on a high-paying lead dev role. If too tough, then we can still count on these items to help improve our stress profile; self-esteem; job security; bonus… ]