mutex: allocated in heap or global memory

发表于11月 28, 20176月 12, 2018 作者 BinTAN

In java, mutex is always in heap. Primitive objects can’t be host to a mutex.

In pthreads, mutex object can be allocated anywhere, but I have seen it allocated only in heap or in global area.

in global area, you use a one-liner to allocate and initialize a non-ref variable
on heap, you first malloc then call init(). I think this is like a constructor.
- As a special case, you can also allocate mutex in shared memory, creating a cross-process mutex! Not a common practice in java.

Q: what if I allocate a mutex in stack?
A: in java,the actual object is still on heap,though the variable is on stack and invisible to other threads
A: in C, the entire mutex object can be on stack, but such a mutex is useless. Imagine a lock on a door with a single key to be shared, but other people have their own doors, so there’s no synchronization no access control at all:(

What if I hadn’t worked this hard # kids

发表于11月 28, 20179月 16, 2020 作者 BinTAN

Over my 20Y career, I showed _some_ talent professionally.

In contrast, I showed significantly more talent in school. My massive study effort increased my abilities [1] to the extent that people don’t notice the difference between my talent vs abilities. Even my IQ score improved due to my intellectual curiosity and absorbency. If these efforts are considered a talent, then yes I have the talent of diligence.

[1] eg — Chinese composition, English grammar/vocabulary, many knowledge-intensive subjects

Q1: what if I had put in just an average amount of effort in school and at work? Some people (mostly guys I really don’t know well) seem to put in sporadic efforts that average out to be “just average” but still got into similar professional level like I did, or higher.
A: For my academic excellence, persistent effort was necessary.

A: I guess sporadic effort could be enough to reach my level of professional “success” for very bright and lucky people. I doubt any professional programmer got higher than mine without consistent effort for many years.

Professional success also depends on opportunity, on positioning and timing, on inter-personal skills and relationship skills, on judgment 判断力. None of these depends directly on consistent effort.

An example of positioning — my break into Wall St java market vs my peers who didn’t.

To analyze this question, I need to single out an under-appreciated talent — absorbency capacity i.e. 耐得住寂寞, where I score in the 97th percentile.

Q2: if my son only puts in sporadic and average efforts, could he end up losing the competitions?

make_shared syntax #Gotcha’s

发表于11月 26, 201711月 27, 2017 作者 BinTAN

Needed for take-home coding test.

#include <iostream>
#include <memory> // make_shared

struct C{
  float f;
  C(float arg): f(arg){}
};
int main() {
  std::shared_ptr<int> a(new int(13));
  std::shared_ptr<int> b = std::make_shared<int>(12);
  std::cout<<*b<<std::endl;

  std::shared_ptr<C> c = std::make_shared<C>(1.1); //must pass in ctor args
  //shared_ptr<C> b= make_shared<C>(new C(1.1)); //can't use "new" with make_shared.

  std::cout<<c->f<<std::endl;
}

cout on a char #gotcha

发表于11月 26, 20177月 12, 2018 作者 BinTAN

cout<<someFunction() //can sometimes print an integer!
cout<<(char)someFunction() // best practice!

.so file can b newer than executable #dynamic library

发表于11月 26, 20172月 25, 2019 作者 BinTAN

-rwxr-xr-x 1 bint ccL026 16M Sep 14 15:24 libETSMinervaBust-g.so*
-rwxr-xr-x 1 bint ccL026 19M Sep 14 14:50 minerva-g*

As shown in this real example, the executable was not updated when the .so file was.

I believe with static linking, this would never happen.

defining+using your swap() #gotcha

发表于11月 26, 201712月 7, 2018 作者 BinTAN

Gotcha! If you define your own swap() then it may not get picked up depending on some subtle factors. In the demo below, when the args are declared as “int” variables, then the hidden std::swap() gets picked up instead of your custom swap(size_t, size_t)!

Note there’s no specific #include required.

Solution : if practical, avoid “using namespace std” even in cpp files
solution : except the outermost main(), enclose everything in an anonymous namespace, to force the unqualified swap() to resolve to your custom version.

#include <iostream>
#include <assert.h>

//namespace{

using namespace std;

int value1 = 5;
int calls=0;
template <typename T> void swap(size_t a, size_t b){
        ++calls;
        std::cout<<"in my swap()"<<std::endl;
}

int mymain(){
  int a=0, b=1;
  int oldCount = calls;
  swap<int>(a,b); //int arguments won't invoke my swap()
  assert (calls == oldCount);

  std::cout<<"after 1st call to swap()"<<std::endl;
  swap<int>(0,1); //calls my swap()
  assert (calls == 1+oldCount);

  std::swap<int>(a,b);  //can compile even without any #include
  // no return required
}

//}

int main(){
  mymain();
}

##elegant/legit simplifications ] cod`IV

发表于11月 23, 20178月 10, 2020 作者 BinTAN

eg: reverse link list in K-groups — (CS algo challenge) assume there’s no stub, solve the real problem, then deal with the stub
eg: consumer thread dequeue() method. When empty, it “should” be waiting for a notification, according to Jun of Wells Fargo, but a simpler design returns a special value to indicate empty. The thread can then do other things or return to the thread pool or just die. I think the wait model is not ideal. It can waste a thread resource. We could end up with lots of waiting threads.
Eg: https://bintanvictor.wordpress.com/2017/09/10/lru-cache-concise-c-implementation/ requirement is daunting, so it’s important and completely legitimate to ~~simplify lookup() so it doesn’t~~ ~~insert~~ any data. API is simpler, not incomplete
Eg: find every combination adding up to a given target #Ashish permutation within each combination is a separate issue and not the main issue
recursive solution is often a quicker route to working code. Once we cross that milestone, we could optimize away the recursion.
eg: extract isPrime() as a unimplemented function and simply assume it is easy to implement when I get around to do it.
Eg: show free slots between meetings #bbg I solved a similar and more familiar problem.
eg: violation check for Sudoku is a tedious but simple utility function. We could assume it’s available
eg: violation check for n-queens is likewise slightly harder

try_lock: since pthreads

发表于11月 23, 20173月 7, 2019 作者 BinTAN

This is available in pthreads (pthread_mutex_trylock()).

I believe java and boost::thread are all based on that.

c++11 added std::timed_mutex which supports try-lock features

c++ salaries: high-end^regular

发表于11月 22, 201711月 23, 2017 作者 BinTAN

I spoke with 5 to 10 fellow c++ developers in financial domain across a few countries. All of them seem to suggest c++ skill has higher market value than any other language like java, c# or python. One common reason they all gave is the high frequency trading jobs. My own observations also show that most HFT tech jobs use c++ and very few jobs are java or c#.

However, how many of us can ever pass the HFT interviews?

I have tried a few high-frequency or medium-frequency algo-trading jobs using c++. I didn’t pass any. Among my friends, only one person passed. He passed more than one such interviews and he’s clearly very strong.

So I feel the reality is, most of us won’t be able to get those jobs, or keep the jobs.

Q: how many percent of the c++ jobs in finance are HFT ?

I would guess 5 to 15%. So the vast majority of c++ salaries are ordinary. I think many of us can get those jobs:)

What’s your opinion?

std::move on a int^string variable

发表于11月 21, 20173月 4, 2018 作者 BinTAN

std::string has move ctor, but “int” doesn’t!

https://stackoverflow.com/questions/27497830/move-constructor-and-stdmove-confusion brings clarity

detect cycle in slist #Part 1

发表于11月 21, 20178月 25, 2019 作者 BinTAN

Q: A singly-linked list (slist) contains a loop. You are given nothing but the head node. With O(1) space complexity, how do you locate the join node? For example,

0(head)->1->2->…101->102->103->4(again), so #4 is the merge point

Here’s Deepak’s ingenious trick

first use the 2-pointer trick to find any node inside the loop.
find the length (say, 55, denoted LL) of the loop using a single moving pointer, starting from that node
now we cant discard that node
Now start a pointer B from head and move LL steps.
Now put pointer A at head
Now move A and B in locksteps. They will meet for sure, at the merge point. Here’s the proof:

Suppose the merge point is at MM, i.e. MM steps from head. When A and B starts moving in locksteps,

how far is B from MM? MM steps!
how far is A from MM? MM steps too.

Note LL value could be small like 1.

struct Node{
  Node const * p;
  int const data;
  Node(int _data, Node const * _p): p(_p), data(_data){}
  friend ostream & operator<<(ostream &os, Node const & node){
        os<<node.data<<" / "<<&node<<" -> "<<node.p<<endl;
        return os;
  }
};

Node _9(9, NULL);
Node _8(8, &_9);
Node _7(7, &_8);
Node _6(6, &_7);
Node _5(5, &_6);
Node _4(4, &_5);
Node _3(3, &_4);
Node _2(2, &_3);
Node _1(1, &_2);
Node _0(0, &_1);
Node & root = _0;
Node const * mergePoint = &_1;

//how many distinct nodes in the loop
size_t getLoopLen(Node const & root){
  Node const * brunner = &root;
  Node const * frunner = &root;
  for(;;){
        frunner = frunner->p->p;
        brunner = brunner->p;
        if (frunner == brunner) break;
  }
  cout<<"now the two runners have met somewhere in the loop: "<<*frunner ;
  for(int ret = 1; ;++ret){
        frunner = frunner->p ;
        if (frunner == brunner) return ret;
  }
}

Node const * getMergePoint(Node const & root){
  size_t LL = getLoopLen(root);
  cout<<"# of nodes in loop = "<<LL<<endl;
  Node const * frunner = &root;
  for(int i = 0; i<LL; ++i){ frunner = frunner->p; }
  //cout<<"front runer is "<<*frunner;
  Node const * brunner = &root;
  for(;frunner != brunner;){
        brunner = brunner->p;
        frunner = frunner->p;
  }
  return frunner;
}

int main(){
  _9.p = mergePoint;
  Node const * ret = getMergePoint(root);
  cout<<"Merge point is "<<*ret;
  assert(ret == mergePoint);
}

c++integer overflow when multiplying two literals #gotcha

发表于11月 21, 201712月 7, 2018 作者 BinTAN

You need to cast before multiplying! Easy mistake in coding tests!

int main() {
  long long good, good2 , bad;
  bad =              100000000 * 50;
  good = (long long) 100000000 * 50;
  good2 =            5000000000;
  cout<<good<<" "<<good2<<" without casting: "<<bad<<endl;
}

declare variables ] loop header: c^j #IF/for/while

发表于11月 21, 20176月 28, 2020 作者 BinTAN

Small trick to show off in your coding test…

Background — In short code snippet, I want to minimize variable declarations. The loop control variable declaration is something I always want to avoid.

https://stackoverflow.com/questions/38766891/is-it-possible-to-declare-a-variable-within-a-java-while-conditional shows java WHILE-loop header allows assignment:

List<Object> processables;
while ((processables = retrieveProcessableItems(..)).size() > 0) {/*/}

But only (I’m 99% sure) c++ WHILe-loop header allows variable declaration.

The solution — both java/c++ FOR-loop headers allow variable declaration. Note the condition is checked Before first iteration, in both for/while loops.

update — c++0x allows variable declaration in IF-block header, designed to limit the variable scope.

if (int a=string().size()+3) cout<<a << ” = a \n”; // shows 3

symlink/hardlink: Win7 or later

发表于11月 21, 201711月 22, 2017 作者 BinTAN

https://www.howtogeek.com/howto/16226/complete-guide-to-symbolic-links-symlinks-on-windows-or-linux/ is a 2017 article.

The mklink command can create both hard links (known as “hard links” in Windows) and soft links (known as “symbolic links” in Windows).

On Windows XP, I have used “Junction.exe” for years, because mklink is not available.

ibank dev projects/timelines #XR

发表于11月 21, 20177月 26, 2020 作者 BinTAN

I have heard several people complaining about the unhealthy work culture in NY investment bank IT teams.

· Timeline is artificially tight.
· A typical senior manager is motivated by trophy systems to win him promotions + bonuses, rather than really useful systems making a real impact to business
· Lots of systems are developed but not used or used for superficial purposes only. (This is common in many systems. Only 1% of the features are actually put to productive use.)
· Low quality code; low quality output data; poor quality control
· Constant and wide-spread pressure to retire old systems and rewrite, when the old system is still usable and sometimes rather stable.
· Senseless requirements, questionable value, thrown-away systems

I heard that West coast shops, Chicago buy-side and other companies don’t have exactly the same problems. I feel my current company is also different from NY banks — Older technology; Slight higher quality standard; More stable codebase; Users are mostly external paying customers…

I guess the Wall St culture is “good” for some, while other developers hate it and exit. I’m planning to stick to finance but not banks.

share buy-back #basics

发表于11月 21, 201712月 7, 2018 作者 BinTAN

shares outstanding — reduced, since the repurchased shares (say 100M out of 500M total outstanding) is no longer available for trading.
Who pays cash to who? Company pays existing public shareholders (buying on the open market), so company need to pay out hard cash! Will reduce company’s cash position.
EPS — benefits, leading to immediate price appreciation
Total assets — reduces, improving ROA/ROE
Demonstrates comfortable cash position
Initiated by — Management when they think it is undervalued
Perhaps requested by — Existing share holder hoping to make a profit
company has excess capital and
A.k.a “share repurchase”

discover cpu count ] my machine #lscpu

发表于11月 20, 201711月 3, 2020 作者 BinTAN

NIC — There’s another blog post https://bintanvictor.wordpress.com/2018/03/19/discover-nic-my-machine-lspci/

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
CPU socket(s): 2
NUMA node(s): 2
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K

$ less /proc/cpuinfo|egrep 'core id|processor'  # on the same machine shows

16 “processors”, but #0 and #1 have the same core-id! Total 8 core-id values. I think this is because in each (of two) socket, there are 8 cores with unique core-id values.

dmesg | grep rocessor # doesn’t work any more

dxdiag # windows

dxdiag /t c:\deldel.txt
find “CPU” c:\deldel.txt # copy-paste may not work

check if a word can reshuffle to palindrome

发表于11月 19, 20177月 21, 2019 作者 BinTAN

requirement: https://codecon.bloomberg.com/challenger-series/29. With minimal time and space complexity, the corner cases are tricky. I decided to add a null terminator:)

int main() {
 string S;
  cin>>S;
  vector<char> v(S.begin(), S.end());
  sort(v.begin(), v.end());
  v.push_back(0); //null char
  char last=v[0];
  size_t occur=0;
  bool isOddFound = false;

  for(int i=0; i<v.size(); ++i) {
    bool isNew = v[i] != last;
    //cout<<"--> "<<v[i]<<" isNew = "<<isNew<<endl;
    if (!isNew){
        ++occur;
        //cout<<last<<" occured "<<occur<<" times"<<endl;
        continue;
    }
    //cout<<last<<" occured total "<<occur<<" times"<<endl;
    if (occur % 2){
      if(isOddFound){
        cout<<"no"<<endl;
        return 0 ;
      }
      isOddFound = true;
    }
    last = v[i];
    occur = 1;
  }
  cout<<"yes"<<endl;
}

coin problem #all large-enough amounts are decomposable

发表于11月 19, 20177月 17, 2019 作者 BinTAN

This is not really a algo IV question, but more like brain teaser problem.

Based on https://en.wikipedia.org/wiki/Coin_problem — For example, the largest amount that cannot be obtained using only coins of 3 and 5 units is 7 units. The solution to this problem for a given set of coin denominations is called the Frobenius number of the set. The Frobenius number exists as long as the set of coin denominations has no common divisor.

Note if a common divisor exists as in {2,4} then all the odd amounts will be non-decomposable.

Q: why a very large amount is always decomposable ? Give an intuitive explanation for 2 coin values like 3 and 5.

Here’s an incomplete answer — 15 (=3*5), 16, 17 are all decomposable. Any larger number can be solved by adding 3’s .

In fact, it was proven that any amount greater than (not equal to) [xy-x-y] are always decomposable. So if we are given 2 coin values (like 4,5, where x is the smaller value) we can easily figure out a range

xy-x-y+1 to xy-y

are each decomposable. Note this range has x distinct values. So any higher amount are easily solved by adding x’s

Also note xy-y is obviously decomposable as (x-1)y.

## G5 big-data technologies

发表于11月 18, 20178月 9, 2020 作者 BinTAN

map-reduce #at the heart of ..
hadoop/Spark
cloud
noSQL
big data tech feature: inexpensive hardware

##G5 std::map tasks 4cod`IV

发表于11月 18, 20174月 21, 2019 作者 BinTAN

custom hash func? See short example on P 364 [[c++standard library]]. [[optimized c++]] has many examples too.

initialize? There’s a simple constructor taking a long initializer, but the insert() methods support the same and are more versatile.

insert? single pair; range (anotherMap.being(), end());

** insert single value — won’t overwrite pre-existing
** map1.emplace(…)
** map1[key2] = val3 // overwrites pre-existing
** insert list of values — http://en.cppreference.com/w/cpp/container/unordered_map/insert

(returning the value) lookup? at() better than operator[]

a pointer type as key? useful technique.

erase? by a specific key. No need to call another function to really erase the node.

Q: create only if absent; no update please
A: insert()

Q2: create or uppate
Q2b: look up or create
A: operator []

Q1: update only; no create please
Q1b: look up only. No create please
A: find() method

Q: check for existance
A: c.find() is slightly better than c.count() esp. for multi_* containers

## G20 operations(4cod`IV)on containers+str

发表于11月 17, 20174月 21, 2019 作者 BinTAN

“Operation” means logical operations any programmer (java, Perl, javascript, whatever) frequently performs on a common data structure. STL offers about 30 to 60 everyday operations. A beginner can benefit from a good short list.

List below leans towards sequential containers (shows up in 80% coding questions), but also includes essential operations on associative containers.

) print — using copy and ostream_iterator. See post. See stl-tut
) find, find_all
slicing i.e. subsequence
[iv] sort — using custom “myLess” as a type param. See post on functor-type. See effSTL
[iv] construct a sorted container using customer “myLess” as type param. See blog
filter
[iv] nested container
truncate at a point
append, in bulk
pop front
binary search in pre-sorted
[iv] deep copy
mass-populate with one default value
remove/erase – see effSTLplug back_inserter into copy, remove_copy, replace_copy …
[iv] plug bind2nd into replace_if/replace_copy_if, count_if, remove_if/remove_copy_if …. See blog
initialize with hardcoded literals — relatively easy
conversion like between string and vector of char. See blog. See [[ stl tut ]]
—-2nd tier
insert mid-stream, in bulk
[iv] = possibly picked by interviewers

Hog the cpu by busy-waiting, with interrupts off

发表于11月 16, 20176月 6, 2020 作者 BinTAN

In the low-latency seminar https://www.youtube.com/watch?v=BD9cRbxWQx8 , Dan Shaya said our thread should hog the cpu by spinning, with interrupts off (to fend off other threads).

I like this unconventional advice.

q[-c] flag in g++

发表于11月 16, 20176月 1, 2018 作者 BinTAN

$ cat main.cpp:

void aa();
int main(){aa();}

g++ command without “-c” would try to create an executable but the “aa()” is declared not defined.

You can either user “-c” or

g++ lib1.o main.cpp # to link in the aa() definition from lib1.o

## a few devops technical tips : %%xp

发表于11月 15, 20172月 7, 2018 作者 BinTAN

* continuous bamboo build triggered by commits
* automated test as part of continuous build
* automated packaging (jar, c++ lib, python installable distro…) and upload into central repository (such as Maven/Nexus)
* automated build stream (with history) created in continuous build engine, when a branch is created in git
* require each commit to start with a jira id
* generate release notes from commit messages

python class to define both str and repr

发表于11月 15, 20171月 25, 2018 作者 BinTAN

def __repr__(self): # supports pprint
return self.__str__()

def __str__(self):
return self.car + ‘ ‘ + self.date + ‘: miles=’ + self.miles + ‘ fuel=’ + self.fuel;

[tags: t_src_code]

##9dataStruct(+..)for c++speed cod`

发表于11月 15, 20178月 12, 2019 作者 BinTAN

linked node manipulation in a graph context
vector (more useful than array), std::string (more useful than cStr). Many string operations are still unfamiliar
1. Array-based data structures are required in 80% of my coding tests.
2. More than 50% of all my coding tests ~~require nothing but arrays~~.
3. Most of my toughest coding tests are presented in arrays but may need maps as helpers
string in std::string or c-str
iterator manipulation like erase, lower_bound, find,
sorting, operator<(), upper_bound, binary search … on containers
sorted data structure like std::map
[w] stringstream — ECT to improve

Very few Wall St interviewers would test you on ~~graph or DP~~. Here are other less important constructs:

[w] binary tree is common and simple, but they can ask very tough questions on it!
[w] double pointer is trickier and my advantage
[w] Node class in a linked data structure.
[w] stack, queue.
[w] grid or matrix
file I/O? only for IDE tests, not pure algo or take-home tests
[w] basic syntax for pointer arithmetic.
[w] dtor, copier, op=? only for design questions, not algo questions.
[w] shared_ptr? Only for design questions, never needed in algo questions!
[w] ref variable only as function I/O.
stl algo? Only Citadel array-shrink
never exception
never template
no (very, very seldom) threading in coding Q
adv: pointer to function
adv: circular buffer
[w = no weakness]

milePerGallon #DeepakCM SIG

发表于11月 15, 20177月 6, 2019 作者 BinTAN

Similar to GS Tick Query question but without optimization requirements.

–Requirement: Your Choice of Editor, OS, Compiler & Programming Language. Total time allotted for this test is 75 mins, The further rounds of Interview will be based on this programming test.

A group of N people go for driving. When ever they refill the fuel in their vehicles, they update a text file with the following fields

All the fields are comma separated. A sample such file is given below

John, Ford, 350, 20, 20160921
John, Ford, 220, 13, 20160920
John, Ford, 230, 35, 20160918
John, Ford, 300, 22.5, 20161112
Jonathan, Mercedes GLA, 220, 13, 20160920
Mishal, Ford Escort, 230, 35, 20160919

Write a function with the following parameters
GetMPG(name_of_person, Start_date, End_date)
such that it should return the following information for each User/driver
1) Name of Car
2) Miles per Gallon for the mentioned period for each of the Car

A person can have more than one car but never share his cars. If there is no record for the mentioned date range, the function should not return anything for that specific car.

–analysis

https://github.com/tiger40490/repo1/blob/py1/py/milePerGallon.py is my python solution.

atomic types: java ( !! c++)always lockfree

发表于11月 14, 20178月 25, 2019 作者 BinTAN

Java documentation says ALL atomic types use lock-free hardware.

In contrast, C++ standard implies atomic types would use locks if hardware has no lock-free support.

Even on modern hardware, most c++ atomic<myBigType> use locks internally.

lockfree usage in high speed trading system@@

发表于11月 14, 20172月 19, 2018 作者 BinTAN

Hi Deepak,

Further to our conversation, do high frequency shops use lock-free? I would guess that the most demanding high-speeding data processing system (such as the matching engine in an exchange) are single-threaded, rather than lock-free multithreaded.

I hope to hear a typical high-speed data processing system that has lots of shared mutable data. I have not found one.

· Order matching

· Airline booking

· Vote counting in real time?

If 99% of the data is not shared mutable, then single-threaded design is probably better.

· I feel one reason is complexity vs simplicity. Lock-free designs can be very tricky, according to experts.

· Another reason is performance. Lock-free is, in my view, definitely slower than single-threaded. The memory fence required on those atomic objects impeded compiler optimizations.

· Most important reason, in my view — it’s always possible to achieve the same functionality without using locks or lock-free designs. Single-threaded designs are possible, simpler and faster.

If we have 64 cores, just run 64 processes, each single-threaded. However, in reality these systems often do have some shared mutable data between two threads. There are various solutions I’m not qualified to compare and illustrate. These solutions could use lock-free or locks.

volume alone doesn’t qualify a system as big-data

发表于11月 12, 20178月 9, 2020 作者 BinTAN

The Oracle nosql book has these four “V”s to qualify any system as big data system. I added my annotations:

Volume
Velocity
Variety of data format — If any two data formats account for more than 99.9% of your data in your system, then it doesn’t meet this definition. For example, FIX is one format.
Variability in value — Does the system treat each datum equally?

Most of the so-called big data systems I have seen don’t have these four V’s. All of them have some volume but none has the Variety or the Variability.

I would venture to say that

1% of the big-data systems today have all four V’s
50%+ of the big-data systems have no Variety no Variability
- 90% of financial big-data systems are probably in this category
10% of the big-data systems have 3 of the 4 V’s

My friend JunLi said most of the data stores he has seen are strictly structured data, and cited credit bureau report as an example.

The reason that these systems are considered “big data” is the big-data technologies applied. You may call it “big data technologies applied on traditional data”

See #top 5 big-data technologies

Does my exchange market data qualify? Definitely high volume and velocity, but no Variety or Variability. So not big-data.

data science^big data Tech

发表于11月 12, 20175月 6, 2018 作者 BinTAN

The value-add of big-data (as an industry or skillset) == tools + models + data

If we look at 100 big-data projects in practice, each one has all 3 elements, but 90-99% of them would have limited value-add, mostly due to .. model — exploratory research
1. data mining probably uses similar models IMHO but we know its value-add is not so impressive
tools —- are mostly software but also include cloud.
models —- are the essence of the tools. Tools are invented, designed mostly for models. Models are often theoretical. Some statistical tools are tightly coupled with the models…

Fundamentally, the relationship between tools and models is similar to ~~Quant library technology vs quant research~~.

Big data technologies (acquisition, parsing, cleansing, indexing, tagging, classifying..) is not exploratory. It’s more similar to database technology than scientific research.
Data science is an experimental/exploratory discovery task, like other scientific research. I feel it’s somewhat academic and theoretical. As a result, salary is not comparable to commercial sectors. My friend Jingsong worked with data scientists in Nokia/Microsoft.

The biggest improvement in recent years are in … tools

The biggest “growth” over the last 20 years is in data. I feel user-generated data is dwarfed by machine generated data

%%c++keep crash` I keep grow`as hacker #zbs#AshS

发表于11月 12, 20173月 4, 2020 作者 BinTAN

Note these are fairly portable zbs, more than local GTD know-how !

My current c++ project has high data volume, some business logic, some socket programming challenges, … and frequent crashes.

The truly enriching part are the crashes. Three months ago I was afraid of c++, largely because I was afraid of any crash.

Going back to 2015, I was also afraid of c++ build errors in VisualStudio and Makefiles, esp. those related to linkers and One-Definition-Rule, but I overcame most of that fear in 2015-2016. In contrast, crashes are harder to fix because 70% of the crashes come with no usable clue. If there’s a core file I may not be able to locate it. If I locate it, it may not have symbols. If it has symbols the crash site is usually in some classes unrelated to any classes that I wrote. I have since learned many lessons how to handle these crashes:

I have a mental list like “10 common crash patterns” in my log
I have learned to focus on the 20% of my codebase that are most convoluted, most important, most tricky and contribute most to debugging difficulties. I then invest my time strategically to rewrite (parts of) that 20% and dramatically simplify them. I managed to get familiar and confident with that 20%.
- If the code belongs to someone else including 3rd party, I try to rewrite it locally for my dev
I have learned to pick the most useful things to log, so they show a *pattern*. The crashes usually deviate from the patterns and are now easier to spot.
I have developed my binary data dumper to show me the raw market data received, which often “cause” crashes.
I have learned to use more assertions and a hell lot of other validations to confirm my program is not in some unexpected *state*. I might even overdo this and /leave no stoned unturned/.
I figured out memset(), memcpy(), raw arrays are the most crash-prone constructs so I try to avoid them or at least build assertions around them.
I also figured signed integers can become negative and don’t make sense in my case so I now use unsigned int exclusively. In hind sight not sure if this is best practice, but it removed some surprises and confusions.
I also gained quite a bit of debugger (gdb) hands-on experience

Most of these lessons I picked up in debugging program crashes, so these crashes are the most enriching experience. I believe other c++ programs (including my previous jobs) don’t crash so often. I used to (and still do) curse the fragile framework I’m using, but now I also recognize these crashes are accelerating my growth as a c++ developer.

data mining^big-data

发表于11月 12, 201711月 12, 2017 作者 BinTAN

Data mining has been around for 20 years (before 1995). The most visible and /compelling/ value-add in big-data always involves some form of data mining, often using AI including machine-learning.

Data mining is The valuable thing that customers pay for, whereas Big-data technologies enhance the infrastructure supporting the mining

https://www.quora.com/What-is-the-difference-between-the-concepts-of-Data-Mining-and-Big-Data has a /critical/ and concise comment. I modified it slightly for emphasis.

Data mining involves finding patterns from datasets. Big data involves large scale storage and processing of datasets. So combining both, data mining done on big data(e.g, finding buying patterns from large purchase logs) is getting lot of attention currently.

NOT All big data task are data mining ones(e.g, large scale indexing).

NOT All data mining tasks are on big data(e.g, data mining on a small file which can be performed on a single node). However, note that wikipedia(as on 10 Sept. 2012) defines data mining as “the process that attempts to discover patterns in large data sets”.

(Revisit) senior manager IV: what they watch out for

发表于11月 12, 20178月 12, 2019 作者 BinTAN

Common, non-trivial, perhaps hidden issues in a candidate, ranked:

twist and turn and not /candid/ about his past
not listening
- jumping to conclusions
- assuming too much
- not waiting for a long-winded interviewer to finish, perhaps interrupting
- jumping to the defensive, when deliberately provoked
desperate for the job and losing his perspective
opinionated
choosy on projects
conceited beyond “cool confidence”
—–secondary
impulsive answers ^ elusive, guarded answers
- elusive answers? common
bad-mouth past managers
lack of humor? common
tensed up and not cool and calm, perhaps prone to workplace stress
exaggeration about his past? common
not articulate enough? common
poor eye contact? common

next_perm@N color socks #complex #100%

发表于11月 12, 20177月 28, 2019 作者 BinTAN

A common IDE coding challenge — given x pairs socks of various colors, generate all the permutations, in ascending order. Each color has a value.

–Solution 1: std::next_permutation() and prev_permutation()

–solution 2: I can probably write my own next_perm(). Using This function we can generate an ascending sequence of permutations starting from the current content of a vector.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/nextPerm.cpp is my iterative solution, but should use lower_bound()

https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextPermOfNSocks.py is my python solution, overcoming the lower_bound problem.

relating my perm/comb algos

发表于11月 12, 20177月 6, 2019 作者 BinTAN

–next_perm: I have an iterative solution

–next perm 5C3: iterative algo, growing global collection

–next_combo 5C3: I have a (approx) 10-line recursive algo. (Iterative algo is non-ideal). Passing in two collections.

–next_abbreviation of any length: I have a (approx) 10-line recursive algo. Using global collection, this algo is simpler than next_combo

–So far, these algos are decent but have nothing in common.

next abbreviation@word with repeat char #2^N #100%

发表于11月 11, 20177月 6, 2019 作者 BinTAN

This is a real coding question at 10gen/mongoDB

https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/abbrIterative.py is a simple iterative py solution. Not ascending.

Q1: Given a word with unique letter (like “abc”), Can you write c++ program to generate all abbreviations in any order?
Q2: same question, but don’t use any external storage beside character arrays.
Q3: same question, but output in ascending order? Expected output:

It’s ~~daunting~~ to generate this sequence without recursion. https://github.com/tiger40490/repo1/blob/cpp1/cpp1/combo_permu/abbr_ascendRecursive.cpp is my recursive solution to Q3.

Q4: what if the word has non-unique letters, like “mongo”? The only solution I know relies on an external lookup device.

find anagram groups]file #1-pass, FB

发表于11月 11, 20177月 6, 2019 作者 BinTAN

Requirement: if any 2 words are anagrams, that’s a group. Within a file, identify all such groups.

Normalize each word using counting sort on the characters of the word. There’s no competing alternative in this isolated part of the solution. Main part of the solution is sorting all words based on the normalized version. There are many alternative solutions to this sort.

desirable — memory efficiency. create no or few objects besides the strings read in from the file.

desirable — avoid the need to create/allocate linkNode or wrapper objects around original words. In c++, such a wrapper is smaller than in java, probably a pointer to the original word + a link pointer

Solution 1 — insert into a sorted set, using a customized comparison (subclass binary_function) based on the normalized string and something unique, such as line number or object address. After all the inserts, when you iterate the set, an anagram group would show up in a row. We can also keep a separate array or iterators (pointers) that point to the first word in each group — emulating a multimap.

Without the something unique, the set would drop words.

I think insertion is inefficient since a normalized string is regenerated for the same word over and over. To overcome this, we could create a wrapper object with 2 fields — normalized string + a pointer to the original word. However, if “SAT” is a very common normalized string then we create multiple copies of it.

Solution 2 — separate chaining. Maintain a hashed map (faster) or sorted map keyed by the normalized strings, just a single copy for each normalized string. Value of each map entry is a linked list of the original words.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
using namespace std;

map<vector<char>, size_t> m;
template<typename T> ostream & operator<<(ostream & out , vector<T> const & v){
        for(int i=0; i<v.size(); ++i) out<<setw(1)<<v[i];
        out<<" ";
}
int main(){
  ifstream fs("words.txt");
  string line, word;
  while(getline(fs, line)){
     stringstream ss(line);
     while(getline(ss, word, ' ')){
        //trim
        int endpos = word.find_last_not_of(" ");
        if (endpos < string::npos) word = word.substr(0, endpos+1);
        if (word.empty()) continue;
        //lower case
        transform(word.begin(), word.end(), word.begin(), ::tolower);
        //sort
        vector<char> sorted(word.begin(), word.end()); sort(sorted.begin(), sorted.end());

        int count = ++m[sorted]; //1st time we get 1 🙂
        cout<<sorted<<"\t<- "<<word<<" :" <<count<<endl;
     }
  }
}

generate random line-up@Any combo@N boys #70%

发表于11月 10, 20177月 6, 2019 作者 BinTAN

A standard permutation/combination problem in some coding tests. You are often required to iterate all of them.

Given N cities, how many permutations of any combinations are there in total.

My iterative sum formula: answer(N)= N_choose_0 + N_choose_1 * 1! + N_choose_2 * 2! + … + N_choose_N * N!

N_choose_0 = 1 !

–recursive algo: use answer(N-1) to answer(N)

–iterative algo:

Suppose we have an iterative_next_perm(list) function already written.
suppose we have an iterative_next_combo(N, 3) that generates all combinations of 3 out of N distinct chars.

Then, here’s a solution — call

iterative_next_combo(N,1)
iterative_next_combo(N,2)
iterative_next_combo(N,3)…
iterative_next_combo(N,N), in a loop (of N iterations).

Suppose one of the N calls generated 221 combinations. Run a loop (of 221 iterations) each time pass one combination into iterative_next_perm(combo). So our main program has only 2 nested loops. Most of the complexities are encapsulated in iterative_next_perm() and iterative_next_combo()

next Combo@3,using5distinct chars,permitting redraw

发表于11月 10, 20177月 6, 2019 作者 BinTAN

Modified from Leetcode problem 17. Suppose there is nothing but one red button. and there are L (like 5) letters on it.

Q: With 4 (=C) draws from 3 (=L) letters (same phone pad button), how many permutations? L^C.
Q: With 4 (=C) draws from 3 (=L) letters, how many combinations?

For C=1, Ans=3
For C=2, Ans=6
For C=3, Ans=10= 6 + 3 + 1?
For C=4, Ans=15=10+(10-6)+1
For C=5, Ans=21=15+(15-10)+1

–My explanation of the count:

Key: Each combo is represented as a sorted vector (ascending). There’s one-to-one mapping between such a vector and a combo.

let’s say L=3 and C grows from 1 to 3 .. For C=2, I put all combos on 3 rows (3 vectors or lists)

aa ab ac #all that starts with a
bb bc #all that start with b
cc # lone wolf

For C=3, Ans

first container is populated by prepending ‘a’ to all 3 “old” containers of items
2nd container is populated by prepending ‘b’ to 2nd-and-lower old containers
3rd container is always a lone wolf

Both solutions below are simple, reusable and implemented in https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/nextComboOf3redrawFrom5.cpp

–sol-1 efficient incremental solution — for C=3, given one combo as a string, find the “next” combo. For example after bbc, we should get bcc. Bonus feature — output is naturally sorted 🙂

–sol 2: append the next char ..

next split@N boys #51%

发表于11月 10, 20178月 19, 2019 作者 BinTAN

N boys are all unique “entities”, each identified by student id. Later we can look into “next split of N colored balls, having J unique colors”.

In a split, every boy belongs to exactly one group (minimum 1 member). We could rely on python’s default sort order, but for now let’s define a sort order between two groups AA and BB:

sort by group size
If same size, sort by the lowest student id in AA vs in BB.

Every “split” is recorded as a sorted list of groups. I will define a sort order between 2 splits:

sort by list size
if same size, sort by first group, the by 2nd group..

Q1: how many ways to split N boys?
%%A: First we line up N boys — there are N! line-ups. In each line-up, we have 2^(N-1) splits because there are N-1 slots to place a group-boundary. This is a way to iterate over ALL splits, but without uniqueness.

Q2: Write a program to generate all splits in some order. For example, with 3 boys h,i,j:

h/i/j in a giant group…
–3 split-of-2-groups: h, i/j
i, h/j
j, h/i…
–1 split of 3 singular groups: h, i, j
==== total 5 distinct splits showing 10 group objects but not all distinct

With 3+1 boys h,i,j,k: First add k to each of 10 group objects:

h/i/j/k
h/k, i/j
h, i/j/k
i/k, h/j
i, h/j/k
j/k, h/i
j, h/i/k
h,i, j/k
h,j, i/k
i,j, h/k … — so far, 10 splits. Now 5 more splits featuring k in its own singular group
k, h/i/j
h, k, i/j
i, k, h/j
j, k, h/i
h,i,j,k

—- analysis: I don’t know how many splits there will be for N boys. An iterative function will probably work:

After I listed out exactly 5 splits of 3 boys, and I added another boy. So here’s the algo I used — For each of the 5 (existing) splits, add this boy to each of 10 group objects, or put himself as a new singular group.

next Perm@3boys out@5, non-recursive #complex

发表于11月 9, 20177月 6, 2019 作者 BinTAN

Latest -- https://github.com/tiger40490/repo1/blob/cpp1/cpp1/combo_permu/iterative_nextPerm3boysOf5.cpp


// Requirement: generate all permutations of C(like 3) chars
//from N(like 5). The count should be N!/(N-C)!
// Bonus: generate in ascending order, where 'ascending' is
//defined on the basis that within the original word, a char
//has lower value than any char on its right. This is more clear
//when the word itself is a sorted string, but actually it's
//not needed.
//
//global collection is simpler than stack variables. Easier to visualize
//and uses less memory
#include <iostream>
#include <sstream>
#include <vector>
#include <deque>
#include <algorithm>
#include <iomanip>
#include <assert.h>
using namespace std;

string _tmp="abcd"; vector<char> const pool(_tmp.begin(), _tmp.end());
vector<size_t> poolIndex;
size_t const C = 3, N = pool.size(); //wanted: all permutations of C, out of the pool of items

//global collection, to be updated in each recursive layer.
vector<vector<size_t> > coll;
// Note each item (like a char or a color or a studentId) is
// represented in the global collection by an unsigned integer.
// This is the positional index in the original "pool" of items.
// This scheme ensures the permutations produced is ascending

string show(vector<size_t> const & v, string headline="", bool isPrint=true){
  stringstream ss;
  for (int i=0; i<v.size(); ++i){
    ss<<setw(5)<<pool[v[i]];
  }
  if (isPrint) cout<<ss.str()<<" <-- "<<headline<<v.size()<<endl; return ss.str(); 
} 
void dump(string headline="", bool isAssert=true){ size_t const cnt = coll.size(); assert(cnt); size_t const len = coll[0].size(); size_t exp = 1; for (int t=N; t>N-len; --t) exp *= t; //loop len times
  assert(cnt == exp);

  stringstream ss;
  string last = "";
  for(int i=0; i<cnt; ++i){
    string item=show(coll[i], "", false);
    ss<<item<<endl;
    if (!isAssert) continue;
    assert(last<item && "should be strictly asending");
    last = item;
  }
  cout<<headline<<"\t size "<<cnt<<endl<<ss.str()<<endl;
}

//RVO should kick in to skip the copying upon return
vector<size_t> const find_unused(vector<size_t> const & perm, size_t const len){
      vector<size_t> tmp4set_diff(perm); //permutations are by defnition unsorted.
      sort(tmp4set_diff.begin(), tmp4set_diff.end());
      vector<size_t> unused(N-len);
      set_difference(poolIndex.begin(), poolIndex.end(), tmp4set_diff.begin(),tmp4set_diff.end(),unused.begin());
      //show(tmp4set_diff, "tmp4set_diff"); show(poolIndex, "poolIndex"); show(unused, "unused");
      return unused;
}

//RVO should kick in to skip the copying upon return
vector<size_t> const new_perm(vector<size_t> const & perm, size_t unused){
        vector<size_t> newPerm(perm);
        newPerm.push_back(unused);
        //show(newPerm, "newPerm");
        return newPerm;
}
//This algo is considerably more complex than many recursive algos
//we have written recently, largely due to the set_difference()
void next_perm_from_pool_iterative(){
  for(size_t loop=0;loop<9 /*useful during dev to control when to exit*/;++loop){
    if (coll.empty()){
      for(size_t j=0; j<pool.size(); ++j){
        coll.push_back(vector<size_t>(1, j));
        poolIndex.push_back(j);
      }
      // dump("^^^^ after initilization of coll ^^^^");
    }
    assert(!coll.empty());
    size_t const len=coll[0].size();
    assert(loop+1 == len);
    if (len == C) {
      cout<<"Done:)"<<endl;
      return;
    }
    vector<vector<size_t> > newcoll;
    for(int kk=0; kk<coll.size(); ++kk){
      vector<size_t> const & perm = coll[kk];
      vector<size_t> unused(find_unused (perm, len));

      //now unused contains the pool items not in the current perm.
      //Suppose there are 3 unused items, we will create 3 new permutations
      //by appending each one to the current perm
      for(vector<size_t>::iterator it=unused.begin(); it != unused.end(); ++it){
        newcoll.push_back(new_perm(perm, *it));
      }
    }
    coll = newcoll;
    dump("^^^^ end of iteration ^^^^");
  }
}

int main(){
  next_perm_from_pool_iterative();
}

stl iterator as a function declared param

发表于11月 9, 201711月 10, 2017 作者 BinTAN

template<typename T>
void Merge_Sort(vector<T>::iterator begin, vector<T>::iterator end);

This function header won’t compile. I don’t know why. The std library uses template<typename InputItr> , so that’s a easy and more flexible solution:)

See https://stackoverflow.com/questions/18138075/vector-iterator-parameter-as-template-c

mix std::getline() with cin>>

发表于11月 9, 20175月 2, 2019 作者 BinTAN

Useful in coding tests

#include <limits>
using namespace std;

int main(){ //mixing cin>> and getline()
  int N;
  cout<<"enter N:"<<endl; cin>>N;
  cout<<"N = "<<N<<endl;
  cin >> std::ws; //special manipulator1
  string s;
  cout<<"Now enter a line with optional spaces"<<endl;
  getline(cin,s);
  cout<<"s = "<<s<<endl;
}

next_Perm@3boys out@5 #80%

发表于11月 9, 20177月 6, 2019 作者 BinTAN

algo-practice: generate permutation@3, using5distinct chars

Such a skill might be needed in some coding IV sooner or later. Let’s write this in py or c++. Formally,

Q1: generate all permutations of 3, from 5 distinct chars, in any order.
Q2: generate all permutations of 3, from 5 distinct chars, in ascending order. You can sort the 5 chars first.
Q2b: once you have generated one permutation, how do you identify The next?

Note the same solution is a ~~generalization of std::next_permutation()~~, so once you have this solution you have that solution.

How many? 5-choose-3 * 3! = 5!/(5-3)! = 5*4*3 = 60, denoted Answer(3). Paradoxically, Answer(4) == Answer(5) = 120

–algorithm 1 for Q1

1st generate 5 single-char strings;
then for each generate 4 two-char strings. We get 20 strings.
…

–algorithm 1b for Q1: rec(5,3) will call rec(5,2). rec(5,2) has 20 items, each can generate 3 items for rec(5.3), because each item has 2 characters and a void to be filled by the 3 unused chars.

The 20 items returned should be a pair{vector of 2, vector of 3}

This produces a sorted collection:)

See my code https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextPerm%403boysFrom5.py

python^c++for coding drill

发表于11月 9, 201712月 8, 2018 作者 BinTAN

For some coding questions, c++ is not too bad and you get to practice and improve your c++ ECT:

array questions
number questions

However, I feel overall py is significantly simpler, faster for many tasks like

file parsing, without using getline, cin…
string split
converting numbers to/from strings
regex

next_Combo@3 boys out@5 # 100%

发表于11月 8, 20177月 6, 2019 作者 BinTAN

Given, abcde, How many combinations of 3? ~~5-choose-3 = 10 standard formula~~, but let’s implement in py or c++.

It’s useful to define a score for a combination — sort the combination into a string. The string is the score. Now,

Q1: generate all combinations@3 from 5 distinct chars, in ascending score.
Q2 (more importantly) given any combination of 3, generate the next higher combination of 3. Output each combination as a sorted string to keep things clean. Extremely effective constraint and simplification.

1) https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextComboOf3from6boys.py is a decent recursive.

2) fixed-length abbreviation generator also generates exactly the same sequence!

3) Perhaps we can modify the iterative generate random perm@3boys out@5 algorithm:

Key: Scan from end of string to identify position2upgrade. Compare current (i.e.last) position with the unused characters. If can’t upgrade me [eg abe], then move left which is guaranteed to be a lower character. Now compare the new “me” with only the unused characters (in this algo we don’t allow swap within the string) If can’t upgrade “me”, then move left.

If current position is upgradeable, then set it as p2u. Upgrade to the lowest unused character. Then reset all characters on my right and return.

abc
abd
abe
acd
ace
ade //last 2 letters are the max. p2u = 0
bcd
bce
bde
cde

next_Combo@3 boys out@5 #recursive descending

发表于11月 8, 20177月 6, 2019 作者 BinTAN

//This recursive version suffers from stack overflow
//but it's able to print out combinations in Descending order and
//maintains the relative positions between any 2 items
//
//This version reduces vector cloning by growing/shrinking the prefix vector
#include <iostream>
#include <sstream>
#include <vector>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
vector<char> emptyV;

template<typename T> void dump(vector<T> const & p, string const & s){
  cout<<"------------ "<<s<<" ------------ size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
  if (p.size()) cout<<endl;
}
template<typename T> int show(vector<T> const * p, vector<T> const * v = NULL){
  ++ combos;
  stringstream ss;
  for(int i=0; i<p->size(); ++i) ss<<setw(5)<<(*p)[i];
  if (v){
    //cout<<",";
    for(int i=0; i<v->size(); ++i) ss<<setw(5)<<(*v)[i];
  }
  static string last("ZZZZZ");
  string combo=ss.str();
  cout<<"combo: "<<combo<<endl; assert(last >= combo);
  last = combo;
}

template<typename T> int recurs(vector<T> & prefix, vector<T> const & pool){// size_t const suffix ){
  ++calls;
  dump(prefix, "entering ... prefix"); dump(pool, "pool");
  if (prefix.size()            == C) return show(&prefix);
  if (prefix.size()+pool.size()== C) return show(&prefix, &pool);
  assert (!pool.empty());
  vector<T> const newPool(pool.begin()+1, pool.end());
  recurs(prefix, newPool);
  prefix.push_back(pool[0]);
  recurs(prefix, newPool);//this function returns only after all the layer are done
  prefix.pop_back();
}

int main() {
  string tmp = "abcde";
  vector<char> v(tmp.begin(), tmp.end());
  assert(C <= v.size());
  recurs(emptyV, v);
  cout<<calls<<"  calls to the recursive funcion to generate "<<combos<<endl;
}

if q[ print obj1] shows only type name

发表于11月 8, 201711月 11, 2018 作者 BinTAN

print obj1 # by default shows limited content like

<testcases.multitest.result.Result object at 0x7fe7f08fc810>

It’s more useful to see the source code of the obj1’s type definition like “Result” in this case. Here’s how

print testcases.multitest.result.__file__ # assuming …result is a module

(latency) DataGrid^^noSQL (throughput)

发表于11月 8, 20175月 29, 2019 作者 BinTAN

Coherence/Gemfire/gigaspace are traditional data grids, probably distributed hashmaps.
One of the four categories of noSQL systems is also a distributed key/value hashmaps, such as redis
…. so what’s the diff?

https://blog.octo.com/en/data-grid-or-nosql-same-same-but-different/ has an insightful answer — DataGrids were designed for latency; noSQL were designed for throughput.

I can see the same trade-off —

FaceBook’s main challenge/priority is fanout (throughput)
RTS main challenge is TPS measured in messages per second throughput
HFT main challenge is nanosec latency.

For a busy exchange, latency and throughput are both important but if they must pick one? .. throughput. I think most systems designers would lean towards throughput if data volume becomes unbearable.

I guess a system optimized for latency may be unable to cope with such volume. The request queue would probably overflow … lost of data

In contrast, a throughput-oriented design would still stay alive under unusual data volume. I feel such designs are likely more HA and more fault-tolerant.

std::vector insert by position

发表于11月 8, 201711月 25, 2017 作者 BinTAN

myVec.insert(myVec.begin() + INTEGER_OFFSET, DATA);

next_Combo@3 boys out@5 #recursive

发表于11月 8, 20177月 6, 2019 作者 BinTAN

//This recursive version suffers from stack overflow but
//it's able to print out combinations in ascending order and
//maintains the relative positions between any 2 items
//
//This version reduces vector cloning by growing/shrinking
//global objects but with higher complexity
//
//However, global variables actually simplify the logic!
#include <iostream>
#include <sstream>
#include <deque>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
//#define DEBUG
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
string tmp = "abcde";
deque<char> pool(tmp.begin(), tmp.end());
deque<char> prefix;

template<typename T> void dumpDeque(deque<T> const & p, string const & headline){
  cout<<"-- "<<headline<<" -- size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
  cout<<endl;
}
template<typename T> int showCombo(deque<T> const * p){
  ++ combos;
  stringstream ss;
  for(int i=0; i<p->size(); ++i) ss<<setw(5)<<(*p)[i];
  static string last;
  string combo=ss.str();
  cout<<"combo: "<<combo<<endl;
  assert(last <= combo && "should be ascending");
  last = combo;
}

template<typename T> int recurs(){
  ++calls;
#ifdef DEBUG
  cout<<"-------------\nentering "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
#endif
  if (prefix.size() == C) return showCombo(&prefix);
  if (pool.empty()) return 0;
  T poolhead = pool.front(); pool.pop_front();

  prefix.push_back(poolhead); //add poolhead to prefix

  //this 1st recursive function call starts a rather deep call stack and prints
  //all combinations with the given (new longer) prefix
  recurs<T>();//use the longer prefix and the shorter pool
  prefix.pop_back();//restore prefix
  recurs<T>();
  pool.push_front(poolhead); //restore pool, needed by the 2nd call in the parent stack
#ifdef DEBUG
  cout<<"^^^^^^ restored before returning "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
#endif
}

int main() {
  assert(C <= pool.size());
  recurs<char>();
  cout<<calls<<"  calls to the recursive function to generate "<<combos<<endl;
}

next_Combo@3 boys out@5 #iterative complex

发表于11月 8, 20177月 6, 2019 作者 BinTAN

//Without loss of generality, each combination is internally represented
//as a sorted vector (ascending).
//There's one-to-one mapping between such a vector and a combination
#include <iostream>
#include <vector>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
using namespace std;
size_t changes=0, calls=0;
size_t const C=3; //how many in each combination

template<typename ITR> bool isAscending (ITR const b, ITR const end){
  for (ITR last = b, i = b; i!=end; ++i){
    if (*last > *i) {
      cout<<*last<<" should be lower (or equal) but is higher than "<<*i<<endl;
      return false;
    }
  }
  return true;
}
template<typename T> void dump(vector<T> & v,  bool isAssert = true){
  for(int i=0; i<v.size(); ++i) {
    cout<<setw(5)<<v[i];
    if (i == C-1) cout<<"  unused:";
  }
  cout<<endl;
  for(int i=0; i<v.size(); ++i){
    cout<<setw(5)<<i;
    if (i == C-1) cout<<"  unused:";
  }
  cout<<endl<<"---------------------------------"<<endl;
  if(isAssert){
    assert(isAscending(v.begin(), v.begin()+C) && "1st C elements should be ascending after next_combo (not next_perm)");
    assert(isAscending(v.begin()+C+1, v.end()) && "unused section should be ascending");
  }
}

template<typename T> bool reshuffle(vector<T> & v, int p2u){
//      cout<<"after swap"<<endl; dump(v);
        sort(v.begin()+p2u+1, v.end());
//      cout<<"after sorting everyting to the right of p2u"<<endl; dump(v);

        if (p2u == C-1){
                sort(v.begin()+C, v.end());
                ++changes;
                return true;
        }
        assert(p2u<C-1);
        //now reset everything to my right
        //now locate the best_man (next item after the new p2u) .. can use binary search
        for(int i=p2u+1; i<v.size() ; ++i){
          if (i==v.size()){ //p2u is highest possible!
                //cout<<"p2u is already highest"<<endl;
                sort(v.begin()+C, v.end());
                ++changes;
                return true;
          }
          if (v[p2u]<v[i]){
                //cout<<"best man = "<<i<<endl;
                for(int j=0; p2u+1+j<=C-1; ++j){
                  swap(v[p2u+1+j], v[i+j]);
                }
                sort(v.begin()+C, v.end());
                ++changes;
                return true;
          }
        }//for
        // now must return!

        assert(1==0 && "should never reach here");
        cout<<"after best_man search"<<endl; dump(v);
}

// reshuffles vector to the next higher combo
//Assuming 5-choose-3, the 1st 3 chars represent the combination,
//and the remaining characters at the end of the vector are
//unused in the current combination.
template<typename T> bool next_combo(vector<T> & v){
  ++calls;
  dump(v );
  if (v.size() == C) return false; // C-choose-C == 1 !

  for(int p2u=C-1; /*last position*/ p2u >=0 ;--p2u){
    for (int unusedItem=C; unusedItem<v.size(); ++unusedItem){ //scan the unused section of the array
        if (v[p2u] < v[unusedItem]) {
          assert(p2u<unusedItem);
          swap(v[p2u], v[unusedItem]);  //p2u should not change further
        //cout<<"identified "<<p2u<<" as position to upgrade... Will reset subsequent positions, and return"<<endl;
          return reshuffle(v, p2u);
        }
    }
    // no p2u identified yet. move p2u marker to the left
  }//for
  cout<<"no more higher combo. This is the end"<<endl;
  return false;
}
int main() {
//  vector<float> v{111,222,333,444,555,666};
  string tmp = "abcdefg";
  vector<char> v(tmp.begin(), tmp.end());
  assert(C <= v.size());
  for(; calls<9992; ){
        if (!next_combo(v)){
         cout<<changes<<" changes performed till the highest combo; next_combo() call count = "<<calls<<endl;
         return 0;
        }
  }
}

c++concatenate number #stringstream

发表于11月 7, 20173月 3, 2019 作者 BinTAN

–Use stringstream:

stringstream tmpPath;
tmpPath.str(“”); //set to empty string
tmpPath<<tmpFolder << tmpCount++;

cout<<tmpPath.str()<<endl;

complexities: replicate exchange order book #Level1+2

发表于11月 7, 20173月 21, 2018 作者 BinTAN

–For a generic orderbook, the Level 1 complexity is mostly about trade cancel/correct.
All trades must be databased (like our TickCache). In the database, each trade has a trade Id but also an arrival time.

When a trade is canceled by TradeId, we need to regenerate LastPrice/OpenPrice, so we need the ArrivalTime attribute.

VWAP/High/Low are all generated from TickCache.

The other complexity is BBO generation from Level 2.

–For a Level-based Level 2 order book, the complexity is higher than Level 1.

OrderId is usually required so as to cancel/modify.

python run complex external commands #subprocess

发表于11月 7, 20176月 2, 2018 作者 BinTAN

I prefer one single full-feature solution that’s enough for all my needs. The os.system() solution is limited. The subprocess module is clearly superior. One of the simplest features is

>>> subprocess.call([“ls”, “-l”])

If you need redirection and background, then try the single-string version

>>> subprocess.call(‘ls /tmp > /tmp/a.log &’, shell=True) # output goes to STDOUT, hard to capture

c++4beatFronts: which2LetGo if I must pick1

发表于11月 6, 20179月 25, 2020 作者 BinTAN

Q: Which one to let go, given I have limited O2/laser/bandwidth and mental capacity?

give up BP — biggest room for improvement but least hope
give up codility style — Just get other friends to help me. See codility: ignore overflow, bigO

How about pure algo?

already decent? Can improve.
diminishing return? Smaller room for improvement? but I can learn a few key ideas about the G100 questions

##failed cod`IV

发表于11月 6, 20176月 12, 2019 作者 BinTAN

Here I only list the failed ones. There are many successful ones in ##some@the significant coding IV experiences

QQ	BP	ECT speed	syntax	algo	where
?	😦	NA	NA	?	home	c	GS tick-engine
NA	😦	NA	NA	?	home	c	Tetris
?	😦	NA	NA	?	home	c	Macq
😦	😦	NA	NA	NA	home	j	MS-commodity
?	?	NA	NA	😦	home	j	Amazon
😦	?	NA	NA	NA	home	j	MS itr
😦	?	NA	NA	NA	home	j	MS FIX

?	NA	good	good	😦	codility	c/j	Mako
NA	NA	😦	?	?	codility	j	FXoption
NA	NA	😦	?	?	codility	c	Jump #3
NA	NA	😦	slow	good	IDE	c	Bbg 2017
NA	?	😦	slow	good	webex	c	Citadel array-shrink
none	?	😦	good	simple	IDE	j	UBS Suntec

😦	NA	NA	NA	?	paper	c	Nsdq array 0-N
NA	NA	NA	NA	😦	paper	c	Nsdq day-trading
😦	NA	NA	NA	😦	paper	c	FB regex
😦	NA	NA	NA	😦	paper	c	Goog
😦	NA	NA	NA	?	paper	j	Barc SmartOrderRouter
😦 ?	?	NA	NA	NA	paper	j	UBS YACC

😦	NA	NA	NA	NA	paper	c	Bbg London
😦	?	NA	NA	NA	paper	c	Imagine Software

See C++(n java)IV tend to beat us]4fronts.

Q: where is my biggest coding IV weakness? crack the algo? complete in pseudo code? or get the damn thing to compile and work?

take-home tests usually beat me in terms of Best Practice
- possibly give up if I must give up something 😦
speed coding tests usually beat me in terms of ECT speed
- low hanging fruit?
- See ##10 basic programm`constructs for c++ cod`IV
white-board tests usually beat me in algoQQ

c++cod`IV: no threading/boost/template/socket..

发表于11月 6, 201711月 23, 2020 作者 BinTAN

See C++(n java) high-end IV tend2beat us]4ways

This post is about the c++ topics. The BP/ECT/algo fronts primarily use basic constructs in ##10 basic program`constructs for c++ cod`IV

QQ	BP	ECT	algo#pure
heavy	no	never	never	thread
heavy	yes	seldom	never	smart ptr
seldom	never	never	never	other boost
yes	never	never	never	rvr+move
seldom	optional		never	other c++11
heavy	seldom	never	never	memory tricks
yes	never	never	never	templates meta..
heavy	yes	never	never	c^c++
heavy	never	never	never	sockets
never	never	never	once	object graph serialization

heavy	never	never	never	inline
heavy	never	never	never	cpu cache
yes	never	never	never	exception safety
yes	never	never	never	pimpl

FTE^contractor: mgr choosing 1 guy to let go

发表于11月 5, 20175月 27, 2018 作者 BinTAN

There are individual differences, but I now feel majority of manager-decision-makers across U.S. and Singapore do feel a higher resistance to let go an FTE rather than a contractor. Deterrence:

expectation of affected employee that the job is long-term. Most managers won’t ignore your feeling. They are worried about your complaints
reflection on her own record as a manager
- they often prefer an internal transfer
formal performance improvement process is not required for non-performing contractor
severance
official complaints
litigation (in the U.S.)

The deterrences are a form of protection for the FTE, but paradoxically, I hate these deterrences, as they lengthen the slow death.

#1 career SAFETY enhancer@past5Y #muscleBuild`IV

发表于11月 5, 201712月 7, 2018 作者 BinTAN

From 2010 to 2015, “IV muscle building” was my #1 career safety enhancer. That’s why I had so much “joy”. That’s where I invested so much time and money.

MSFM to open “new market” or at least cement my position on the quant-dev job market
- SQL is, intriguingly, similar
c++, c#, py
swing– as a distinct job market segment

Now in 2017 I don’t see it as my #1 career safety enhancer. In retrospect, I find it not very simple to reach a conclusion.

One factor — I think core java skill demand turned out to be extremely robust (whereas c# and c++11 were underwhelming)

One factor — traditional pricing quant is shrinking as a job category

One factor — For both high end c++ and quant domains, bar is much higher than anticipated.

c# static class emulated in java/c++ #all-static

发表于11月 5, 20176月 28, 2020 作者 BinTAN

–c++:

use a (possibly nested) namespace to group related free functions. See google style guide.

c# has static classes. C++ offers something similar — P120 effC++. It’s a struct containing static fields. You are free to create multiple instances of this struct, but there’s just one copy for each field object. Kind of alternative design for a singleton.

This simulates a namespace.

–java:

In [[DougLea]] P86, this foremost OO expert briefly noted that it can be best practice to replace a java singleton with an all-static class

–c# is the most avant-garde on this front

C# static class can be stateful but rarely are
it can have a private ctor

##(IncInc)most effective 10Y direction to INCrease INCome

发表于11月 3, 201712月 14, 2019 作者 BinTAN

Reality! We better embrace reality, not ignore it.

new skills like py, hadoop, data science? I see low chance of increasing my income, but I could be wrong
some specialist role, perhaps
- risk/pricing analytics? questionable premium
- low-latency algo trading in c++/java? unlikely … Get Real
hands-on architect (different from specialist role)
(perhaps maintenance mode) app owner, and grow in a big company?
- must be a high value application
high-end contractor (probably@ibanks)? most practical
coding drill? practical 🙂
portable instrumentation zbs
[p] portable GTD skills (eg instrumentation)
[p] improve localSys learning process to speed up get-over-the-hump
[m] algo practice
socket QQ/GTD? not directly increasing my income but prevent income decline
[m] low level QQ topics in ARM, linux, STL, concurrency
[m=muscle building for iv]
[p=can help me take on a high-paying lead dev role. If too tough, then we can still count on these items to help improve our stress profile; self-esteem; job security; bonus… ]

## 4 obstacles@%%specialist path

发表于11月 1, 201712月 4, 2017 作者 BinTAN

High chance of disappointment
GC is usually required
relatively few contract roles
Singapore have very few specialist roles and mostly very high bar. See tech specializations – no opp2play%%strength]SG

reverse-sort a single std::string

发表于11月 1, 20178月 12, 2019 作者 BinTAN

confirmed — a std::string is not sortable. A STL author (DRM) suggested using a vector of char

sort(vec.begin(), vec.end()); //normal sort
reverse(vec.begin(), vec.end());

//faster method:
sort(vec.rbegin(), vec.rend()); //To reverse-sort any vector

	ptr-ref layering #re…发表在《convert a reference variable i…》
	1330152open⇒发表在《My xx-absorbency[def#1]!=highe…》
	why our coding drill…发表在《## coding IV P/F》
	“hard” l…发表在《FB: spiral number pattern》
	sensitivities = #1 v…发表在《beta ^ rho i.e. correlation co…》