python usage + risk system ] a big buy-side #ChaoJen

Government Investment Corporation is a large buy-side. IT heads are keen about python as a first-class language, used for data analysis, in post-trade risk systems etc. Pandas is the most common module.

Python salary is not always lower than java/c++. My friend Chao Jen said python salary could be high if the team is demanding and deep-pocketed.

Is the GIC risk system mostly written in python? Chao Jen doesn’t know for sure, but he feels c++ may not be widely used in the GIC risk system. Might be a vendor product.

Hackerrank python tests can be hard to get 100% right, but Chao Jen said many get close to 100%.

Advertisements

[14] python (realistic) IV #misc

CV claims — “used py for sys automation”

Q: optimize – how did you optimize perf?
A: I didn’t need to for my scripts.

Q: try/catch + ….?
Q: immutable – what data types are mutable/immutable?
Q: threading – global interpreter lock?
A: never really needed threading
Q: xml – what xml parser did you use? ess477
Q: debugger? A: I didn’t need it

Q: read a config file?

Q: logging?
Q: DB access?
Q: command line arguments stored where? A: sys.argv?
Q78: exit normally? A: sys.exit()
Q78b: normally, “raise Exception” would print to stderr stream, but how do you direct that to a log file instead?
A: set sys.stderr stream. dive205 i.e. [[dive into python]]

Q: how do you handle newline diff between OS? ess114
Q: truth table? e68 (i.e. [[essentialRef]])
Q: how do you edit on windows and deply to linux?
A: samba, ftp. See also the post in pearl
— sys admin
Q: how do you pass input files to your py script? cook539
A: fileinput module is convenient.

Q: how is PYTHONPATH used with sys.path?

Q: another py — how do you execute another py script? Ess89 (i.e. [[essentialRef]])
Q: what command line options do you use often?

Q5: how do you launch an external program?
Q5b: how do you capture its output? c546 [[cookbook]] has a slightly advanced solution
Q5c: how do you call a unix shell command?
A: shutil module meets some of your needs

Q: exit with an error msg? cook540
A: raise SystemExit (‘msg’)

— data structures
Q: diff between listcomp and genexp? How would you choose?
Q: convert between list and tuple? cvc
Q: convert a list of objects to string and concat? see post in recrec
Q: split a string on multiple delimiters? cook37 explains re.split()
Q: iterating in reverse? cook119
==coding IV
Q: how do you print to a file rather than stdout?
A: e115 — print >> anOpenFile, ‘whatever’ # using the print KEYWORD
A: c144 — print (‘whatever’, file=anOpenFile) # using the print() function
Q: concat 2 lists?
Q: initialize a dict
Q: initialize a 2D array? m52 (i.e. [[perl to python migration]])
Q: walk a directory
Q: use a dict as a “seen”
Q: iterate a dict? mig87
Q: iterate a file
Q: interpolate into a string? ess115. c61 i.e. [[cookbook]]
Q: date/time usage (datetime module is OO;  time module is procedural?)
Q: trim both ends of a string? strip()

##minimum python know-how4coding IV

Hi XR,

My friend Ashish Singh (in cc) said “For any coding tests with a free choice of language, I would always choose python”. I agree that perl and python are equally convenient, but python is the easiest languages to learn, perhaps even easier than javascript and php in my opinion. If you don’t already have a general-purpose scripting language as a handy tool, then consider python as a duct tape and Swiss army knife

(Actually linux automation still requires shell scripting. Perl and python are both useful additions.)

You can easily install py on windows. Linux has it pre-installed. You can then write a script in any text editor and test-run, without compilation. On windows the bundled IDLE tool is optional but even easier. For the ECT cycle – see https://stackoverflow.com/questions/6513967/running-python-script-from-idle-on-windows-7-64-bit

I actually find some inconveniences — IDLE uses Alt-P to get previous command. Also copy-paste doesn’t work at all. On Windows The basic python command-line shell is better than IDLE!

For coding tests, a beginner would need to learn

  • String common operations … Regex not needed since many developers aren’t familiar with it
  • list and dict data structures and common operations. A “Set” may be useful occasionally. Tuple not needed.
  • Define simple functions (multiple return values are possible but not required). Recursion is frequently coding-tested.
  • “global” keyword used inside functions
  • if/elif/else; while loop with beak and next. Switch statement doesn’t exit.
  • for-each loop is more useful in coding test, esp. iterating list, dict, string, range(), file content
  • range() and xrange() function – frequently needed in coding test
  • “in” operator on string, list, dict
  • check 2 object have same address
  • check against None
  • · No need to handle exceptions
  • · No need to create classes
    • I think “struct-type” classes with nothing but data fields are useful in coding tests, but not yet needed in my experience.
  • · No need to learn OO features
  • · No need to use list comprehension and generator expressions, though very useful features of python
  • · No need to use lambda, map()/reduce()/filter()/zip(), though essential for functional programming
  • · No need to use import os and sys modules or open files, which are essential for everyday automation scripts

3groupsOf3digits #YiHai

Q: Find three 3-digit numbers with a ratio of 1:2:3;
These numbers pick their 3 digits from a range of 1 to 9;
All digits that form these numbers must be completely unique, i.e. you must use up all distinct 9 digits. For example, a set of 123:246:369 doesn’t qualify.

def failed_to_add(myset, digit): # return False when failed
    return digit == 0 or len(myset) == ( myset.add(digit) or len(myset))

def other_number_failed(myset, num): # returns False by default if everything good
    for digit in(num/100, num/10%10, num%10):
        if failed_to_add(myset, digit):     return True

def check(number1, myset):
    number2 = 2*number1
    if other_number_failed(myset, number2): return 
    number3 = 3*number1
    if other_number_failed(myset, number3): return 
    print number1, number2, number3, myset
    winners.append([number1, number2, number3])

winners=list()
nine_digits = tuple(range(1,10))
for digit1 in (1,2,3):
    last8 = list(nine_digits)
    last8.remove(digit1)
    for digit2 in last8:
        last7 = list(last8)
        last7.remove(digit2)
        for digit3 in last7:
            number1 = digit1*100 + digit2*10 + digit3
            if number1 > 329: break
            check(number1, {digit1, digit2, digit3})
print "-- lucky winners --", winners

my quicksort in python/c++

import random
# partition the given section by fixing the anchor
def partitionUsingFarRight(A, le, ri):
    pivotValue = A[ri] # immutable value
    pivotPos = i = ri
    while True:
        if (i <= le): return pivotPos
        i -= 1
        if A[i] > pivotValue:  
          swap(A, pivotPos-1, i)
          swap(A, pivotPos-1, pivotPos) 
          pivotPos -= 1
          
def partitionUsingFarLeft(A, le, ri): 
    # optional: swap a random object to the far left
    swap(A, random.randint(le, ri), le)
    benchmark=A[le]
    ret = i = le
    while True:
        if i == ri: return ret
        i +=1 #move pointer
        if A[i] < benchmark: # 3-step adjustment
            swap(A, ret+1, i)
            swap(A, ret+1, ret)
            ret +=1
    
def partition(A, le, ri):
    return partitionUsingFarLeft(A, le,ri)
def recurse(A, le, ri): 
    if le>=ri: return
    print 'entering partition()...',
    print(A[le:ri+1]), ' pivotVal =', A[ri]
    anchor = partition(A, le, ri)
    print '...after partition()   ',
    print(A[le:ri+1])
    recurse(A, le, anchor-1)
    recurse(A, anchor+1, ri)
def swap(A, x,y):
    tmp = A[x]
    A[x] = A[y]
    A[y] = tmp
def qsort(A):
    recurse(A, 0, len(A)-1)
    print(A)
 
qsort([222,77,6,55,3,11,5,2,88,9,66,22,8,44,1,33,99,7,66])

Above is py, below is c++


#include <iostream>
#include <vector>

std::vector<int> A{77, 11, 66,22,33,99,44,88, 77, 55, 0};
int const size = A.size();

void dump(int l=0, int r=size-1) {
	for (int i = l; i <= r; ++i)
		std::cout << A[i] << ' ';
	std::cout <<std::endl;
}

template <typename T>
void swap(int pos1, int pos2) {
	if (A[pos1] == A[pos2]) return;
	T tmp = A[pos1];
	A[pos1] = A[pos2];
	A[pos2] = tmp;
}

/*partition the region [l,r] such that all elements smaller than
pivotValue are on the left of pivotPosition
*/
template <typename T>
int partitionUsing1st(int l, int r) {
	T const pivotVal = A[l];
	int pivotPos = l;
	for (int i = l+ 1; i <= r; ++i) { 
		if (A[i] >= pivotVal) continue;
		swap<int>(pivotPos + 1, i);
		swap<int>(pivotPos + 1, pivotPos);
		++pivotPos;
	}
	return pivotPos;
}
template <typename T>
int partitionUsingLast(int l, int r) {
	T const pivotVal = A[r];
	int pivotPos = r;
	for (int i = r - 1; i >= l; --i) {
		if (A[i] <= pivotVal) continue;
		swap<int>(pivotPos - 1, i);
		swap<int>(pivotPos - 1, pivotPos);
		--pivotPos;
	}
	return pivotPos;
}
/*based on [[Algorithms unlocked]], should but doesn't minimize swaps!
Lime zone -- items smaller than pivot value
Red zone -- items bigger than pivot value
Ugly zone -- items yet to be checked
*/
template <typename T>
int partitionMinimalSwap(int le, int ri) {
	T const pivotVal = A[ri];
	// two boundaries exist between zones
	int redStart = le;
	// we start with redStart == uglyStart == l, which means item at le is Unknown
	for (int uglyStart = le; uglyStart < ri; ++uglyStart) {
		if (A[uglyStart] < pivotVal) {
			swap<int>(uglyStart, redStart);
			redStart++;
		}
	}
	swap<int>(ri, redStart);
	return redStart;
}

template <typename T>
void recurse(int l, int r) {
	if (l >= r) return; // recursion exit condition
	int const anchor = partitionMinimalSwap<T>(l, r);
	recurse<T>(l, anchor-1);
	recurse<T>(anchor+1, r);
}

int main() {
	recurse<int>(0, size-1);
	dump();
	return 0;
}

python: routine^complex tasks

XR,

Further to our discussion, I used perl for many years. 95% of my perl tasks are routine tasks. With py, I would say “majority” of my tasks are routine tasks i.e. solutions are easy to find on-line.

  • routine tasks include automated testing, shell-script replacement, text file processing, query XML, query various data stores, query via http post/get, small-scale code generation, simple tcp client/server.
  • For “Complex tasks” , at least some part of it is tricky and not easily solved by Googling. Routine reflection / concurrency / c++Integration / importation … are documented widely, with sample code, but these techniques can be pushed to the limit.
    •  Even if we just use these techniques as documented, but we combine them in unusual ways, then Google search will not be enough.
    • Beware — decorators , meta-programming, polymorphism, on-the-fly code-generation, serialization, remote procedure call … all rely on reflection.

When you say py is not as easy as xxx and takes several years to learn, I think you referred to complex tasks.

I can see a few reasons why managers choose py over java for certain tasks. I heard there are a few jvm-based scripting languages (scala, groovy, clojure, jython …) but I guess python beats them on several fronts including more packages (i.e. wheels) and more mature, more complete and proven solutions, familiarity, reliability + wider user base.

One common argument to prefer any scripting language over any compiled language is faster development. True for routine tasks. For complex tasks, “your mileage may vary”. As I said, if the software system requirement is inherently complex, then implementation in any language will be complex. When the task is complex, I actually prefer more verbose source code — possibly more transparent and less “opaque”.

Quartz is one example of a big opaque system for a complex task. If you want, I can describe some of the complex tasks (in py) I have come across though I don’t have the level of insight that some colleagues have.

When you said the python debugger was less useful to you than java debugger, it’s a sign of java’s transparency. My “favorite” opaque parts of py are module import and reflection.

If python (or any language) has excellent performance/stability + good on-line resources [1] + reasonable library of components comparable to the mature languages like Java/c++, then I feel sooner or later it will catch on. I feel python doesn’t have exactly the performance.

[1] documentation is nice-to-have but not sufficient. Many programmers don’t have time to read documentation in-depth.

[[automate the boring stuff with python]]

This book teaches just enough python features for the purpose. All

non-essentials are left out.

–sub chapter on selenium

the easiest way to use selenium and drive a browser, IMO.

–Chapter on Excel

text file spreadsheet converter

merge/unmerge

setting font in a cell

–Chapter on PDF:

combine select pages from many files

—-

Chapter on CSV + json

Chapter on task scheduler

Chapter on keyboard/mouse control — powerful

c++string tasks: IV+GTD

(Let’s be imprecise here… Don’t sweat the small stuff.)

We should be able to perform all of these using c-string, std::string (limited adoption since c++98), the standard string in java , c#, perl, python, php. This is a master list. Tolerate multiple names on Each task.

See basic tasks on https://bintanvictor.wordpress.com/2018/01/26/22tasksarraystrdictq-algoiv/

–STL
use string iterator with STL algorithms

–the easy
toUpper

–advanced
convert to vector and apply vector tricks
convert to std::string and apply tricks
equalsIgnoreCase
compareIgnoreCase
lastIndexOf
count how many times a substr occurs
sort content
endsWith

python – some relatively innovative features

I’m relatively familiar with perl, java, c++, c# and php, though some of them I didn’t use for a long time.

IMO, these python features are kind of unique, though other unknown languages may offer them.

* decorators
* list comp
* more hooks to hook into object creation. More flexible and richer controls
* methods and fields are both class attributes. I think fundamentally they are treated similarly.