## defaultdict tricks #matrix

  • defaultdict(list) — is one of the most used. I think “list” is a function pointer.

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/commonAncestor_Indeed.py demos a few time-saving tricks with defaultdict

  • mydict[key] # without printing or using the return value … would trigger the dictionary to construct a default value for this key IFF missing
  • from key/value pairs, build dict {key -> valueList}

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py shows

  • defaultdict(int) as a frequency counter.
    • mydict[key] += 1 #works even when key was previously missing in mydict
  • defaultdict(lambda: ‘_’).
    • mydict[(row,col)] can implement a matrix. negative (or other invalid) indices results in default value of string ‘_’. I believe dict key can be tuple but not list.
    • In contrast, the 2D array interprets negative indices as wrap-around !
  • defaultdict(lambda: [[],[]]) creates a list of 2 small lists for any “invalid” key. In contrast defaultdict(list) will create a monolithic list which hits runtime error when you access element of the nested list i.e. dict[someKey][0][anyIndex]

hash table of integer_ratios #slope

It’s not possible to store arbitrary ratios as floating point values in a hash table. We hit this problem when matching line slopes.

https://docs.python.org/2/library/fractions.html#fractions.gcd provides a solution

  • the gcd(int1 int2) function finds the greatest common divisor of two integers. This is the key to our problem.
  • convert a string like ‘x/y’ to a simplified Fraction object
  • convert a decimal 
  • convert a float 

I think other languages also provide the same gcd functionality.

python bisect #cod`IV

The bisect module is frequently needed in coding tests esp. codility. In this write-up, I will omit minor function parameters.

* bisect.bisect_right(x)  # returns a position “i” after the last hit, if any, such that
all values are <= x from a[lo] to a[i-1]) for the left side, and
all values are > x from a[i] to a[hi-1]) for the right side. So the “hit” items are on the left, unconditionally 🙂
* bisect.bisect_left(x) # returns an index i before or on the first hit:
all values are < x from a[lo] to a[i-1]) for the left side, and
all values are >= x from in a[i] to a[hi-1]) for the right side.

That’s too long-winded. My sound bytes:

  • bisect_left(needle) returns the first index above or matching needle.
  • bisect_right(needle) returns the first index above needle, unconditionally.

A few scenarios:

  1. If No perfect hit, then same value returned by both functions.
    • Common scenario: if needle is higher than all, then “i” would both be the last index + 1.
    • Common scenario: if the needle is lower than all, then “i” would both be 0
    • in all cases, You can always insert Before this position
  2. If you get perfect hits, bisect_left would return the first “perfect” index, so bisect_left() is more useful than bisect_right(). I feel this is similar to std::lower_bound
    • This is confusing, but bisect_right() would return surpass the last perfect index, so the returned “i” value is higher. Therefore, bisect_right() would never return the “perfect” index.
  3. If you have a lower-bound input value (like minimum sqf) that might hit, then use bisect_left(). If it returns i, then all list elements qualify from i to end of list
  4. If you have an upper-bound input value that might hit, then use bisect_left(). If it returns i, then all list values qualify from 0 to i. I never use bisect_right.
  5. Note the slicing syntax in python a[lo] to a[i-1] == a[lo:i] where the lower bound “lo” is inclusive but upper bound “i” is exclusive.

See demo code in https://github.com/tiger40490/repo1/blob/py1/py/88lang/bisectUsages.py

collections.Counter == unordered_multiset

From Kyle

from collections import *
c = Counter()              # a new, empty counter
c = Counter('gallahad')    # a new counter from an iterable

c = Counter(a=4, b=2, c=0, d=-2)
list(c.elements()) #['a', 'a', 'a', 'a', 'b', 'b']
Counter('abracadabra').most_common(3) #[('a', 5), ('r', 2), ('b', 2)]

sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
c.most_common()[:-n-1:-1]       # n least common elements
c += Counter()                  # remove zero and negative counts


void list.reverse() ^ reversed()→iterator #sorted(),items()

These three constructs do three different things, without overlap

  • mylist[::-1] returns a reversed clone
    • myStr[::-1] works too
  • mylist.reverse() is a void in-place mutation, same as mylist.sort()
  • reversed(mylist) returns an iterator, not printable but can be used as argument to some other functions
    • I feel this can be convenient in some contexts.
    • reversed(myStr) works too but you can’t print it directly


  • Unlike items(),dict.iteritems()return iterator

Different case:

  • sorted(any_iterable) returns a new vector (i.e. “list”). Returning an iterator would be technically stupid.

pass generator output to next generator

I think this technique can be extremely time-saving in coding tests.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/1fromEachSet.py my code demos:

for myset in pool:     output = list(gen(output, myset))

The gen() function uses yield. For the first call to gen(), we exhaust all of its items and save into a list named “output”.

Then we pass this list into the second gen(), this time with a different myset

python closure and global variables

One way to minimize global variables is converting a regular function into nested functions. Nested functions can automatically READ local variables (like rootNode) of enclosing function outer(). No complication. This follows the usual LGB rule.

On the other hand, if nested_func() needs to rebind outer() function’s rootNode variable to another Node object, then we need to declare “global rootNode” in both nested_func() and outer (). This is tested in https://github.com/tiger40490/repo1/tree/py1/py/88lang. This “partial-global” variable is NOT used outside outer().

Another way to avoid global variables — call mutator methods on the variable, rather than reseating/rebinding. Best example is list or dict objects. Inside nested_func() , if I were to rebind myDict to an empty dict, then it has no effect on the myDict in outer(). Hence “global” needed. The alternative is to clear the content of myDict and populating it. The id(myDict) value remains unchanged. This is the standard java idiom.

master The yield-generator

https://github.com/tiger40490/repo1/tree/py1/py/algo_combo_perm uses python q[ yield ] to implement classic generators of

  • combinations
  • permutations
  • abbreviations
  • redraw

Key features and reasons why we should try to memorize it

  1. very brief code, not too dense (cryptic), .. helps us remember.
  2. reliability — brief means fewer hiding place for a bug. I added automated tests for basic quality assurance. Published solution
  3. versatile — one single function to support multiple classic generators.
  4. yield functions’ suspend-resume feature is particular powerful and possibly a power drill. This is my first opportunity to learn it.
  5. instrumentation — relatively easy to add prints to see what’s going on.
  6. halo — yield-based generator functions are a halo and also a time-saver
  7. elegant — brief, simple, relatively easy to understand
  8. recursive — calling itself recursively in a loop therefore fairly brief but powerful
  9. useful in take-home assignments
  10. identity-aware — The second time you call myYieldFunc(44), it would usually return the same generator object as the first time you call it with that same argument (i.e. 44). If that generator object has reached end of it’s execution, then you won’t get any more output.

— How I might try to memorize it

  1. If needed, we just need to reproduce it quickly from memory a few times
  2. I added comments to help me understand and memorize it

custom sort ] python: scenarios

Note python 3 (and python 2) favors the “key” parameter which names a single-arg function that returns a single comparable value for each list item. (The list items could be otherwise unsortable.) This technique is supposed to be versatile, but I find it restrictive and unnatural.

— alternative to “key”: define a two-arg comparitor function

https://docs.python.org/2.7/howto/sorting.html has the “official” documentation on ‘cmp’ parameter, which is similar to java/c++:

>>> def numeric_compare(x, y):
...     return x - y
>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare)

https://stackoverflow.com/questions/5213033/sort-list-of-list-with-custom-compare-function-in-python has another simple example.

–list.sort() vs sorted(iterable)

https://www.programiz.com/python-programming/methods/list/sort explains that

  • list.sort() is a void mutator method
  • sorted() returns an new list

— sort instances of MyType

https://thepythonguru.com/python-builtin-functions/sorted/ shows

sorted(emp_list, key=lambda x: x.age) # sort Employee objects by age

— sort key as a lambda:

sorted(paths, key=lambda x: x[“cost”]) # sort the paths by looking up their costs.

— special sort

def reorderLogFiles(self, logs): #from kyle
  def f(log):
      id_, rest = log.split(" ", 1) # split into 2 items
      return (0, rest, id_) if rest[0].isalpha() else (1,)
return sorted(logs, key = f)

OrderedDict == LinkedHashMap

collections.OrderedDict can be useful in some coding interviews. As Rahul pointed out, interviewers are likely interested in your skill implementing it.

  • similarly preference — http endpoint skill is less valued than socket programming skill
  • Counterexample — interviewers are seldom interested in how you implement a priority queue.

In all cases, some knowledge of actual implementation is a small halo, as Venkat shows


doubly-linked circular list, as explained on https://stackoverflow.com/questions/33748340/how-does-pythons-ordereddict-remember-elements-inserted, and in the source code link

http://code.activestate.com/recipes/496761-a-more-clean-implementation-for-ordered-dictionary/ shows a non-standard implementation, but del is not O(1)

python slicing q( [] )ALWAYS generate clone

This is crucial knowledge for ECT coding test

See mystr[:-2] copy_truncate last 2 chars #mystr[2:]

See copy_reverse list/string/tuple by [::-1]

See python initialize list/dict : perf

Among the various list cloning solutions on https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list:

  • slicing is consistent with string/tuple
  • list(oldList) is similar to java API
  • copy.deepcopy is useful across many containers

break out of N layers of loop

–Most flexible solution — use a wrapper function to hold the inner n layers. Use return to break out of N layers.

As a bonus, you can also return a value like “break-out with a value”

If there’s another layer outside those N layers and you are half way through that iteration and want to keep going, then in that loop call the function

–In some coding test, it’s more succinct to say

if some_break_condition: print_result(); sys.exit()

For the semicolon usage, See https://stackoverflow.com/questions/8236380/why-is-semicolon-allowed-in-this-python-snippet

addicted to c++for()loop@@

I find myself thinking in c++ while writing python for loops. Inefficient.

for i in range(0,len(mystrLiTup),1): # explicit is best

What if you need to adjust the index “i” in the loop? A transitional construct (until we get rid of c++ thinking):

mystrLiTup='sentence. with. periods. end'
while True:
  i+=1 # increment explicitly
  if i >= len(mystrLiTup): break
  print mystrLiTup[i]
  if mystrLiTup[i] == '.':

mystr[:-2] COPY_truncate last 2 chars #mystr[2:]

  • Python string (tuple too) is immutable, so mystr[:-2] returns a copy with last 2 chars truncated
  • Even for a mutable list, this slicing syntax returns a copy.
  • …. This seems to be the best syntax to truncate list, string and tuple.

See https://www.dotnetperls.com/slice-python

— how about

mystr[2:] # As expected, this clones then chops First 2 chars

— If you want to truncate a list in-place, use the q(del) keyword (not a function)

Syntax is easy for single-element deletion. Tricky for slice deletion

list tuple str comment
del immutable in-place truncate?
var[:-2] tested copy_truncate LAST 2
var[2:] tested copy_truncate FIRST 2

StopB4: arg to range()/slice: simplistic rule

I believe range() and slicing operators always generate a new list (or string)

If you specify a stopB4 value of 5, then “5” will not be produced, because the “generator” stops right before this value.

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If StopB4 is 5 then five integers are generated. If used in a for loop then we enter the loop body five times.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so StopB4 is a used in something like “if generated_candidate >= StopB4 then exit before entry into loop body”

Code below proves slicing operator is exactly the same. See https://github.com/tiger40490/repo1/blob/py1/py/slice%5Erange.py

word[:2]    # The first two characters, Not including position 2
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

python queues #deque

In a coding test, I will use a vanilla list for both.

https://docs.python.org/2/tutorial/datastructures.html confirms that list can be an efficient stack and an inefficient queue.

— Stack can use

  • list.append()
  • list.pop() with no arg.
  • Top of stack is list[-1]

— Queue:  https://github.com/tiger40490/repo1/blob/py1/py/tree/bTreeDftBftSerialize_bbg.py shows an primitive (inefficient) Queue class I wrote:

  • dequeue — list.pop()
    • A non-queue operation — pop(2) would remove and return the 3rd vector item but this is inefficient on a vector!
  • enqueue — list.insert(0, newItem) — similarly inefficient

A deque or circular array (fixed capacity) are more efficient for a queue. Linked list is slower due to frequent allocations.

[[python cookbook]] P659 compared these and other alternatives. The fastest is collections.deque.

2 uses@q[del] ] python


  • Usage 1: in-place delete from a container — dict and list (usually inefficient)
    • You can also slice-delete from a list
    • You can even del mylist[start:end:stride]
  • Usage 2: remove a local(or global) name such as MyClass, myVar
    • the name is erased from the symbol table
    • dir() will no longer show it
    • usage? metaprogramming might use runtime introspection on dir()

python random-generate/shuffle list@int

I now think the simplest is xrange() followed by random.shuffle. Don’t bother with random.sample(). See my github code.

https://stackoverflow.com/questions/22842289/generate-n-unique-random-numbers-within-a-range shows

https://github.com/tiger40490/repo1/blob/py1/py/array/qsort.py uses

random.sample(xrange(99, 100), 19)

In general, below construct is useful because sampleSize must never exceed size of the range i.e. the population:

random.sample(xrange(-99, -99+sampleSize), sampleSize)

I think the generated list has no duplicates, so I had to manually create some duplicates. I guess random.shuffle can work on a list containing duplicate…

–to generate an array of 8 integers between 0 and 100

>>> import random
>>> random.sample(xrange(100), 8)
[39, 53, 1, 80, 54, 61, 4, 26]

3groupsOf3digits #YH #li.remove()..

http://entrepidea.com/blogs/tech/index.php/2017/06/02/three-3-digit-numbers/ is the original blog post by my friend

Q: Find three 3-digit numbers with a ratio of 1:2:3;
These numbers pick their 3 digits from a range of 1 to 9;
All digits that form these numbers must be completely unique, i.e. you must use up all distinct 9 digits. For example, a set of 123:246:369 doesn’t qualify.

https://github.com/tiger40490/repo1/blob/py1/py/miscIVQ/3groupsOf3digits.py is my solution, not necessarily smart.

Shows list.remove(anyValue)
shows default return value is None
shows sys.exit()
shows no main() needed, to save screen real estate

I see two distinct constraints.
* The obvious — number2 and number3 must be 2x and 3x and they must use the remaining 6 digits.
* The other constraint — number1 must use 3 distinct digits.

Real challenge is the first step — iteration to generate the first 3-digit number without checking the 2x and the 3x numbers. My implementation of it is basically the code outside the functions.

Now I think the simpler iteration is to take each numbers 123..329 and check each the number the same way now. I would reuse the “check” routine — better code reuse. Performance would only be significantly slower when there are many many disqualified integers between 123 and 329 i.e. a sparse array.

In contrast my iteration won’t waste time on those disqualified. So I consider it a little bit clever. My iteration took 20-40 minutes to write. Once this is done, the check on the 2x and 3x is much simpler.

python initializing list^dict differently #ECT

Favor {} to dict() due to performance. Also dict() can’t take numbers as keys 😦

However, for simple dict, dict() avoid quoting

pprint({1:’value1′,   3+2:’value2′})

Empty [] is faster than list() but I prefer list() because

See https://stackoverflow.com/questions/5790860/and-vs-list-and-dict-which-is-better

##ECT tips@python triple quote #print

  • 3-single-quote or 3-double-quote are the  same thing.  Single is faster to type 🙂
  • trick: You can print a triple-quoted multi-line string
  • trick: I also use this for multi-line comments, as a standard practice — https://www.python.org/dev/peps/pep-0008/#documentation-strings
  • trick: You can construct formatted multi-line strings from a list of value, or from a dict such as locals()

python ‘global myVar’ needed where@@

Suppose you have a global variable var1 and you need to “use” it in a function f1()

Note Global basically means module-level. There’s nothing more global than that in python.

Rule 1a: to read any global variable in any function you don’t need “global”. I think the LGB rule applies.

Rule 1b: to call a mutator method on a global object, you don’t need “global”. Such an object can be a dict or a list or your custom object. Strings and integers have no mutators!

Rule 2: to rebind the variable to another object in memory (i.e. pointer reseat), you need “global” declaration to avoid compilation error.

Note you don’t need to declare the variable outside all functions. Just declare it global inside the rebinding functions

python dict bad lookup key: solutions

A common scenario. myDict[‘baaadKey’] throws exception. https://stackoverflow.com/questions/3483520/use-cases-for-the-setdefault-dict-method compared the solutions.

My use case — if any key is not found, return a const default value.

— 1) Solution: defaultdict class. To my surprise, my use case is not easily supported. Need to define a factory method.

— 2) Solution: mydict.setdefault(myKey, myDefault) of the regular dict class. Note this solution is similar to the get() solution below, and it does NOT set a single default for the entire dict.

https://www.programiz.com/python-programming/methods/dictionary/setdefault explains setdefault() with clear examples, but I can’t remember the multiple rules so I won’t use setdefault in coding tests.

— 3) simplest Solution: mydict.get(myKey, myDefault).


common string tasks: IV+GTD

(Let’s be imprecise here… Don’t sweat the small stuff.)

We should be able to perform all of these using c-string, std::string (limited adoption since c++98), the standard string in java , c#, perl, python, php. This is a master list. Tolerate multiple names on Each task.

See basic tasks on https://bintanvictor.wordpress.com/2018/01/26/22tasksarraystrdictq-algoiv/

use string iterator with STL algorithms

–the easy

convert to vector and apply vector tricks
convert to std::string and apply tricks
count how many times a substr occurs
sort content

convert sequence@obj→strings→concat #listCompr

P60 [[cookbook]] shows a neat trick

>>> data = [‘aa’, 50, 91.1]
>>> ‘, ‘ . join(str(d) for d in data)
‘aa, 50, 91.1’

Technique: generator expression,
Technique: str() conversion ctor. Without conversion, joining string with non-string leads to exceptions.
Technique: calling join() method on the delimiter string

The author points out that string concat can be very inefficient if you blindly use “+” operator. Similarly, java/dotnet offers stringBuilders, and c++ offers stringstream

python pseudo constructors4everyday tasks #[]^()

These Python builtin functions have something in common:

* pseudo [1] constructors — manufacture an object of the specified type
* conversion constructors — converting some input value into an object of the specified type

[1] Actually builtin functions rather than ctor. I guess the fine differences between builtin functions, keywords and operators are not that important at this stage.

Because these are functions, they use “()”. The real list/dict ctors use other brackets.

P64 [[essential ref]] lists these and more, as “type conversion” functions.

– str()
– dict()
– list() — see py list initialize []^list()
– tuple()
– set()
– int()

– file() — very similar to open()

##python – convert btw dict/tuple/list and string

* any dict/tuple/list ==> serialize to string? use repr() or backquote, or str()
** how about deserialize from string? use eval()

list/tuple ==> dict? dict(enumerate(li))
[key,val,key,val…] list/tuple ==> dict? dict() ctor
list/tuple ==> list? list() ctor
list/tuple ==> tuple? tuple() ctor
frozenset ==> set? set() ctor

— in real projects, half the non-trivial data conversions involves dict —
dict ==> list of keys? myDict.keys()
dict ==> list of values? myDict.values()
dict ==> list of pairs? myDict.items()

random list-of-pairs ==> various “roll-up” dictionaries? See defaultdict(list) and defaultdict(set) on http://docs.python.org/release/2.5.2/lib/defaultdict-examples.html

colon^semi-colon for python compound statement

Python does use semi-colon but not curly braces in compound statements.

if condition: do_something(); do_something_else() # is same as

if condition:

https://stackoverflow.com/questions/8236380/why-is-semicolon-allowed-in-this-python-snippet explains semicolon using official docu

python for-loop: string,file,dict,args,dir …

Following the Unix philosophy, Python’s for-in loop is a simple idea (iterator) pushed to the max. It supports
– iterating chars in a ….. string
– iterating lines in a …. file
– iterating integers in a range() or xrange()
– iterating …. keys() values() in dict —
– iterating …. key/value pairs in a dict.items()
– iterating sys.argv on command line
– list, tuple
– retrieving pairs from list-of-pairs — idiom ## this example also illustrates defaultdict(list)

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list) # construct empty defaultdict .. "list" is some mysterious default_factory
>>> for k, v in s:
...     d[k].append(v)
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

More generally, any class supporting iteration can use for-loop. Here’s an example illustrating some of them.

import re, sys
from os import walk
for (path, dirs, files) in walk(“c:\\”) :
for filename in files :
if not re.search(“\.py$”,filename) : continue
if not printed.has_key(path):
print ” path = ” + path
printed[path] = True

for line in open (path+’\\’+filename) :
if re.search(‘^\s*class\s’, line) : print filename + ‘:\t’ + line,

clear a python list: slice-delete vs create empty list

— based on http://www.daniweb.com/software-development/python/threads/117599

data = []
temp = []
for x in range(2) :
    # temp list populated from a file
    #### now we need to empty temp
    temp = []  # reseating, may need “global”
    # del temp[:]  # bug

Look at the last line. What’s going on? Well, temp=[] would Instantiate a new list object, disconnected to the lists already saved as data[0]. Java{temp=new List(); }. (C++ would treat it as assignment but here treated as rebinding). Variable “temp” rebinds to the new object. The reference count on the original list object drops to 1 since only data[0] points to it

But del temp[:] is different. This is same as java{ temp.del(…); } or temp.empty_myself_set_length_to_zero() i.e. _edit_in_place_. Variable “temp” remains bound to the same object as before. Therefore, the list already saved in data[0] is now emptied! Both data[0] and temp are nicknames for the same list object, whose reference count remains 2. Alternatively,

temp[:] = [] # same as del but more versatile — “overwrite_entire_list
java{ temp.replace(range_start ,range_end , new_content); }