iterating chars in a string

for i in range(len(myString)) # is more versatile as I can access the index and also change each char

• a Node class
• a dumper function for one node, showing prev/next links, even if unset
• a dumper function for entire list

A homemade implementation is more flexible. Node1.next -> Node2 but Node2.prev may not point to Node1

I find myself thinking in c++ while writing python for loops. Inefficient.

for i in range(0,len(mystrLiTup),1): # explicit is best
mystrLiTup[i]...

What if you need to adjust the index “i” in the loop? A transitional construct (until we get rid of c++ thinking):

mystrLiTup='sentence. with. periods. end'
i=-1
while True:
i+=1 # increment explicitly
if i >= len(mystrLiTup): break
print mystrLiTup[i]
if mystrLiTup[i] == '.':
i+=1

mystr[:-2] copy_truncate last 2 chars #mystr[2:]

• Python string (tuple too) is immutable, so mystr[:-2] returns a copy with last 2 chars truncated
• Even for a mutable list, this slicing syntax returns a copy.
• …. This seems to be the best syntax to truncate list, string and tuple.

mystr[2:] # As expected, this clones then chops first 2 chars

— If you want to truncate a list in-place, use the q(del) keyword (not a function)

Syntax is easy for single-element deletion. Tricky for slice deletion

 list tuple str del immutable in-place truncate? var[:-2] tested remove_copy LAST 2 var[2:] tested remove_copy FIRST 2

STOP arg to range()/slice: simplistic rule

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If STOP is 5 then five integers are generated. If used in a for loop then we enter the loop body five iterations.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so STOP is a used in something like “if generated_candidate >= STOP then exit before entry into loop body”

I think slicing operator is exactly the same. See https://github.com/tiger40490/repo1/blob/py1/py/slice%5Erange.py

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

minimal queue ] python #stack=ez

In a coding test, I will use a vanilla list for both.

— Stack can use

• list.append()
• list.pop() with no arg.
• Top of stack is list[-1]

— Queue:  https://github.com/tiger40490/repo1/blob/py1/py/tree/bTreeDftBftSerialize_bbg.py shows an primitive (inefficient) Queue class I wrote:

• dequeue — list.pop()
• A non-queue operation — pop(2) would remove and return the 3rd vector item but this is inefficient on a vector!
• enqueue — list.insert(0, newItem) — similarly inefficient

A deque or circular array (fixed capacity) are more efficient for a queue.

Many details I tend to forget:

• the actual data can be either a tuple or dict. The dict version is powerful but less known
• if there’s just one item in the tuple, you don’t need to parenthesis —  myStr = “%d” % var1
• for a tuple, the format specifier count must match the tuple length
• for a dict, each format specifier must name a valid key value.
myStr = "%(var1)d" % locals()) # locals() returns a dict including var1
• There are at least two q(%) in the above expression
• extra parentheses after the 2nd % are required:

“#%03d” % (id(node)%1000)  # return last 3 digit of object id, with 0-padding

choose python^c++for cod`IV

1) Some hiring teams have an official policy — focus on coding skills and let candidate pick any language they like. I can see some interviewers are actually language-agnostic.

2) 99% of the hiring teams also have a timing expectation. Many candidates are deemed too slow. This is obvious in online coding tests. We all have failed those due to timing. (However, I guess on white-board python is not so much faster to code.)

If these two factors are important to a particular position, then python is likely better than c++ or java or c#.

• Your code is shorter and easier to edit. No headers to include.
• Compiler errors are shorter
• c++ pointer or array indexing errors can crash without error message. Dynamic languages like python always give an error message.
• STL iterator can become end() or invalidated. Result would often look normal, so we can spend a long time struggling a hidden bug.
• Edit-Compile-Test cycle is reduced to Edit-Test
• no uninitialized variables
• Python offers some shortcuts to tricky c++ tasks, such as
1. string: split,search, and many other convenient features. About 1/3 of the coding questions require non-trivial string manipulation.
2. vector (a.k.a List): slicing, conversion(to string etc), insertion, deletion…, are much faster to write. Shorter code means lower chance of mistakes
3. For debugging: Easy printing of vector, map and nested containers. No iteration required.
4. easy search in containers
5. iterating over any container or iterating over the characters of a string — very easy. Even easier than c++11 for-loop
6. Dictionary lookup failure can return a default value
7. Nested containers. Basically, the more complex the data structure , the more time-saving python is.
8. multiple simultaneous return values — very very easy in python functions.
9. a python function can return a bool half the times and a list otherwise!

If the real challenge lies in the algorithm, then solving it in any language is equally hard, but I feel timed coding tests are never that hard. A Facebook seminar presenter emphasized that across tech companies, every single coding problem is always, always solvable within the time limit.

next_Perm@3boys out@5 #80%

algo-practice: generate permutation@3, using5distinct chars

Such a skill might be needed in some coding IV sooner or later. Let’s write this in py or c++. Formally,

Q1: generate all permutations of 3, from 5 distinct chars, in any order.
Q2: generate all permutations of 3, from 5 distinct chars, in ascending order. You can sort the 5 chars first.
Q2b: once you have generated one permutation, how do you identify The next?

Note the same solution is a generalization of std::next_permutation(), so once you have this solution you have that solution.

–algorithm 1 for Q1

• 1st generate 5 single-char strings;
• then for each generate 4 two-char strings. We get 20 strings.

–algorithm 1b for Q1: rec(5,3) will call rec(5,2). rec(5,2) has 20 items, each can generate 3 items for rec(5.3), because each item has 2 characters and a void to be filled by the 3 unused chars.

The 20 items returned should be a pair{vector of 2, vector of 3}

This produces a sorted collection:)

[] in python vs ( )

Bracket [[[[ ]]]] is used for
– fast list initialization
– access a dict/tuple/str/list including ..
– slicing — for any sequence structure

Parenthesis (((( ))) is used for
* tuple initialization
* Not needed for pair assignment

python initializing list/dict : perf

Empty {} is faster than dict()

Empty [] is faster than list() but I prefer list() because

python generate list@random integers

https://stackoverflow.com/questions/22842289/generate-n-unique-random-numbers-within-a-range has a one-liner. I implemented it in

https://github.com/tiger40490/repo1/blob/py1/py/array/qsort.py

random.sample(xrange(99, 100), 19)

random.sample(xrange(-99, -99+sampleSize), sampleSize)

I think the generated list has no duplicates, so I had to manually create some duplicates.

python triple quote

• trick: You can print a triple-quoted multi-line string
• I also use this for multi-line comments, as a standard practice — https://www.python.org/dev/peps/pep-0008/#documentation-strings
• trick: You can construct formatted multi-line strings from a list of value, or from a dict such as locals()
• 3-single-quote or 3-double-quote are the  same thing

python list append #3ways

Note there could be some useful python tips in the pearl blog. Perhaps it’s better to move the z_py posts there.

li + [1,2,3] # can be used directly in a for loop

### all solutions below return None!

li.append(1) # one argument exactly. To append more than one, use extend:

li.extend( [1,2] )

li.extend(list2)

python bisect #cod`IV

The bisect module is frequently needed in coding tests esp. codility. In this write-up, I will omit all other function parameters.

* bisect.bisect_right(x)  # less useful … returns an index i such that
all(val <= x for val in a[lo] to a[i-1]) for the left side and all(val > for val in a[i] to a[hi-1]) for the right side.
* bisect.bisect_left(x) # returns an index i such that
all(val < x for val in a[lo] to a[i-1]) for the left side and all(val >= for val in a[i] to a[hi-1]) for the right side.

In other words,

• bisect_left(needle) returns the first index above or matching needle.
• bisect_right(needle) returns the first index above needle.

A few scenarios:

1. If No perfect hit, then same value returned by both functions.
• Common scenario: if needle is higher than all, then “i” would both be the last index + 1.
• Common scenario: if the needle is lower than all, then “i” would both be 0
• in all cases, You can always insert Before this position
2. If you get a perfect hit on a list values, bisect_left would return that “perfect” index, so bisect_left() is more useful than bisect_right(). I feel this is similar to std::lower_bound
• This is confusing, but bisect_right() would return a value such that a[i-1] == x, so the returned “i” value is higher. Therefore, bisect_right() would never return the “perfect” index.
3. If you have a lower-bound input value (like minimum sqf) that might hit, then use bisect_left(). If it returns i, then all list elements qualify from i to end of list
4. If you have an upper-bound input value that might hit, then use bisect_left(). If it returns i, then all list values qualify from 0 to i. I never use bisect_right.
5. Note the slicing syntax in python a[lo] to a[i-1] == a[lo:i] where the lower bound “lo” is inclusive but upper bound “i” is exclusive.
import bisect
needle = 2
float_list = [0, 1, 2, 3, 4]
left = bisect.bisect_left(float_list, needle)
print 'left (should be lower) =', left # 2

right = bisect.bisect_right(float_list, needle)
print 'right (should be higher) =', right # 3

python ‘global myVar’ needed where@@

Suppose you have a global variable var1 and you need to “use” it in a function f1()

Note Global basically means module-level. There’s nothing more global than that in python.

Rule 1a: to read any global variable in any function you don’t need “global”. I think the LGB rule applies.

Rule 1b: to call a mutator method on a global object, you don’t need “global”. Such an object can be a dict or a list or your custom object. Strings and integers have no mutators!

Rule 2: to rebind the variable to another object in memory (i.e. pointer reseat), you need “global” declaration to avoid compilation error. This situation is rare in my projects.

python dict bad lookup key: solutions

A common scenario. myDict[‘baaadKey’] throws exception. https://stackoverflow.com/questions/3483520/use-cases-for-the-setdefault-dict-method compared the solutions.

My use case — if any key is not found, return a const default value.

Solution: defaultdict class. To my surprise, my use case is not easily supported. Need to define a factory method.

Solution: mydict.setdefault(myKey, myDefault) of the regular dict class. Note this solution is similar to the get() solution and it does NOT set a single default for the entire dict.

Solution: mydict.get(myKey, myDefault).

python – convert sequence of obj to strings then concat

P60 [[cookbook]] shows a neat trick

>>> data = [‘aa’, 50, 91.1]
>>> ‘, ‘ . join(str(d) for d in data)
‘aa, 50, 91.1’

Technique: generator expression,
Technique: str() conversion ctor. Without conversion, joining string with non-string leads to exceptions.
Technique: calling join() method on the delimiter string

The author points out that string concat can be very inefficient if you blindly use “+” operator. Similarly, java/dotnet offers stringBuilders, and c++ offers stringstream

These Python builtin functions have something in common:

* pseudo [1] constructors — manufacture an object of the specified type
* conversion constructors — converting some input value into an object of the specified type

[1] Actually builtin functions rather than ctor. I guess the fine differences between builtin functions, keywords and operators are not that important at this stage.

P64 [[essential ref]] lists these and more, as “type conversion” functions.

– str()
– dict()
– list() — see py list initialize []^list()
– tuple()
– set()
– int()

– file() — very similar to open()

process user input as string or as number

Perl makes is too easy …

Strong-typed languages require explicit conversions between strings and numbers.

Python also requires conversion —

int(myStr)
float(myStr)

str(myNumber), repr(myNumber),  modulo (%) — all equivalent

##python – convert btw dict/tuple/list and string, briefly

* any dict/tuple/list ==> serialize to string? use repr() or backquote, or str()
** how about deserialize from string? use eval()

list/tuple ==> dict? dict() ctor
list/tuple ==> list? list() ctor
list/tuple ==> tuple? tuple() ctor
frozenset ==> set? set() ctor

— in real projects, half the non-trivial data conversions involves dict —
dict ==> list of keys? myDict.keys()
dict ==> list of values? myDict.values()
dict ==> list of pairs? myDict.items()

random list-of-pairs ==> various “roll-up” dictionaries? See defaultdict(list) and defaultdict(set) on http://docs.python.org/release/2.5.2/lib/defaultdict-examples.html

python for-loop: string,file,dict,args,dir …

Following the Unix philosophy, Python’s for-in loop is a simple idea (iterator) pushed to the max. It supports
– iterating chars in a ….. string
– iterating lines in a …. file
– iterating integers in a range() or xrange()
– iterating …. KEYs in a dict — values requires explicit look-up
– iterating …. key/value pairs in a dict.items()
– iterating sys.argv on command line
– list, tuple
– retrieving pairs from list-of-pairs — idiom ## this example also illustrates defaultdict(list)

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

More generally, any class supporting iteration can use for-loop. Here’s an example illustrating some of them.

import re, sys
from os import walk
printed={}
for (path, dirs, files) in walk(“c:\\”) :
for filename in files :
if not re.search(“\.py\$”,filename) : continue
if not printed.has_key(path):
print ” path = ” + path
printed[path] = True

for line in open (path+’\\’+filename) :
if re.search(‘^\s*class\s’, line) : print filename + ‘:\t’ + line,

jagged^matrix(2D-array) syntax: C/j/py/C#

There exist only 2 types of 2D arrays across these 3 languages — C, java and c#. Across these languages the 2 types can be compared but the syntax … better don’t compare.

C uses vastly different syntax between
– array of pointers — i.e. jagged.
[a][b] matrix. Column(and row) size is fixed and permanent. Total a*b storage space allocated, used or unused. Unused space is sometimes padded?
** arr[a,b] // this code strangely compiles but means arr[b]. This confusing syntax is completely unrelated to 2D array.

A Java 2D array is jagged,  _ a l w a y s _. Implemented as an array of pointers. Syntax is…. [a][b], a departure from C. Java has no built-in support for matrix

C# 2D arrays are really 2 unrelated constructs with similar syntax. The Jagged[a][b] vs the Matrix[a, b]. See http://www.dotnetperls.com/jagged-2d-array-memory

Python can support arr[3][2]. See https://stackoverflow.com/questions/6667201/how-to-define-a-two-dimensional-array-in-python

In terms of syntax evolution, java jagged took the c matrix syntax, and C# jagged inherited java syntax.

Therefore the c# designers faced a dilemma whether to follow c or java syntax. They chose java and not c.

In summary, the more useful jagged construct has this syntax
*arrOfPointer[3] // C
arr[][] //java
arr[][] //c#

A matrix in linear algebrea is a rectangular 2D array, built-in with c and c#, not java.

clear a python list: slice-delete vs create empty list

data = []
temp = []
for x in range(2) :
# temp list populated from a file
data.append(temp)
#### now we need to empty temp
temp = []  # reseating, may need “global”
# del temp[:]  # bug

Look at the last line. What’s going on? Well, temp=[] would Instantiate a new list object, disconnected to the lists already saved as data[0]. Java{temp=new List(); }. (C++ would treat it as assignment but here treated as rebinding). Variable “temp” rebinds to the new object. The reference count on the original list object drops to 1 since only data[0] points to it

But del temp[:] is different. This is same as java{ temp.del(…); } or temp.empty_myself_set_length_to_zero() i.e. _edit_in_place_. Variable “temp” remains bound to the same object as before. Therefore, the list already saved in data[0] is now emptied! Both data[0] and temp are nicknames for the same list object, whose reference count remains 2. Alternatively,

temp[:] = [] # same as del but more versatile — “overwrite_entire_list
java{ temp.replace(range_start ,range_end , new_content); }