##advanced python topics #!! IV

I read a few books and sites listing “advanced python” topics, but they don’t agree on what features are important.

Anyone can list 20 (or 100) obscure and non-trivial python features and call it “list of advanced python features

  • Mixin/AbstractBaseClass, related to Protocols
  • Protocol, as an informally defined Interface
  • coroutines?
  • Futures for async? python3.2 😦
  • Properties? Similar to c# properties; provides better encapsulation than __myField3
  • q[yield] keyword

python Protocols #phrasebook

  • interface — similar to java Interface
  • unenforced — unlike java Interface, python compiler doesn’t enforce anything about protocols
  • eg: ContextManager protocol defines __enter__() and __exit__() methods
  • eg: Sequence protocol defines __len__() and __getitem__() methods
  • partial — you can partially implement the required functions of a protocol
    • eg: your class can implement just the __getitem__() and still works as a Sequence

concurrent python..#my take

I’m biased against multi-threading towards multiprocessing because …

  1. threading is for high-performance, but java/c++ leaves python in the dust
  2. GIL in CPython, which is the default download version of python. The standard doc says “If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing

(For my academic curiosity ….) Python thread library offers common bells and whistles:

  • join()
  • timers
  • condVar
  • lock
  • counting semaphore
  • barrier
  • concurrent queue
  • isDaemon()
  • Futures? python3.2 😦


slist in python@@ #no std #Ashish

A quick google search shows

* python doesn’t offer linked list in standard library

* python’s workhorse list like [2,1,5] is a expendable array, i.e. vector. See https://stackoverflow.com/questions/3917574/how-is-pythons-list-implemented and https://www.quora.com/How-are-Python-lists-implemented-internally

* {5, 1, 0} braces can initialize a set. I very seldom use a set since a dict is almost always good-enough.

python to dump binary data in hex digits

Note hex() is a built-in, but I find it inconvenient. I need to print in two-digits with leading 0.

Full source is hosted in https://github.com/tiger40490/repo1/blob/py1/tcpEchoServer.py

def Hex(data): # a generator function
  for code in map(ord,data):
    yield "%02x " % code
    i += 1
    if i%8==0: yield ' '

print ''.join(Hex("\x0a\x00")); exit(0)

edit 1 file in big python^c++ production system #XR

Q1: suppose you work in a big, complex system with 1000 source files, all in python, and you know a change to a single file will only affect one module, not a core module. You have tested it + ran a 60-minute automated unit test suit. You didn’t run a prolonged integration test that’s part of the department-level full release. Would you and approving managers have the confidence to release this single python file?
A: yes

Q2: change “python” to c++ (or java or c#). You already followed the routine to build your change into a dynamic library, tested it thoroughly and ran unit test suite but not full integration test. Do you feel safe to release this library?
A: no.

Assumption: the automated tests were reasonably well written. I never worked in a team with a measured test coverage. I would guess 50% is too high and often impractical. Even with high measured test coverage, the risk of bug is roughly the same. I never believe higher unit test coverage is a vaccination. Diminishing return. Low marginal benefit.

Why the difference between Q1 and Q2?

One reason — the source file is compiled into a library (or a jar), along with many other source files. This library is now a big component of the system, rather than one of 1000 python files. The managers will see a library change in c++ (or java) vs a single-file change in python.

Q3: what if the change is to a single shell script, used for start/stop the system?
A: yes. Manager can see the impact is small and isolated. The unit of release is clearly a single file, not a library.

Q4: what if the change is to a stored proc? You have tested it and run full unit test suit but not a full integration test. Will you release this single stored proc?
A: yes. One reason is transparency of the change. Managers can understand this is an isolated change, rather than a library change as in the c++ case.

How do managers (and anyone except yourself) actually visualize the amount of code change?

  • With python, it’s a single file so they can use “diff”.
  • With stored proc, it’s a single proc. In the source control, they can diff this single proc. Unit of release is traditionally a single proc.
  • with c++ or java, the unit of release is a library. What if in this new build, beside your change there’s some other change , included by accident? You can’t diff a binary 😦

So I feel transparency is the first reason. Transparency of the change gives everyone (not just yourself) confidence about the size/scope of this change.

Second reason is isolation. I feel a compiled language (esp. c++) is more “fragile” and the binary modules more “coupled” and inter-dependent. When you change one source file and release it in a new library build, it could lead to subtle, intermittent concurrency issues or memory leaks in another module, outside your library. Even if you as the author sees evidence that this won’t happen, other people have seen innocent one-line changes giving rise to bugs, so they have reason to worry.

  • All 1000 files (in compiled form) runs in one process for a c++ or java system.
  • A stored proc change could affect DB performance, but it’s easy to verify. A stored proc won’t introduce subtle problems in an unrelated module.
  • A top-level python script runs in its own process. A python module runs in the host process of the top-level script, but a typical top-level script will include just a few custom modules, not 1000 modules. Much better isolation at run time.

There might be python systems where the main script actually runs in a process with hundreds of custom modules (not counting the standard library modules). I have not seen it.

big guns: template4c++^reflection4(java+python)

Most complex libraries (or systems) in java require reflection to meet the inherent complexity;

Most complex libraries in c++ require template meta-programming.

But these are for different reasons… which I’m not confident to point out.

Most complex python systems require … reflection + import hacks? I feel python’s reflection (as with other scripting languages) is more powerful, less restricted. I feel reflection is at the core of some (most?) of the power features in python – import, polymorphism

TCP listening socket shared by2processes #fork

Common IV question: In what scenarios can a listening socket (in memory) be shared between 2 listening processes?

Background — a socket is a special type of file descriptor (at least in unix). Consider an output file handle. By default, this “channel” isn’t shared between 2 processes. Similarly, when a packet (say a price) is delivered to a given network endpoint, the kernel must decide which process to receive the data, usually not to two processes.

To have two processes both listening on the same listening-socket, one of them is usually a child of the other. The webpage in [1] and my code in https://github.com/tiger40490/repo1/blob/py1/py/sock/1sock2server.py show a short python code illustrating this scenario. I tested. q(lsof) and q(ss) commands both (but not netstat) show the 2 processes listening on the same endpoint. OS delivers the data to A B A B…

https://bintanvictor.wordpress.com/2017/04/29/so_reuseport-socket-option/ shows an advanced kernel feature to let multiple processes bind() to the same endpoint.

For multicast (UDP only) two processes can listen to the same UDP endpoint. See [3] and [2]

A Unix domain socket can be shared between two unrelated processes.


[1] http://stackoverflow.com/questions/670891/is-there-a-way-for-multiple-processes-to-share-a-listening-socket

[2] http://stackoverflow.com/questions/1694144/can-two-applications-listen-to-the-same-port

[3] http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html

%%logging decorator with optional args

Latest is Uploaded to github: https://github.com/tiger40490/repo1/blob/py1/py/loggingDecorator.py

I hope the wordpress code formatting renders the source code correctly:

def log3(funcOrMsg=None, named_arg=None):
    '''arg is Optional. You can use any of:
    @log3(named_arg='specific msg') # some prefer named argument for clarity
    @log3('msg2') or
    arg1_isCallable = callable(funcOrMsg)
    arg1_isStr      = isinstance(funcOrMsg, basestring)
    arg1_isNone     = funcOrMsg is None

    def decorated(func):

        def wrapper(*args, **kwargs):
            if named_arg: print 'named_arg = ' + str(named_arg)
            tmp = funcOrMsg if arg1_isStr else ''
            logger.info(tmp + ' pym ver = ' + str(logger.pymodelsVer),
              extra={'name_override' : func.__name__})
              ### set name_override to func.__name__ in a kwarg to info()
            return func(*args, **kwargs)
        return wrapper
        ## end wrapper

    if arg1_isCallable:
        return decorated(funcOrMsg) # decorator received no-arg
        # decorator had kwargs   or   positional arg
        assert     arg1_isNone   or   arg1_isStr
        return decorated

python RW global var hosted in a module

Context: a module defines a top-level global var VAR1, to be modified by my script. Reading it is relatively easy:

from mod3 import *
print VAR1

Writing is a bit tricky. I’m still looking for best practices.

Solution 1: mod3 to expose a setter setVAR1(value)

Solution 2:
import mod3
mod3.VAR1 = ‘new_value’

Note “from mod3 import * ” doesn’t propagate the new value back to the module. See example below.

#!/usr/bin/python -u
from mod3 import *

def main():
  ''' Line below is required to propagate new value back to mod3
      Also note the semicolon -- to put two statements on one line '''
  import mod3; mod3.VAR1 = 'new value'
VAR1='initial value'
def mod3func():
  print 'VAR1 =', VAR1

## innovative features of python

Here’s my answer to a friend’s question “what innovative features do you see in python”

  • * decorators. Very powerful. Perhaps somewhat similar to AOP. Python probably borrowed it from Haskell?
  • * dynamic method/attribute lookup. Somewhat similar to C# “dynamic” keyword. Dangerous technique similar to java reflection.
  • * richer introspection than c# (which is richer than java)
  • * richer metaprogramming support (including decorator and introspection) … Vague answer!
  • * enhanced for-loop for a file, a string,
  • * listcomp and genexpr
  • * Mixin?
  • I wrote a code gen to enrich existing modules before importing them. I relied on hooks in the importation machinery.

python q[import]directive complexity imt java/c++/c#

I would say it “does more work” not just “more complicated”…

Name conflicts, name resolution … are the main purpose of (import/using/include) in java/c++/c#. (Side question — Can c++ header files execute arbitrary statements? I think so since it’s just pasted in… Minor question)

In contrast, python modules are executed line by line the first time they are imported. P300 [[programming python]]. I think this can include arbitrary statements. This is the major departure from “Tradition”.

I guess “from … import …” is more traditional and won’t execute arbitrary code??

python template method; parent depending on child

Background — classic template method pattern basically sets up a base class dependency on (by calling) a subclass method, provided the method is abstract in base class.

Example — doHtml(), doParameters() and doProperties() methods are abstract in the base EMPanel class.

1) Python pushes the pattern further, when method can be completely _undeclared_ in base class.  See runCommand() in example on P222 [[Programming Python]].

* When you look at the base class in isolation, you don’t know what self.runCommand() binds to. It turned out it’s declared only in subclass.

2) Python pushes the pattern stillllll further, when _undeclared_ fields can be used in base class. The self.menu thing looks like a data field but undeclared. Well, it’s declared in a subclass!

3) I have yet to try a simple example but multiple sources [3] say python pushes the pattern yeeeet further, when a method can be invoked without declaring it in any class — if it’s declared in an Instance. That instance effectively is an instance of an anonymous subclass (Java!).

* There’s no compiler to please! At run time, python can “search” in instance and subclass scopes, using a turbo charged template-method search engine.

In conclusion, at creation time a python base class can freely reference any field or method even if base class doesn’t include them in its member-listing.

[3] P96 [[ref]]

python LGB rule, %%take

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 clarifies many key points discussed below.

(The simple-looking “global” keyword can be tricky if we try to understand its usage too early.)

The key point highlighted inadequately (except the above book) is the distinction between assignment and reference.

* AA – Assignment – When a variable appears on the LHS, it’s (re)created in some namespace [1], without name lookup. By default, that namespace is the local namespace.
* RR – Reference – any other context where you “use” a variable, it’s assumed to be a pre-existing variable. Query and method invocation are all references. Requires name lookup, potentially causing an error.

Let me test you on this distinction —

Q: what happens if I Assign to a non-existent variable?
A: I believe it will get created.

Q: what happens if I Reference a non-existent varialbe?
A: I believe it’s a NameError, in any context

The LGB search sequence is easily understood in the RR context. Failed search -> error.

In the AA context, there’s no need to look up, as the variable is simply created.

shadowing — local var often shadows a global var. This makes sense in the RR case, and I think also in the AA case.

[1] A namespace is probably implemented as an “idict”, a registry (presumably) mapping variable names to object addresses.

Now we are ready to look at the effect of keyword “global”.

RR – the LGB search sequence shrinks to “G”.
AA – the variable is simply (re)created in the G rather than L namespace.

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 points out that “Global names needs to be declared only if they are assigned in a function.” If not assigned, then no shadowing concern.

map reduce filter zip …

Importance — these handful of functions are the core FP features in Python, according to http://www.ibm.com/developerworks/linux/library/l-prog/index.html?ca=drs-

I feel it’s good to tackle the family members one by one, as a group. I believe each has some counterpart in either perl, C++, java and most likely in c#, so we can just list one counterpart to help us grasp each.

Bear in mind how common/uncommon each function is. Don’t overspend yourself on the wrong thing.

Don’t care about list vs tuple for now! Don’t care about advanced usages. Just grasp one typical usage of each.

http://www.lleess.com/2013/07/python-built-in-function-map-reduce-zip.html is concise.

–filter() = grep in perl

filter(function, sequence) -> sequence

>>> filter(lambda d: d != ‘a’, ‘abcd’) 

–map() = map in perl

>>> map(lambda a: a+1, [1,2,3,4] )
[2, 3, 4, 5]

–reduce = Aggregate() in c#
reduce(function, sequence [, initial]) -> value

>>> reduce(lambda x, y: x+y, range(0,10), 10)

–zip is less common

–apply() is deprecated

python – important details tested on IKM

Today I did some IKM python quiz. A few tough questions on practical (not obscure) and important language details. You would need to invest months to learn these. I feel a regular python coding job of 2 years may not provide enough exposure to reach this level.

Given the amount of effort (aka O2, laser) invested, I feel LROTI would be much higher in c++, java and WPF (slightly less in general c# and swing). One problem with LROTI on WPF is churn.

As we grow older and have less time to invest, LROTI is a rather serious consideration. I no longer feel like superman in terms of learning new languages.

##some python constructs to understand

These are the features I feel likely to turn up in production source code or interviews, so you need to at least recognize what they mean but need not know how to use exactly. (A subset of these you need to Master for writing code but let’s not worry about it.)

List of operators, keywords and expressions are important for this purpose

Most built-in Methods are self-explanatory.

nested list-comprehension (python): untangled#venkat

>>> [innerItr for outerItr in [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]  for innerItr in outerItr]

[‘_VectorInteger’, ‘VectorIntegerConst’, ‘_EquityAssetVolatilitySurface’, ‘EquityAssetVolatilitySurfaceConst’]

—– That’s the python IDLE response. Now let’s unravel it. —–

The easy part — the split expression returns a LIST of strings.

[[…] for base in …split()] returns a List of List (LoL). Note You can’t inject into output stream 2 items for each input stream item. List comprehension only supports injecting one item. See P86 [[python essential ref]].

outerItr in [..split()] – the “outerItr” is an iterator and represents each inner List. To open up an inner list, we put for innerItr in outerItr at the END. This resembles a double for-loop – as explained in P85 [[python essential ref]].

First expression inside the outermost [], before the very first “for” is the select-clause. In our case it’s the expression “innerItr”. Like in SQL, this is the last thing parsed. This expression is evaluated inside the double-for-loop

Note the outermost [] pair is required. It makes the entire expression a list-expression. I don’t know exactly what happens if you omit the pair, probably not a good idea.

>>> [outerItr for outerItr in [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]]
>>> [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]
—-> same response
[[‘_VectorInteger’, ‘VectorIntegerConst’], [‘_EquityAssetVolatilitySurface’, ‘EquityAssetVolatilitySurfaceConst’]]

>>> [outerItr[0] for outerItr in [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]]
[‘_VectorInteger’, ‘_EquityAssetVolatilitySurface’]


python – some performance tips { AndyZhao

1) *.pyc = java class files, pre-compiled. Similar to jdbc prepared statement.

2) PYTHONPATH = classpath. Probably the most important performance tip, according to Andy.

3) Minimize run-time name look-up —

from time import time
from math import sqrt
import math
start = time()
for i in xrange(10000000):
    d += sqrt(i)        # one name look-up for “sqrt”
print time() – start

start = time()
for i in xrange(10000000):
    d += math.sqrt(i)           # 2 name look-up for “math” and “sqrt”
print time() – start

rooted vs re-bindable variables – c#, c++, java, python

Q: What kinds of variables can re-bind (reseated) to a different object at run-time and what kinds can’t? This understanding is not academic but helps programmers remember ground rules.

—-Python moves further towards rebinding. Even a simple myInt variable can rebind. I feel the fundamental distinction in python world is between immutable vs mutable “Objects” (defined as storage-locations).
* Python Immutables are reference-counted, probably copy-on-write. Therefore variables bound to immutable Objects are reseat-able.
* What python variables are rooted? Well I believe the first element (other elements too) in a tuple is, though the tuple variable itself can rebind.

—-In java, all primitive variables are “rooted”. All reference variables are reseat-able.
+ Assigning to a primitive variable writes into “the ultimate” memory location;
+ Assigning to a reference variable reseats the pointer, without cloning any object.
– There’s a Separate 32-bit storage for every reference variable, distinct from pointee’s storage.[1]
– There’s no separate storage for a primitive variable. Variable name is a nickname of the storage address. Compiler translates variable name into storage address. Run-time access to variable is one-hit. In contrast, Reference variables’ access is 2-hit – following the pointer.

[1] Evidence? See memory layout of any MyClass having a non-primitive field. How much memory (like sizeof(MyClass)) is allocated by new MyClass()?
—-In c#, all Value variables (including structs) are rooted. Assignment clones, including pass-by-value into a method.
All reference variables can be rebound.
—-C is simple and clean
All pointer variables can be re-seated but non-pointer variables are rooted. When a variable is on the LHS, it either rebinds or the Object is “edited”. See post on “Immutable, initialize..” to see the difference between Object vs Variable.
—-C++ feels more complicated.
In C++, all nonref and reference variables are rooted. Assignment writes directly into the object’s “stomach”. Pointers are reseat-able.

However, a C++ reference variable (like pointer variable) has a separate 32-bit storage (address hidden) distinct from pointee/referent storage. Some writers say “referent” but I find “pointee” more distinct and less ambiguous.

## 6 kinds of "del" in python

(Note in most quick and dirty scripts, we seldom need to delete stuff. In other scripts, deletion is much less common than insertion.) If you are overwhelmed, just remember Most common uses are dict-delete and list-delete-by-index.

1) del myDict[“some key”] # syntactically closer to List-delete than Attribute-delete

2) del myList[index] # [Note A]
2b) del myList[ sliceStart : sliceEnd ]

[A] myList.remove(someValue) is deletion by value, not by index

——— below are more advanced (read “obscure”)
3) del myVar # removes the name “myVar” but not necessarily the object. Note this is Not related to
3b) __del__(self) # like dtor or java finalizer

4) del myObj.myAttr # same as
4b) __delattr__(self, myAttr) # basically remove from the idic

5) these are defined in built-in dict and list classes
__delitem__(self,index_or_key) # implements list-delete or dict-delete
__delslice__ # implements slice-delete

6) advanced Descriptor-delete
__delete__(self, instance)


LC (list comp) builds up a physical list in memory. LC needs you, the creator, to put the brackets [….] around it. Now it looks like a list (and really IS a list)

GE (generator expression) doesn’t create a list in memory. GE “yields” one “fruit” at a time on demand. GE needs the parentheses (…).

Declarative programming — a generator Expression is a specification for producing a stream of items.

LC — range()
GE — xrange()

What if you remove the brackets around an LC? I think it won’t compile.

The “yield” keyword (also present in c#) serves a similar purpose to generator expressions and can express more complex logic. https://stackoverflow.com/questions/16780434/yield-vs-generator-expression-different-type-returned sheds some lights but is on python3

python exec keyword ^ eval() function

Based on my experience in other scripting languages, I don’t think these are widely used in everyday scripting, but veterans must know.

– exec is a keyword[1] and not a function so can’t return a value.
– eval() is a function so returns a value.

Both accept a string “argument”, or (more advanced) code object from compile() function.
[1] Other keywords include while/for/lamda and raise/try/except — note “except” is like “catch”

See P90 [[py ref]]