enumerate()iterate py list/str with idx+val

The built-in enumerate() is a nice optional feature. If you don’t want to remember this simple simple syntax, then yes you can just iterate over xrange(len(the_sequence))

https://www.afternerd.com/blog/python-enumerate/#enumerate-list is illustrated with examples.

— to enumerate backward,

Since enumerate() returns a generator and generators can’t be reversed, you need to convert it to a list first.

for i, v in reversed(list(enumerate(vec)))

convert 512bit int to bit-vector #python=best

In coding questions, int-to-bit_vector conversion is common. I now think python offers a simple clean solution, applicable on integers of any size:)

'{0:032b} <-32|8-> {0:08b}'.format(myInt) #convert to a 32-element list, simultaneously to an 8-element
  • first zero refers to argument #0 (we have only one argument).
  • The zero after colon means zero-padding
  • The last ‘b’ means binary
  • Don’t forget your (dental) braces 🙂

Note this is the str.format() method, not the global bulitin function format()

Return value is a python string, but easily convertible to a vector

What if the number is too large? My {0:08b}  example shows that if the int is above 255, then python does the safe thing — extending the output beyond 8-element, rather than truncating 🙂

To convert bit-vector to an int, use int(bitArr , 2) # 2 is the base i.e. binary

python *args **kwargs: cheatsheet

“asterisk args” — I feel these features are optional in most cases. I think they can create additional maintenance work. So perhaps no need to use these features in my own code.

However, some codebases use these features so we had better understand the syntax rules.

— Inside the called function astFunc(),

Most common way to access the args is a for-loop.

It’s also common to forward these asterisk arguments:

def astFunc(*args, **kwargs):
anotherFunc(*args, **kwargs)

I also tested reading the q[ *args ] via list(args) or args[:]

— how to use these features when calling a function:

  • astFunc(**myDict) # astFunc(**kwa)
  • simpleFunc(**myDict) # simpleFunc(arg1, arg2) can also accept **myDict

See my github

python nested function2reseat var] enclos`scope

My maxPalindromeSubstr code in https://github.com/tiger40490/repo1/tree/py1/py/algo_str demos the general technique, based on https://stackoverflow.com/questions/7935966/python-overwriting-variables-in-nested-functions

Note — inside your nested function you can’t simply assign to such a variable. This is like assigning to a local reference variable in java.

https://jonskeet.uk/java/passing.html explains the fundamental property of java reference parameter/argument-passing. Basically same as the python situation.

In c# you probably (99% sure) need to use ref-parameters. In c++, you need to pass in a double-pointer. Equivalently, you can pass in a reference to a pre-existing 64-bit ptr object.

##advanced python topics !! IV

I read a few books and sites listing “advanced python” topics, but they don’t agree on what features are important.

Anyone can list 20 (or 100) obscure and non-trivial python features and call it “list of advanced python features

  • Mixin/AbstractBaseClass, related to Protocols
  • Protocol, as an informally defined Interface
  • coroutines?
  • Futures for async? python3.2 😦
  • Properties? Similar to c# properties; provides better encapsulation than __myField3
  • q[yield] keyword

python Protocols #phrasebook

  • interface — similar to java Interface
  • unenforced — unlike java Interface, python compiler doesn’t enforce anything about protocols
  • eg: ContextManager protocol defines __enter__() and __exit__() methods
  • eg: Sequence protocol defines __len__() and __getitem__() methods
  • partial — you can partially implement the required functions of a protocol
    • eg: your class can implement just the __getitem__() and still works as a Sequence

concurrent python #my take

I’m biased against multi-threading, biased towards multiprocessing because …

  1. threading is for high-performance, but java/c++ leaves python in the dust
  2. GIL in CPython, which is the default download version of python. The standard doc says “If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing

(For my academic curiosity ….) Python thread library offers common bells and whistles:

  • join()
  • timers
  • condVar
  • lock
  • counting semaphore
  • barrier
  • concurrent queue
  • isDaemon()
  • Futures? python3.2 😦


slist in python@@ #no std #AshS

A quick google search shows

* python doesn’t offer linked list in standard library

* python’s workhorse list like [2,1,5] is a expendable array, i.e. vector. See https://stackoverflow.com/questions/3917574/how-is-pythons-list-implemented and https://www.quora.com/How-are-Python-lists-implemented-internally

* {5, 1, 0} braces can initialize a set. I very seldom use a set since a dict is almost always good-enough.

edit 1 file in big python^c++ production system #XR

Q1: suppose you work in a big, complex system with 1000 source files, all in python, and you know a change to a single file will only affect one module, not a core module. You have tested it + ran a 60-minute automated unit test suit. You didn’t run a prolonged integration test that’s part of the department-level full release. Would you and approving managers have the confidence to release this single python file?
A: yes

Q2: change “python” to c++ (or java or c#). You already followed the routine to build your change into a dynamic library, tested it thoroughly and ran unit test suite but not full integration test. Do you feel safe to release this library?
A: no.

Assumption: the automated tests were reasonably well written. I never worked in a team with a measured test coverage. I would guess 50% is too high and often impractical. Even with high measured test coverage, the risk of bug is roughly the same. I never believe higher unit test coverage is a vaccination. Diminishing return. Low marginal benefit.

Why the difference between Q1 and Q2?

One reason — the source file is compiled into a library (or a jar), along with many other source files. This library is now a big component of the system, rather than one of 1000 python files. The managers will see a library change in c++ (or java) vs a single-file change in python.

Q3: what if the change is to a single shell script, used for start/stop the system?
A: yes. Manager can see the impact is small and isolated. The unit of release is clearly a single file, not a library.

Q4: what if the change is to a stored proc? You have tested it and run full unit test suit but not a full integration test. Will you release this single stored proc?
A: yes. One reason is transparency of the change. Managers can understand this is an isolated change, rather than a library change as in the c++ case.

How do managers (and anyone except yourself) actually visualize the amount of code change?

  • With python, it’s a single file so they can use “diff”.
  • With stored proc, it’s a single proc. In the source control, they can diff this single proc. Unit of release is traditionally a single proc.
  • with c++ or java, the unit of release is a library. What if in this new build, beside your change there’s some other change , included by accident? You can’t diff a binary 😦

So I feel transparency is the first reason. Transparency of the change gives everyone (not just yourself) confidence about the size/scope of this change.

Second reason is isolation. I feel a compiled language (esp. c++) is more “fragile” and the binary modules more “coupled” and inter-dependent. When you change one source file and release it in a new library build, it could lead to subtle, intermittent concurrency issues or memory leaks in another module, outside your library. Even if you as the author sees evidence that this won’t happen, other people have seen innocent one-line changes giving rise to bugs, so they have reason to worry.

  • All 1000 files (in compiled form) runs in one process for a c++ or java system.
  • A stored proc change could affect DB performance, but it’s easy to verify. A stored proc won’t introduce subtle problems in an unrelated module.
  • A top-level python script runs in its own process. A python module runs in the host process of the top-level script, but a typical top-level script will include just a few custom modules, not 1000 modules. Much better isolation at run time.

There might be python systems where the main script actually runs in a process with hundreds of custom modules (not counting the standard library modules). I have not seen it.

big guns: template4c++^reflection4(java+python)

Most complex libraries (or systems) in java require reflection to meet the inherent complexity;

Most complex libraries in c++ require template meta-programming.

But these are for different reasons… which I’m not confident to point out.

Most complex python systems require … reflection + import hacks? I feel python’s reflection (as with other scripting languages) is more powerful, less restricted. I feel reflection is at the core of some (most?) of the power features in python – import, polymorphism

TCP listening socket shared by2processes #mcast

Common IV question: In what scenarios can a listening socket (in memory) be shared between 2 listening processes?

Background — a socket is a special type of file descriptor (at least in unix). Consider an output file handle. By default, this “channel” isn’t shared between 2 processes. Similarly, when a packet (say a price) is delivered to a given network endpoint, the kernel must decide which process to receive the data, usually not to two processes.

To have two processes both listening on the same listening-socket, one of them is usually a child of the other. The webpage in [1] and my code in https://github.com/tiger40490/repo1/blob/py1/py/sock/1sock2server.py show a short python code illustrating this scenario. I tested. q(lsof) and q(ss) commands both (but not netstat) show the 2 processes listening on the same endpoint. OS delivers the data to A B A B…

https://bintanvictor.wordpress.com/2017/04/29/so_reuseport-socket-option/ shows an advanced kernel feature to let multiple processes bind() to the same endpoint.

For multicast (UDP only) two processes can listen to the same UDP endpoint. See [3] and [2]

A Unix domain socket can be shared between two unrelated processes.


[1] http://stackoverflow.com/questions/670891/is-there-a-way-for-multiple-processes-to-share-a-listening-socket

[2] http://stackoverflow.com/questions/1694144/can-two-applications-listen-to-the-same-port

[3] http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html

%%logging decorator with optional args

Latest is Uploaded to github: https://github.com/tiger40490/repo1/blob/py1/py/loggingDecorator.py

I hope the wordpress code formatting renders the source code correctly:

def log3(funcOrMsg=None, named_arg=None):
    '''arg is Optional. You can use any of:
    @log3(named_arg='specific msg') # some prefer named argument for clarity
    @log3('msg2') or
    arg1_isCallable = callable(funcOrMsg)
    arg1_isStr      = isinstance(funcOrMsg, basestring)
    arg1_isNone     = funcOrMsg is None

    def decorated(func):

        def wrapper(*args, **kwargs):
            if named_arg: print 'named_arg = ' + str(named_arg)
            tmp = funcOrMsg if arg1_isStr else ''
            logger.info(tmp + ' pym ver = ' + str(logger.pymodelsVer),
              extra={'name_override' : func.__name__})
              ### set name_override to func.__name__ in a kwarg to info()
            return func(*args, **kwargs)
        return wrapper
        ## end wrapper

    if arg1_isCallable:
        return decorated(funcOrMsg) # decorator received no-arg
        # decorator had kwargs   or   positional arg
        assert     arg1_isNone   or   arg1_isStr
        return decorated

## innovative features of python

Here’s my answer to a friend’s question “what innovative features do you see in python”

  • * decorators. Very powerful. Perhaps somewhat similar to AOP. Python probably borrowed it from Haskell?
  • * dynamic method/attribute lookup. Somewhat similar to C# “dynamic” keyword. Dangerous technique similar to java reflection.
  • * richer introspection than c# (which is richer than java)
  • * richer metaprogramming support (including decorator and introspection) … Vague answer!
  • * enhanced for-loop for a file, a string,
  • * listcomp and genexpr
  • * Mixin?
  • I wrote a code gen to enrich existing modules before importing them. I relied on hooks in the importation machinery.

python q[import]directive complexity imt java/c++/c#

I would say it “does more work” not just “more complicated”…

Name conflicts, name resolution … are the main purpose of (import/using/include) in java/c++/c#. (Side question — Can c++ header files execute arbitrary statements? I think so since it’s just pasted in… Minor question)

In contrast, python modules are executed line by line the first time they are imported. P300 [[programming python]]. I think this can include arbitrary statements. This is the major departure from “Tradition”.

I guess “from … import …” is more traditional and won’t execute arbitrary code??

python template method; parent depending on child

Background — classic template method pattern basically sets up a base class dependency on (by calling) a subclass method, provided the method is abstract in base class.

Example — doHtml(), doParameters() and doProperties() methods are abstract in the base EMPanel class.

1) Python pushes the pattern further, when method can be completely _undeclared_ in base class.  See runCommand() in example on P222 [[Programming Python]].

* When you look at the base class in isolation, you don’t know what self.runCommand() binds to. It turned out it’s declared only in subclass.

2) Python pushes the pattern stillllll further, when _undeclared_ fields can be used in base class. The self.menu thing looks like a data field but undeclared. Well, it’s declared in a subclass!

3) I have yet to try a simple example but multiple sources [3] say python pushes the pattern yeeeet further, when a method can be invoked without declaring it in any class — if it’s declared in an Instance. That instance effectively is an instance of an anonymous subclass (Java!).

* There’s no compiler to please! At run time, python can “search” in instance and subclass scopes, using a turbo charged template-method search engine.

In conclusion, at creation time a python base class can freely reference any field or method even if base class doesn’t include them in its member-listing.

[3] P96 [[ref]]

python LGB rule, %%take

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 clarifies many key points discussed below.

(The simple-looking “global” keyword can be tricky if we try to understand its usage too early.)

The key point highlighted inadequately (except the above book) is the distinction between assignment and reference.

* AA – Assignment – When a variable appears on the LHS, it’s (re)created in some namespace [1], without name lookup. By default, that namespace is the local namespace.
* RR – Reference – any other context where you “use” a variable, it’s assumed to be a pre-existing variable. Query and method invocation are all references. Requires name lookup, potentially causing an error.

Let me test you on this distinction —

Q: what happens if I Assign to a non-existent variable?
A: I believe it will get created.

Q: what happens if I Reference a non-existent varialbe?
A: I believe it’s a NameError, in any context

The LGB search sequence is easily understood in the RR context. Failed search -> error.

In the AA context, there’s no need to look up, as the variable is simply created.

shadowing — local var often shadows a global var. This makes sense in the RR case, and I think also in the AA case.

[1] A namespace is probably implemented as an “idict”, a registry (presumably) mapping variable names to object addresses.

Now we are ready to look at the effect of keyword “global”.

RR – the LGB search sequence shrinks to “G”.
AA – the variable is simply (re)created in the G rather than L namespace.

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 points out that “Global names needs to be declared only if they are assigned in a function.” If not assigned, then no shadowing concern.

map reduce filter zip …

Importance — these handful of functions are the core FP features in Python, according to http://www.ibm.com/developerworks/linux/library/l-prog/index.html?ca=drs-

I feel it’s good to tackle the family members one by one, as a group. I believe each has some counterpart in either perl, C++, java and most likely in c#, so we can just list one counterpart to help us grasp each.

Bear in mind how common/uncommon each function is. Don’t overspend yourself on the wrong thing.

Don’t care about list vs tuple for now! Don’t care about advanced usages. Just grasp one typical usage of each.

http://www.lleess.com/2013/07/python-built-in-function-map-reduce-zip.html is concise.

–filter() = grep in perl

filter(function, sequence) -> sequence

>>> filter(lambda d: d != ‘a’, ‘abcd’) 

–map() = map in perl

>>> map(lambda a: a+1, [1,2,3,4] )
[2, 3, 4, 5]

–reduce = Aggregate() in c#
reduce(function, sequence [, initial]) -> value

>>> reduce(lambda x, y: x+y, range(0,10), 10)

–zip is less common

–apply() is deprecated

python – important details tested on IKM

Today I did some IKM python quiz. A few tough questions on practical (not obscure) and important language details. You would need to invest months to learn these. I feel a regular python coding job of 2 years may not provide enough exposure to reach this level.

Given the amount of effort (aka O2, laser) invested, I feel LROTI would be much higher in c++, java and WPF (slightly less in general c# and swing). One problem with LROTI on WPF is churn.

As we grow older and have less time to invest, LROTI is a rather serious consideration. I no longer feel like superman in terms of learning new languages.

##some python constructs to understand

These are the features I feel likely to turn up in production source code or interviews, so you need to at least recognize what they mean but need not know how to use exactly. (A subset of these you need to Master for writing code but let’s not worry about it.)

List of operators, keywords and expressions are important for this purpose

Most built-in Methods are self-explanatory.

nested list-comprehension (python): untangled#venkat

>>> [innerItr for outerItr in [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]  for innerItr in outerItr]

[‘_VectorInteger’, ‘VectorIntegerConst’, ‘_EquityAssetVolatilitySurface’, ‘EquityAssetVolatilitySurfaceConst’]

—– That’s the python IDLE response. Now let’s unravel it. —–

The easy part — the split expression returns a LIST of strings.

[[…] for base in …split()] returns a List of List (LoL). Note You can’t inject into output stream 2 items for each input stream item. List comprehension only supports injecting one item. See P86 [[python essential ref]].

outerItr in [..split()] – the “outerItr” is an iterator and represents each inner List. To open up an inner list, we put for innerItr in outerItr at the END. This resembles a double for-loop – as explained in P85 [[python essential ref]].

First expression inside the outermost [], before the very first “for” is the select-clause. In our case it’s the expression “innerItr”. Like in SQL, this is the last thing parsed. This expression is evaluated inside the double-for-loop

Note the outermost [] pair is required. It makes the entire expression a list-expression. I don’t know exactly what happens if you omit the pair, probably not a good idea.

>>> [outerItr for outerItr in [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]]
>>> [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]
—-> same response
[[‘_VectorInteger’, ‘VectorIntegerConst’], [‘_EquityAssetVolatilitySurface’, ‘EquityAssetVolatilitySurfaceConst’]]

>>> [outerItr[0] for outerItr in [[‘_’+base , base+’Const’]  for base in ‘VectorInteger EquityAssetVolatilitySurface’.split()]]
[‘_VectorInteger’, ‘_EquityAssetVolatilitySurface’]


python – some performance tips { AndyZhao

1) *.pyc = java class files, pre-compiled. Similar to jdbc prepared statement.

2) PYTHONPATH = classpath. Probably the most important performance tip, according to Andy.

3) Minimize run-time name look-up —

from time import time
from math import sqrt
import math
start = time()
for i in xrange(10000000):
    d += sqrt(i)        # one name look-up for “sqrt”
print time() – start

start = time()
for i in xrange(10000000):
    d += math.sqrt(i)           # 2 name look-up for “math” and “sqrt”
print time() – start

python: very few named free functions

In python, Most operations are instance/static methods, and the *busiest* operations are operators.

Free-standing functions aren’t allowed in java/c# but the mainstay of C, Perl and PHP. Python has them but very few.

— perl-style free functions are a much smaller population in python, therefore important. See the 10-pager P135[[py ref]] —
map(), apply()
min() max()

— advanced free functions such as introspection
repr() str() — related to __repr__() and __str__()
type(), id(), dir()
isinstance() issubclass()
eval() execfile()
getattr() setattr() delattr() hasattr()
range(), xrange()
?yield? not a free-function, but a keyword!

## 6 kinds of q[ del ] in python

(Note in most quick and dirty scripts, we seldom need to delete stuff. In other scripts, deletion is much less common than insertion.) If you are overwhelmed, just remember Most common uses are dict-delete and list-delete-by-index.

1) del myDict[“some key”] # syntactically closer to List-delete than Attribute-delete

2) del myList[index] # [Note A]
2b) del myList[ sliceStart : sliceEnd ]

[A] myList.remove(someValue) is deletion by value, not by index

——— below are more advanced (read “obscure”)
3) del myVar # removes the name “myVar” but not necessarily the object. Note this is Not related to
3b) __del__(self) # like java finalizer. I think this runs only when the ref count drops to zero

4) del myObj.myAttr # same as
4b) __delattr__(self, myAttr) # basically remove from the idic

5) these are defined in built-in dict and list classes
__delitem__(self,index_or_key) # implements list-delete or dict-delete
__delslice__ # implements slice-delete

6) advanced Descriptor-delete
__delete__(self, instance)


  1. LC (list comp) builds up a physical list in memory. LC needs you, the creator, to put the brackets [….] around it. Now it looks like a list (and really IS a list)
  2. GE (generator expression) doesn’t create a list in memory. GE “yields” one “fruit” at a time on demand. GE needs the parentheses (…).

Declarative programming — a generator Expression is a specification for producing a stream of items.

LC example — range()
GE example — xrange()

What if you remove the brackets around an LC? I think it won’t compile.

Now we focus on the “yield” keyword

  • Any function using yield is a generator-function and should not [3] use “return”.
  • The unique capability of a generator function is “stop, save state, resume [1]”. To start with, you may think of it as a stateful function with its own private memory.
  • generator function often take arguments to initialize or customize itself

if you happen to print the return value of myYieldFunc() , you realize it’s a generator object with an address. Basically, this is the uncommon way [2] to use myYieldFunc.  Instead, myYieldFunc() should be called, with parentheses, in “iteration context”. In iteration context, under the hood the compiled code basically instantiates (once) and queries (repeatedly) the generator object.

Q: do I always use myYieldFunc with parentheses?
%%A: I think so.

[1] [[py cookbook]] says this is such a simple yet powerful idea that it was pushed to the limit (just like the unix pipe).

https://www.programiz.com/python-programming/generator compares generator functions to regular functions.

This “yield” (also present in c#) serves a similar purpose to generator expressions and can express more complex logic. https://stackoverflow.com/questions/16780434/yield-vs-generator-expression-different-type-returned sheds some lights but is on python3

— [2] using the returned generator object directly
Most of the time we use myYieldFunc() in iteration context, but we can (but why) do this:

myGenObj = myYieldFunc(); myGenObj.next()
myGenObj = myYieldFunc(); next(myGenObj) # builtin free function

— [3] combining yield and return
A generator function like myYieldFunc() should almost never use q[ return ]. However, a regular function f2() can do a return myYieldFunc() with or without args. I think this f2() should be used as a generator function, in an iteration context.

If you do call return inside myYieldFunc(), better document it. The “return” is like break. Usually you can avoid “return”.

https://stackabuse.com/python-generators/ has more details on return. See also P698 [[cookbook]]

python exec keyword ^ eval() function

Based on my experience in other scripting languages, I don’t think these are widely used in everyday scripting, but veterans must know.

– exec is a keyword[1] and not a function so can’t return a value.
– eval() is a function so returns a value.

Both accept a string “argument”, or (more advanced) code object from compile() function.
[1] Other keywords include while/for/lamda and raise/try/except — note “except” is like “catch”

See P90 [[py ref]]

dict – at the heart of python, perl and javascript OO

Q: why is dict so central to python OO?
%%A: method object — python methods are easily modeled as code objects (just like fields), therefore more dict-friendly. Recall the regular dict stores key and value objects.
%%A: ultra-virtual — both field and method lookup is a runtime dict search. Note java/c++ does runtime search only for methods, not fields.
%%A: namespace — is a kind of dict. Namespaces are central to inheritance (and presumably overriding)

dir(), vars(),__dict__  … often returns something like a dictionary.

Note the dict in this context is a kind of “kernel-dict”. On the other hand, the user-defined dict is probably a “userland dict”. They share the same syntax. Implementation wise, they may differ. kernel-dict is probably memory efficient and in implemented in C. These various types of dict probably implement a consistent “interface”.

When people say an “import sys” creates an object named “sys”, it probably means a dict created in memory.

In Perl OO, a hash is at the heart of every class. This is the “userland dict”.

In javascript, “associative array” is closely related to (if not the basis of) user-defined objects

every python thingy is an object with a type

I like this particular insight in [[py ref]] (i.e. “python essential reference”) though I can’t easily verify it in code — Every thingy in a python program is an object with a type. (There might possibly be some exceptions but this is a rather good starting point.)

* An object is defined, same way in C++, as a storage_location, with an (obviously) immutable address.
** content could be mutable.
** Even an int is an object, unlike java

* Every storage_location (Say ObjectA) has a specific data_type. Required by python interpreter, java compiler or c++ compiler… That specific data_type is technically a Type OBJECT.
* [[peref]] made it clear that Type tells us the features supported by this storage_location. Features include methods and fields…

The thing about the python language is that Everything seems to be an object with a type. Here are some special objects and their types
$ a free function. This OBJECT has a __name__ attribute. Try qq/dir(len)/
$ a bound instance method. Such an OBJECT has an im_self attribute.
$ a bound class method. Such an OBJECT has a __name__ attribute
$ (There’s no such thing as a bound static method — nothing to bind.)
$ a regular instance method. Such an OBJECT has a __self__ attribute pointing to the host OBJECT
$ a regular class method. Such an OBJECT has a __name__ attribute
$ a regular static method. Such an OBJECT has a __name__ attribute

Warning — these special objects each have a distinct data_type but not by type()! It’s worthwhile to step back and recognize these special OBJECTS as separate data_types with their own features. The “feature set” difference between any 2 data_types is reflected in various ways (such as the attributes…)

* the built-in type() function is an almost-perfect way to see the data_type of a storage_location, but I don’t think it reveals all the details I need in a foolproof way.
* The “type” OBJECT is comparable to type_info objects and java “class” objects.
* What’s the type of the “type” OBJECTS? We don’t need to know.