edit1file]big python^c++ prod system

Q1: suppose you work in a big, complex system with 1000 source files, all in python, and you know a change to a single file will only affect one module, not a core module. You have tested it + ran a 60-minute automated unit test suit. You didn’t run a prolonged integration test that’s part of the department-level full release. Would you and approving managers have the confidence to release this single python file?
A: yes

Q2: change “python” to c++ (or java or c#). You already followed the routine to build your change into a dynamic library, tested it thoroughly and ran unit test suite but not full integration test. Do you feel safe to release this library?
A: no.

Assumption: the automated tests were reasonably well written. I never worked in a team with a measured test coverage. I would guess 50% is too high and often impractical. Even with high measured test coverage, the risk of bug is roughly the same. I never believe higher unit test coverage is a vaccination. Diminishing return. Low marginal benefit.

Why the difference between Q1 and Q2?

One reason — the source file is compiled into a library (or a jar), along with many other source files. This library is now a big component of the system, rather than one of 1000 python files. The managers will see a library change in c++ (or java) vs a single-file change in python.

Q3: what if the change is to a single shell script, used for start/stop the system?
A: yes. Manager can see the impact is small and isolated. The unit of release is clearly a single file, not a library.

Q4: what if the change is to a stored proc? You have tested it and run full unit test suit but not a full integration test. Will you release this single stored proc?
A: yes. One reason is transparency of the change. Managers can understand this is an isolated change, rather than a library change as in the c++ case.

How do managers (and anyone except yourself) actually visualize the amount of code change?

  • With python, it’s a single file so they can use “diff”.
  • With stored proc, it’s a single proc. In the source control, they can diff this single proc
  • with c++ or java, the unit of release is a library. What if in this new build, beside your change there’s some other change , included by accident? You can’t diff a binary 😦

So I feel transparency is the first reason. Transparency of the change gives everyone (not just yourself) confidence about the size/scope of this change.

Second reason is isolation. I feel a compiled language (esp. c++) is more “fragile” and the binary modules more “coupled” and inter-dependent. When you change one source file and release it in a new library build, it could lead to subtle, intermittent concurrency issues or memory leaks in another module, outside your library. Even if you as the author sees evidence that this won’t happen, other people have seen innocent one-line changes giving rise to bugs, so they have reason to worry.

  • All 1000 files (in compiled form) runs in one process for a c++ or java system.
  • A stored proc change could affect DB performance, but it’s easy to verify. A stored proc won’t introduce subtle problems in an unrelated module.
  • A top-level python script runs in its own process. A python module runs in the host process of the top-level script, but a typical top-level script will include just a few custom modules, not 1000 modules. Much better isolation at run time.

There might be python systems where the main script actually runs in a process with hundreds of custom modules (not counting the standard library modules). I have not seen it.

java GC frequency and duration

Actually, GC overhead is more important than GC frequency or duration, except in low latency systems. This blog has many posts on overhead.


Could be Every 10 sec , as documented in my blog

–stop-the-world duration: (Concurrent collection probably doesn’t worry us as much.)

100 msec duration is probably good enough for most apps but too long for latency sensitive apps, according to my blog.


c++iv: Jump#2

Mostly QQ type of questions. I feel i may have to give up on some of the very low level (perf optimization) topics. I feel java and c# interviews are not so low.

Q: What data type would you use for the tasks in a thread pool??
(I find this question too advanced. c++11 offers Futures…)

Q: After malloc(), how do you cast the pointer to MyClass* ? Do you call the ctor? How?
(This is asked again by Alex of DRW)
A: placement-new?

Inter-thread communications in thread pool – how does it work?
Thread pool — Your resume mentioned your home-made thread pool? How?
Boost::any, boost::bind, boost::function
CPU cache – how do you use it to improve performance? Any specific techniques?
Stack size – who controls it? at Compile time or run time?
Stack overflow – who can detect it and print an error msg? JVM can do it but what if there’s no VM?
Shared ptr – how is it implemented?
Scoped lock – what is it, why use it?
Your bash shell customizations as a cpp developer?
$LD_LIBRARY_PATH — what is it?

dotnet unmanaged RESOURCES, learning notes

(Many open questions below are a bit arcane and may not be relevant to IV or projects.)

I feel “managed” means “automatically-released”. In that case, most dotnet Objects qualify as managed “Resources”. Any time you need Dispose() in your CLASS, that’s a telltale sign of unmanaged resources.

“Resource” is a rather abstract term, best understood by example[1]. In my mind, a resource is some runtime object(s) needed by our App, something constructed and configured. Like a natural resource it is scarce, shared and rationed. Like a library book, there’s usually an acquisition “protocol”, and a return protocol.

[1] As a crude example, what’s a city and what’s not a city? Best learn by example.

http://stackoverflow.com/questions/13786570/determine-managed-vs-unmanaged-resources has some interesting comments.

–Unmanaged resource and … IntPtr + handle ?
What’s an IntPtr? Basically a struct that represents a native void pointer (or a native handle) —
http://stackoverflow.com/questions/1148177/just-what-is-an-intptr-exactly Many unmanaged “resources” are accessed via IntPtr.

–Unmanaged resource and unsafe code?

–Unmanaged resoruce and p/invoke?

–Unmanaged resource and … win32 + COM?

–some managed resources
filestream is a MR. It contains a native file handle, which is UR. It’s IntPtr, but not a integer-based file descriptor as in C/python. See MSDN.

A socket object is a MR. It contains a socket handle, which is UR and IntPtr

A DB connection is probably UR.

%% c# brain bench

Q: what modifiers are equivalent to static when defining a class? Perhaps a invalid question
Q: build a comparison lambda using expression tree. Will we use Binary.. or Boolean….?
Q: foreach on my custom class … will call which methods — MoveNext/Current or GetEnumerator()
Q: calling ToUpper() on a dynamic variable, where ToUpper is both an instance method and an extension method?
%%A: instance meth takes precedence. Ext method may not be found at all. But there are workarounds.

Q: benefit of compiling an expression tree?
Q: can I assign twice to an out parameter within Method1? I think there’s no restriction.
Q: how to p/invoke on a win32 DLL
Q: execution order between base ctor, my ctor, static ctor of a type used in my ctor
Q: GetType().ToString() on System.NotSupportedException shows “System….” ?
Q: linq group-by

Q: can Main() method take 0 parameter?
AA: yes

Q: in Main(), can I initialize a variable before passing it to a out parameter of Method1?
%%A: yes optional. Pre-initialize is compulsory for ref-params and optional for out-params. Consequently, the method is requried to
populate out-param (since pre-initlialize was possibly skipped) but not required for ref-params.

Q: which is a managed resource – file handle, memory stream, socket, windows handle, db conn?
A: socket

Q: throw; vs throw ex;
A: throw; is better — more complete stack trace. see stack overflow

Q: can you pass an anonymous delegate into Thread ctor?
A: quite common

Q: static readonly field – set in static ctor or declaration?
A: both

use STL Map to emulate a Set(as JDK does

Q1: is it practical to use a STL Map to emulate the Set class-template, at comparable efficiency?

We know a Map stores PAIRs in a red-black tree. Can we just put an empty thing in the PAIR.second field?

Q2: how small can the “PAIR.second” footprint be? 0 byte? 1 byte?
%%A: i don’t think it can be 0 byte. sizeof(PAIR) is known when you specialize the PAIR template with T1 and T2. sizeof(PAIR) depends on sizeof(T1) + sizeof(T2). Compiler needs to know sizeof(PAIR) in order to allocate memory to a new PAIR object.

Q3: is there any type T2 such that sizeof(T2) == 0?
%%A: I don’t know any.

Q4: we know (confirmed) java HashSet is implemented using HashMap physically. Is that different from C++?

Q5: java TreeSet.java source code shows a physical implementation using a map. Is that different from STL?