##compiler-enforced documentation techniques #xLang

I’m a big fan of non-comment source documentation such as

  1. assert
  2. C++ enum
  3. typedef — “typedef int FibNum” as source documentation
  4. java annotations
  5. final/override (c++11), sealed, and const keywords

keyboard shortcuts to create in .bashrc

In RTS, I developed a personal “system”:

  • q(nx+) is alias to edit the nx.sh scirpt
  • q(nx-) is alias to run nx.sh
  • q(nx) is an alias to chdir into the nx project,
  • q($nx) variable has the nx project directory path
  • The above two shortcuts are set up in my very useful shell function exa()
  • I leave q(nx=) unassigned


non-local static class-instance: pitfalls

Google style guide and this MSDN article both warn against non-local static objects with a ctor/dtor.

  • (MSDN) construction order is tricky, and not thread-safe
  • dtor order is tricky. Some code might access an object after destruction 😦
  • (MSDN) regular access is also thread-unsafe, unless immutable, for any static object.
  • I feel any static object including static fields and local statics can increase the risk of memory leak since they are destructed very very late. What if they hold a growing container?

I feel stateless global objects are safe, but perhaps they don’t need to exist.

edit 1 file in big python^c++ production system #XR

Q1: suppose you work in a big, complex system with 1000 source files, all in python, and you know a change to a single file will only affect one module, not a core module. You have tested it + ran a 60-minute automated unit test suit. You didn’t run a prolonged integration test that’s part of the department-level full release. Would you and approving managers have the confidence to release this single python file?
A: yes

Q2: change “python” to c++ (or java or c#). You already followed the routine to build your change into a dynamic library, tested it thoroughly and ran unit test suite but not full integration test. Do you feel safe to release this library?
A: no.

Assumption: the automated tests were reasonably well written. I never worked in a team with a measured test coverage. I would guess 50% is too high and often impractical. Even with high measured test coverage, the risk of bug is roughly the same. I never believe higher unit test coverage is a vaccination. Diminishing return. Low marginal benefit.

Why the difference between Q1 and Q2?

One reason — the source file is compiled into a library (or a jar), along with many other source files. This library is now a big component of the system, rather than one of 1000 python files. The managers will see a library change in c++ (or java) vs a single-file change in python.

Q3: what if the change is to a single shell script, used for start/stop the system?
A: yes. Manager can see the impact is small and isolated. The unit of release is clearly a single file, not a library.

Q4: what if the change is to a stored proc? You have tested it and run full unit test suit but not a full integration test. Will you release this single stored proc?
A: yes. One reason is transparency of the change. Managers can understand this is an isolated change, rather than a library change as in the c++ case.

How do managers (and anyone except yourself) actually visualize the amount of code change?

  • With python, it’s a single file so they can use “diff”.
  • With stored proc, it’s a single proc. In the source control, they can diff this single proc. Unit of release is traditionally a single proc.
  • with c++ or java, the unit of release is a library. What if in this new build, beside your change there’s some other change , included by accident? You can’t diff a binary 😦

So I feel transparency is the first reason. Transparency of the change gives everyone (not just yourself) confidence about the size/scope of this change.

Second reason is isolation. I feel a compiled language (esp. c++) is more “fragile” and the binary modules more “coupled” and inter-dependent. When you change one source file and release it in a new library build, it could lead to subtle, intermittent concurrency issues or memory leaks in another module, outside your library. Even if you as the author sees evidence that this won’t happen, other people have seen innocent one-line changes giving rise to bugs, so they have reason to worry.

  • All 1000 files (in compiled form) runs in one process for a c++ or java system.
  • A stored proc change could affect DB performance, but it’s easy to verify. A stored proc won’t introduce subtle problems in an unrelated module.
  • A top-level python script runs in its own process. A python module runs in the host process of the top-level script, but a typical top-level script will include just a few custom modules, not 1000 modules. Much better isolation at run time.

There might be python systems where the main script actually runs in a process with hundreds of custom modules (not counting the standard library modules). I have not seen it.

java collection in a static field/singleton can leak memory

Beware of collections in static fields or singletons. By default they are unbounded, so by default they pose a risk of unexpected growth leading to memory leak.

Solution — Either soft or weak reference could help.

Q: why is soft reference said to support memory/capacity-sensitive cache?
A: only when memory capacity becomes a problem, will this soft reference show its value.

Q: is WeakHashMap designed for this purpose?
A: not an ideal solution. See other posts about the typical usage of WHM

Q: what if I make an unmodifiable facade?
A: you would need to ensure no one has the original, read-write interface. So if everyone uses the unmodifiable facade, then it should work.

##c++good habits like java q[final]

  • Good habit — internal linkage
    • ‘static’ keyword on file-level static variables
    • anonymous namespace
  • if (container.end() ==itrFromLower_bound) {…}
  • use typedef to improve readability of nested containers
  • typedef — “typedef int FibNum” as source documentation
  • initialize local variables upon declaration.
  • use ‘const/constexpr’ whenever it makes sense. The c++11 constexpr is even better.
  • [c++11] – use ‘override’ or ‘final’ keywords when overriding inherited virtual methods. Compiler will break if there’s nothing to override.

goto: justifications

Best-known use case: break out of deeply nested for/while blocks.

  • Alternative — extract into a function and use early return instead of goto. If you need “goto: CleanUp”, then you can extract the cleanup block into a function returning the same data type and replace the goto with “return cleanup()”
  • Alternative — extract the cleanup block into a void function and replace the goto with a call to cleanup()
  • Alternative (easiest) — exit or throw exception

Sometimes none of the alternatives are easy. To refactor the code to avoid goto requires too much testing, approval and release. The code is in a critical module in production. Laser surgery is preferred — introduce goto.


retreat to raw ptr from smart ptr ASAP

Raw ptr is in the fabric of C. Raw pointers interact/integrate with countless parts of the language in complex ways. Smart pointers are advertised as drop-in replacements but that advertisement may not cover all of those “interactions”:

  • double ptr
  • new/delete/free
  • ptr/ref layering
  • ptr to function
  • ptr to field
  • 2D array
  • array of ptr
  • ptr arithmetics
  • compare to NULL
  • ptr to const — smart ptr to const should be fine
  • “this” ptr
  • factory returning ptr — can it return a smart ptr?
  • address of ptr object

Personal suggestion (unconventional) — stick to the known best practices of smart ptr (such as storing them in containers). In all other situations, do not treat them as drop-in replacements but retrieve and use the raw ptr.

what to test #some brief observations

For a whitebox coder self-test, it’s tough to come up with real possible corner cases. It takes insight about the many systems that make up the real world. I would argue this type is the #1 most valuable regression tests.

If you see a big list of short tests, then some of them are trivial and simply take up space and waste developer time. So I’m a big fan of the elaborate testing frameworks.

Some blackbox tester once challenged a friend of mine to design test cases for a elevator. Presumably inputs from buttons on each level + in-lift buttons.

%%jargon — describing c++ templates

The official terminology describing class templates is clumsy and inconsistent with java/c#. Here’s my own jargon. Most of the words (specialize, instantiate) already have a technical meaning, so I have to pick new words like “concretize” or “generic”

Rule #1) The last Word is key. Based on [[C++FAQ]]
– a class-template — is a template not a class.
– a concretized template-class — is a class not a template

“concretizing” a template… Officially terminology is “instantiating”

A “concretized class” = a “template class”. Don’t’ confuse it with — “concrete” class means non-abstract

A “unconcretized / generic class template” = a “class template”.
(Note I won’t say “unconcretized class”as it’s self-contradictory.)

A “non-template” class is a regular class without any template.

Note concretizing = instantiating, completely different from specializing!

“dummy type name” is the T in the generic vector

“type-argument”, NOT “parameter“, refers to an actual, concrete type like the int in vector