opaque c++trouble-shooting: bustFE streamIn

This is a good illustration of fairly common opaque c++ problems, the most dreadful/terrifying species of developer nightmares.

The error seems to be somewhat consistent but not quite.

Reproducing it in dev enviroment was a first milestone. Adding debug prints proved helpful in this case, but sometimes it would take too long.

In the end, I needed a good hypothesis, before we could set out to verify it.

     81     bool SwapBustAction::streamInImpl(ETSFlowArchive& ar)
     82     { // non-virtual
     83       if (exchSliceId.empty())
     84       {
     85         ar >> exchSliceId;
     86       }
    104     }
    105     void SwapBustAction::streamOutImpl(ETSFlowArchive& ar) const
    106     { // non-virtual
    107       if (exchSliceId.size())
    108       {
    109         ar << exchSliceId;
    110       }

When we save the flow element to file, we write out the exchSliceId field conditionally as on Line 107, but when we restore the same flow element from file, the function looks for this exchSliceId field unconditionally as on Line 85. When the function can’t find this field in the file, it hits BufferUnderflow and aborts the restore of entire flow chain.

The serialization file uses field delimiters between the exchSliceId field and the next field which could be a map. When the exchSliceId field is missing, and the map is present, the runtime would notice an unusable data item. It throws a runtime exception in the form of assertion errors.

The “unconditional” restore of exchSliceId is the bug. We need to check the exchSliceId field is present in the file, before reading it.

In my testing, I only had a test case where exchSliceId was present. Insufficient testing.

#include order among my own headers

Each header has an “amount” of dependencies, each being another included header.

Some header files are very short and have only “‘system” #includes. They are known to be “light” headers with light dependencies.

Majority of my headers have more than two non-system #includes. Consider class declaration headers. The more custom #includes it has, the more dependencies it has.

My #include order is from heavy to light — Last to include are C headers. Above them are the c++ system headers. Among my custom header files, I include the simpler, lighter ones later. This way, if a heavy header is missing some #include (a coding error), it would (more likely) get caught by compiler.

A common practice is to put lots of “frequently used headers” in a util.h, and include it on top of every file. I don’t like it.

template specialization based on NDTTP=true

We know it’s possible to specialize a template for a concrete type like int or std::string, but I didn’t know that It’s also possible to

… specialize a (class or function) template for a particular compile-time const value (like “true”) of a NDTTP (like “bool flag”)

  • On [[Alexandrescu]] Page xii , Scott Meyers showed an elegant example of specializing for “true”. Note “true” is a value, not a data type !
  • P 34 has a longer example.

Note on reading TMP code — the template specialization syntax is clumsy and can add noise to the signal. Better ignore the syntax rules for now to focus on the gist.

simple Windows c++(python)set-up #Intellj is better4java

See also easiest windows GCC installer #c++17

Git-Bash + StrawberryPerl is the best combo for me on Windows. I put g++ commands in bash scripts to automate my GCC build and test. Note my typical project has at most 20 source files.

Git-Bash + StrawberryPerl + Notepad++ is better for me than any c++ IDE like Eclipse CDT (4 installations), Bloodshed (4), CodeBlocks (1), NetBeans…

  • I don’t need code completion..
  • I don’t need jump into a type definition.
  • I don’t need my debugger to be visual. StrawberryPerl includes gdb
  • I use Notepad++ for text search across hundreds of files
  • I use Notepad++ for search/replace
  • I use diff and git-diff to see code changes
  • I used github for version history

I’m such a die-hard fan of command line that the only GUI development tool I use is notepad++ and it’s optional. Since 2015, my Linux c++ code editor has been VIM, which I used on large codebases consisting of 1000+ source files.

For about 2 years EclipseCDT was my default choice, then the simpler Bloodshed became my default choice as it is simpler than EclipseCDT. However they are still over-complicated. I didn’t have a favorite until 2017, when I discovered Notepad++/Git-Bash/StrawberryPerl

Q: why do I use Eclipse for Java but not for c++?
A: Eclipse is more problematic, more messy for c++ than for java. The benefits (convenience, automation..) is offset by the high TCO (total cost of ownership). For example, java debugger, java code navigation, java renaming/refactor all work better than c++.

Q: how about MS VisualStudio?
A: I managed a large c++ codebase on MSVS 2015 in 2015-2016, about a year+. Too complicated, worse than MSVS for c#. I would never prefer it over my command line set-up, if given a choice.

By the way, Git-Bash works well with a standard python installation, but I do recommend the tweak at https://stackoverflow.com/questions/32597209/python-not-working-in-the-command-line-of-git-bash

m_activity IV story@investigation skill #RTS

symptom — array filled up beyond limit of 1024 .. Undefined Behavior, often crashing entire process, but no guarantees.

This “m_activity” array is a process-wide singleton, holding ALL active tcp/udp socket descriptors (each an int id). Every time we close a socket we are supposed to remove its id from the array, and shift all “upper” elements down.

Sometimes a connection can drop unexpectedly.

What’s this array for? We iterate this array frequently for timer events. We don’t but could use select() to monitor a bunch of sockets.

I found that when we reconnect after an unexpected disruption, we were not following a proper sequence
* check if connected
* if disconnected, then remove the socket id from the array
* connect
* upon success, append the new socket id to the array

Due to bug, in a unstable period, usually at start of day, we could drop connection and reconnect many times, and fill up this array within the first hour (often within minutes).

Hard to reproduce.

non-local static composite object: pitfalls

Google style guide and this MSDN article both warn against non-local static objects with a ctor/dtor.

  • (MSDN) construction order is tricky, and not thread-safe
  • dtor order is tricky. Some code might access an object after destruction 😦
  • (MSDN) regular access is also thread-unsafe, unless immutable, for any static object.
  • I feel any static object including static fields and local statics can increase the risk of memory leak since they are destructed very very late. What if they hold a growing container?

I feel stateless global objects are safe, but perhaps they don’t need to exist.

c++11,sockets,IPC in QnA IV #AshS

[18] Update — This has become one of the more  valuable posts in my c++ blog. It cuts down the amount of info overload and learning curve steepness by half then again by half.

Hi Ashish,

I now have a little theory on the relative importance of several c++ tech skills in a job candidate. I feel all of the skills below are considered “secondary importance” to most of the (15 – 30) interviewers I have met. These skill are widely used in projects, but if we say “no experience” in them, BUT demonstrate strength in core c++ then we win respect.

[e=c++ecosystem topics]
[c=core c++topics]

—#2 [e] c++threading —— many job descriptions say they use threading but it’s probably very simple threading. Threading is such complex topic that only the well-reviewed proven designs are safe to use. If a project team decides to invent their concurrency design ( I invented once ) , and have any non-trivial deviation from the proven designs, they may unknowingly introduce bugs that may not surface for years. So the actual threading usage in any project is always minimal, simple and isolated to a very small number of files.

The fastest systems tend to have nothing shared-mutable, so multi-threading presents zero risk and requires no design. No locking no CAS. Essentially single-threaded.

However, we don’t have the luxury to say “limited experience” in threading. I have a fair amount of concurrency design experience across languages and thread libraries (pthreads, ObjectSpace, c#), using locks+conditional variables+CAS as building blocks, but the c++ thread library used in another team is probably different. I used to say I used boost::thread a lot but it back-fired.

—#99 [c] OO design patterns ——- never required in coding tests. In fact, basically no OO features show up in short coding tests. 90% of short coding tests are about GettingThingsDone, using algorithms and data structures.

Design patterns do show up in QnA interviews but usually not difficult. I usually stick to a few familiar ones — Singleton, Factory, TemplateMethod, ProducerConsumer. I’m no expert with any of these but I feel very few candidates are. I feel most interviewers have a reasonable expectation. I stay away from complex patterns like Visitor and Bridge.

—#1 [c] c++11 —— is not yet widely used. Many financial jobs I applied have old codebases they don’t want to upgrade. Most of the c++11 features we use as developers are optional convenience features, Some features are fundamental (decltype, constexpr …) yet treated as simple convenience features. I feel move semantics and r-value references are fairly deep but these are really advanced features for library writers, not application developers. Beside shared_ptr, C++11 features seldom affect system design. I have “limited project experience using c++11“.

Interviewers often drill into move semantics. If I admit ignorance I could lose. Therefore, I’m actively pursuing the thin->thick->thin learning path.

—#5 [e] Boost —— is not really widely used. 70% of the financial companies I tried don’t use anything beyond shared_ptr. Most of the boost features are considered optional and high-level, rather than fundamental. If I tell them I only used shared_ptr and no other boost library, they will usually judge me on other fronts, without deducting points. I used many boost libraries but only understand shared_ptr better.

Q: Are there job candidates who are strong with some boost feature (beside shared_ptr) but weak on core c++?
A: I have not seen any.

Q: Are there programmers strong on core c++ but unfamiliar with boost?
A: I have seen many

—#4 [c] templates —— another advanced feature primarily for library developers. Really deep and complex. App developers don’t really need this level of wizardry. A few experienced c++ guys told me their teams each has a team member using fancy template meta-programing techniques that no one could understand or maintain. None of my interviewers went very deep on this topic. I have only limited meta-programming experience, but I will focus on 2 common template techniques — CRTP and SFINAE and try to build a good understanding (thin->thick->thin)

—#6 [c] IKM —— is far less important than we thought. I know many people who score very high but fail the tech interview badly. On the other hand, I believe some candidates score mediocre but impress the interviewers when they come on-site.

—#3 [e] linux system programming like sockets —— is relevant to low-latency firms. I think half the c++ finance firms are. Call them infrastructure team, or engineering team, or back-end team. I just happened to apply for too many of this kind. To them, socket knowledge is essential, but to the “mainstream” c++ teams, socket is non-essential. I have “modest socket experience” but could impress some interviewers.

(Actually socket api is not a c/c++ language feature, but a system library. Compared to STL, socket library has narrower usage.)

Messaging is a similar skill like sockets. There are many specific products so we don’t need to be familiar with each one.

InterProcessCommunication  (Shared memory, pipes… ) is a similar “C” skill to sockets but less widely required. I usually say my system uses sockets, database or flat files for IPC, though shared memory is probably faster. I hope interviewers don’t mind that. If there’s some IPC code in their system, it’s likely isolated and encapsulated (even more encapsulated than threading code), so hopefully most developers don’t need to touch it

Memory Manager is another specialized skill just like IPC. Always isolated in some low-level module in a library, and seldom touched. A job spec may require that but actually few people have experience and never a real show stopper.

If a role requires heavy network programming (heavy IPC or messaging development is less common) but we have limited experience, then it can be a show-stopper.

— #99[c] A) instrumentation and B) advanced build tools for memory, dependency etc … are big paper tigers. Can be open-source or commercial. Often useful to GTD but some interviewers ask about these tools to check your real-world experience. 99% of c++programmers have superficial knowledge only because we don’t need to understand the internals of GPS to use it in a car.

Note basic build-chain knowledge is necessary for GTD but relevant “basic” and seldom quizzed.

These topics are never tested in short coding questions, and seldom in longer coding questions.

What c++ knowledge is valued more highly? See ##20 C++territories for QQ IV

avoid unsigned-int type if you ever test for positiveness

Beware in coding tests…

Even though unsigend types are self-documenting, Google style guide advises against unsigned int types as they are error-prone.

When I use size_t as a loop control variable, I need to avoid decrementing it below 0, which is something like undefined behavior for me.

size_t sz=myMap.size();
for(int i=0; i<sz-N; ++i) //sz-N can overflow to a very large number!

[[21st century c]] – unusually practical update on C

a sub-chapter on string processing in the new world
a sub-chapter on robust Macros in the new world
a sub-chapter on function to report errors in the new world
a full chapter on pointer in the new world
a full chapter on C api to be consumed by other languages like python
a full chapter on struct syntax improvement to support returning multiple values + status code
a sub-chapter on pthreads
a sub-chapter on [[numerical recipes in C]] and the implementation – the GNU scientific library
a sub-chapter on SQLite
briefly on valgrind
function returning 2 values + status code
many innovative macro tricks
innovative and concise explanation of auto(i.e. stack) vs static vs malloc memory

Note a sub-chapter is very short, in a concise book. A remarkably practical update on C, somewhat similar to [[safe c++]]. Content isn’t theoretical, and not so relevant to interviews, but relevant to real projects and GTD

quiz: add 0 to return value: gotcha # lval

Given calcTax(), I sometimes convert the returned value — simply add 0. I used to think this is completely harmless and has no effect on the program.

Now I don’t feel that way. calcTax() could return by reference, so we can take the address of this “expression” like


In other words, calcTax() is an lvalue expression. If you add 0, you get a rvalue expression! Basically a temporary object. You can’t take its address.

global variables in (financial) rapid development

In my financial projects, I realized people do worse things than global vars – copy/paste, super long method signatures, poor null checks, returning nulls…

My professional perl programs successfully use lots of globals. Be practical. Get Real.

C++ lets you easily use tons of globals, so I’d say “avoid it, but if no time then use it.”

In java, someone might want to create a GlobalVars.java to hold public static vars. This is actually better than scattering lots of mutable[1] non-private static variables into multiple classes. So before you dismiss someone for using GlobalVars.java, look for those mutable shared static vars.

For a bit of access control, you can use an abcSingleton.java class to hold such shared variables — Some classes will not get a reference to this singleton, so they can’t access the vars.

[1] Most final variables are mutable in java, unless the class is designed to be immutable.

pair up new/delete — usually impractical@@

Update — Best way to manage the delete is probably RAII. There’s a post on that

A simple [2] technique — make sure for every “new” operation at run time [1], there’s exactly one “delete” operation. RAII is a similar solution.

[1] One challenge is how to ensure this runtime rule using compile-time coding rules. Some runtime rules are enforceable at compile time. Consider const, final, finally, synchronized, private, abstract keywords

Q: Analyzing source code, you might see multiple deletes in a branch/loop, but is the delete operation ever implicit (ie absent in source code)?
A: Never. “new” only gives you a ptr. You can pass that ptr around or assign the pointee address to a field, but c++ never automatically, implicitly calls delete on a ptr.

[2] Many situations are too complex for this technique. As pointed out on [[nitty gritty]] P194, sometimes delete is placed into another function than the “birth place” function.

c++ in finance is mostly C code@@

My friend told me a lot of c++ apps in finance is 80% C code. How about yours?
He said most of today’s c++ systems were created in C days. When c++ became mature, people didn’t decide to create brand-new modules in c++ (contrast java). Instead, they decided to evolve slowly from c to c++. Result is, they still write mostly in C, and not c++.
Java is a different story. People did feel the justification to create brand-new apps in java, perhaps because there’s no “evolutionary” approach. They either stick to c or make a clean break using java.
I still wonder why people didn’t create new components in c++. C++ could coexist with C very, very easily. Any insight?