opaque c++trouble-shooting: bustFE streamIn

This is a good illustration of fairly common opaque c++ problems, the most dreadful/terrifying species of developer nightmares.

The error seems to be somewhat consistent but not quite.

Reproducing it in dev enviroment was a first milestone. Adding debug prints proved helpful in this case, but sometimes it would take too long.

In the end, I needed a good hypothesis, before we could set out to verify it.

     81     bool SwapBustAction::streamInImpl(ETSFlowArchive& ar)
     82     { // non-virtual
     83       if (exchSliceId.empty())
     84       {
     85         ar >> exchSliceId;
     86       }
    104     }
    105     void SwapBustAction::streamOutImpl(ETSFlowArchive& ar) const
    106     { // non-virtual
    107       if (exchSliceId.size())
    108       {
    109         ar << exchSliceId;
    110       }

When we save the flow element to file, we write out the exchSliceId field conditionally as on Line 107, but when we restore the same flow element from file, the function looks for this exchSliceId field unconditionally as on Line 85. When the function can’t find this field in the file, it hits BufferUnderflow and aborts the restore of entire flow chain.

The serialization file uses field delimiters between the exchSliceId field and the next field which could be a map. When the exchSliceId field is missing, and the map is present, the runtime would notice an unusable data item. It throws a runtime exception in the form of assertion errors.

The “unconditional” restore of exchSliceId is the bug. We need to check the exchSliceId field is present in the file, before reading it.

In my testing, I only had a test case where exchSliceId was present. Insufficient testing.

#include order among my own headers

Each header has an “amount” of dependencies, each being another included header.

Some header files are very short and have only “‘system” #includes. They are known to be “light” headers with light dependencies.

Majority of my headers have more than two non-system #includes. Consider class declaration headers. The more custom #includes it has, the more dependencies it has.

My #include order is from heavy to light — Last to include are C headers. Above them are the c++ system headers. Among my custom header files, I include the simpler, lighter ones later. This way, if a heavy header is missing some #include (a coding error), it would (more likely) get caught by compiler.

A common practice is to put lots of “frequently used headers” in a util.h, and include it on top of every file. I don’t like it.

template specialization based on NDTTP=true

We know it’s possible to specialize a template for a concrete type like int or std::string, but I didn’t know that It’s also possible to

… specialize a (class or function) template for a particular compile-time const value (like “true”) of a NDTTP (like “bool flag”)

  • On [[Alexandrescu]] Page xii , Scott Meyers showed an elegant example of specializing for “true”. Note “true” is a value, not a data type !
  • P 34 has a longer example.

Note on reading TMP code — the template specialization syntax is clumsy and can add noise to the signal. Better ignore the syntax rules for now to focus on the gist.

simple Windows c++(python)set-up #Intellj is better4java

See also easiest windows GCC installer #c++17

Git-Bash + StrawberryPerl is the best combo for me on Windows. I put g++ commands in bash scripts to automate my GCC build and test. Note my typical project has at most 20 source files.

Git-Bash + StrawberryPerl + Notepad++ is better for me than any c++ IDE like Eclipse CDT (4 installations), Bloodshed (4), CodeBlocks (1), NetBeans…

  • I don’t need code completion..
  • I don’t need jump into a type definition.
  • I don’t need my debugger to be visual. StrawberryPerl includes gdb
  • I use Notepad++ for text search across hundreds of files
  • I use Notepad++ for search/replace
  • I use diff and git-diff to see code changes
  • I used github for version history

I’m such a die-hard fan of command line that the only GUI development tool I use is notepad++ and it’s optional. Since 2015, my Linux c++ code editor has been VIM, which I used on large codebases consisting of 1000+ source files.

For about 2 years EclipseCDT was my default choice, then the simpler Bloodshed became my default choice as it is simpler than EclipseCDT. However they are still over-complicated. I didn’t have a favorite until 2017, when I discovered Notepad++/Git-Bash/StrawberryPerl

Q: why do I use Eclipse for Java but not for c++?
A: Eclipse is more problematic, more messy for c++ than for java. The benefits (convenience, automation..) is offset by the high TCO (total cost of ownership). For example, java debugger, java code navigation, java renaming/refactor all work better than c++.

Q: how about MS VisualStudio?
A: I managed a large c++ codebase on MSVS 2015 in 2015-2016, about a year+. Too complicated, worse than MSVS for c#. I would never prefer it over my command line set-up, if given a choice.

By the way, Git-Bash works well with a standard python installation, but I do recommend the tweak at https://stackoverflow.com/questions/32597209/python-not-working-in-the-command-line-of-git-bash

m_activity IV story@investigation skill #RTS

symptom — array filled up beyond limit of 1024 .. Undefined Behavior, often crashing entire process, but no guarantees.

This “m_activity” array is a process-wide singleton, holding ALL active tcp/udp socket descriptors (each an int id). Every time we close a socket we are supposed to remove its id from the array, and shift all “upper” elements down.

Sometimes a connection can drop unexpectedly.

What’s this array for? We iterate this array frequently for timer events. We don’t but could use select() to monitor a bunch of sockets.

I found that when we reconnect after an unexpected disruption, we were not following a proper sequence
* check if connected
* if disconnected, then remove the socket id from the array
* connect
* upon success, append the new socket id to the array

Due to bug, in a unstable period, usually at start of day, we could drop connection and reconnect many times, and fill up this array within the first hour (often within minutes).

Hard to reproduce.

non-local static composite object: pitfalls

Google style guide and this MSDN article both warn against non-local static objects with a ctor/dtor.

  • (MSDN) construction order is tricky, and not thread-safe
  • dtor order is tricky. Some code might access an object after destruction 😦
  • (MSDN) regular access is also thread-unsafe, unless immutable, for any static object.
  • I feel any static object including static fields and local statics can increase the risk of memory leak since they are destructed very very late. What if they hold a growing container?

I feel stateless global objects are safe, but perhaps they don’t need to exist.

avoid unsigned-int type if you ever test for positiveness

Beware in coding tests…

Even though unsigend types are self-documenting, Google style guide advises against unsigned int types as they are error-prone.

When I use size_t as a loop control variable, I need to avoid decrementing it below 0, which is something like undefined behavior for me.

size_t sz=myMap.size();
for(int i=0; i<sz-N; ++i) //sz-N can overflow to a very large number!

[[21st century c]] – unusually practical update on C

a sub-chapter on string processing in the new world
a sub-chapter on robust Macros in the new world
a sub-chapter on function to report errors in the new world
a full chapter on pointer in the new world
a full chapter on C api to be consumed by other languages like python
a full chapter on struct syntax improvement to support returning multiple values + status code
a sub-chapter on pthreads
a sub-chapter on [[numerical recipes in C]] and the implementation – the GNU scientific library
a sub-chapter on SQLite
briefly on valgrind
function returning 2 values + status code
many innovative macro tricks
innovative and concise explanation of auto(i.e. stack) vs static vs malloc memory

Note a sub-chapter is very short, in a concise book. A remarkably practical update on C, somewhat similar to [[safe c++]]. Content isn’t theoretical, and not so relevant to interviews, but relevant to real projects and GTD

quiz: add 0 to return value: gotcha # lval

Given calcTax(), I sometimes convert the returned value — simply add 0. I used to think this is completely harmless and has no effect on the program.

Now I don’t feel that way. calcTax() could return by reference, so we can take the address of this “expression” like

&calcTax()

In other words, calcTax() is an lvalue expression. If you add 0, you get a rvalue expression! Basically a temporary object. You can’t take its address.

global variables in (financial) rapid development

In my financial projects, I realized people do worse things than global vars – copy/paste, super long method signatures, poor null checks, returning nulls…

My professional perl programs successfully use lots of globals. Be practical. Get Real.

C++ lets you easily use tons of globals, so I’d say “avoid it, but if no time then use it.”

In java, someone might want to create a GlobalVars.java to hold public static vars. This is actually better than scattering lots of mutable[1] non-private static variables into multiple classes. So before you dismiss someone for using GlobalVars.java, look for those mutable shared static vars.

For a bit of access control, you can use an abcSingleton.java class to hold such shared variables — Some classes will not get a reference to this singleton, so they can’t access the vars.

[1] Most final variables are mutable in java, unless the class is designed to be immutable.

pair up new/delete — usually impractical@@

Update — Best way to manage the delete is probably RAII. There’s a post on that
—————————

A simple [2] technique — make sure for every “new” operation at run time [1], there’s exactly one “delete” operation. RAII is a similar solution.

[1] One challenge is how to ensure this runtime rule using compile-time coding rules. Some runtime rules are enforceable at compile time. Consider const, final, finally, synchronized, private, abstract keywords

Q: Analyzing source code, you might see multiple deletes in a branch/loop, but is the delete operation ever implicit (ie absent in source code)?
A: Never. “new” only gives you a ptr. You can pass that ptr around or assign the pointee address to a field, but c++ never automatically, implicitly calls delete on a ptr.

[2] Many situations are too complex for this technique. As pointed out on [[nitty gritty]] P194, sometimes delete is placed into another function than the “birth place” function.

c++ in finance is mostly C code@@

My friend told me a lot of c++ apps in finance is 80% C code. How about yours?
He said most of today’s c++ systems were created in C days. When c++ became mature, people didn’t decide to create brand-new modules in c++ (contrast java). Instead, they decided to evolve slowly from c to c++. Result is, they still write mostly in C, and not c++.
Java is a different story. People did feel the justification to create brand-new apps in java, perhaps because there’s no “evolutionary” approach. They either stick to c or make a clean break using java.
I still wonder why people didn’t create new components in c++. C++ could coexist with C very, very easily. Any insight?

client interface of ANY c++ class (effC++

based on P79 [[effC++]]

The Scenario — Suppose another programmer can’t change your class or inherit from it. What parts of your class are accessible to her?

* public methods, both static and non-static
* public overloaded operators
* friend functions/classes
* public fields, but rarely justified

These constitute the _Client_ interface of your class.