struct^array: 2 basic dataStruct in C

Stroustrup singled out these 2 constructs in C and said they represent nice, simplified models of memory usage.

Note a struct field can be a pointer.

Now I see that in C, there’s no other basic data structure. What about C++? Same! Java, c# seem to follow suit.

func overloading: pro^con #c++j..

Overloading is a common design tool in java and other languages, but more controversial in c++. As interviewer, I once asked “justification for using overload as a design tool”. I kinda prefer printInt/printStr/printPtr… rather than overloading on print(). I like explicit type differentiation, similar to [[safe c++]] advice on asEnum()/asString() conversion function.

Beside the key features listed below, the most common j4 is convenience | readability….. a on-technical justification!

— ADL — is a key c++ feature to support smart overload resolution

— TMP — often relies heavily on function overloading

— optional parameter and default arguments — unsupported in java so overloading is the alternative.

— visitor pattern — uses overloading. See https://wordpress.com/post/bintanvictor.wordpress.com/2115

— ctor and operator overloading — no choice. Unable to use differentiated function names

— C language — doesn’t support overloading. In a sense, overloading is non-essential.

Name mangling is a key ABI bridge from c++ to C

[19] zbs cf to QQ+GTD #compiler+syntax expertise

Why bother — I spend a lot of time accumulating zbs, in addition to QQ halos and localSys GTD

I have t_zbs99 and other categories/tags on my blogposts showcasing zbs (真本事/real expertise) across languages. Important to recognize the relative insignificance of zbs

  • #1 QQ — goal is mobility. See the halo* tags. However, I often feel fake about these QQ halos.
  • #2 GTD — localSys or external tools … goal is PIP, stigma, helping colleagues. Basic skill to Make the damn thing work. LG2 : quality, code smell, maintainability etc
  • #3 zbs — goal is self-esteem, respect and “expert” status. By definition, zbs knowledge pearls are often not needed for GTD. In other words zbs is “Deeper expertise than basic GTD”. Scope is inherently vague but..
    • Sometimes I can convert zbs knowledge pearls to QQ halos, but the chance is lower than I wished, so I often find myself overspending on zbs. Therefore I consider the zbs topics a distant Number 3.
    • Zbs (beyond GTD) is required as architect, lead developer, decision makers.

I also have blog categories on (mostly c++) bulderQuirks + syntax tricks. These knowledge pearls fall under GTD or zbs.

c++toolchain complexity imt new languages #%%advantage

The modern languages all feature dramatically simplified tool chain. In contrast, c++ tool chain feels much bigger to me, including profilers, static analyzers, binary file dumpers, linkers ..

This is one of the real obstacles to new entrants, young or old. This is also my (slowly growing) competitive advantage. I feel some people (like Kevin of Macq) know more, but most developers have a cursory working knowledge in this field.

I was frustrated for years by the complex and messy build tools in c++. Same for the other new entrants — Rahul spent a month setting up Eclipse CDT…

This learning curve, entry barrier … is a direct consequence to the c++ “sweet spot” as Stroustrup described — inherently complex codebase close to hardware.

I wrote dozens of blogposts about c++ build issues. For example, on windows, my strawberryPerl + git_bash + notepad++ setup is unknown to many. These fellow developers struggle with MSVS or Eclipse !

Due to the bigger ecosystem needed to support c++, new features are added at a slower pace than languages having a central organization.

new lang2challenge c++on efficiency@@

I asked Stroustrup — efficiency is the traditional strength of C and C++,  both memory efficiency and speed.. Is that still true? He immediately said yes.

I think it was clear in his mind that c/c++ were still the most efficient languages around. He did say Fortran is optimized for scientific computing.

I later asked him — any new language he watches out for. He said none, without real hesitation, no ifs or buts.

Recalling that conversation, I feel new languages are usually more high-level and easier to use. They are more likely to use heap to provide a “consistent interface” and avoid the complexities of a low-level language.

If I’m right, then these languages can’t and won’t optimize for efficiency as a top priority. Efficiency is possibly a 2nd priority.

vtable also contains.. #class file

C++ is more complex than java. A typical vtable in c++ contains

  • offset of base type subobject. In multiple inheritance, this offset is often non-zero. This offset is needed not only for field access but also up-casting
  • typeid for RTTI

These details are part of the compiler ABI, since object files from older and newer compilers (of the same brand) could link together iFF they agree on these details.

Best-known part of ABI is name-mangling algorithm. This vtable detail would be the 2nd best-known ABI feature.

I believe the class file in java is one file per class. Therefore, vtable is something like the equivalent of a java class file.

 

c++ABI #best eg@zbs

Mostly based on https://www.oracle.com/technetwork/articles/servers-storage-dev/stablecplusplusabi-333927.html

Imagine a client uses libraries from Vendor AA and Vender BB, among others. All vendors support the same c++ compiler brand, but new compiler versions keep coming up. In this context, unstable ABI means

  • Recompile-all – client needs libAA and libBB (+application) all compiled using the same compiler version, otherwise the binary files don’t agree on some key details.
  • Linker error – LibAA compiled by version 21 and LibBB compiled under version 21.3 may fail to link
  • Runtime error – if they link, they may not run correctly, if ABI has changed.

Vendor’s solutions:

  1. binary releases —  for libAA. Vendor AA needs to keep many old binary versions of libAA, even if compiler version 1.2 has retired for a long time. Some clients may need that libAA version. Many c++ libraries are distributed this way on vendor websites.
  2. Source distribution – Vendor AA may choose to distribute libAA in source form. Maintenance issues exist in other forms.

In a better world,

  • ABI compatible — between compiler version 5.1 and 5.2. Kevin of Macq told me this does happen to some extent.
  • interchangeable parts — the main application (or libBB) can be upgraded to newer compiler version, without upgrading everything else. Main application could be upgraded to use version 5.2 and still link against legacy libAA compiled by older compiler.

(overloaded function) name mangling algorithm — is the best-known part of c++ABI. Two incompatible compiler versions would use different algorithms so LibAA and LibBB will not link correctly. However, I don’t know how c++lint demangles those names across compilers.

No ABI between Windows and Linux binaries, but how about gcc vs llvm binaries on Intel linux machine? Possible according to https://en.wikipedia.org/wiki/Application_binary_interface#Complete_ABIs

Java ABI?

 

%%strength ] c++knowhow #Mithun

See also

Mithun asked me “So you are now completely pro in c++?” I replied

  1. On-the-job technical challenges are not very different from java
  2. On interviews the QQ topics are different. That’s the real challenge for a java guy moving into c++.

Now I feel a 3rd element is zbs beyond GTD and interviews. I have written many blogposts about “expert”. I also have many blogpost in the category “c++real”

Some may say c++ is overshadowed by java, and c++ QQ is overshadowed by coding IV. Well, we need sharper perception and judgment, and recognize the many facets of the competitive landscape. I won’t elaborate here, but c++ has withstood many waves and is more robust than many other technologies.

poll()as timer]real time C : industrial-strength #ST-mode

See also http://www.sourcexr.com/articles/2013/11/12/timer-notifications-using-file-descriptors

RTS Xtap (and earlier framework? probably) use a C timer based on the Linux syscall epoll(). It has millisecond precision [1], good enough for our real time feed parsers that consume all of NYSE, Nasdaq, OPRA etc. I won’t say how many clients we support, but some of the world’s top sites use our feeds.

There’s also a version using poll(). It’s used by default when epoll is unavailable, but linux supports epoll. For simplicity, I will say “poll” from now on.

[1] The millisec resolution is due to the tick length in Linux kernel, described on P 195 [[Linux kernel]]

I guess one advantage of poll() to implement timer is the integration with sockets. A parser is fundamentally event driven. Timer and socket are the two primary event sources.

It looks like the poll() syscalls are capable of supporting both requirements, together, which is our situation.

I call it “industrial-strength” solution because it has powered one of the most reliable, most widely used market data dissemination systems for decades, the unsung hero behind most of the financial data web sites. It has been in use for decades and handles 300 “exchanges feeds”

Concurrency? xtap is single-threaded mode. In the poll() solution, incoming packets AND timer events are inherently processed serially. No synchronization required. When the receiver socket is busy, timer events are .. delayed. Exactly what we want.

–The timer/event loop structure

while(1){ //pseudocode
  check_timers(); //roughly this gets called every 1 ms
  epoll_wait(timeout=1ms);
}

 

[[safeC++]] discourages implicit conversion via OOC/cvctor

See other posts about OOC and cvctor. I am now convinced by [[safeC++]] that it’s better to avoid both. Instead, Use AsXXX() method if converting from YYY to XXX is needed. Reason is type safety. In an assignment (including function input/output), it is slightly hacky if LHS is NOT a base type of RHS. Implicit conversion is like Subversion of compiler’s type enforcement — Given a function declared as f(XXX), it should ideally be illegal to pass in a YYY. However, The implicit converters break the clean rule, from the back door.

As explained concisely on P8 [[safeC++]], The OOC is provided specifically to support implicit conversion. In comparison, The cvctor is more likely to be a careless mistake if without “explicit”.

Favor explicit conversion rather than implicit conversion. Some manager in Millennium pointed out that c++ syntax has too many back doors and is too “implicit”. Reading a piece of code you don’t know what it does, unless you have lots of experience/knowledge about all the “back-doors”.

[[safeC++]]assertion technique(versatile), illustrated with NPE

Update: how to add a custom error message to assert — https://stackoverflow.com/questions/3692954/add-custom-messages-in-assert

This is a thin book. Concise and practical (rather than academic) guidelines.

#1 guideline: enlist compiler to catch as many errors as possible.

However, some errors will pass compiler and only happen at run-time. Unlike on P11, I will treat programmer errors and other run-time errors alike – we need an improvement over the “standard” outcome which is UB (undefined behavior) and we may or may not see any error message anywhere.

#2 guideline: In [[safeC++]], such an improvement is offered in the form of assertions, in other words, run-time checks. The author gives them a more accurate name “diagnostics”.

2.1) now the outcome is guaranteed termination, rather than the undefined behavior.
2.2) now there’s always an error message + a stack trace. Now 2.2) sounds like non-trivial improvement. Too good to be true? The author is a practicing programmer in a hedge fund so I hope his ideas are real-world.

Simplest yet realistic example of #2 is NPE (i.e. null pointer deref). NPE is UB and could (always? probably not) crash. I doubt there’s even an error message. Now with a truly simple “wrapper” presented on P53-54, an NPE could be diagnosed __in_time__ and an fatal exception thrown, so program is guaranteed to terminate, with an error message + stack trace.

Like a custom new/delete (to record allocations), here we replace the raw pointer with a wrapper. There we see a pattern where we replace builtin c++ constructs with our wrappers to avoid UB and get run time diagnostics —

$ this wrapper is actually a simple smart ptr
$ traditional smart ptr templates
$ custom new, delete
$ vector
$ Int class replacing int data type

The key concepts —
% assertion
% diagnostics
% run time

Q: Can every UB condition be diagnosed this way? Not sure, but the most common ones seem to be.

[[safeC++]] – concise, pragmatic, unconventional wisdom

First off, this is a 120-page thin book, including about 30 pages of source code in the appendices. Light-weight, concise. Very rare.

I feel the author is bold to advocate avoidance of popular c++ features such as
– “Avoid using pointer arithmetic at all.”
– For class fields, avoid built-in types like int. Use Int type — no need to initialize.
– “Use the new operator only without bracket”. Prefer Vector to new[]
– “Whenever possible, avoid writing copy ctor and assignment operators for your class”

I feel these suggestions are similar to my NPE tactics in java. Unconventional wisdom, steeped in a realistic/pessimistic view of human fallibility, rather tedious, all about ….low-level details.

Amidst so many books on architecture and software design, I find this book so distinctive and it speaks directly to me — a low-level detailed programmer.

I feel this programmer has figured out the true cost/benefit of many c++ features, through real experience. Other veterans may object to his unconventional wisdom, but I feel there’s no point proving or convincing. A lot of best practices [1] are carefully put aside by veterans, often because they know the risks and justifications. These veterans would present robust justifications for their deviation — debatable but not groundless.

[1] like “avoid global variables and gotos”

–finance
Given the author’s role as a quant developer I believe all of the specific issues raised are relevant to financial applications. When you read about some uncommon issue (examples in [1]), you are right to question if it’s really important to embedded, or telecom, or mainframe domains, but it is certainly relevant to finance.

Incidentally, most of the observations, suggestions are tested on MSVS.

–assert, smartPtr…
I like the sample code. Boost smart ptr is too big to hack. The code here is pocket-sized, even bite-sized, and digestible and customizable. I have not seen any industrial strength smart pointer so simple.

–templates
The sample code provided qualify as library code, and therefore uses some simple template techniques. Good illustration of template techniques used in finance.

[1] for eg the runtime cost of allocating the integer ref count on P49; or the date class.

Any negative?

Simple, clean, pure Multiple Inheritance..really@@

Update — Google style guide is strict on MI, but has a special exception on Windows.

MI can be safe and clean —

#1) avoid the diamond. Diamond is such a mess. I’d say don’t assume virtual base class is a vaccine

#2) make base classes imitate java interface … This is one proven way to use MI. Rememer Barcalys FI team. All pure virtual methods, No field, No big4 except empty virtual dtor.

#2a) Deviation: java8 added default methods to interfaces

#2b) Deviation: c++ private inheritance from one concrete base class , suggested in [[effC++]]

#3) simple, minimal, low-interference base classes. Say the 2 base classes are completely unrelated, and each has only 1 virtual method. Any real use case? I can’t think of any but when this situation arises i feel we should use MI with confidence and caution. Similarly “goto” could be put to good use once in a blue moon.

g++ frontends ^ JVM/CLR bytecode

Have you ever wondered how gcc can compile so many languages. We know various dotnet languages all compile to the same IL code; JVM similar. Now I know that gcc uses a similar technique —

gcc has a frontend for java
gcc has a frontend for ObjectiveC
gcc has a frontend for c++
gcc has a frontend for Fortran

All of them produce the same intermediate code that feeds the true compiler, which produces assembly code.

c++ uninitialized "static" objects ^ stackVar

By “static” I mean global variables or local static variables. These are Objects with addresses, not mere symbols in source code. Note Some static objects are _implicitly_ static — P221 effC++.

Rule 1: uninitialized local and class fields of primitive types (char, float…) aren’t automatically initialized. See [[programming]] by Stroustrup.

Rule 2: uninitialized local or file-scope statics are automatically initialized to 0 bits at a very early stage of program loading. See https://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value

Rule 3: all class instances are “initialized”, either explicitly, or implicitly via default ctor. Beware… Reconsider Rule 1 — is a field of primitive type of the class initialized? I don’t think so. Therefore, “initialized” means a ctor is called on the new-born instance, but not all fields therein are necessarily initialized. I’d say the ctor can simply ignore a primitive-typed field.

Uninitialized static Objects (stored in BSS segment) don’t take up space in object file. Also, by grouping all the symbols that are not explicitly initialized together, they can be easily zeroed out at once. See
http://stackoverflow.com/questions/9535250/why-is-the-bss-segment-required

Any static Object explicitly initialized by programmer is considered an “initialized” static object and doesn’t go into BSS.

P136 [[understanding and using c pointers]] uses a real example to confirm that a pointer field in a C struct is uninitialized. C has no ctor!

P261 [[programming]] by Stroustrup has a half-pager summary
* globals are default-initialized
* locals and fields are truly uninitialized …
** … unless the data type is a custom class having a default ctor. In that case, you can safely declare the variable without initialization, and content will be a pre-set value.

4 Scopes for operator-overloads #new()

  1. non-static member operators are very common, such as smart ptr operator++(), operator< () in iterators
  2. static member operator new() is sometimes needed. ARM explains why static.
  3. friend operators are fairly common, such as operator<<()
  4. class specific free standing operator is recommended by Sutter/Andrei, to be placed in the same namespace as the target class. Need to understand more. Advanced technique.

RTTI compiler-option enabled by default

All modern compilers have RTTI enabled by default. If you disable it via a compiler option, then typeid, typeinfo and dynamic_cast may fail, but virtual functions continue to work.  Here’s the g++ option

-fno-rtti— Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (`dynamic_cast‘ and `typeid‘). If you don’t use those parts of the language, you can save some space by using this flag. Note that exception handling uses the same information, but it will generate it as needed. The `dynamic_cast‘ operator can still be used for casts that do not require runtime type information, i.e. casts to void * or to unambiguous base classes.

See http://en.wikibooks.org/wiki/C++_Programming/RTTI

divide-by-0: c++no excp;java throws..why

https://stackoverflow.com/questions/8208546/in-java-5-0-statement-doesnt-fire-sigfpe-signal-on-my-linux-machine-why explains best.

http://stackoverflow.com/questions/6121623/catching-exception-divide-by-zero — c++ standard says division-by-zero results in undefined behavior (just like deleting Derived via a Base pointer without virtual dtor). Therefore programmer must assume the responsibility to prevent it.

A compliant c++ compiler could generate object code to throw an exception (nice:) or do something else (uh :-() like core dump.

If you are like me you wonder why no exception. Short answer — c++ is a low-level language. Stroustrup said, in “The Design and Evolution of C++” (Addison Wesley, 1994), “low-level events, such as arithmetic overflows and divide by zero, are assumed to be handled by a dedicated lower-level mechanism rather than by exceptions. This enables C++ to match the behavior of other languages when it comes to arithmetic. It also avoids the problems that occur on heavily pipelined architectures where events such as divide by zero are asynchronous.”.

C doesn’t have exceptions and handles division-by-zero with some kind of run time error (http://en.wikibooks.org/wiki/C_Programming/Error_handling). C++ probably inherited that in spirit. However, [[c++primer]] shows you can create your own divideByZero subclass of a base Exception class.

java has no “undefined behavior” and generates an exception instead.

[09]global free func^free func]a namespace^static method #ADL

In java there are static methods and nothing else. In c there are “functions” and nothing else. In c++ there are 3 alternatives.

1) static methods — like java
** unlike the free functions, static methods can be private or protected
** can access private static fields of host class
** unlike the free functions, static methods are inherited. See Operator-new in [[eff c++]]
** usually don’t need name space prefix

2) free func/operators in a “package” like a boost library or STL, typically organized into namespaces. Better than global. See ADL technique

3) GLOBAL free func /operators — vestige of C syntax

In quick and dirty applications (not libraries), you see lots of global free functions.