missing q[ return ] in non-void function #Josh

I told Josh about this real Scenario: my non-void function has 3 paths to exit the function, one of them without a return statement.

g++ usually give a warning under “-Wall”. Flowing off the end of a function is equivalent to a return with no value — undefined behavior in non-void function.

If you print the function return value and the 3rd path executes, you get undefined behavior.

https://stackoverflow.com/questions/3402178/omitting-return-statement-in-c shows -Wreturn-type

I believe it’s a compilation error in other languages.

Empty base-class optimization

This optimization only zeros out the runtime [1] size of baseclass subobject [2]. All other sizes of empty types are the same — one byte.

Suppose Der subclasses B1 and B2.

Note on [1]:
If B1 is empty, then the compile-time operator sizeof() shows sizeof(B1) == 1 not zero, but at run time, the Der object size shows the B1 sub-object has zero size, due to EBCO.

Note on [2]:
If all of Der, B1 and B2 are empty, we know sizeof(Der) == 1, but how about rutime sizeof(an_Der_instance)? Also one, not zero, because this instance is not a baseclass subobject.

ValueType default ctor needed in std::map[]

Gotcha — Suppose you define a custom Value type for your std::map and you call mymap[key] = Value(…). There’s an implicit call to the no-arg ctor of Value class!

If you don’t have a no-arg ctor, then avoid operator[](). Use insert(make_pair()) and find(key) methods.

Paradox — for both unordered_map and map, a lookup HIT also requires a no-arg ctro to be available. Reason is,

  • since your code uses map operator[], compiler will “see” the template member function operator[]. This function explicitly calls the ctor within a if-block.
    • If you have any lookup miss, this ctor is called at run time
    • If you have no lookup miss, this ctor is never called at run time
    • compiler doesn’t know if lookup miss/hit may happen in the future, so both branches are compiled into a.out
  • If your source code doesn’t use operator[], then at compile time, the template function operator[] is not included in a.out, so you don’t need the default ctor.

 

ADL #namespace

  • “Put the function in the same namespace as the classes it operates on.” is a one-liner summary
  • If you want to write a function that needs only a class’s public interface – then that function doesn’t have to be a (static/non-static) member. The function can become a free function placed in the same name space as the class. This increases encapsulation and data hiding, and reduces coupling as only the public interface is needed.. P79 [[c++codingStd]]
  • It’s also an idiom/pattern to impress interviewers. The published experts all seem to view ADL as a library feature mainly for overloaded operators. Read cppreference and [[c++codingStd]]. I’m not interested in that usage.
  • boost seems to use a lot of ADL
  • I feel namespace loves ADL more than any other c++ feature 🙂
  • ADL is an endorsed, prized compiler feature but still with criticisms [1]. To eliminate the confusion, simply fully qualify the function call.

[1] https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup has the best quickguide with simple examples.

https://softwareengineering.stackexchange.com/questions/274306/free-standing-functions-in-global-namespace is a short, readable discussion.

My annotations on the formal, long-winded definition — ADL governs look-up of unqualified function names in function-call expressions (including operator calls). These function names are looked up in the namespaces of their arguments in addition to the usual namespace-based lookup.

defining+using your swap() #gotcha

Gotcha! If you define your own swap() then it may not get picked up depending on some subtle factors. In the demo below, when the args are declared as “int” variables, then the hidden std::swap() gets picked up instead of your custom swap(size_t, size_t)!

Note there’s no specific #include required.

  • Solution : if practical, avoid “using namespace std” even in cpp files
  • solution : except the outermost main(), enclose everything  in an anonymous namespace, to force the unqualified swap() to resolve to your custom version.
#include <iostream>
#include <assert.h>

//namespace{

using namespace std;

int value1 = 5;
int calls=0;
template <typename T> void swap(size_t a, size_t b){
        ++calls;
        std::cout<<"in my swap()"<<std::endl;
}

int mymain(){
  int a=0, b=1;
  int oldCount = calls;
  swap<int>(a,b); //int arguments won't invoke my swap()
  assert (calls == oldCount);

  std::cout<<"after 1st call to swap()"<<std::endl;
  swap<int>(0,1); //calls my swap()
  assert (calls == 1+oldCount);

  std::swap<int>(a,b);  //can compile even without any #include
  // no return required
}

//}

int main(){
  mymain();
}

g++ -D_GLIBCXX_DEBUG #impractical

This is a good story for interviews.

In a simple program I wrote from scratch, this flag saved the day. My input to std::set_difference was not sorted, as detected by this flag. Without this flag, the compiler didn’t complain and I had some unexpected successful runs, but with more data I hit runtime errors.

I had less luck using this flag with an existing codebase. After I build my program with this flag, I got random run-time crashes due to “invalid pointer at free()” whenever i use a std::stringstream.

 

gdb stop@simple assignments #compiler optimize

Toggle between -O2 and -O0, which is the default non-optimized compilation.

In my definition, A “simple assignment” is one without using functions. It can get value from another variable or a literal. Simple assignments are optimized away under -O2, so gdb cannot stop on these lines. This applies to break point or step-through.

In particular, if you breakpoint on a simple assignment then “info breakpoint” will show a growing hit count on this breakpoint, but under -O2 gdb would never stop there. -O0 works as expected.

As another illustration, if an if-block contains nothing but simple assignment, then gdb has nowhere to stop inside it and will only stop after the if-block. You won’t know whether you entered it. -O0 works as expected.

ODR@functions # and classes

Warning – ODR is never quizzed in IV. A weekend coding test might touch on it but we can usually circumvent it.

OneDefinitionRule is more strict on global variables (which have static duration). You can’t have 2 global variables sharing the same name. Devil is in the details:

(As explained in various posts, you declare the same global variable in a header file that’s included in various compilation units, but you allocate storage in exactly one compilation unit. Under a temporary suspension of disbelief, let’s say there are 2 allocated storage for the same global var, how would you update this variable?)

With free function f1(), ODR is more relaxed. http://www.drdobbs.com/cpp/blundering-into-the-one-definition-rule/240166489 (possibly buggy) explains the Lessor ODR vs Greater ODR. Lessor ODR is simpler and more familiar, forbidding multiple (same or different) definitions of f1() within one compilation unit.

My real focus today is the Greater ODR. Obeying Lessor ODR, the same static or inline function is often included via a header file and compiled into multiple binary files. If you want to put non-template free function definition in a shared header file but avoid Great ODR, then it must be static or inline, implicitly or explicitly. I find the Dr Dobbs article unclear on this point — In my test, when a free function was defined in a shared header without  “static” or “inline” keywords, then linker screams “ODR!”

The most common practice is to move function definitions out of shared headers, so the linker (or compiler) sees only one definition globally.

With inline, Linker actually sees multiple (hopefully identical) physical copies of func1(). Two copies of this function are usually identical definitions. If they actually have different definitions, compiler/linker can’t easily notice and are not required to verify, so no build error (!) but you could hit strange run time errors.

Java linker is simpler and never cause any problem so I never look into it.

//* if I have exactly one inline, then the non-inlined version is used. 
// Linker doesn't detect the discrepancy between two implementations.
//* if I have both inline, then main.cpp won't compile since both definitions 
// are invisible to main.cpp
//* if I remove both inline, then we hit ODR 
//* objdump on the object file would show the function name 
// IFF it is exposed i.e. non-inline
::::::::::::::
lib1.cpp
::::::::::::::
#include &amp;lt;iostream&amp;gt;
using namespace std;

//inline
void sameFunc(){
    cout&amp;lt;&amp;lt;"hi"&amp;lt;&amp;lt;endl;
}
::::::::::::::
lib2.cpp
::::::::::::::
#include &amp;lt;iostream&amp;gt;
using namespace std;

inline
void sameFunc(){
    cout&amp;lt;&amp;lt;"hey"&amp;lt;&amp;lt;endl;
}
::::::::::::::
main.cpp
::::::::::::::
void sameFunc(); //needed
int main(){
  sameFunc();
}

 

c++non-void function +! a return value

Strictly, undefined behavior not a compiler error. https://stackoverflow.com/questions/9936011/if-a-function-returns-no-value-with-a-valid-return-type-is-it-okay-to-for-the explains the rationale.

However, in practice,

  • For an int function the compiler could return any int value.
  • For functions returning type AA, I don’t know what is returned. Could it be a default-constructed instance of AA?
    • My specific case — I modified a well-behaving function to introduce an exception. I then added a catch-all block without a return value. Actually worked fine. So some instance of AA is actually returned!

 

linker dislikes [non-generic]function definition in shared header

I used to feel header files are optional so we can make do without them if they get in our ways. This post shows they aren’t optional in any non-trivial c++ project. There is often only one (or few) correct way to structure the header vs implementation files. You can’t make do without them.

Suppose MyHeader.h is included in 2 cpp files and they are linked to create an executable.

A class definition is permitted in MyHeader.h:

class Test89{
void test123(){}
};

However, if the test123() is a free function, then linker will fail with “multiple definition” of this function when linking the two object files.

http://stackoverflow.com/questions/29526585/why-defining-classes-in-header-files-works-but-not-functions explains the rules

  • repeated definition of function (multiple files including the same header) must be inlined
  • repeated class definition (in a shared header) is permitted for a valid reason (sizing…). Since programmers could not only declare but define a member function in such a class, in a header, the compiler silently treats such member functions as inline

c++method default arg isn’t saved]vtable

  1. Google c++ style guide forbids default args in virtual functions. It’s error-prone.
  2. [[c++primer]] P607 has a slightly more liberal advice — on a virtual function, subclass/baseclass always use same default arg.

Ashish said he was asked a common question in multiple c++ interviews. The default arg value is “declared” in base class and (or) derived class but never saved in the vtable. It’s basically decided statically (i.e. at compile time).

If you use a pointer to Base to invoke the virtual method, then the Base version’s declared default arg applies unconditionally, even though subclass override is chosen at run time, via the vtable. Highly confusing.

my_unsigned_int = -1

http://www.cplusplus.com/reference/string/string/npos/:

static const size_t npos = -1;

npos is a static std::string member constant value with the greatest possible value for an element of type size_t.This value, when used as the value for a len (or sublen) parameter in string‘s member functions, means “until the end of the string”. As a return value, it is usually used to indicate no matches.This constant is defined with a value of -1, which because size_t is an unsigned integral type, it is the largest possible representable value for this type.

typedef: non-optional in this case

Someone asked me to write a utility function to print any STL container, in my own loop. I suggested we follow the STL convention to use iterator inputs. Echoed on http://stackoverflow.com/questions/4657767/how-to-templatize-a-function-on-an-stl-container-that-has-an-iterator. However, what if we pass the container itself as input (assuming it’s a standard-conforming container)?

template
void dump(const CT& cont) {
typedef typename CT::const_iterator iterator; //no choice
iterator i;
//won’t compile — CT::const_iterator i;
for(i = cont.begin();   i!= cont.end();   ++i){
cout<<*i <<” “<<endl;
}
}

This works but the typedef isn’t sugar-coating. Without it you get

dependent-name ` M::const_iterator’ is parsed as a non-type, but instantiation yields a type

Very loosely, CT::const_iterator i suggests to the compiler to create a concrete type for i but CT::const_iterator is not a generic type, not “concretized” [1]. Solution — The typedef dresses up this “generic type” as if it’s an end-user type, usable in a variable declaration

[1] remember the Barcap FMD eval objects?

3 implicit conversions by c++ compiler – const

# a literal “any string” is treated as char const *. If you must assign such a literal string to a ptr-to-char variable, you must cast away the constness (Macquaries)

# If you have a const method in a non-const class instance, and if inside that method you access some instance fields, those fields are implicitly converted to const variables. See P215 [[eff STL]]

# as stated in other posts (like http://bigblog.tanbin.com/2012/02/c-const-methods-mean-this-is-bitwise.html), in a const method, “this” is converted to a ptr-to-const.

# http://bigblog.tanbin.com/2012/04/non-primitive-field-in-const-method.html

5 implicit conversions by c++ compiler – big 4

See also post on the “same” topic. There are too many such implicit things to list in one blog. This one is about the dtor, copy-ctor, assignment operator + ctor — the big 3 + 1.
 
# MyClass myObj; // calls the default ctor
# MyClass myObj5[5] // ditto
# op-new first argument is always a size_t, but when you invoke it you never pass pass in size
# assigning to an existing reference var of MyClass calls the assignment operators implicit
          refVar2.operator=(….) //because reference variables can’t be reseated

# per-class overloaded operator new and operator delete are implicitly static, always. ARM P283
** overloaded op new is (implicitly) inherited

# Say your copy ctor is invoked (often implicitly;-). Therein you need to clone a FIELD of type T. If you don’t use the field initializer syntax, then the copier first calls T’s no-arg ctor to “make room” for the field, then calls T’s operator= to modify its state.

# when you new up a Derived object on heap, and Base class has an operator new, you would use that (implicitly inherited) operator-new unless Derived class redefines (“override” is wrong term) operator-new.
** when you new up a Umbrella object on heap, and the object has a non-ref (not pointer either) field of type Base, you would ignore whatever operator-new Base has. You use either “::operator new” or the Umbrella’s own operator-new to grab a sizeof(Umbrella) chunk. That’s all the memory you need. Part of it will be constructed as a Base object. See [[more effC++]]

10 implicit translations by c++ compiler

See also posts on the “same” topic.

Note many inefficient implicit conversions are selectively, optionally “optimized away” by various compiler options. This adds even more complexity.

# proxy classes (like in [[more effC++]]) rely heavily on implicit translations
# smart pointer are drop-in replacement for raw pointers, and always relies on implicit compiler translations.
# “–any literal string–” anywhere in source code is implicitly converted to a char*const
# unnamed temporary object creation by compiler. [[more eff c++]]
# RVO
# [[c++ primer]] here are 3 identical func declarations because all array params are implicitly converted to ptr param. This is still very relevant because arrays are far more widely used than vectors (C is widespread) —
  void f(int * )
  void f(int[] )
  void f(int[5] )

#51 [[effC++]] Item 19 points out that if your Rational class overloads operator* with a Rational argument, then “aRational * 2” still works because compiler converts it first into
  aRational.operator*(2); // then it needs to convert 2 to a Rational object, so it converts the call into

  const Rational temp(2);
  aRational.operator*(temp);

#50) if f() declares a param of type Animal, you can pass in a Shape variable, if there’s a converter. (See effC++ Item 26). Applies to any LHS=RHS expression like

** Type5 var2 = var3ofUnrelatedType // you get either a Type5 conversion ctor or a conversion method in UnrelatedType, 

#10) overloading the arrow operator ie “->”. See P19/23 [[boost]]. IBM explains — The statement x->f() is interpreted as (x.operator->())->f()

5 implicit conversions by c++ compiler — STL

See also posts on the “same” topic.

#50) [STL] when you use STL copy() to print any container, you implicitly call the ostream_iterator ctor

#40) [STL] when you specify a pred func in a STL algo, there’s a ton of implicit conversions behind the scene. Here’s one example. The pred func is invoked as

  bool flag = pred(*the_iterator_in_this_func_call) // for a unary predicate

#30) [STL] when you put a functor TYPE into  , the specialized template class’s constructor instantiates a functor OBJECT. In short, you specify functor TYPE only — functor Object instantiation is implicit. See P91 [[effective STL]]. P154[[STL tutorial]] shows

 set<char, less > mySetOfChar;

#20) [STL] converting a func name to a func ptr — when you pass the func name as arg to some STL algo. P 202 [[effective STL]]

const T =shadow-type of T;const T&=distinct type from T&

(background: the nitty gritty of overload/overriding rules are too complicated…)

I feel overloading and overriding has consistent rules. Take 2 same-name functions (single-param functions for simplicity) and ignore their host classes. If the 2 can overload each other, then their parameters are considered distinct. That means they can’t override each other (if they were inserted into a inheritance tree).

Conversely, if the 2 can override each other, then their parameter types are considered “identical” so they can’t overload each other (if set “free”).

Q: We know function overloading forbids 2 overloads with identical parameter sets. How about const, i.e. If 2 functions’ parameters differ only in const, can they have identical names?
A: No. ARM P308 explains that compiler’s (static) “resolution” (based on argument type) will fail to pick a winner among the 2. It’s ambiguous.
A: therefore, in the overloading context, const SomeType is a “shadow type” and does NOT count as a distinct type.

However, if 2 functions’ parameters have this difference — const T& vs T&, then distinct types, so the 2 functions can have identical names. Exaplained in ARM.

Similarly, 2 overloads can overload based on const T* vs T* — distinct types.

Q: We know method overriding requires exact parameter match. How about const? Can an override add or remove const?
A: whatever the answer, this is sloppy. No justification.
A: yes according to my test. Adding the const makes no difference — runtime binding unobstructed.
A: therefore, in the overriding context, const SomeType is a “shadow type” and does NOT count as a distinct type.

However, my test shows const T& vs T& do make a difference — runtime binding disabled. These are considered 2 distinct types.

static object initialization order #lazy singletons

Java’s lazy singleton has a close relative among C++ idioms, comparable in popularity and implementation.

Basically, local statics are initialized upon first use, exactly. Other static Objects are initialized “sometime” before main() starts. See the Item on c vs c++ in [[More eff c++]] and P222 [[EffC++]].

In real life, this’s hard to control — a known problem but with standard solutions — something like a lazy singleton. See P 170 C++FAQ. Also addressed in [[EffC++]] P221.

non-const-non-primitive field in a const method

As stated on P215 [[effSTL]], inside const methods, all fields act like const fields. This is simpler for primitive fields like an int field, but trickier for non-primitive fields … like Animal —

class C{
Animal f;
Animal const & dump() const;
}

Inside dump(),
– we can’t overwrite f like “f = anotherAnimal”
– we can’t call mutator methods on f like “f.grow()” — uncompilable
– we can’t remove the “const” from dump() return type — uncompilable in GCC. See also http://www.parashift.com/c++-faq-lite/const-correctness.html#faq-18.11 A common IV quiz.
– if Animal class overloads method age() on const, then compiler will statically bind the “f.age()” to the const version.

All these rules are compiler-enforced. No such check at run-time.

g++ removes a method if never called@@

I suspect the syntax checker in gcc effectively comments out an (potentially illegal) method if it's never called. In the example below,

1) modification of this->intVar is blatant violation of const “this” but this is invisible to the gcc syntax checker unless there's a call to the method.

2) More obviously, bad2() calls a non-existent method but condoned unless someone calls bad2().

using namespace std;
template struct J{
    J(T & rhs){}
    void violateConstThis(T a) const{
       this->intVar = a; // legal when method is const but no one calls.
    }
    void bad2() const{
        this->nonExistentMethod();
    }
    T intVar;
};
int main()
{
    int a=22;
    const J j1(a);
    //j1.violateConstThis(89);
    //j1.bad2();
    return 0;
}

c++ forward class declaration vs "implementation"

a FCD is the minimum declaration of a class before its use. Here are some FCD in std::iosfwd library —

template class char_traits;
class char_traits;

In C++, i see 3 levels of class declarations
1) FCD
2) class definition using method prototypes, and field compositions ie a full listing of fields. Full listing required for memory allocation.
3) class fully defined with method bodies
4?) see another post for an alternative – pure abstract classes

Usually we put #2 in *.h; client programs “#include” our class definitions by macro expansion. We seldom need to put #3 in header files, though most boost header files are #3, with important consequences for linking and compiling.

effC++ item 34 has a detailed treatment of #1 vs #2. FCD, being the minimum declaration, is also known as the “interface”, whereas #2 is known as an “implementation” and “class definition”.

Puzzled by the word “implementation”? Think of a Car as an abstract concept. Different car makers “implement” it by using concrete components. A specific implementation of car is essentially a listing of non-static fields.

Put another way, implementation means composition.

Compiler need the size of each field (possibly user-defined-types) in order to size up your Car instance. new expression and operator new calls sizeof(Car). In java, primitive fields have known sizes; all reference fields occupy 4 bytes (32-bit machine). I believe c++ compiler actually calculates and determines the address of each new’ed object — the address is not determined at runtime.

As you write a client program, you could sometimes choose to include the API classes by FCD rather than #2. I feel If you don’t open up an API object to access its members, and you only mention the class name in method signatures, then FCD suffices.

The motivation behind Item #3/4 is compile-time dependency and coupling. I feel it’s c++ specific. By the way, decoupling is one of my favorites, and is a practical priority compared to a lot of other design principles.

Google c++ coding guide gave a simple illustration of bad FCD messing up compilation, where #include is safe. In a nutshell, #include would inform compiler the inheritance relationship between B and D, which is important to compiler static binding.

protected virtual dtor #tricky

[[boost]] P 24 explains the technique of protected virtual destructor. Here’s my take —

The superclass A destructor is called by B’s destructor. See http://www.cplusplus.com/forum/general/12344/. Remember you first clean up the outermost layer of the onion, and remember the B fields exist on the outer layer, outside A’s real estate.

Now, if someone gets hold of an A pointer, where the pointee is B, she can’t call delete on the pointer. Compile time (not run time) error. The special dtor is a protection.

class Base{
  protected: virtual ~Base(){cout <<"base dtor" <<endl;}
};
class Derived : public Base {
  public: virtual ~Derived(){cout <<"derived dtor" <<endl;}
};
int main(){
  Base * p1 = new Derived;
  Derived * p2 = new Derived;
  delete p1; //won't compile
  delete p2; // ok
}

If I swap the 2 words public and protected then I can delete Base ptr but not Derived ptr (compiler error)!

So it seems declared type must have a public dtor. In our examples, even though object is Derived, delete is allowed only on the ptr whose declared type has a public dtor.

Now, What if i have a private dtor? Probably less common, but it’s possible to invoke this private virtual dtor —

class Base{
  public:virtual ~Base(){cout <<"base dtor" <<endl;}
};
class Derived : public Base {
  private: virtual ~Derived(){cout <<"derived dtor" <<endl;}
};
int main(){
  Base * p1 = new Derived;
  delete p1; // ok private destructor called!
}

In my experiments, qq(new) is mandatory as a stackVar or this class will trigger a compiler error, because compiler can’t destroy the object automatically.