writing to array beneath c-string-literal # UB

This is an online c++ test. The program compiles but hits seg fault but why? SOF poster said string literals are stored in read-only memory, and overwriting them is undefined behavior.

#include <stdio.h>
int main(){
 char  *p = "abc";
 char  *q = "abc123";
 while (*p=*q)
   printf("%c %c\n", *p, *q);

function returning c-string #avoid

This is a very common scenario but … c-string is always accessed by ptr, but returning a ptr is … always error-prone! https://stackoverflow.com/questions/1496313/returning-c-string-from-a-function gave two good solutions:

  1. statically allocate the string
  2. caller to pass in a ptr to an empty buffer

I like simplicity and prefer

  • static local char-array
  • global char-array
  • char const * myName() { return “Dabao”; } //the simplest solution, if the string is immutable

ptr = inevitable when using c-str

It is impossible to use any string without using pointers in C, according to https://stackoverflow.com/questions/1496313/returning-c-string-from-a-function

That’s one reason to call C a lowLevel language.

In most c++ string classes, there’s still a c-string inside every “string object”. I am 99% sure the char-array now lives on heap.

In java and c#, not only the char-array, but the entire string object (including the house-keeping data) live on heap.

convert non-null-terminated char-array to std::string

std::string ccy (ptr->ccy, ptr->ccy+3); //using a special string() ctor

my ptr->ccy is the address of a 3-char array, but it’s immediately followed by other chars belonging to another field, in a tightly packed struct without padding. If you simply pass ptr->ccy to string() ctor, your string will take in many extra chars until a null terminator.

strncpy^strcpy^strlcpy #padding nulls

https://stackoverflow.com/questions/6987217/strncpy-or-strlcpy-in-my-case claims that strncpy is not safe, but I am not sure. The strncpy() API is well defined, while strlcpy is a BSD extension and not even available in my linux. May not be in glibc.

One key difference between strcpy vs strncpy is the implicit \0 terminator —

strcpy() unconditionally appends a single null (\0), whereas strncpy won’t. strncpy() appends enough padding nulls IFF input array contains a null within first “n” slots.

array^pointer variables types: indistinguishable

  • int i; // a single int object
  • int arr[]; //a nickname of the starting address of an array, very similar to a pure-address const pointer
  • int * const constPtr;
  • <— above two data types are similar; below two data types are similar —->
  • int * pi; //a regular pointer variable,
  • int * heapArr = new int[9]; //data type is same as pi

[13]arrayName^ptrVar differences tabulated

See other posts why an array name ((including c-str) is a permanent name plate on a permanently allocated room in memory. Once you understand that, you know
– can’t move this name plate to another room
– can’t put this array name on the LHS of assignment

I feel overall, in most everyday contexts you can treat the array name in this example as if it’s a variable whose type is int*. However, the differences lurk in the dark like snakes, and once bitten, you realize there are many many differences. The 2 constructs are fundamentally different.

Q: how to edit the below html table structure (content is easy)?
A: edit the html
A: copy paste to ms-word

ptr to heap array ptr-var array-name (array not on heap)
declaration array-new int* p//allocate 32bit int arr[3]//allocate 3 ints
declared as a struct/class field same as ptr-var 4-bytes onsite entire array embedded onsite
declared as func param var no such thing common converted a ptr-var by compiler
initialize probably not supported char* p=”abc”;// puts the literal string in RO memory char arr[]=”xyz”; //copying the literal string from RO memory to stack/heap where the array is allocated. The new copy is editable.
object passed as func arg same as ptr-var common converted to &arr[0]
dereference same as ptr-var common gives the first int element value. P119[[primer]]
address of same as ptr-var double ptr &arr[0]==&arr==arr. See headfirstC post and also
rebind/reseat same as ptr-var common syntax error
add alias to the pointee same as ptr-var common unsupported
as LHS (not as func param) same as ptr-var common. Reseat syntax error

char-array dump in hex digits: printf/cout

C++ code is convoluted. Must cast twice!

// same output from c and c++: 57 02 ff 80
void dumpBufferPrintf(){
  static const char tag[] = {'W', 2, 0xFF, 0x80};
  cout << hex << setfill('0') ;
  for(int i = 0; i< sizeof(tag)/sizeof(char); ++i)
    printf("%02hhx ", tag[i]);
  printf ("\n");
#include <iostream>
#include <sstream> //stringstream
#include <iomanip> //setfill

//This function was also used to dump a class instance. See below
void dumpBufferCout(const char * buf, size_t const len){
                std::stringstream ss;
                ss << std::hex << std::setfill('0');
                for(size_t i=0; i<len; ++i){
                          if (i%8 == 0) ss<< "  ";
                          ss<<std::setw(2)<< (int)(unsigned char) buf[i]<<" ";
dumpBufferCout((const char*)&myStruct, sizeof(myStruct));

impressive container dumper #operator<<

Compared to the standard dump(), This is slightly harder to write but convenient to use and impressive in an interview:

  • operator template, agnostic of the payload type
  • operator overload
  • setw()

To support other containers beside list, you can copy paste this function and change the 2nd arg type. Thanks to the auto keyword, nothing else needs change.

When I attempted to consolidate to a single function template to handle all container types, then cout<<“any_string” also picks up this function inadvertently 😦


is my tested code


std::vector memory allocation/free: always heap@@

https://stackoverflow.com/questions/8036474/when-vectors-are-allocated-do-they-use-memory-on-the-heap-or-the-stack is concise.

  • The vector “shell” can be on stack or heap or static memory, as you wish.
  • The continuous array (payload objects) are always allocated on the heap, so that the vector can grow or deallocate them.

Deallocation is explained in https://stackoverflow.com/questions/13944886/is-stdvector-memory-freed-upon-a-clear

range-check in c++ vector^raw array

[[safe c++]] points out that static array or dynamic array are both (unfortunately) silent about access beyond their limits. Vector has operator[] and at() —

[[]] says c++ new array type supports .size()…

#include <iostream>
#include <vector>
using namespace std;

int main(){
    vector<int> v;
    size_t sz = v.size();
        cout<<v[0]<<" "<<v[sz]<<" <- operator[]: out-of-range treated as non-error!"<<endl;
        cout<<v.at(sz)<<" <- at(): throws:)"<<endl;
    }catch(exception & ex){
        cerr<<"Must catch by ref (what() virtual) " << ex.what()<<endl;

std::array+alternatives #RAII dtor

feature wish list — RAII; fixed size; bound check

https://stackoverflow.com/questions/16711697/is-there-any-use-for-unique-ptr-with-array compares vector ^ unique_ptr ^ std::array. Conclusion — Vector is more powerful, more versatile, more widely useful, less restrictive. std::array is

All alternatives to std::array:

  • vector and raw array
  • boost::scoped_array and boost::shared_array — least quizzed.
  • std::unique_ptr<int[]> dynArr(new int[theSize]) # can be allocated on heap at runtime, and you don’t need to call delete[]

The dtor in each data structure is a key feature if you want RAII:

  1. std::array — dtor probably destroys each element one by one. No q(delete) called since the elements could be on stack
  2. scoped_array — dtor calls delete[], which is a key difference from scoped_ptr, according to my book [[beyond the c++ standard lib]]
  3. vector — dtor uses its allocator, which very likely calls delete[] since vector uses heap for the underlying growable array.
  4. unique_ptr<T[]> — yes as expected. This valgrind experiment shows the array-specialization uses delete[], whereas the vanilla instance uses regular q(delete).

basics of 2-D array in C

Array of strings are the least confusing. Matrix-double is also widely used.

A[33][22] is a 2D matrix with 33 rows 22 columns. For element A[3][2] , 3 is the first subscript, i.e. row-number. Row number can go up to 33 – 1.

See http://c-programmingbooks.blogspot.com/2011/11/memory-map-of-2-dimensional-array-in-c.html


– The 2D array layout is contiguous — a[0][0] a[0][1]….a[1][0] a[1][1]. I think of it as 33 simple arrays connected end-to-end, each 22 cells

– The array of pointer is 33 pointers.


Char namesA[count][size]; //size = limit on long names

Char *namesB[count]; //this many strings; this many pointers

string member – char array ^ char ptr

I have no practical experienced on this topic. I seldom see people discussing it. Perhaps many didn’t know the char-array option or dare to try it.

Good use case for char-array — Suppose you know you will have a million instances of the struct, and all of them have an Address string about the same length. Then the char-array is a viable option.

Our /Cast of characters/ — lets say we have a Person struct (or class) with an Address field of a 99-char string.

[[headfirstC]] P286/220 point out (arguably) the most fundamental difference — the entire 99 bytes of char-array is allocated on-site, whereas in the char-ptr design only the 4-byte pointer [1] is stored on-site.

[1] pointer object, not pointer variable or pure-address

When you duplicate the struct, the char-array case is simpler. The char-ptr solution may require strdup(), malloc() and free().

With char-arrayh, you can get redundant copies of the same data.

The char-ptr solution allows 2 instances to share the same string in memory. Looks efficient and neat but could be tricky later on. Might hit dangling pointers.

Now suppose there’s a Resume field of maximum length 10KB. Char array option would waste memory since some Persons (kids and house wives) have an empty string.

Verdict — I feel in many real projects the char-array works fine and is easier to code and maintain.

arrayName^ptrVar – headfirstC

How important? So low-level! I feel precise grasp gives us confidence. There are quite a lot of nuances due to this fundamental difference.

void f(){
 int arr[33] = …
 int* ptr = …;

[[headfirst C]] P79 clearly explains that compiler will allocate stack memory for a variable ptr but not for arr. Basically, the “arr” variable doesn’t exist in the final executable.

(Assuming a 32bit) If you look at the size of the stack frame, you find 4 bytes for “ptr” and 33*2 bytes for “arr”. No allocation for the “arr” variable! Compiler allocates 4 bytes for the ptr VARIABLE “ptr” partly because we can later reseat this pointer. In contrast, the “arr” name is a not reseatable and a permanent alias for the address of array object. Every occurence of “arr” is replaced (by compiler) with the actual address of the array object.

arr = anotherArray;// won’t compile.
On P59, the book points out another key difference between array Name vs ptr Variable

Taking the address of “ptr” gives you the address of the 4-byte ptr VARIABLE. So suppose ptr points to address 0x0197. &ptr may be 0xb762.  In contrast,

&arr == arr // Both expressions evaluate to an address (like 0xb755) of the array OBJECT. There’s no separate allocation or separate address for the “arr” variable (non-existent). The “arr” is just a name in source code.

equivalence of array^pointer@@ # my take

Update — the spreadsheet is a newer post.
First let’s clarify what we mean by “pointer”.

Q: an array name is comparable to a pointer ….? Nickname? PointerObject? or PureAddress?
%%A: an array name like myArray is nickname for a pure address. See post on 3 meanings of “pointer”. However, in this post we are comparing myArray with a myPointer, where myPointer is … either a PointerObject or PureAddress
%%A: myArray is like a const ptr “int * const myConstPtr”

The apparent “equivalence” of array vs pointer is a constant source of confusion. Best (one-paragraph) pocket-sized summary — http://c-faq.com/aryptr/practdiff.html

http://c-faq.com/aryptr/aryptrequiv.html is a longer answer.

#1 (principle dealing with the confusion) physical layout. When in doubt always go back to fundamentals.

A simple array on the stack is a permanent name plate [2] on a permanent block of memory, physically dissimilar to a pointer Variable. However, Pointer Arithmetic and Array Indexing are equivalent. I feel AI is implemented using PA.

#2) syntax

Pointer variable is syntactically a more _flexible_ and more powerful construct than array. However, the strong syntactical restriction on array is a feature not a limitation. Look at strongly typed vs dynamic languages.

I feel most operations on an array is implemented using pointer operations. (Not conversely — Many pointer operations are syntactically unavailable on arrays). Specifically, Passing an array into a function
is implemented by pointer bitwise copy.

That’s the situation with arrays on the Stack. On the heap situation is more confusing. Consider “new int[5]” — return value Must be saved in a pointer variable, but the underlying is a nameless array. In contrast, see [2] above. Also, that pointer variable could rebind/reseat.

y differentiate Read vs Write to encapsulated/internal fields

Say my class C has an array field, and provides operator[]. MoreEffC++ shows some techniques to handle read access vs write access differently. As a result, when you call cout << myObj[2] you invoke the readonly accessor,  but you hit another accessor when you assign myObj[2] += 9999

Q1: Now, why do we ever Need to differentiate read vs write?
Q2: Why can’t I return a const reference under read, and a non-const reference under write?
A2: That’s a simple and useful design. To implement it you need to differentiate read vs write, and it’s not easy.

A1: More importantly, in some cases such as copy-on-write, we need to do something special when writing.
A1: suppose I want to provide read/write lock
A1: suppose i want to count how many times I’m accessed in RO vs RW mode

The standard mechanism to differentiate — overload operator= (and operator+= etc).

The simpler mechanism to differentiate — java style getter/setter, instead of op overloading. I feel the entire challenge in the Scott Meyers chapter is c++ specific and motivated by read/write Operator. Since java doesn’t support op overloading, the techniques don’t apply. Uncommon in C# even though c# supports op overloading.

c-str cheatsheet – remarks

cString manipulation involves too many more details than I anticipated. This is a first cheatsheet. I will put new tricks into separate posts for easy management. This cheatsheet is Sufficient for Most coding tests. Remember testers are more interested in your algo and data structure, not string operations.

Only a minority of questions require editing an existing string.

– Manufacture a std::string from a given c-string?http://stackoverflow.com/questions/5328544/returning-char-array-as-stdstring

?populate an array of strings, and save them to a vector
? convert int to a c-string? http://faq.cprogramming.com/cgi-bin/smartfaq.cgi?id=1043284385&answer=1043808026
– duplicate a string on stack
– char* strstr()
– int indexOf()
– compare
– clearContent
– concat/append
– erase_by2pos
– substr_by2pos
– replace_by2pos
– replace_a_substr
– insert_by_pos
? remove white spaces
?rtrim trailing comments

c-str cheatsheet – code

printf("%.*s", len, markets) // print c-str without \0
#include <iostream>
#include <string.h>
#include <algorithm>
#include <iterator>
#include <vector>
using namespace std;
void clone(char* orig) {
    char dup[strlen(orig) + 1];//VLA
    strcpy(dup, orig);
    cout << "-- duplicate a string on stack\n" << dup << endl;
    dup[0] = 0;
    cout << "-- now clear its content. New strlen() == \n" << strlen(dup)
            << endl;

int main() {
    char a[] = "Four score and seven years ago our fathers brought forth //comment";
    char * const pos5 = strstr(a, "//");
    cout << "-- position of sub-string == " << pos5 << endl;
    *pos5 = '\0';
    cout << "-- rtrim trailing comments -->" << a << "<---\n";
    char const b[] = "(extra)";
    char * news = new char[strlen(a) + strlen(b) + 1];
    strcpy(news, a);
    strcat(news, b);
    cout << "-- append and return a heapy\n" << news << endl;
    delete[] news;
    cout << "-- after clearing, strlen ==" << strlen(news) << endl;
    char * pattern = "and";
    int const indexOf = strstr(a, pattern) - a;
    cout << "-- indexOf == " << indexOf << endl;
    int const pos = 4;
    news = new char[strlen(a) + strlen(b) + 1];
    strncpy(news, a, pos); //the left half
    strcpy(news + pos, b); //insert
    strcpy(news + strlen(news), a + pos); //the left half
    cout << "-- insert_by_index at position " << pos << endl << news << endl;
    delete[] news;
    news = new char[strlen(a) + strlen(b) + 1];
    int pos1 = indexOf;
    int pos2 = indexOf + strlen(pattern); //1 past the region
    strncpy(news, a, pos1); //the left half
    strcpy(news + pos1, b);
    strcpy(news + pos1 + strlen(b), a + pos2);//the right half
    cout << "-- replace_by2pos or by substring\n" << news << endl;
    delete[] news;
    news = new char[strlen(a) + 1];
    strcpy(news, a);
    pos1 = 1;
    pos2 = 6; //1 past the region
    *(news + pos2) = 0;
    cout << "-- substr_by_pos\n" << news + pos1 << endl;
    strcpy(news, a);
    pattern = "ou";
    cout << "-- progressive matching (counting, lastIndexOf..)\n" << endl;
    while (1) {
        news = strstr(news, pattern);
        if (!news)
        cout << news << endl;
        news += strlen(pattern);
    delete[] news;
    const char* strArray[] = {"aa", "bb"};

std::copy-print array of pointers

Suppose you already have a friend operator<<(ostream&, YourClass &) for YourClass, but you need to print out an array of pointers —

YourClass* array[99]

Here’s a first attempt —

copy(array, array+99, ostream_iterator(cout,” “); // prints the addresses

Simple solution —

Simply overload operator<<(ostream&, YourClass* const) using the existing operator<<

array-name can’t be LHS #almost

An array name myArr (including c-str) looks like a variable but no no no. It is the name plate on a room door.

  • It represents an allocated room of fixed size
  • It is permanently tied to a fixed address
  • You can’t put this name plate on another door later on.
  • (obscure) You can’t put a 2nd c-array name plate on the same room door like q[ cstr2=existingCstr ], either as initialization or assignment on cstr2.
    • you can use q[ char * ptr2=existingCstr ], though ptr2 is not a c-str not an array.

An array-name in source code is like a const ptr (not ptr-to-const) with a value like 0xFF82, i.e. an alias for Address 0xFF82. Such an Address is not an Lvalue. Therefore array-name isn’t an Lvalue. Array-name can’t be LHS —

int aa[3];
int b[3];
aa=b; // error

That’s trivial, but look at this —

f(aa); // f was declared f(int pa[])

When you pass the array-name “aa” to a function, the array-name is effectively on RHS (not breaking the LHS rule). The LHS is the function parameter-variable “pa”, which is always treated as pointer Variable even though you declare the function parameter-variable “pa” as an array-name! Compiler converts it to f(int * pa).

In summary,
* Rule 1 — array name can’t be LHS of an assignment
* Rule 2 — function param-var is effectively the LHS of an assignment. You can declare a function param-var as array type, which seems to violates the 2 rules above, but
* Rule 2b — array param-var is converted to pointer param-var

Now let’s look at c-string, an important special case , since the “=” is initialization, not assignment.

char const * s = “literal”; // compiler turns a blind eye if you remove the “const” but DON’T
char s8[] = “literal”;

Note the above creates an char-array of length 7+1=8 including the null char, but the array below has length 7 only:

char s7[] = {‘l’, ‘i’, ‘t’, ‘e’, ‘r’, ‘a’, ‘l’};

// To modify such a literal string,
char s[] = “literal”;

container class-template having array field@@ #swap

Background — pretend to be an author of a new container template.

Most containers internally uses some raw array. Therefore the container class often (or usually?) needs a non-static field for that array. It’s tempting to simply declare an array field like

   T onsiteArray[size]; // T is the template parameter

This container is not growable. But today let’s focus on another issue — I feel this makes swap() hard to implement. Swap() achieve efficient move semantic through pointer-Variable swapping. onsiteArray is an Array-Name, not a Lvalue variable, so you can’t manipulate it as a pointer-Variable.

Note “pointer” has immensely confusing meanings (http://bigblog.tanbin.com/2012/03/3-meanings-of-pointer-tip-on-delete.html). onsiteArray is more like (alias of) pure address, but swap() requires pointer Variables.

To support swap(), I often use

      T* real_array; // to be dynamically allocated outside the class instance, and deleted in dtor

&myVec[0] is like a pointer into the underlying array

A STL container typically defines a nested typedef named “reference“, so within this container class template, “reference” can be used as a data type just like “int” or the dummy type T. Typically “reference” is a typedef for “T&”.

myVec[0] is a “function[1] call” returning such a reference. Therefore, &(myVec[0]) is a real address, nothing more nothing less — I won’t call it a pointer variable or pointer object — NOT_A_pointer_AT_ALL. This expression “&myVec[0]” evaluates to an address.

Incidentally, the vector class template also has a typedef named “pointer“. I think it’s added in c++03.

[1]if you regard overloaded operator as a special funciton

duplicate a c-string on stack

            char tmp[strlen(src)+1]; // I guess strlen(src) must be known at compile time

On stack, the array capacity must be determined before runtime.

If you want to declare an array whose capacity is determined at run time, you must use heap. http://stackoverflow.com/questions/252782/strdup-what-does-it-do-in-c explains strdup()

c++ Push all zeros in array to the end #easy

Basic programming exercise

#include <cstdlib>
#include <iostream>
#include <iterator>
#include <assert.h>

using namespace std;
int a[] = {0,0,-1,2,0,4,0,0,8,0};
Push all the zero's of a given array to the end of the array. In place only. Ex 1,2,0,4,0,0,8 becomes 1,2,4,8,0,0,0
int main(int argc, char *argv[])
   size_t size = sizeof(a)/sizeof(a[0]);
   copy(a, a+size, ostream_iterator<int>(cout, " "));
   int left = 0;
   int right = size -1;
   while(left < right){ if (0==a[left]){ while (0==a[right]) --right; if (left >= right) break;
          swap(a[left], a[right]);
   cout<<"\n new: \n";
   copy(a, a+size, ostream_iterator<int>(cout, " "));

island rainfall problem – C array/pointer algo

#include <cstdlib>
#include <cstdio>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <assert.h>
using namespace std;
int const island[] = { 54, 50, 54, 54, 52, 55, 51, 59, 50, 56, 52, 50 };
///////////////   Pos # 0   1   2   3   4   5   6   7   8   9  10  11
int const size = sizeof(island) / sizeof(int);
int accu = 0;
template<class ForwardIterator>
ForwardIterator max_element_last(ForwardIterator first, ForwardIterator last) {
    ForwardIterator ret = first;
    if (first == last)
        return last;//empty range
    while (++first != last)
        if (*ret <= *first)
            ret = first;
    return ret;
void print1(int const* const a, char const * const label) {
    printf("%s=%d/%d ", label, *a, a - island);
void printAll(int const* const L, int const* const l, int const* const h,
        int const* const H) {
    if (l < h) {
        print1(L, "wallL");
        print1(l, "ptr");
        printf("  ");
        print1(h, "ptr");
        print1(H, "wallH");
    } else {
        print1(H, "wallH");
        print1(h, "ptr");
        printf("  ");
        print1(l, "ptr");
        print1(L, "wallL");
    printf("%d=accumulated\n", accu);
void onePassAlgo(){
    int*wallLo, *wallHi;
    int*l, *h; //moving pointers
    wallLo = l = const_cast<int*> (island);
    wallHi = h = const_cast<int*> (island) + size - 1;
    if (*l > *h) {
        std::swap(l, h);
        std::swap(wallLo, wallHi);
    printf("All pointers initialized\n");
    while (l != h) {
        if (*l > *wallHi) {
            wallLo = wallHi;
            wallHi = l;
            std::swap(l, h);
            //printf("new wallHi:");
        } else if (*l >= *wallLo) {
            wallLo = l;
            //printf("new wallLo:");
        } else {
            accu += *wallLo - *l;
            printf("adding %d liter of water at Pos#%d (T=%d)\n", *wallLo - *l,
                    l - island, accu);
        //now move the scanner
        if (l < h)
void twoPassAlgo() {
    int const* const peak = max_element_last(island, island + size);
    printf("highest peak (last if multiple) is %d, at Pos %d\n", *peak, peak
            - island);
    //(island, island + size, ostream_iterator<int> (cout, " "));
    //forward scan towards peak
    int* pos = const_cast<int*> (island); //left edge of island
    int* wall = pos;
    for (++pos; pos < peak; ++pos) { if (*wall > *pos) {
            accu += *wall - *pos; // accumulate water
            printf("adding %d liter of water at Pos#%d (T=%d)\n", *wall - *pos,
                    pos - island, accu);
        //ALL new walls must match or exceed previous wall.
        printf("found new wall of %d^ at Pos#%d\n", *pos, pos - island);
        wall = pos;
    cout << "^^^ end of fwd scan ; beginning backward scan vvv\n";
    //backward scan
    pos = const_cast<int*> (island) + size - 1;
    wall = pos;
    for (--pos; pos > peak; --pos) {
        if (*wall > *pos) {
            accu += *wall - *pos; // accumulate water
            printf("adding %d liter of water at Pos#%d (T=%d)\n", *wall - *pos,
                    pos - island, accu);
        //Note all new walls must match or exceed previous wall.
        printf("found new wall of %d^ at Pos#%d\n", *pos, pos - island);
        wall = pos;
int main(int argc, char *argv[]) {
 Requirement -- an island is completely covered with columns of bricks. If
 between Column
 A(height 9) and Column B(10) all columns are lower, then we get a basin to
 collect rainfall. Watermark level will be 9.  We can calculate the
 amount of water. If I give you all the columns, give me total rainfall collected.
 Code showcasing
 - stl algo over raw array
 - array/pointer manipulation
 - array initialization
 - array size detection
 - std::max_element modified
 - std::swap


My blog http://bigblog.tanbin.com/2009/04/4-basic-foundations-of-all-collection.html claims all java/c#/STL structures are based on 4 basic data structures

– array
– linked graph

Official STL documentation uses the term “sequence containers”. Now, Sequence-container can be non-array-backed. Linked list is one example.

The deque is not completely array-based. See ObjectSpace manual. A more detailed description of the HP implementation of STL deque is on P139 [[stl tutorial]]. It consists of multiple mini-arrays (segments) linked to a lookup construct.

Therefore deque combines array and linked graph.

See also http://bigblog.tanbin.com/2011/09/deque-advantage-over-vector.html.

Deque is one of the few random-access containers.

char const * — string OR ptr-to-single-char@@ #my take

see also my blog – http://bigblog.tanbin.com/2011/09/qq-char-reflex.html
see http://www.codeguru.com/forum/archive/index.php/t-227977.html and http://cboard.cprogramming.com/c-programming/106063-difference-between-const-char-*-char-const-*.html

First thing first — q(const char*) is better written as q(char const *) which reads left-to-right “non-const pointer to const char”

Q: If this is a common C-string, is the entire string const? Some say The const is on the first char only.
%%A:You can advance this (non-const) pointer through an array of chars, but you can’t edit _any_ of the chars using this (const!) pointer – like changing ‘X’ to ‘Y’.
Q: Do we need to explicitly remove constness of a q(char const *)?
A: Yes. Constness radiates left. Use const_cast to remove const
A: I saw this in the Macqurie coding test.

char const * ptr
– often used to represent a c-string, but technically a pointer to a single char. The real thing in memory could be a char-array or could be a single char. You can’t tell. If you don’t know the length of the array, then you don’t know where the array ends, even if there’s a continuous stream of 9999999999 chars.
– This pointer can’t be used to state-edit the char, but the char is possibly state-editable by other pointers.

Q: how long is this c-string?
A: you can’t tell. The original thingy
– could be a single char + null
– could be a single char without a null
– could be nothing but the null char
– could be a regular null-terminating c-string
– could be a bunch or chars without a null. This ptr may advance forever without a null character.
– or this pointer may point to a heap location already deallocated – dangling pointer
– or this pointer may be uninitialized

q[ char * ] reflexes in C++ developers

A java guy develops reflex on wait() or HashMap; a c++ guy has the same for ( char * ), a C heritage that’s pervasively inherited and embedded in C++.

– reflex — if this is a key field of a class, then very likely there’s implicit conversions to/from a “string like this”
– reflex — usually a string. Specifically, if this appears as func param or return type, then almost always means a string

– if on heap, array-new. Note array-new or regular new expressions share identical return type i.e. qq( char * ). Therefore whoever calling new is the only buy knowing to call array-delete or regular delete. No one else knows how to delete that pointer.

– if on heap, someone must delete it, or we get memory leak. Best way is RAII dtor

– if you see any variable declared as qq( someType * ), then you know this is one side of a coin. Something is wrong if the other side is not specified — the array size. However, for qq (char *), array size is optional, given the null terminator.

– It’s possible for the array not to have a null terminator

– It’s possible for the pointee to be on heap or stack

– better use all the C string functions. They operate on qq( char * ), be it stack or heap, null-terminated or not

– reflex — if you see 2 qq( char* ), they might be 2 overlapping char-arrays !
– reflex — if we have qq (char * var1; float var2)   then the object addresses are qq( var1;  &var2). Note the missing & in front of our string variable!

vector insertion — implicit copy-ctor frequently

When you insert your object into a vector, the original object (B) is passed by value. Copy-ctor called exactly once for each insert.

    vector v(3); // default ctor — 3 calls.
// update — now I see one default ctor call, and 3 copy ctor calls.

    cout<<v.capacity()<<endl; // capacity = 3. Next insert would reallocate to 6
    v.push_back(B()); // temp B instance created, copied in, then destroyed. The reallocation calls copy ctor 3 times on the existing elements. 3 original objects destroyed
    cout<<v.capacity()<<endl; // Capacity is now 6
    v.push_back(B()); // temp B instance created, copied in, then destroyed. No reallocation
    cout<<v.capacity()<<endl; // Capacity remains 6

Here are some useful instrumentation code —

struct B {
    static int count;
    int id;
    B() {
        cout << count << " B()\n";
        id = count;
    B(B const & rhs) :
        id(rhs.id + 10) {
        cout <<id << " copied\n";
        cout<<id<<" ~B\n";
int B::count = 0;
ostream & operator<<(ostream & os, B const & b){

C-str ^ std::string (+boost) – industry adoption

I believe the boost features aren’t needed for coding test. C++ coding IV doesn’t care std::string or cStr. Perhaps std::string is easier.

C++ threading, c++ trading, c++ pricing, c++ connectivity… all use C string functions like a workhorse. STL string class is still largely displaced by C string.

I think many teams in need of custom optimization create their homemade string classes.

Boost offers a bunch of standalone functions as string utilities + the tokenizer class. See http://www.boost.org/doc/libs/1_36_0/doc/html/string_algo.html, also covered in my boost book.

Cell * head[HOW_MANY_BUCKETS];

How do we master this extremely common c++ declaration?

Cell * head [HOW_MANY_BUCKETS]; // HOW_MANY_BUCKETS can be constant like 99, or a variable in the new standard — VLA

1) head is the name of the variable.
2) qq(Cell *) means “pointer to Cell”

If you use typedef you can rewrite it as

ptr2Cell head [HOW_MANY_BUCKETS];

This reads “head is an array (size 99) of pointers”. In fact, it turns out this is the array of buckets in a hash table. There are 99 linked lists, so we have 99 list heads.

The above declaration is slightly different from:

Cell **head; // no need to specify array size.

cStr cheatsheet – duplicate a c-str on stack (no strdup)

http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html sample code shows that C99 supports variable-length-array, or VLA.

Now it’s easy to duplicate a c-string on stack as a local variable.

void dup(char* orig) {
char dup[strlen(orig) + 1];
strcpy(dup, orig);
cout << “– duplicate a string on stack\n” << dup << endl;
int main() {
char a[] = “Four score and seven years ago”;

cStr bread-and-butter functions

strcmp / strncmp
strcpy / strncpy
strcat / strncat


strdup — see below


memcpy / memmove (copy with buffer)

Q: why bother with these low level stuff in financial systems, when everyone is moving to C++, java and C# ?
A: low-latency — requires low-level full control
A: these are cross-platform std library functions, in man page section 3
A: lots of production code is in C, not c++
A: I feel C string functions aren’t inferior to std::string or STL vector
A: practically, if you already have a character array, it’s easiest/fastest to manipulate it this way
A: many shops use home-grown standard-conforming string implementations with vendor extensions. I vaguely remember a Barcap colleague said they use a home-grown string class custom-optimized
%%A: it takes quite a bit of vi-style “mileage” to master string manipulation. If you “invest” in std::string you may find it less widely used in some domains like trading.

Q: why use std::string?
A: one advantage is reference counting.
%%A: a wealth of convenience string utilities

[[more effC++]] item on c vs c++ points out that strdup allocates memory (heap/stack) but different environments may internally use malloc() or new(), so I’d say use strdup() only if you don’t need portability.

cStr split – short and sharp eg

See also the concise pdf on c++ streams.

Similar to std::string getline() with delimiter, C-string offers strtok, but the delimiter must be single-character, though you can put together a family of delimiters.

Note first arg must be non-const char*. You may need to first make a copy off a char const*. This string is modified, according to documentation.

using namespace std;

int main() {
    char months[] =
    cout << months << endl;

    char const * const possibleDelimiterS = “^”; //single chars only
    for (char* token = strtok(months, possibleDelimiterS);
            token = strtok(0, possibleDelimiterS)) { //0 means “reuse”
        cout << token << "\n";

cStr checkHasText()

bool checkHasText(char const* c){
    unsigned int const len = strlen(c);
    unsigned int i = 0;
    for (; i <= len – 1; ++i) {
        if (!isspace(c[i]))
    if (i == len) {
        cout << "all space\n";
        return false;
    } else {
        cout << "has text\n";
        return true;

Alternatively, here’s a pointer-arithmetic version, more direct, less readable

bool checkHasText(char const* c){
    for (; *c; ++c)
        if (!isspace(*c))
            break; //better than return true — can check *c
    return *c; //implicitly cast to bool.
//    if (*c) {
//        cout << "has text\n";
//    } else {
//        cout << "all space\n";
//    }

initialize array-of-pointer to all nulls

Suggestion 1: LimitOrder* orders[howManyOrders] = {};

Suggestion 0: P37 [[c++ coding standards]] suggests “.. = {NULL}“, consistent with P115 [[c++primer]].

See also http://bigblog.tanbin.com/2012/07/default-initializevalue-initialize-new.html

See also http://www.informit.com/articles/article.aspx?p=1852519 on c++11 array-init:

For an array (on stack or heap), an empty pair of braces indicates default initialization. Default initialization of POD types usually means initialization to binary zeros, whereas for non-POD types default initialization means default construction

sizeof : array on stack^heap

Here are 3 int pointers — almost indistinguishable at runtime, but compile-time….?

int main(){
cout<<sizeof (char)<<endl; // always 1 by definition
cout<<sizeof (void*)<<endl; // shows 8 on my 64-bit machine, the size of any address
cout<<sizeof (int)<<endl; // shows 4 on my 64-bit machine

int * ipNew = new int;
cout<<sizeof ipNew // 8, size of an address

int * iaNew = new int[5];
cout<<sizeof(iaNew) // 8 too, since array-new returns just an address

int ia[] = {1,3,9};
cout<<sizeof(ia) // 12, size of an Entire array variable on stack

At run time, ia probably is just a simple pointer (to stack) and doesn’t remember the array length. However, when sizeof is evaluated at compile time, compiler knows the array length!

C++ array declaration and object construction

Pig * ptr = new Pig[5]; // pointer to an array of simple Pig objects
Q: Does this create 5 Pig objects? YES tested. P238[[ 24 ]]
Q: using no-arg constructor? yes tested.
Q: contiguous memory? yes

Note the return type of this or the singular-new is identical — ptr to Pig. No one knows to use array-delete except the guy initializing the ptr.

Pig arr[5];
Q: Does this create 5 empty Pig objects? … in contiguous memory? YES … on stack? ALWAYS!
Q: no-arg constructor needed? Yes.
Q: How many times does constructor get called if we use a debugger? 5 times

#define SIZE 3
using namespace std;
#define S(x) cout<<#x<<":\t"<<x<<"\n";
class CAT {
CAT() {
cout < no-arg constructor with addr = ” << this << "\n";
itsAge = 0;
itsWeight = 0;
} // default constructor
~CAT(); // destructor
int GetAge() const {
return itsAge;
int GetWeight() const {
return itsWeight;
void SetAge(int age) {
itsAge = age;
CAT operator=(const CAT&);
CAT::CAT(const CAT&);
int itsAge, itsWeight;
CAT::CAT(const CAT& rhs) {
cout < copy constructor ” << this << " <- " << &rhs << "\n";
this->itsAge = rhs.itsAge;
this->itsWeight = 8888888;
cout << "<– copy constructor\n";
// if you return CAT& then you won’t call copy constructor upon return
CAT CAT::operator=(const CAT& rhs) {
cout < assigning ” << this << " <- " << &rhs << "\n";
if (this == &rhs)
return *this;
this->itsAge = rhs.itsAge;
this->itsWeight = 11111111;
cout << "<– assignment " << this << " <- " << &rhs << "\n";
return *this;
CAT::~CAT() {
cout << this << " Destructor called!\n";
this->itsAge = this->itsWeight = 999999999;
int main() {
CAT * Family = new CAT[SIZE];
cout << "<– array declaration\n";
int i;
CAT * pCat;
for (i = 0; i < SIZE; i++) {
pCat = new CAT; // same memory location would be reused after reclaim!
pCat->SetAge(2 * i + 1);
CAT & alias2family = Family[i];
CAT * tmp = &(Family[i] = *pCat);
S(tmp);// same addr across iterations, due to the assignment pbclone
delete pCat;
for (i = 0; i < SIZE; i++)
std::cout << "Cat #" << i + 1 << ": " << Family[i].GetAge()
<< std::endl;
delete[] Family;
return 0;

deque vs vector, briefly

[[Essential C++]] points out that deque offers efficient insert/delete at the first node. For a vector this would be mighty inefficient since all existing nodes must shift. This is the real advantage of deque over vector — O(1) insert/delete at the position 0. If you don’t need it, then vector is (simpler therefore) faster.

Q: is a deque expandable?
AA: expandable at both ends

Q: Why is deque (contiguous!) good at this?
AA: Wiki says STL uses multiple arrays — Storing contents in multiple smaller arrays, allocating additional arrays at the beginning or end as needed. Indexing is implemented by keeping a dynamic array containing pointers to each of the smaller arrays. I call this arrayOfVariableSizeArrays. I guess random access can be quite fast.
%%A: a circular array? Reallocation when reaching capacity?

In terms of efficiency and implementation, I feel a deque treats both head and tail ends “fairly”, like color-blind.

http://faq.cprogramming.com/cgi-bin/smartfaq.cgi?id=1073086407&answer=1069897449 suggests deque should be used if the list of items has any chance of being very large, whether you need insertions and erasure at the beginning or not.

It also says a vector requires N consecutive blocks of memory to hold its items where N is the number of items and a block is the size of a single item [1]. This can be a problem if the vector needs 5 or 10 megabytes of memory, but the available memory is fragmented to the point where there are not 5 or 10 megabytes of consecutive memory. A deque does not have this problem, if there isn’t enough consecutive memory, the deque will use a series of smaller blocks.

A minor drawback of deque is the always-invalidate-all-iterators behaviour when you insert (not erase) anywhere.

[1] Each item is copied into the vector, unlike java vector’s pointer-copying.

c++ array variable doesn’t carry size information

Java arrays are objects and, unlike c, has extra storage to hold its size. C arrays have exactly { sizeof(element) x initialized elementCount} bytes, so about the only way to pass on the array size information is to pass another variable. Examples follow.

Q: in c++ standard, why must we pass in an arraySize param to main(int arraySize, char ** args)?

Answer is below, but let me clarify that array params are widely used in C and (consequently) c++ financial apps.

Q1: what’s the difference between f(sometype*) and f(sometype[] )?
A: no diff. Either way f() receives a pointer.
A: [[Absolute c++]] section on array parameter explains nicely that an array variable has 3 parts
– starting address
– type
– size

….but the size is not passed to f(). f() could increment the pointer forever without knowing it trespasses on the real estate of another object or de-allocated wasteland in the free list.

Rule on array parameter — Always add a size param to an array param.

To pass a std::vector into a f(MyType* data, int size), do it this way

    f(&vec[0], vec.size(); //P74 [[effSTL]]

Q2: But why is there never array size in C string functions?
A: the exception to the Rule. c-string is null-terminated, so pointer won’t trespass.

STL vector, according to Stanford lecture

In c++, technically vector can (though occasionally not recommended) replace arrays in all cases except…? That’s actually a question (Q1)

* if a func param is an array i.e. a pointer to the element type. [[effective stl]] Item 16 comes to the rescue.
* (not for vector) if you need thread safety but your string implementation uses ref counting

The Stanford lecturer Julie Zelenski suggests the answer to Q1 is “nothing”. She did point out

+ vector offers bound checking.
– Every insert and remove must copy/relocate elements “on the right”, to keep memory contiguous.

[[effective stl]] also covers this topic.

However, we still must master C arrays, as they are widely used in production c++ code, not to mention C.

is actual argument an array or singular pointer@@

Quiz: pretend to be a typical c/c++ function at run time. If one of the arguments to the function is declared an int* pointer, how do you know if real thing is an array or a single pointer?

A: if this is the only argument, then you can’t tell. There has to be another argument like “int size”
A: however, if the argument is char*, then there may not be a size argument. Assume C-string please.

As a function parameter, a simple pointer is indistinguishable from an array.

importance of arrays in c++

Q: arrays are fixed blocks of memory (can’t grow beyond their initial allocated “lands” into neighbors). why are they so popular and relevant in c++?
A: The Defensive Coding chapter in Microsoft author’s [[Solid Code]] advocates arrays (over vectors) on the ground of simplicity.

A: a lot of c++ apps are owned by C developer teams. A lot of arrays in use. Consistency is esp. important when you are not in green field.

A: most enterprise C apps, in any language, use lots of strings. The most common string in c/c++ is still a char array.
A: c++ is low level. Arrays are a fundamental low-level structure. For any collection, perhaps the only alternative is linked data structure.

A: as stated in the blog [[4 implementations]], array is the #1 most fundamental implementation of all collections and containers. Salt and vinegar don’t taste good, but you can’t cook without them.

A: pointers are fundamental to and powerful in every language. array is the most natural user of pointers. Much of c++ “most powerful language” reputation rests on pointer/array integration.

A: array is to vector what wait/notify is to jdk5 concurrency constructs

C array name vs variables, again

(This applies to C/C++, but how about c# and java? I guess this is an obscure grammar rule that many people just follow without investigating.)

An array name looks like a ptr variable but no no no. Analogue — an array object is a (permanently) allocated “room” in memory; an array name is a (permanent and dominant) name plate on the room door, which a variable never is.

Rule1: Suppose an array name is “myArray”, you can’t put this “myArray” name plate on another room’s door later on.

Rule2: You can’t put another name plate (“myArray2”) on the same door.

This array name is a permanent and exclusive “owner” of this particular room. Conversely, the room permanently belongs to this array name. An “aliasing” pointer Variable can be created for this same address, but compiler won’t allow another array-name as another name plate on this room (Rule2)

Before Variable-Length-Array, array size must be determined at compile time, therefore for a function f(int pa[]), the array-name pa is nonsense. Actually, the compiler translates that to f(int* pa). VLA didn’t change this rule.

See [[c++ primer]].

std::string field and other non-ptr fields in a c++ class

Background — I feel this is a fundamental but overlooked design consideration.

In java, any non-primitive field is a ptr. In c++, std::string is a common field type. No ptr needed. std::vector is similar… But What other non-ptr fields are common? Let’s exclude primitive types like double or “bool” (shorter spelling than java “boolean”)

I feel any class like Address can be used as the nonref type of a field myAddr in a new class Customer. Customer ctor needs to allocate this field, but is it allocated on stack or heap? It depends on the context.

– If you allocate the entire Customer object on heap, then myAddr field is also on heap
– ditto for stack. You can trace the steps of myAddr’s allocation and there’s no malloc().

Either way, upon Customer de-allocation, myAddr is automatically /bulldozed/ since myAddr is on the real estate of the Customer object. sizeof(Customer) includes sizeof(Address), which is not the case for pimpl.

Now we know nonref field like qq{ Address myAddr; } is the c++ default whereas qq{ Address * myAddrPtr; } is the pimpl/java/c# version.

raw arrays and raw ptr still abound in c++

I doubt developers really follow this

Advice #1: Treat raw pointers and arrays as obsolete; favor smart pointers, vectors and string objects 99% of the time.

Look at any production code.
Look at any library code.
Look at any func param and return types.

C++, like C, is a low-level language. Pointers and addresses are the basic tools for a low-level language. Arrays are low-level too. However, c++ is often used as a high-level application language, just like java and c#. In that case, Advice #1 might apply.

Even in java, arrays are heavily used, often more than ArrayList.

save a literal string in various simple data holders

— basic char pointer
char * ptr  = “and”;
char const * const_ptr = “and”;

— char array
char charArray[] = “and”;
char const  constCharArray[] = “and”;

— std::string
std::string std_string = “and”;

— std::string to vector of char.
std::vector<char> vc (std_string.begin(), std_string.end()); //

(2/3 of C/C++ coding tests involve strings…)

2D-array-variable vs double-ptr-variable # c++

[1] Consider a strict 2D-array-variable declared as
int myArray[9][9]; // size? exactly 81 ints + 9 pointers. Are the 81 locations contiguous? Yes for a 2D array.

For a 1D array of pointers (probably same as jagged array), Not contiguous. The 1st (contiguous 9 ints) can live far from the 2nd (contiguous 9 ints). http://stackoverflow.com/questions/7784758/c-c-multidimensional-array-internals has the syntax

Any array is fix-sized (in theory). If you want to append physically, the physical memory block past the end of your array may not be available :-(.

So What solutions can you imagine?
P101 [[c pointer & mem mgmt]] shows diagrammatically a very common solution using a double pointer. Note a 2D-array variable won’t work here.

Once you allocate every nested array in the strict 2D structure, its memory usage is fixed, as it will look like [1] above?? I don’t think so. It’s more like the jagged 2D of C#. The declaration of myArray in [1] restricts the physical capacity of all the sub-arrays. In contrast, for a variable declared as a double-pointer, now I can point it at myArray, and later I can reseat it to an int[8][8].

I can also reseat my double-ptr at a partially initialized 1D array (size 5) of pointer objects, where the last pointer object is null. I can then point that last pointer to an int[99].

Now we realize a strict 2D-array-variable and a double-pointer-variable are quite different.