std::vector capacity reduction

Q: how would vector’s backing array reduce in size? In other words, how would the capacity every reduce?
%%A: The only hope is to shrink_to_fit() which is a request to compiler. Compiler may ignore it.

If capacity reduction does happen at runtime, then reallocation would probably happen.

I believe resize() assign() clear() etc will never reduce vector capacity.

Advertisements

STL map[k]=.. Always uses assignment@runtime

The focus is on the value2, of type Trade.

— someMap[key1] = value2; //uses default-ctor if needed. Then Unconditionally invokes Trade::operator=() for assignment! See P344 [[Josuttis]]

Without default-ctor or operator=, then Trade class probably won’t compile with this code.

— someMay.emplace(key1, value2); //uses some ctor, probably the copy-ctor, or the cv-ctor if value2 type is not Trade

Can C implement basic containers@@

Templates — I think template is unavailable in C, and I don’t know how C could emulate it, except using macros.

C doesn’t support exceptions but luckily very few exceptions in STL.

Most STL container classes have relative few member functions. I think C can implement free functions with “this” as an additional parameter.

Bulk of the basic operations on container are global functions in the form of STL algorithms. I think C can implement them.

Iterators are tricky. In STL I think they are usually nested classes. I think I would just give up. C supports no operator overloading so there’s no point providing an iterator.

std::priority_queue phrasebook

Mostly based on [[Josuttis]]

  • #include <queue> // this class template is functionally closer to a regular queue
  • container adapter — based on other containers
    • defaults to vector, but deque fine.
  • random access — base container must provide random access iterator (to support fast sorting)
  • push and pop — both O(logN). https://en.cppreference.com/w/cpp/algorithm/push_heap shows individual insert and delete_max are both logN in worst case
  • make_heap() — linear in worst case, guaranteed:)

STL containers +! pointer

  1. std::array
  2. std::string SSO i.e. small-string optimization

These containers

  1. heap storare is not required
    • trivial special case — if this container is a nonref field of a Shape object, then q(new Shape) would still allocate the container on heap.
    • This is similar to “Are java primitives on heap/stack?”
  2. Therefore contains no pointer.
  3. Therefore move ctor runs in linear [1] time compared to constant-time move-ctor in other STL containers
    • std::array probably offers a move-ctor and move-assignment

[1] P205 [[effModernC++]] also addresses

Q: What happens when you move a std::array container with 77 Acct elements?
A: 77 individual Acct objects are moved to the new container, in linear time, invoking the move-ctor of Acct class

map::emplace() calls std::pair ctor

The non-map emplace() usually forwards the args to the ctor of the element type

vector<Acct>::emplace(args) would call Acct(args)

In contrast, map emplace() calls std::pair ctor!

https://en.cppreference.com/w/cpp/container/map/emplace shows emplace() using 3 versions of std::pair<> class template constructors.

    std::map<std::string, std::string> m;
 
    // uses pair's move constructor
    m.emplace(std::make_pair(std::string("a"), std::string("a")));
 
    // uses pair's converting move constructor
    m.emplace(std::make_pair("b", "abcd"));
 
    // uses pair's template constructor
    m.emplace("d", "ddd");

The simplest is most “magical” and (i guess) may not work for more complex data types. I used this feature in https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_miscIVQ/concretize.cpp

given arbitrary value X,get nearest2nodes in std::map

Basically, find the left/right neighbor nodes.

! Don’t use upper_bound since lower_bound is enough.

  • If perfect match, then lower_bound return value is all you need. No need for 2 nodes:)
  • If no perfect match, then lower_bound() and prev(lower_bound)
  • if X too low, then begin() alone is all we can get
  • if X too high then prev(end()) alone is all we can get

See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/curveInterpolation_CVA.cpp

NavigableMap.java interface offers four methods .. see closestMatch in sorted-collection: j^python^c++

lower_bound() means touchUpperBound #4 scenarios

  • std::upper_bound() should be named strictUpperBound()
  • std::lower_bound() should be named touchUpperBound() since it can return an element touching the target value

If no perfect hit, then both returns the same node — lowest node above the target

  1. if target is too high, both return end(), which is a fake element
  2. if target is too low, both return begin(), which is a real element
  3. if target is matching one of the nodes, then perfect hit
  4. if target is between 2 nodes, then … this is the most common.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/66miscIVQ/curveInterpolation_CVA.cpp caters to all four scenarios.

favor std::begin(arrayOrContainer)

https://stackoverflow.com/questions/7593086/why-use-non-member-begin-and-end-functions-in-c11 explains some important details.

Q: So how do we choose between

  • this free global function
  • the container member function cont::begin() / end()?

%%A: Basically, I would always use std::begin() instead of cont.begin() esp. in template-enable programs.

std::sort (!!quicksort) requires random-access

list::sort() works on list — a non-random-access container, using a stable version of quicksort. See https://stackoverflow.com/questions/1717773/which-sorting-algorithm-is-used-by-stls-listsort.

There are many online resources about quicksort without random access.

In contrast, java Collections.sort() does NOT require RandomAccess.. RandomAccess #ArrayList sorting

unordered_set implementation note

https://www.gamedev.net/forums/topic/582613-c-how-is-unordered_set-implemented/ says

“unordered_set is typically implemented as a linked list that holds all the elements and a hash table stores pointers to the linked list nodes.”, echoed on https://stackoverflow.com/questions/31112852/how-stdunordered-map-is-implemented

LinkedHashMap.java?

ranged-based for-loop over a map or vector

Hi Kam,

Thanks for your valuable tip. I found this 3-point summary on https://stackoverflow.com/questions/15176104/c11-range-based-loop-get-item-by-value-or-reference-to-const

1) Choose for(auto x : myVector) when you want to work with copies.
2) Choose for(auto &x : myVector) when you want to work with original items and may modify them.
3) Choose for(auto const &x : myVector) when you want to work with original items and will not modify them.

In the case of myMap, the first form for(auto myPair: myMap) still clones the pairs from myMap. To use a reference instead of a clone, we need the 2nd or 3rd forms.

efficient swap(): two containers-of-T

Background — template function std::swap(T&, T&) works for int, float etc, but the same implementation will not work efficiently for vector, list, map or set. Therefore I suspected there might be specializations of swap() template function.

As it turns out, vector (and the other containers) provides a swap() member function. So the implementation of vector swap is indeed different from std::swap().

RBTree range count #enum,auto

// demo range counting with lower_bound. I don't know any faster algorithm
// demo auto keyword
// demo enum
// demo upper_bound different from lower_bound!
#include <iostream>
#include <climits>
#include <set>
#include <assert.h>
using namespace std;

set<int> s;
//typedef set<int>::iterator It; // ---> auto is easier
enum Mode { NoMin, NoMax, BothGiven };

size_t countWithInclusiveLimits(int limit1, int limit2, Mode m = BothGiven){
  if (s.empty()) return 0;
  auto it  = s.begin();
  auto it2 = s.end(); //it2 is the node past the last wanted node.

  if (m != NoMin) it = s.lower_bound(limit1);
  if (m != NoMax){
    it2 = s.upper_bound(limit2);
    assert(*it2 != limit2 && "it2 initial value should be consistent with end()");
  }

  size_t ret = 0;
  for (; it != it2; ++it){
    cout<<*it<<" ";
    ++ret;
  }
  cout<<" --> "<<ret<<endl;
  return ret;
}
int main(){
  for(int i=-4; i<=9; ++i) s.insert(i*10);
  countWithInclusiveLimits(11, 55);
  countWithInclusiveLimits(0, 50, NoMin);
  countWithInclusiveLimits(10, 0, NoMax);
}

##G5 std::map tasks 4cod`IV

custom hash func? See short example on P 364 [[c++standard library]]. [[optimized c++]] has many examples too.

initialize? There’s a simple constructor taking a long initializer, but the insert() methods support the same and are more versatile.

insert? single pair; range (anotherMap.being(), end());

** insert single value — won’t overwrite pre-existing
** map1.emplace(…)
** map1[key2] = val3 // overwrites pre-existing
** insert list of values — http://en.cppreference.com/w/cpp/container/unordered_map/insert

(returning the value) lookup? at() better than operator[]

a pointer type as key? useful technique.

erase? by a specific key. No need to call another function to really erase the node.

Q: create only if absent; no update please
A: insert()

Q2: create or uppate
Q2b: look up or create
A: operator []

Q1: update only; no create please
Q1b: look up only. No create please
A: find() method

Q: check for existance
A: c.find() is slightly better than c.count() esp. for multi_* containers

 

asymmetry lower_bound^upper_bound #IFF lookup miss

For a “perfect” hit in both set::lower_bound() and std::lower_bound(), the return value is equivalent to the target; whereas upper_bound is strictly higher than target. See

To achieve symmetry, we need to decrement (if legal) the iterator returned from upper_bound.
———-
If no perfect hit, then lower_bound() and upper_bound() both give the next higher node, i.e. where you would insert the target value.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

vector<int> v{1,3,5};
int main(){
  vector<int>::iterator it;
  it = lower_bound(v.begin(), v.end(), 2); cout<<*it<<endl;
  it = upper_bound(v.begin(), v.end(), 2); cout<<*it<<endl;
}

lower_bound() may return end() #gotcha

lower_bound() may return end().

If your target value is too high and nothing qualifies, all 6 functions return the right end of the range. If you look at the (key value in) return value i.e. end-of-range,

  • This end-of-range node is a dummy. Never read its key value.
  • After lower_bound or upper_bound, always validate before reading the return value

I spent hours puzzled by the wrong data returned after lower_bound()->first. Specifically, if the map/set is keyed by integer, then the end()->first can be normal int value even when it fails and returns map.end()!

Consistent across 6 functions:

  • std::lower_bound
  • std::upper_bound
  • set/map methods

What if the target value is too low? Easier — upper bound should return left boundary iterator, and lower_bound returns the same iterator! See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/curveInterpolation_CVA.cpp

 

sorted vector^multiset for tick query

I see no advantage to using a tree.

Note the key (timestamps) is repeating, so only multiset is valid.

Space — vector is more compact leading to better cache affinity. There’s a price to pay — Vector should use reserve() to prevent reallocation, if possible. Tree doesn’t need it.

Query by lower_bound/upper_bound — I guess both are similar O(log N)

Initial insertions — Remember our data are ticks from an exchange loaded from a large file.

  1. 1. For one insert (assuming no re-balancing no reallocation), vector is O(1) but tree is O(log N) 😦
  2. 2. If data comes in randomly, then tree re-balancing is less frequent but more than once. For vector, it’s a single quicksort after all insertions. I would say vector is faster mostly due to the previous factor
  3. 3. if data comes in (almost) by ascending timestamp, then vector requires no or minimal sorting, but the tree would require frequent re-balancing — presorted data is probably the worst input sequence for one-by-one-insertion.
  4. 4. If no re-balancing is done, then the tree would have depth = N, and query would be linear search 😦

LRU cache #Part 2 c++#LinkedHashMap

Java offers LinkedHashMap-based solution. Below is a simple c++ version written by https://leetcode.com/problems/lru-cache/discuss/

    • Basic idea is nice — list as primary storage; and map value points to list nodes.  Therefore, By design, the lookup operation never inserts new nodes.
      • (Even if I can think of the same idea, I would be slow writing up this code in real time.)
    • This code showcases a handful of useful methods of list and map. For example,
    • list.splice() : Transfers the element pointed to by 3rd arg (itr), from 2nd arg (another list), into *this. The element is inserted before the element pointed to by 1st arg (itr)
    • To reproduce this code in real time, you need to memorize list.splice()! I don’t have the time.
    • The map value is an iterator — essential!
class LRUCache{
    size_t m_capacity;
    unordered_map<int,  list<pair<int, int>>::iterator> m_map; //m_map_iter->first: key, m_map_iter->second: list iterator;
    list<pair<int, int>> m_list;                               //m_list_iter->first: key, m_list_iter->second: value;
public:
    LRUCache(size_t capacity):m_capacity(capacity) {
    }
    int get(int key) {
        auto found_iter = m_map.find(key);
        if (found_iter == m_map.end()) //key doesn't exist
            return -1;
        m_list.splice(m_list.begin(), m_list, found_iter->second); //move the node corresponding to key to front
        return found_iter->second->second;                         //return value of the node
    }
    void set(int key, int value) {
        auto found_iter = m_map.find(key);
        if (found_iter != m_map.end()) //key exists
        {
            m_list.splice(m_list.begin(), m_list, found_iter->second); //move the node corresponding to key to front
            found_iter->second->second = value;                        //update value of the node
            return;
        }
        if (m_map.size() == m_capacity) //reached capacity
        {
           int key_to_del = m_list.back().first; 
           m_list.pop_back();            //remove node in list;
           m_map.erase(key_to_del);      //remove key in map
        }
        m_list.emplace_front(key, value);  //create new node in list
        m_map[key] = m_list.begin();       //create correspondence between key and node
    }
};

std::vector-of-containers: initializer list

Typical example: If you heavily use a vector of map, it’s tempting to use a vector of pointers to maps. The java way.

If you drop the “pointers to”, then when you retrieve the map from the vector, you often get a copy, unless you save the return value in a reference variable

By the way, here’s an initializer for std::map:

vec.push_back(map<int, int>{{32,1}} );

deque::erase is fairly efficient

Time complexity deleting one item from a deque:

Depending on the particular STL library implementation, there’s up to an additional linear time on the number of elements between position and one of the ends of the deque.

Deque maintains multiple linked segments. Each segment is continuous and designed to be small, so that the distance between any deletion position and end-of-encolosing-segment is small. For example,

. If a vector has size 100,000 and your deletion (or insertion) position is 5, then you must shift 99,995 elements.
. If your deque has segment size 32, and your deletion (or insertion) position is 5 in the enclosing segment, then you must shift the subsequent 27 elements

The choice of 32 as a segment size is my random guess. If segment size is too large, then deletion (and insertion) is slower. If too small, then there will be too many small segments — poor cache efficiency.

STL containers: pick correct SORT for each

Needed in coding IV

  • std::sort() is good for array, vector, deque…
  • std::list has no random access iterator so it must use list::sort() method
  • a single string? You have to construct a vector<char>
  • unordered containers don’t need sorting

map (and set) remain sorted so don’t need a sort operation. They can specify a special sort comparitor either as a template arg or a ctor arg. P316 and P334 [[c++ std lib]] compare the two choices. You can also also pass no comparitor arg but define operator< in the key class

[12]back-scan any container,print`every Other item #MS

Have I overspent my time on this once-asked question?

The umbrella question — write a utility function to iterate any container and print out every Other element backwards?

Good coding practice! I think this is all about iterator syntax knowledge (my weakness) not algorithm (my strength)!

Note this is really about knowledge not  coding abilities. QQ not ZZ.

Iterator declaration is a can of worm 😦 I might need to give up on this.

#include <iostream>
#include <vector>
#include 	<list>
#include <set>
using namespace std;

template<class _InIt>  void printAlternateItem2itr(_InIt _First, _InIt _Last){
	bool flag = true;
	// if the iterator is from rbegin, then ++ would reverse it!
	for (_InIt it = _First; it != _Last; ++it, flag=!flag) {
		if (flag) cout << *it << ' ';
	}
	cout << endl;
}
template <typename CONT> void printAlternateItemBackward(CONT const & cont) {
	printAlternateItem2itr(cont.rbegin(), cont.rend());
}
int main() {
	//vector<int> cont = { 11,2,3,4,5,6,7,18 };
	//list<int> cont = { 11,2,3,4,5,6,7,18 };
	string cont = "0123456789a";
	set<int> cont2 = { 11,2,33,44,55,66,77,88,99 };
	printAlternateItemBackward(cont);
	printAlternateItemBackward(cont2);
	int arr[] = { 11,2,3,4,5,6,7,18,9 };
	int size = sizeof(arr) / sizeof(arr[0]);
	printAlternateItem2itr(arr, arr + size); //forward only
}

—-
Q: is comparison defined on all iterators?
A: now I think linked list doesn’t. Now I think only random access itr does.

%%Q: what’s the signature of STL find()? I will use those declarations of iterators in my function. (Actually the map and set containers have member functions find() outperforming std::find)

%%Q: from a const container, can u get a non-const iterator?

Q: why don’t you take a container as input? Why must you take iterators?
%%A: it’s more common to take iterator, but in this case container will do. All containers provide rbegin() or begin() including string. Raw array doesn’t but the iterator increment won’t work for raw arrays anyway.


Separate question
Q: OO design — how would you represent Order state transition graph in an OMS?

STL trees: add presorted items one-by-one

The STL standard requires insert() and find() to be O(log N), so the tree has to remain fairly balanced at all times.

Q: When inserting sorted data one-by-one, what’s the frequency of re-balancing?

%%A: I didn’t find any definitive answer. I think it’s quite frequent. System has to assume “This could the last insertion and there will be many queries.” so re-balancing has to kick in whenever the tree becomes lopsided.

Incidentally, if you have presorted data in some temp container and you insert them /in one gulp/, then c++98 [1] standard requires the tree container to provide O(N) insertion which is better than O(N log N).

[1] not c++11. See http://www.cplusplus.com/reference/set/set/insert/

test if a key exists in multimap: count() perf tolerable

map::count() is simpler and slightly slower 🙂 than map::find()

Even for a multimap, count() is slower but good enough in a quick coding test. Just add a comment to say “will upgrade to find()”

In terms of cost, count() is only slightly slower than find(). Note multimap::count() complexity is Logarithmic in map size, plus linear in the number of matches… Probably because the matching entries live together in the RB tree.

std::map key is always field@payload@@

I encounter many situations like map<string, Trade>, where the string tradeRef is also a field of the Trade instance.

Q: is it inefficient to have each string value saved twice in the data structure?

  • A1: for the foreseeable future, std::string is reference counted, so not a big inefficiency
  • A2: map keys should be small. Big key values slow down everything
  • A3: my colleague Paul pointed out a limited technique using a set<Trade> that sorts itself by tradeRef.
    • limitation — no lookup by key. Sometimes, lookup is the main usage!
  • A4: See my solution in bbg coding IV: N most active stocks

use STL Map to emulate a Set#as JDK does

Q1: is it practical to use a STL Map to emulate the Set class-template, at comparable efficiency?

We know a Map stores PAIRs in a red-black tree. Can we just put an empty thing in the PAIR.second field?

Q2: how small can the “PAIR.second” footprint be? 0 byte? 1 byte? Java is less memory efficiency … where everything is heapy thingy..
%%A: i don’t think it can be 0 byte. sizeof(PAIR) is known when you specialize the PAIR template with T1 and T2. sizeof(PAIR) depends on sizeof(T1) + sizeof(T2). Compiler needs to know sizeof(PAIR) in order to allocate memory to a new PAIR object.

Q3: is there any type T2 such that sizeof(T2) == 0?
%%A: I don’t know any.

Q4: we know (confirmed) java HashSet is implemented using HashMap physically. Is that different from C++?
A: different

Q5: java TreeSet.java source code shows a physical implementation using a map. Is that different from STL?
A: different

replace STL with more efficient containers@@

Q: Does anyone ever need to replace STL with more efficient containers?

I have not heard of any.

I feel many attempts to “optimize” or “enhance” STL will resort to RTTI and inheritance. Note inheritance is hard without RTTI — See http://bigblog.tanbin.com/2012/03/virtual-keyword-adding-complexity.html. RTTI incurs several run time costs, usually negligible, but not in the most competitive high performance systems. STL inventors made the conscious decision to completely avoid RTTI. See [[STL tutorial]]. I think a major justifications is efficiency.

Most large c++ systems have their complexity in OO object hierarchy/graph. (Remember all 23 classic design patterns involve some kind of type hierarchy.) However, STL complexity lies in templates. I feel container/algorithm templates add compile time complexity but zero (?) run time complexity.

I feel iterators and functors also add overall system complexity but not run time complexity.

Allocator is an efficiency feature, so it improves efficiency not reduce efficiency.

[12] erase while reverse iteration: map,list #noIV

Sugg: reverse iterator is too complicated when you want to erase. Just use forward iterator!

Tip: reverseItr.base() is designed for insert-by-fwd-iterator. To emulate insertion at a position of a reverse_iterator named ri, insert() at the position ri.base() instead.

Now let’s be specific. When we say “insert at ri” we mean insert to the Right of ri, or insert “inFrontOf” ri in the iteration direction. That’s your goal, but not your code, because insert() can only use fwd iterators. So you use ri.base() to insert and it exactly inserts to the Right of ri. Therefore, For purposes of insertion, ri and ri.base() are equivalent, so to speak.

For the purpose of logical insertion, ri.base() is truly the fwd iterator corresponding to ri. See http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406

Warning: For purposes of erasure, ri and ri.base() are Not equivalent, and ri.base() is Not the fwd iterator corresponding to ri.

Tip: After erase, the r-iterator value is less intuitive to predict. (I believe it points to ri + 1.) Better  continue/break the for-loop, rather than executing to end of the current iteration. It’s not always straightforward to directly “dump” the reverse-iterator.

Tip: after erasing, print all Keys to be sure.

Tip: instrument dtor to see which node is erased

Tip #1: if you need to continue looping, then don’t use for-loop. Use a while loop to gain control over exit condition and over increment on the reverse iterator.

 

for (MapType::reverse_iterator ri = bidBook.rbegin(); ri != bidBook.rend();){

ListType & li = ri->second;
li.remove_if(mem_fun(&LimitOrder::notLive));
if (li.empty()) {
bidBook.erase((++ri).base());
//showKeys(bidBook);
continue;
} else {
bestBid = ri->first;
return;
}
++ri;
}

container class-template having array field@@ #swap

Background — pretend to be an author of a new container template.

Most containers internally uses some raw array. Therefore the container class often (or usually?) needs a non-static field for that array. It’s tempting to simply declare an array field like

   T onsiteArray[size]; // T is the template parameter

This container is not growable. But today let’s focus on another issue — I feel this makes swap() hard to implement. Swap() achieve efficient move semantic through pointer-Variable swapping. onsiteArray is an Array-Name, not a Lvalue variable, so you can’t manipulate it as a pointer-Variable.

Note “pointer” has immensely confusing meanings (http://bigblog.tanbin.com/2012/03/3-meanings-of-pointer-tip-on-delete.html). onsiteArray is more like (alias of) pure address, but swap() requires pointer Variables.

To support swap(), I often use

      T* real_array; // to be dynamically allocated outside the class instance, and deleted in dtor

probably no shallow copy by STL containers

As hinted in Item 3 of [[effSTL]] and [[c++primer]], element copy is performed during insert, sort, container clone… On each element, copy-ctor and op= are used.

Q: Excluding container of pointers, is there any “shallow copy”?
%%A: I don’t think so. Container doesn’t remember the original (incoming) object’s address. Once an element is in the container, container does know its address, and often returns its address. For example, vector has many member functions returning by reference.

Q: does vector subscript accessor return address rather than a copy?
AA: yes, pbref. &vecA[0] gives the address of 1st element. See [[effSTL]] item on getting a C-style array from a given vector.

Q: does deferenced iterator return an address or a copy?
%%A: not sure. remember iterators are smart pointers, so by definition they redefine operator*()

insert() – versatile technique to populate 1 container from another

Let’s start with the most popular container in STL…

For vector, this is the most versatile conversion-technique —
void insert ( iterator position, InputIterator first, InputIterator last ); // http://www.cplusplus.com/reference/stl/vector/insert/ shows

– you can copy another container/array/string in its entirety into a new vector. Essentially convert any container to a vector, or make a copy of a vector — but you need construct an empty container first.
– you can copy part of another container/array/string
– you can copy the data into position 5, shifting 6th element
(all copies are deep copies)

list::insert() is similar — http://www.cplusplus.com/reference/stl/list/insert/
multiset::insert() — http://www.cplusplus.com/reference/stl/multiset/insert/

————————————–
map::insert() is similarly versatile —
void insert ( InputIterator first, InputIterator last ); // http://www.cplusplus.com/reference/stl/map/insert/ shows
– you can copy another map in its entirety

Multimap has the same insert method — http://www.cplusplus.com/reference/stl/multimap/insert/

//However, for interviews the most convenient is similar to python and perl:

map<int, char> m = {{1, 'a'}, {3, 'b'}, {5, 'c'}, {7, 'd'}};

insert() — more versatile than assign() and range-ctor

[[effSTL]] P31 points out that range-iterators are used consistently across containers —

– Every (yes both sequence/associative) container supports a ctor taking a couple of range iterators
– All sequence (not associative) containers support assign() method taking a couple
– Every (yes both sequence/associative) container supports erase() method taking a couple

However, I’d argue the most versatile is the insert() method in Every Sequence and Associative containers.
* insert() can emulate the range-ctor
* insert() can emulate assign()

This is also more versatile than operator=().
This member function is also simpler than the free function copy().

sequence-container≠array-based

My blog http://bigblog.tanbin.com/2009/04/4-basic-foundations-of-all-collection.html claims all java/c#/STL structures are based on 4 basic data structures

– array
– linked graph
….

Official STL documentation uses the term “sequence containers”. Now, Sequence-container can be non-array-backed. Linked list is one example.

The deque is not completely array-based. See ObjectSpace manual. A more detailed description of the HP implementation of STL deque is on P139 [[stl tutorial]]. It consists of multiple mini-arrays (segments) linked to a lookup construct.

Therefore deque combines array and linked graph.

See also http://bigblog.tanbin.com/2011/09/deque-advantage-over-vector.html.

Deque is one of the few random-access containers.

rbegin() in STL container class templates

I once told someone that not every container supports rbegin(). Now I know all standard STL container class templates define rbegin(). STL string class also supports rbegin(). By the way, If you specialize vector (or any container) class-template with your own Account class, you end up with a specialized class-template. You must re-implement rbegin() and all public methods. That’s part of the “contract”.

Now, WHO don’t support rbegin()?
– The simple array doesn’t support rbegin() or any method for that matter.
– I think ostream and istream don’t provide rbegin().
– How about hashed containers? They don’t have a rbegin()

What if someday someone creates a useful container that doesn’t support rbegin()? If your algorithm insists on a container defining rbegin(), then this algo can’t be used for that new container.

In conclusion, it’s probably more useful in practice to write utility functions accepting iterator arguments, not container arguments. Follow STL algorithm conventions.

##standard methods (incl. operators) in all STL container templates — CCC

Every STL container exposes these 3 groups of non-static methods — Copy, Compare and Count — Based on ObjectSpace manual. (Note All operators except “=” are inherited, so in this article, we treat them as methods.)

Count – empty()
Count – size()
Count – max_size()

Compare – op== deep-compares 2 containers by content. Practically used — see [[STL tutorial]]
Compare – op!=
Compare – operator-less-than, greater-than etc

Copy – swap() — swap 2 containers. seem to be a non-standard feature. See http://www.cplusplus.com/reference/algorithm/swap/
Copy – copy-ctor — copy entire container
Copy – op= — assigns entire container

populating STL containers requires copy-ctor

See also http://bigblog.tanbin.com/2012/07/insert-into-vector-copy-ctor-constantly.html for detailed experiments.

I believe STL containers generally use the payload’s copy-ctor during insertion.

vector? YES according to p21 [[effSTL]] vector example
deque at extremities? yes [1]
deque mid-stream???
list? yes
set? yes
map? yes

Let’s examine vectors. A java Vector (or arrayList) holds nothing but pointers. A 5-element vector occupies about 32bits x 5. All pointee objects live on the heap.

A stl vector, in basic configuration, holds a bunch of non-ref objects. A 5-element vector of Trade where sizeof(Trade)=20 bytes occupies exactly 100 bytes. (No additional overhead due to nodes and links. Remember a linked container consists of nodes each containing a payload object.)

P22 [[effSTL]] suggests that an empty vector can have the 100 bytes reserved but no Trade instance constructed. Item 9 in [[effSTL]] too suggest that default allocators for all STL containers grab memory from heap and return it as uncooked (i.e. uninitialized) chunk of raw memory. [[c++nutshell] says an implementation of vector must allocate an uninitialized array of objects and initialize Individual elements when needed

When you push_back a Trade that lives in Address 8123, our 5-element vector needs to somehow copy that Trade into Position #6. Assume there’s enough capacity reserved so no reallocation. Our vector can’t use assignment operator, because Position #6 is not a Trade object, but just 20 raw bytes allocated by the default vector allocator. Can it use the copy constructor? I think so, yet it doesn’t grab memory from stack or heap, but from the vector’s allocator. Item 10 of [[effC++]] has a complete allocator class. If your vector uses such an allocator, then every memory allocation “within” the vector would be intercepted by this allocator.

Down-shift after removing an element calls the assignment operator. I feel there’s no choice. Shifting Item 3 down to Position 2 — you can’t use copy-ctor or any ctor, since they use uncooked raw memory. In this case Slot 2 is already allocated and “cooked” (initialize).

I believe up-shift uses the assignment operator or copy-constructor.

[1] allocating memory from the “reserved packing lots”. See other posts on deque memory model.

print any container, using copy() and ostream_iterator ctor #noIV

Better commit to memory. See P53 of [[ stl tut ]] and P96 [[essential c++]]

  // vector vector1(10);
  // generates a series of numbers generate(vector1.begin(), vector1.end(), calc_square());

  copy (vector1.begin(), vector1.end(), ostream_iterator(cout,” “));

This is an idiom. back_inserter is another idiom. Such idioms aren’t tested in interviews.

STL multiset = sorted Bag

(Q: Have you ever wondered why set/multiset are grouped with map/multimap as Associative containers? Answer revealed at the end.)

http://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset points out

It is possible for distinct objects to be considered “equivalent” under some equivalence relation but still distinct under another relation. Some types of multiset implementations will store distinct equivalent objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.

STL multiset is the first type. See also P299 [[STL Tutorial and Reference Guide]] written by one of the STL inventors.

STL multiset is a reliable store keeper — never “loses” items that look alike. In contrast, the counting version discards items as _unwanted_duplicate_. STL multiset associates objects to invisible keys — associative containers.

int main() {
    float floats[3] = {1.1,1.2,2};
    multiset<float, less > cont;
    cont.insert(floats, floats+3);
    copy(cont.begin(), cont.end(), ostream_iterator(cout, ”   “));
}

STL map search and so-called "equivalence"

Question: key-based or pair-based?
A: insert/erase/find/access-by-key[1] are key-based.
A: I don’t know what operations are pair-based

Question: key-matching is position-based or equality-based?
A: position-based, i.e. equivalence-based

[1] access-by-key means myMap[keyA1]. This is the standard dictionary/associative-array access, but not supported by multi-map, since this expression must return a single value.

stl pair template #briefly

In java, the map entry inner class is seldom used by developers, but stl pair is different. See [[STL tutorial and ref guide]] large and realistic example.

I believe all experienced STL users will sooner or later memorize some basic operations on pair. Here are the top 5 —
– cvctor  ie conversion ctor
– OOC ie operator-overload converters
– less-than operator — java's Comparable is more clean and simple

A concrete pair class can be subclassed.

What T needs in container of T

Update — [[c++coding standard]] points out T is usually value-like, such as a smart pointer (including an iterator) type. P95 further points out if we disable copier and op= then T can’t be go into a container.

Q1:  what does your class need, as Element in a container? In other words, if I want to put instances of my class C into an STL container, like vector, what must my class have?
A1: According to bloomberg,
– Copy constructors — C can’t be auto_ptr!
– Assignment operator — No auto_ptr!
– Default constructor
– Destructor

P16[[ObjectSpace]] covers this too. [[eff STL]] covers this too.

In addition, you almost always need operator==(). P21 [[stl tutorial]] shows this operator==() can be a free function defined for our type C.

Q: how about java collections? Similar?
A: equals() need overriding. Default equals() breaks contains() and remove().

every STL container has a no-arg no-parenthesis ctor

I believe iterators are seldom (never?) instantiated with no-arg ctor, but  every container has a no-arg ctor.
Note all containers has a no-arg ctor, and you call it without parenthesis

#include <iostream>
#include <iterator>
#include <string>
#include <vector>
#include <list>
#include <deque>
#include <stack>
#include <queue>
#include <set> //supports multiset too
#include 

<map>
#include <ext/hash_set>
#include <ext/hash_map>
using namespace std;
using namespace __gnu_cxx;
// needed for hash_set

int main() {
    //    vector<float> cont;
    //    cout<<cont.size()<<endl;
    //    list<float> cont
    //    cout<<cont.size()<<endl;
    //    deque<float> cont;
    //    cout<<cont.size()<<endl;
    //    set<float> cont;
    //    cout<<cont.size()<<endl;
    //    multiset<float> cont;
    //    cout<<cont.size()<<endl;
    //    map<float, float> cont;
    //    cout<<cont.size()<<endl;
    multimap<float, float> cont;
    cout << cont.size() << endl;

    // container adaptors
    //    stack<float> adaptor;
    //    cout<<adaptor.size()<<endl;
    //    queue<float> adaptor;
    //    cout<<adaptor.size()<<endl;
    priority_queue<float> adaptor;
    cout << adaptor.size() << endl;

    //hashed
    hash_set<float> hashed;
    cout << hashed.size() << endl;
    //    hash_map<float, float> hashed;
    //    cout << hashed() << endl;
}

T.operator==() needed in container of T

As stated elsewhere in this blog, any “element type” to go into a STL container should define operator==(). (Actulaly, operator=() too, but different story.) Unlike java, there’s no default equality test.

If no operator==() then std::find() can’t work in this case.

Now, all builtin types are fine, but most user-defined types aren’t.

Therefore, std::find() fails by default. This is yet another everyday issue in STL usage.

This is less critical — all containers define a container-level operator==(), which requires an element-level operator==().

set-by-index on a list, nCopies and STL vector(int)

javaList.set() means List.updateExisting().

If you construct an ArrayList with capacity of 999, and try to set value to position 888 you will fail.

You need to first populate 999 items into the array like this

new ArrayList(Collections.nCopies(MAX_FIX_TAG, (String)null));

See http://docs.oracle.com/javase/tutorial/collections/implementations/convenience.html

STL vector ctor vector(int) does exactly the same. See http://www.cplusplus.com/reference/stl/vector/vector/

Q: If you construct a vector with a single argument 999, what’s the size of the vector?
A: 0 in java, because 999 is interpreted as capacity, not array size.
A: 999 in STL. 999 default constructed elements inserted.

queue based on Deque

STL queue is an adapter (wrapper) class over other containers, so you can create a queue over a deque. (This is quite common — the only other queue is a queue over a list — Yes only 2 types of queues exist in STL. No q-over-vector[1])

Q: so what’s the difference between a naked deque vs a queue-over-deque? Why use a queue-over-deque
%%A: deque exposes unwanted access — indecent exposure —
* bad programmers can add/remove mid-stream (which is rather slow — linear time).
* bad programmers can add/remove on both ends. A real queue allows insert on the left only, and remove on the right only.

In short, queue hides many unwanted deque operations and offers a cleaner and safer API.

[1] q-over-vector doesn’t exist, due to efficiency — see P178 [[stl tutorial]]

STL container to hold Child objects +! pointers – slicing

http://ootips.org/yonat/4dev/smart-pointers.html — but slicing problem!

#include
#include
struct Base {
    virtual void print() const{cout<<1<<"\n";}
    virtual ~Base(){}
};
struct Derived : public Base {
    virtual void print() const{cout<<2<<"\n";}
};
int main(){
    Base b;
    Derived d;
    std::vector v;
 
    v.push_back(b); // OK
    v.push_back(d); // no error but not sure if there’s any hidden issue
    cout<<  v.size() <<endl; // prints 2
    std::vector::const_iterator pos;
    for(pos = v.begin(); pos != v.end(); ++pos){
       pos->print(); // Der object sliced!
    }
}

## factory functions used with STL containers

If we use “factory” to refer to any function that can create a new object, then here are some of the factory functions (including cvctor) covered in my blog or my books.

– back_inserter — always used with something_copy() functions like replace_if/replace_copy_if, count_if, remove_if/remove_copy_if … See blog
– bind2nd — used in something_if() functions like copy, remove_copy, replace_copy … See blog
– customized binary_function like “myLess”. see blog on functor-object

[19]nested STL containers #in practice

Executive summary —  favor container of smartPtr-to-container; Avoid container of raw pointer[1]. However, in many cases, the simpler container-of-container is viable and preferred.

You can also liberally use std::pair or std::tuple class templates. You can also create a simple struct of sub-containers, pairs, or simple (nonref) types, then instantiate a container template with this struct as the type argument. All of these are slightly more “natural” than container of pointers.


  • — container of container (without pointer complexity) is popular and practical:
  • One of the 3 creators of STL wrote [[STL tutorial]]. In it there are numerous examples of nested containers but very few examples of container of pointer.
  • The standard map container is kind of nested. Syntax (“map”) shows no nesting but internally, a map stores pair.
  • I used vector<vector<T>> many times, often as a matrix.
  • map<rcid, set<rcid>> p2d; // In the spreadsheet concretization code (RedMart)

[1] though many authors advocated container of raw pointers for large objects, perhaps before smart pointers. Container of raw pointer is controversial. Many authors show simple examples of them, but in practice it’s probably not so simple. Better use container of smart_ptr.

spreadsheet concretize #Junli explains why container of raw ptr is safe in this context — no q[delete] needed.