(unordered)map erase: prefer by-key !! itr #AshS

Relevant in coding tests like speed-coding, take-home, onsite. Practical knowledge is power !

As shown in https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_misc/mapEraseByVal.cpp

.. by-key is cleaner, not complicated by the iterator invalidation complexities.

You can save all the “bad” keys, and later erase one by one, without the invalidation concerns. You can also print the keys.

if you were to save the “bad” iterators, then once you erase one iterator, are the other iterators affected? No, but I don’t want to remember.

STL iterator invalidation rules, succinctly has a succinct summary, but I don’t prefer to deal with these complexities when I have a cleaner alternative solution.

save iterators and reuse them+!invalidation risk

For a static STL container, the iterator objects can be safely stored and reused. The dreaded iterator invalidation is a risk only under structural changes.

Many coding interview questions allow (even require) me to save those iterators and store them in a vector, a hash table … Afterwards, we retrieve an iterator value, and visit the next/previous of it.

##STL iterator is implemented as ..

  • implemented as raw ptr — if your data is held in an array
  • implemented as member class (sugarcoated as member typedef) — most common
  • implemented as friend class
  • implemented as wrapper over internal container’s iterator — (cheat) if your custom container is kind of wrapper over an STL container, then just use the internal container’s iterator as your iterator.

Remember an iterator class is a form of smart pointer by definition, since it implements operator->() and operator*()

iterator invalidated{erase: opaque

See also

I had multiple encounters with STL iterator invalidation, esp. after erase().

  • sometimes I get correct result
  • sometimes I get segfault
  • sometimes I get incorrect result

Opaque i.e. no obvious clues, so no keyword to google

Luckily my code was 20-lines

Luckily i could reproduce it once a while.

This is another example of tough bugs to hunt down.

STL iterator invalidation rules, succinctly

http://www.martinbroadhurst.com/iterator-invalidation-rules-for-c-containers.html has concise explanations. Specifically,

  • list insertions don’t invalidate any iterator. I feel iterator is a pointer to a node.
  • tree insertions don’t invalidate any iterator. Same reason as list.
  • for erasure from lists or trees, only the iterator to the erased node is invalidated.

Now for vector:

  • vector insertion invalidates any iterator positioned somewhere after the insertion point. If reallocation happens due to exceeding vector.capacity() then all invalidated

Who cares?

  1. in projects, we always check the documentation
  2. in coding interviews, we can save precious time if we remember these rules
  3. in QQ interviews, I have never received such a knowledge question. I think interviewers don’t consider this very “deep”, but we could possibly impress some interviews.

declare iterator]function template #gotcha

(Needed in some coding interviews and also in GTD!)

Update: With c++11, you can use the “auto” keyword and avoid the complexity.

If you drop the “typename” from the for-loop header, then compiler is confused

error: dependent-name ‘std::multiset::iterator’ is parsed as a non-type, but (template) instantiation yields a type
note: say ‘typename std::multiset::iterator’ if a type is meant

Basically, we need to be extra explicit to the confused compiler.

template<typename T> ostream & operator<<(ostream & os, multiset<T> const & l){
  for(typename multiset<T>::iterator it = l.begin(); 
      it != l.end(); ++it){
        os<<*it<<" ";
  }
  os<<endl;
}

iterators : always pbclone !! pbref or by pointer

Iterators – generalizations of pointers – are expected to be copied cheaply. The copy cost is never raised as an issues.

Iterators are usually passed by value. Sutter and Alexandrescu recommended (P154 in their book) putting iterators into containers rather than putting pointers into containers. Containers would copy the iterators by value.

someContainer.end() often returns a temp object, so taking its address is a bug. The returned iterator object from end() must be passed by Value.

Someone online said that If an argument is taken by value, this makes it usually easier for the compiler to optimize the code. Look at the advantage of using function objects by value instead of taking function pointers. This is a similar reason for by-value parameters reasoned on the level of functions objects.

Note java/c# arguments are predominently passed by reference.

4-layers of pointer wrapping

I was looking for a realistic scenario of multiple layers of pointer wrapping. Here’s a vector of iterators (Not a vector of vectors). Each iterator therein comes from a nested-vector.

If we get an iterator from the outer vector, we get a ptr to ptr to ptr to ptr to double.

vector<vector<smartPtr >::iterator>::iterator my_itr;

1) inner-most pointer is a 32-bit raw pointer to a double (stack/heap/global) object.
2) The smart pointer is bigger, say 55-bit object holding a pointer to the 32-bit raw pointer.
3) The inner-iterator is a 32-bit pointer to the smart pointer object, since vector iterator is typically implemented as raw pointers.
4) The outer iterator my_itr is another 32-bit raw pointer to the elements of the outer vector, where each element is an inner-iterator. (Note each element is not a vector.)

How about adding an asterisk — smartPtr… ? As explained elsewhere on this blog (http://bigblog.tanbin.com/2012/04/smartptr.html) it is not my favorite construct.

[12]call`same method@unrelated types: template outshines java

Suppose pure abstract class Animal has a die() method, and so does Project, Product, Plant and Planet, but they don’t share a base class. How would you write a reusable function that invokes this method on a generic input object, whose type could be any of them?

Java can’t do this. In C++ you create

template<typename T>  f1(T input){ input.die(); }

If you pass an int into f1(), then you get compile time error. Probably a linker error. Is this SFINAE ? I doubt it.

STL algorithms routinely take an iterator argument and then call operator>() on the iterator. Now, this operator is undefined for a lot of iterator INSTANCES. I think only RandomAccessIterator category supports it.

Q: So how does STL make sure you don’t pass an object of ForwardInterator category into such a function?
A: use the template parameter type name (dummy type name) as a hint. Instead of the customary “T”, they put a suggestive dummy type name like “BidirectionayInterator” or “InputIterator”. If you ignore the hint you get compile-time error.

Now we understand that STL iterators have no inheritance hierarchy, but “stratification” among them.

y create custom STL iterators, briefly

Q: when would you write your own STL iterator?

I think if you create your own customized container, you probably need custom iterator Objects. You probably need to return such an Object from rbegin() etc.

In python, c# and (less commonly) in java, many data structures can be made iterable. There’s an implicit iterator.

[[STL tutorial]] has a 3-page chapter showing a debuggable-iterator that reveals interesting details of the inner workings of STL containers and STL algorithms.

inputIterator category — unneeded@@

Q: why there's a category “Input Iterator” at all, where is it used? (Output iterator is a similar story, so this post will omit it.)

I think the only major use is input stream.

There are some algorithms like std::merge(…) that require a tiny *subset* of a C pointer's full capabilities.

To make merge() useful on an input stream, STL authors put the Dummy type name “InputIterator” into merge() template declaration as a *hint* — a hint that in the implementation, only that *subset* of pointer capabilities are used. This is a hint to containers

“Hey Containers, if you have an iterator capable of deference-then-read, then you can use me.”

It turned out all containers except output stream has that capability.

iterator = simple smart ptr

– Iterators are *simple* extensions of raw pointers, whereas
– smart pointers are *grand* extensions of pointers.

They serve different purposes.

If an iterator is implemented as a class (template) then it usually defines
– operator=
– operator==, >=
operator* i.e. dereference
– operator++, —
– operator+, += i.e. long jumps
….
—-However, iterator class (template) won't define any operation unsupported by raw pointers, therefore no member functions! Smart pointers do define public ctor, get() etc

iterator categories, stratification, hint..

Iterator categories are a /stratification/ of the 5 to 10 standard Capabilities of raw pointers in the C language. Stratification — like deposits in a lake.

The dereference-then-Read operation is part of Input-Iterator category, which lacks the dereference-then-Write capability.
The dereference-then-Write operation is part of output Iterator category, which lacks the dereference-then-Read.
The ++ and — operators are available in the bidirectional-iterator category
The += and -= (jumps) and greater-than etc operations belong to the topmost strata — random access iterator category

There are a total of 5 categories, arranged in layers. (Some say “hierarchy” — a bit misleading). [[STL tutorial and reference]] has detailed coverage of this technicality, which is too technical for other books.

Iterator categories are not some kind of inheritance hierarchy.

Iterator categories are not some kind of typedef.
 
Q: So how do you write code to enforce what category of iterator you require?
Answer: no-way. Compiler has absolutely no idea what ForwardIterator category means. A category name is no different from those dummy type names in a template declaration. To the compiler, template<Bidirectional_Iterator,…> is no different from template.
A: you can only “hint“, not enforce, the “kind” of iterator you require.

Q: how does an algorithm's author indicate what type of iterator he needs?
A: (This is what STL authors did) Be creative — use special names to name the dummy types in template. Remember all STL algorithms are function templates. If you look at the specification (not “declaration”) of STL algorithms, the dummy types all come with special names. Don't just rename them with the customary T or S.

You appreciate the categories After you start writing STL-style function templates.

Note constness of iterator is completely unrelated to the 5 categories. No STL algo specification mention const in the dummy type names.

y list iterator can’t jump (+=, -=)

Q: Given the STL list iterator supports increment (++), why not make it support jumps? This way, list iterators can be used with algorithms like sort(), right?

P64 of [[STL tutorial and reference]], written by one of the 3 STL inventors, made it clear —

– It’s not about feasibility — adding jump is feasible but not a great idea.
– It’s all about efficiency. Implementing jumps using increment is “faking it”. The sort() algorithm would become extremely inefficient when given a fake random-access iterator.

STL algo – stringent on iterators, !! containers

STL Algorithms are agnostic about containers. STL algorithms also support arrays, string, and input/output streams.

STL Algorithms are very particular about …. iterators! Some demand specific iterator features. As P30[[ObjectSpace manual]] puts it, some algorithms require more powerful iterators than others. For example, sort() expects random-access iterators.

Some algorithms' declarations mention the expected type of iterators.

Many expect input iterators, so an output-only iterator won't do.

Some need output iterators , so an input-only iterator won't do.

I believe some expect RW iterators.

I believe some expect const iterators, i.e. read-only

I believe some expect bidirectional iterators.

input iterator #& other iterator categories

(It’s probably enough to know when to use each. Internals may be muirky, undocumented and trivial…)

Note many of these categories are fake types and mean nothing to the compiler. Compiler knows classes, typedef, pointers … 
Note some of these categories are real types, some are characteristics of a given iterator object….

output iterator vs the outstream? See P5 [[eff STL]]

(see other post for const vs input iterators)

* input iterator is “Rw” iterator ie must let clients read container, may let clients write to container.
* const iterator is R~W iterator as it blocks write-access.
* output iterator is “rW” iterator — it may let clients read container
– many iterator classes [1] are “input-AND-output iterators” ie RW iterators ==> must let clients read and let clients write to container

When you see InputIterator in a template, remember input iterator is (not a type but) a family of types. I guess it could be fake type.

I guess you can visualize each iterator having these 2 groups of boolean flags. First group:
+ R or r
+ W or w or ~W (3 values)
+ const or mutable
++ const is actually part of declarations. const means “unwrapped iterator as lval won’t compile” [2]

And 2nd group:
+ F or ~F — forward or no-forward ie forward move unsupported
+ V or ~V — reverse or no-reverse
+ A or ~A — random or no-random

Between the 2 groups, there’s fairly good orthogonality. Within a group however, flags are not really independent
$ const ~W
$ const => R
$ A => F and V flags both set

[1] vector::iterator, but not vector::const_iterator
[2]
for(list::const_iterator myIt=myList.begin(); myIt!=myList.end(); myIt++){
*myIt += 10; // won’t compile with the const_ prefix
}

When reading function signatures, you will encounter a lot of different iterators, each having these flags. To simplify things, it’s best to focus on one group of flags at a time. Examples:

T a[n] can produce a qq(const T* ) iterator. First, this is a const R ~W. Secondly, it’s A F V

removal→iterator invalidation:STL, fail fast, ConcurrentMap

This is a blog post tying up a few discussions on this subject. It’s instructive to compare the different iterators in different contexts in the face of a tricky removal operation.

http://tech.puredanger.com/2009/02/02/java-concurrency-bugs-concurrentmodificationexception/ points out that ConcurrentModEx can occur even in single-threaded myList.remove(..). Note this is not using myIterator.remove(void).

[[Java generics]] also says single-threaded program can hit CMEx. The official javadoc https://docs.oracle.com/javase/7/docs/api/java/util/ConcurrentModificationException.html agrees.

ConcurrentHashMap never throws this CMEx. See http://bigblog.tanbin.com/2011/09/concurrent-hash-map-iterator.html. Details? not available yet.

Many jdk5 concurrent collections have thread safe iterators. [[java generics]] covers them in some detail.

As seen in http://bigblog.tanbin.com/2011/09/removeinsert-while-iterating-stl.html, all STL graph containers (include slist) can cope with removals, but contiguous containers can get iterators invalidated. Java arrayList improves on it by allowing iterator to perform thread-safe remove. I guess this is possible because the iterator thread could simplify skip the dead node. Any other iterator is invalidated by CMEx. I guess the previous nodes can shift up.

–brief history

  1. STL iterator invalidation results in undefined behavior. My test shows silent erroneous result. Your code continues to run but result can be subtly wrong.
  2. In java, before fail-fast, the outcome is also undefined behavior.
  3. Fail-fast iterator is the java solution to the iterator invalidation issue. Fail-fast iterators all throw CMEx, quickly and cleanly. I think CMEx is caused by structural changes — mostly removal and insertions.
  4. CHM came after fail-fast, and never throws CMEx

python for-loop: string,file,dict,args,dir …

Following the Unix philosophy, Python’s for-in loop is a simple idea (iterator) pushed to the max. It supports
– iterating chars in a ….. string
– iterating lines in a …. file
– iterating integers in a range() or xrange()
– iterating …. keys() values() in dict —
– iterating …. key/value pairs in a dict.items()
– iterating sys.argv on command line
– list, tuple
– retrieving pairs from list-of-pairs — idiom ## this example also illustrates defaultdict(list)

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list) # construct empty defaultdict .. "list" is some mysterious default_factory
>>> for k, v in s:
...     d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

More generally, any class supporting iteration can use for-loop. Here’s an example illustrating some of them.

import re, sys
from os import walk
printed={}
for (path, dirs, files) in walk(“c:\\”) :
for filename in files :
if not re.search(“\.py$”,filename) : continue
if not printed.has_key(path):
print ” path = ” + path
printed[path] = True

for line in open (path+’\\’+filename) :
if re.search(‘^\s*class\s’, line) : print filename + ‘:\t’ + line,

another basic difference iterator^container^algo

(Why bother? Well, you need to know these when you debug STL or extend STL.)

– Containers — are class templates. 100%
– Algorithms — are function templates. 100%

– iterators? A beginner can safely assume that Most of the time iterators are defined inside each container template as a Member type. Since a container has a dummy type T, the iterator must be a class template of T (or a typedef thereof), a Nested class template of T, presumably, inferred from the syntax :

vector<int>::iterator

– The container/algo/iterator adapters are typically class templates
– functor are typically class templates

A trivial consequence — the declarations of containers and iterators are complicated by the templates. Algorithm declarations are simpler in comparison.

4 types of iterators in [[EffectiveSTL]]

(Note these are not “fake-types” aka dummy types in template declarations.)

[[effSTL]] P116 highlights 4 important types of iterators (

http://www.sgi.com/tech/stl/stl_vector.h shows

vector::iterator is a typedef.
vector::const_iterator is a typedef.
vector::reverse_iterator is a class template.
vector::const_reverse_iterator is a typedef based on both

   typedef reverse_iterator const_reverse_iterator;

5 STL components #letter to friend

Hi YH,

(A blog post, so i won’t spell out your name or email.)

You named 5 STL components. I now feel STL was designed first to provide containers, then algorithms, then iterators, as the 3 main things. In addition, functors and are an efficiency feature. Allocators are an advanced customization tool for those users with special requirement and skill.

To a developer new to c++, iterators aren’t meaningful. In contrast, container has the most visible value and usage — to hold a bunch of objects (or pointers), including nested containers. Most developers would want to use common data structures like arrays, linked lists, sets, lookup tables etc. Unfortunately core c++ provides only fixed arrays. Therefore STL provides parametrized containers to the rescue.

Immediately people would need common operations on these containers, like find, remove, copy, sort, count_if… So STL provides parametrized algorithms on containers. A key design goal is maximum flexibility and re-usability when you mix and match any container with any algorithm. This is hard, even without the stringent requirement of efficiency. It’s so hard that a complete breed of classes were invented — iterators. Numerous authors including STL original developers confirmed that iterator was the glue between containers and STL algorithms.

Among the 3, some are more necessary than others. Ranked in terms of necessity,

1) A container is usable by itself, even without algorithms/iterators. I mean the methods of the container classes, like push_back, get, operator[]
2) However, serious users of containers usually need iterators just to get all the content. This minimal usage need nothing beside the operator++ and operator* .
3) To meet more application requirements, some users write their own loops to do everything, using more features of iterators. So those iterator features are less essential than operator++ and operator* .
4) least necessary are the STL algorithms. They use the full features of iterators.

As an interesting comparison, java’s containers are likewise the most necessary part of the collections framework. Iterators are heavily used but not always needed (except in loops). The non-static methods in Collection.java and static methods in Collections.java are similar to STL algorithms, but are somewhat less used. I for one write lots of loops by myself.

iterators – what go into "<>" and "()"

motivation — i find declarations too complicated when iterators are involved.

* in a template definition’s , people put iterator categories like InputIterator. I think these are just suggestive dummy typenames. If you rename to “input_iterator” through out that class, nothing will break. In fact, when you write template , Animal cannot be a real class name.

* in function prototypes, you use those fake types declared earlier. See [[essential c++]]
* in concrete function calls, you feed in variables representing iterators. You need to follow the suggestions in the iterator category. Otherwise, things break, though Compiler can’t detect, i think.

Now let’s extend the question
Q2: what do u put into vs () for a function template [1] using iterators?
[1] such as those STL algorithms

A: i think iterator is an atypical example. See [[java interface as a replacement for c++ templates]] for background.
* in prototypes’ you can put anything silly like T, S, V
* in prototypes’ (), you can use those same type param, or a ptr/ref to it. An iterator argument can be derived from these fake types or standalone.
* in concrete function calls, you can feed a specific iterator type into , and arguments into (). Usually you can omit the stuff because compiler can infer.

iterator arg to STL find() can be anything

Suppose I get a vector::iterator from a vector begin().
Suppose I get a list::iterator from a list begin().

Now, these 2 objects don’t have a common super class[1]. But somehow, find() can use either object as its first argument. Find() can even use a qq(int *) object as its first argument. However, find() can’t accept a “float” object as first argument. I guess compiler will not complain, until you deference the iterator. I think find() does that.

Suppose I get a vector::reverse_iterator from a vector rbegin() – call it vec_r_it. Can I pass it to find? I think find() is going to increment it and deference it.

[1] array-based containers’ iterators are pointers, not wrapped into some wrapper object. Linked list, map, set … define real classes for their iterators.

In Java, the param to such a find() would be an interface, or a template param T like . In java, such an “expected-service” is always (“clean” language) expressed using interfaces. C++ is more complicated and flexible.

C# has a bunch of iteration constructs, not as “clean” as java.

iterator inheritance hierarchy (dummy type

Look inside the class definition of template class reverse_iterator : public random_access_iterator

You find a method definition —

RandomAccessIterator base() …

You wonder what’s the random_access_iterator vs the RandomAccessIterator . Here’s what I know.

* RandomAccessIterator is a dummy-type used in templates, just like the customary T in . As a google tech talk speaker puts it, this dummy type means nothing to the compiler. This is what I call a fake-type or dummy-type. A fake-type is a token in a source code presented to the compiler. It can look like a real type but it’s
** not a class or class templatedumm
** not a typedef
** not a ptr
* random_access_iterator is a hand-written template class, and the parent class of reverse_iterator. You can see the source code at http://www.hackchina.com/r/178959/stl_iterator.h__html or http://www.sgi.com/tech/stl/stl_iterator_base.h

Now i know there is a clear inheritance hierarchy among some iterator TYPES. These are not fake types!

input_iterator (and output_iterator) is a class, !! a typedef

SGI says output_iterator is an empty class, presumably a marker-interface

http://www.cppreference.com/wiki/stl/iterators shows….

void advance( input_iterator& pos, Dist n );

Q: so input_iterator is a real type or a typedef or a dummy-type (template)
A: the context around the advance() declaration will tell you.
A: Here’s another context. http://www.hackchina.com/r/178959/stl_iterator.h__html shows it’s a real class or a “class template” to be precise.

Note these things look like fake types but they are real classes, parsed by compiler.

short and sharp back_inserter tutorial #noIV

P182 [[STL tutorial and ref]] offers a tutorial on back_inserter, used in copy().

back_inserter(myVector) is a kind of factory method[1] that manufactures a back_insert_iterator object that’s a wrapper on myVector.

Whether in java or c++, this wrapper holds a pointer to the container object myVector. In the copy loop, every assignment to a myVector element triggers a call to myVector.push_back().

[1] back_inserter() is a free function. It’s hard to tell from the tutorial.

STL iterator ^algorithm, briefly

“iterator is an intermediary between data structures and algorithms.” Here algorithms refer to Comp Science algorithms, so STL algorithms are not an monopoly. You can write your own algorithms but Scott Meyers says no.

STL books showcase stl algorithms but also include DIY algorithm.

Most STL algorithms need iterators [1] as arguments to work on the containers. Algorithms are container-agnostic. Iterators are a device of abstraction. Generic algorithms are presented in terms of generic iterators, not specific containers.

[1] The exact source code of these STL algorithms often mention fake types in place of iterators. When reading these source code or APIs, It’s extremely important to distinguish between real vs fake types.

In Java, Collection.java interface and Collections.java class contain methods applicable on many containers.

const ^ input iterator – unrelated

(It’s probably enough to know when to use each… Tbudget?)

A) input iterator is a fwd iterator for input streams — P802 [[Absolute C++]] (This book has concise coverage of const and input iterators)

B) const iterator is modeled on ptr-to-const, so you can’t make *myIterator a LHS

These 2 are unrelated.

Are these real (template) classes? dummy types or typedef? I feel in general you can’t say for sure, so I assume the lowest common denominator — dummy types. Note const_iterator is a nested Type in container class declarations, but still it can be a real class, dummy type or typedef.

pair of iterator, array of iterator…

Could be a common pattern. Think of these iterators as pointers.

The anagram blog-post mentions an array of iterators.

[[stl tutorial]] mentions a pair of iterators, where we use iterator to specialize the std::pair template.
– The concrete iterator type is at the specialization-level.
– The iterator object is at the instance level.

I feel the conceptual complexity is an unwanted complexity, so it’s best to just copy this implementation and not analyze too much

iterator’s true type can be anything

[[EffectiveSTL]] P120 points out that in vector [1] class template, iterator and const_iterator are typedef (I call them aliases) of pointers; whereas in other containers these are 2 unrelated classes.

For easy discussion, let’s focus on const_iterator.

It seems to me the compiler sees nothing in common among the const_iterator of different containers. In that case, the const_iterator idea/category is probably another fake type??

Q: Is it possible to have a simple function template accepting an “const_iterator” when “const_iterator” can be any type?
A: No for a non-template function.
A: for a function template, maybe the template param (a fake type) helps? See my thoughts later.
A: for a method like erase() or insert(), the const_iterator often has a more predictable type, defined in the host class.

* When we call advance (), the function accepts “int*” as param.
* When we call advance< list::iterator > with the matching angle brackets, where the iterator is a true class, the function accepts the iterator class as type-param.

[1] and all array-based containers. Regular array is also iterable, whose iterator is the simple pointer.

STL iterators — a simple idea pushed to the max

Unix simple ideas pushed to the max —
* everything is a file
* pipes
* sockets
* fork and exec
* signals sent to a process — trap and kill
* stdin, stdout, stderr streams
* background vs foreground

But this post is not about unix but about STL simple ideas —

Idea: iterators a bold extension of the pointers in an array, which can move and jump, read and write the elemtns.

Idea: Iterators are adopted in many data structures including io streams, strings, arrays, regex

Idea: for_each(), find(), counf_if()… these work with any data structure supporting the iterator pattern

Idea: Parametrized container are adopted in many packages including io stream, strings, regex

Idea: Allocators —