iterators – pbclone, not by pointer or ref

Iterators – generalizations of pointers – are expected to be copied cheaply. The copy cost is never raised as an issues.

Iterators are usually passed by value. Sutter and Alexandrescu recommended (P154 in their book) putting iterators into containers rather than putting pointers into containers. Containers would copy the iterators by value.

someContainer.end() often returns a temp object, so taking its address is a bug. The returned iterator object from end() must be passed by Value.

Someone online said that If an argument is taken by value, this makes it usually easier for the compiler to optimize the code. Look at the advantage of using function objects by value instead of taking function pointers. This is a similar reason for by-value parameters reasoned on the level of functions objects.

Note java/c# arguments are predominently passed by reference.

why create custom STL iterators, briefly

Q: when would you write your own STL iterator?

I think if you create your own customized container, you probably need custom iterator Objects. You probably need to return such an Object from rbegin() etc.

In python, c# and (less commonly) in java, many data structures can be made iterable. There’s an implicit iterator.

[[STL tutorial]] has a 3-page chapter showing a debuggable-iterator that reveals interesting details of the inner workings of STL containers and STL algorithms.

inputIterator category — unneeded?

Q: why there's a category “Input Iterator” at all, where is it used? (Output iterator is a similar story, so this post will omit it.)

I think the only major use is input stream.

There are some algorithms like std::merge(…) that require a tiny *subset* of a C pointer's full capabilities.

To make merge() useful on an input stream, STL authors put the Dummy type name “InputIterator” into merge() template declaration as a *hint* — a hint that in the implementation, only that *subset* of pointer capabilities are used. This is a hint to containers

“Hey Containers, if you have an iterator capable of deference-then-read, then you can use me.”

It turned out all containers except output stream has that capability.

iterator = simple smart ptr

– Iterators are *simple* extensions of raw pointers, whereas
– smart pointers are *grand* extensions of pointers.

They serve different purposes.

If an iterator is implemented as a class (template) then it usually defines
– operator=
– operator==, >=
operator* i.e. dereference
– operator++, —
– operator+, += i.e. long jumps
—-However, iterator class (template) won't define any operation unsupported by raw pointers, therefore no member functions! Smart pointers do define public ctor, get() etc

iterator categories, stratification, hint..

Iterator categories are a /stratification/ of the 5 to 10 standard Capabilities of raw pointers in the C language. Stratification — like deposits in a lake.

The dereference-then-Read operation is part of Input-Iterator category, which lacks the dereference-then-Write capability.
The dereference-then-Write operation is part of output Iterator category, which lacks the dereference-then-Read.
The ++ and — operators are available in the bidirectional-iterator category
The += and -= (jumps) and greater-than etc operations belong to the topmost strata — random access iterator category

There are a total of 5 categories, arranged in layers. (Some say “hierarchy” — a bit misleading). [[STL tutorial and reference]] has detailed coverage of this technicality, which is too technical for other books.

Iterator categories are not some kind of inheritance hierarchy.

Iterator categories are not some kind of typedef.
Q: So how do you write code to enforce what category of iterator you require?
Answer: no-way. Compiler has absolutely no idea what ForwardIterator category means. A category name is no different from those dummy type names in a template declaration. To the compiler, template<Bidirectional_Iterator,…> is no different from template.
A: you can only “hint“, not enforce, the “kind” of iterator you require.

Q: how does an algorithm's author indicate what type of iterator he needs?
A: (This is what STL authors did) Be creative — use special names to name the dummy types in template. Remember all STL algorithms are function templates. If you look at the specification (not “declaration”) of STL algorithms, the dummy types all come with special names. Don't just rename them with the customary T or S.

You appreciate the categories After you start writing STL-style function templates.

Note constness of iterator is completely unrelated to the 5 categories. No STL algo specification mention const in the dummy type names.

why list iterator can’t jump (+=, -=)

Q: Given the STL list iterator supports increment (++), why not make it support jumps? This way, list iterators can be used with algorithms like sort(), right?

P64 of [[STL tutorial and reference]], written by one of the 3 STL inventors, made it clear —

– It’s not about feasibility — adding jump is feasible but not a great idea.
– It’s all about efficiency. Implementing jumps using increment is “faking it”. The sort() algorithm would become extremely inefficient when given a fake random-access iterator.

STL algo – stringent on iterators, !containers

STL Algorithms are agnostic about containers. STL algorithms also support arrays, string, and input/output streams.

STL Algorithms are very particular about …. iterators! Some demand specific iterator features. As P30[[ObjectSpace manual]] puts it, some algorithms require more powerful iterators than others. For example, sort() expects random-access iterators.

Some algorithms' declarations mention the expected type of iterators.

Many expect input iterators, so an output-only iterator won't do.

Some need output iterators , so an input-only iterator won't do.

I believe some expect RW iterators.

I believe some expect const iterators, i.e. read-only

I believe some expect bidirectional iterators.

input iterator #& other iterator categories

(It’s probably enough to know when to use each. Internals may be muirky, undocumented and trivial…)

Note many of these categories are fake types and mean nothing to the compiler. Compiler knows classes, typedef, pointers … 
Note some of these categories are real types, some are characteristics of a given iterator object….

output iterator vs the outstream? See P5 [[eff STL]]

(see other post for const vs input iterators)

* input iterator is “Rw” iterator ie must let clients read container, may let clients write to container.
* const iterator is R~W iterator as it blocks write-access.
* output iterator is “rW” iterator — it may let clients read container
– many iterator classes [1] are “input-AND-output iterators” ie RW iterators ==> must let clients read and let clients write to container

When you see InputIterator in a template, remember input iterator is (not a type but) a family of types. I guess it could be fake type.

I guess you can visualize each iterator having these 2 groups of boolean flags. First group:
+ R or r
+ W or w or ~W (3 values)
+ const or mutable
++ const is actually part of declarations. const means “unwrapped iterator as lval won’t compile” [2]

And 2nd group:
+ F or ~F — forward or no-forward ie forward move unsupported
+ V or ~V — reverse or no-reverse
+ A or ~A — random or no-random

Between the 2 groups, there’s fairly good orthogonality. Within a group however, flags are not really independent
$ const ~W
$ const => R
$ A => F and V flags both set

[1] vector::iterator, but not vector::const_iterator
for(list::const_iterator myIt=myList.begin(); myIt!=myList.end(); myIt++){
*myIt += 10; // won’t compile with the const_ prefix

When reading function signatures, you will encounter a lot of different iterators, each having these flags. To simplify things, it’s best to focus on one group of flags at a time. Examples:

T a[n] can produce a qq(const T* ) iterator. First, this is a const R ~W. Secondly, it’s A F V

removal -> iterator invalidation – STL, ConcurrentMap, fail fast …

This is a blog post tying up a few blog posts on this subject. It’s instructive to compare the different iterators in different contexts in the face of a tricky removal operation. points out that ConcurrentModEx can occur even in single-threaded myList.remove(..). Note this is not using myIterator.remove(void)

ConcurrentHashMap never throws this same CMEx. See Details? not available yet.

Many jdk5 concurrent collections have thread safe iterators. My book [[java generics]] covers them in some detail.

As seen in, all STL graph containers can cope with removals, but contiguous (non-graph) containers can get iterators invalidated. Java arrayList improves on it by allowing iterator to perform thread-safe remove. I guess this is possible because the iterator thread simplify skips the dead node. Any other iterator is invalidated by CMEx. I guess the previous nodes can shift up.

py for-loop – string,file,dict,args,dir …

Following the Unix philosophy, Python’s for-in loop is a simple idea (iterator) pushed to the max. It supports
– iterating chars in a ….. string
– iterating lines in a …. file
– iterating integers in a range() or xrange()
– iterating …. KEYs in a dict — values requires explicit look-up
– iterating …. key/value pairs in a dict
– iterating sys.argv on command line
– list, tuple
– retrieving pairs from list-of-pairs — idiom ## this example also illustrates defaultdict(list)

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

(read about  the “list” arg)

More generally, any class supporting iteration can use for-loop. Here’s an example illustrating some of them.

import re, sys
from os import walk
for (path, dirs, files) in walk(“c:\\”) :
 for filename in files :
  if not“\.py$”,filename) : continue
  if not printed.has_key(path):
      print ” path = ” + path
      printed[path] = True

  for line in open (path+’\\’+filename) :
      if‘^\s*class\s’, line) : print filename + ‘:\t’ + line,