[17] string(+vector) reserve()to avoid re-allocation #resize()wrong

See also std::vector capacity reduction and vector internal array: always on heap; cleaned up via RAII

string/vector use expandable arrays. Has to be allocated on heap not stack.

For vector, it’s very simple —

  • resize() creates dummy elements iFF upsizing
  • reserve() only allocates spare “raw” capacity without creating elements to fill it. See P279 [[c++stdLib]]

[[Optimized C++]] P70 confirms that std::string can get re-allocated when it grows beyond current capacity.

http://stackoverflow.com/questions/9521629/stdstringss-capacity-reserve-resize-functions compares std::string.resize() vs reserve().

  • Backgrounder: Capacity — allocated capacity is often uninitialized memory reserved for this string.
  • Backgrounder: size — length of the string that’s fully initialized and accessible. Some of the characters (nulls) could be invisible.
  • string.resize() can increase the string’s size by filling in space (at the end) with dummy characters (or a user-supplied char). See http://www.cplusplus.com/reference/string/string/resize/
    • After resize(55) the string size is 55. All 55 characters are part of the string proper.
    • changes string’s size
  • string.reserve() doesn’t affect string size. This is a request to increase or shrink the “capacity” of the object. I believe capacity (say 77) is always bigger than the size of 55. The 77 – 55 = 22 extra slots allocated are uninitialized! They only get populated after you push_back or insert to the string.

container{string}: j^c++

In java, any container (of string or int or anything) holds pointers only.

I think c# collections (i.e. containers) contain pointers if T is a reference type.

In cpp,

  • container of int always contains nonref, unlike java
  • container of container contains ptr, just like in java
  • but container of string is widely used, and invariably contains nonref std::string !

Q: is there any justification for container<(smart) ptr to string>? I found rather few online discussions.
A: See boost::ptr_container

Q: what if the strings are very large?
A: many std::string implementations use COW to speed up copying + assignment, however, string copy ctor has O(lengthOfString) per in the standard ! So in a standard-compliant implementation copy and assignment would be expensive, so I believe we must use container<(smart) ptr to string>

 

convert non-null-terminated char-array to std::string

std::string ccy (ptr->ccy, ptr->ccy+3); //using a special string() ctor

my ptr->ccy is the address of a 3-char array, but it’s immediately followed by other chars belonging to another field, in a tightly packed struct without padding. If you simply pass ptr->ccy to string() ctor, your string will take in many extra chars until a null terminator.

std::string is COW-reference-counted now but should change]c++11

http://www.drdobbs.com/cpp/c-string-performance/184405453 (circa 2003) explains that COW speeds up string copy and string destruction.

https://stackoverflow.com/questions/12520192/is-stdstring-refcounted-in-gcc-4-x-c11

This proved a small halo.

However, gcc 5.1 introduced a breaking change. https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html says These changes were necessary to conform to the 2011 C++ standard which forbids Copy-On-Write strings

Q: why forbidden?
A: thread-safety … See [[std c++lib]] P692

split std::string on Custom delimiter #practice every6M

See also post on csv string parse…

For a longer delimiter, you may need string.find()

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/serialize_bbg.cpp has my own tested solution parsing individual tree node details from a stringstream

ifstream f1(fileName.c_str());
string line;
while(getline(f1, line)){
  for(int i=1; ;++i){
        int pos = line.find_first_of("\t");
        string token = line.substr(0,pos);
        cerr<<i<<" : " <<token<<endl;
        if (line == token) break; //there's no more tab in the line
        line = line.substr(pos + 1);
  }
}

///// a simpler method:
istringstream lineStream("denmark sweden   india us"); //consecutive spaces are Not treated as one
string outputToken;
int main(){
  while (
    getline(lineStream, outputToken, ' ')) // <-- the only thing to remember
        cout << outputToken << endl;
}

text parsing with stringstream

 

#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;

string _input =
"1 reverse this line\n"
"2 reverse 2nd line\n"
"3 sort gamma alpha";
istringstream entireInputStreamWithTabs(_input); //pretend to be raw input

/* With fixed column types, this parser is more strict. More professional in an interview.
Also supports DateTime parsing. See https://bintanvictor.wordpress.com/2017/03/16/cpp-parse-datetime-string-without-boost/
*/
void extractionTokenParser(string const & fullLine) {
 istringstream ss1line = istringstream(fullLine);
 int lineNum;
 ss1line >> lineNum;
 cout << "line num = " << lineNum << endl;

 for (string tmp; !ss1line.rdstate();) { // check EOL
   ss1line >> tmp; //whitespace removed:)
   cout << tmp << endl;
 }
}
// --------- simpler alternative ---------
void DelimTokenParser(string const & fullLine) {
 istringstream ss1line = istringstream(fullLine);
 for (string tmp; !ss1line.rdstate();) { // check EOL
   getline(ss1line, tmp, '\t');
   cout << tmp << endl;
 }
}
void parseUsingStringStream() {
 string fullLine;
 //cout << _input << endl;
 for (; !entireInputStreamWithTabs.rdstate();) { // check EOF
   getline(entireInputStreamWithTabs, fullLine);
   //cout << fullLine << endl;
   // you can now search or modify the string
   extractionTokenParser(fullLine);
   DelimTokenParser(fullLine);
 }
}

int main(){
 parseUsingStringStream();
 return 0;
}

c++parse DateTime using stringstream #no boost

This is the simplest way I have found.

#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
using namespace std;

//withou Boost, parsing string to DateTime and back
// from http://arsenmk.blogspot.sg/2014/07/converting-string-to-datetime-and-vice.html
int main(){
 stringstream ss{ "1970-01-01 8:00:01" };
 tm simpleStruct; //construct a placeholder on stack
 //parse and output to the placeholder
 ss >> get_time(&simpleStruct, "%Y-%m-%d %H:%M:%S");

 time_t secSinceEpoch = mktime(&simpleStruct);
 if (secSinceEpoch < 0) {
 cout << "parsing failed. (Very strict.) " << secSinceEpoch << endl;
 return -1;
 }
 cout << secSinceEpoch <<" seconds since Epoch (1970/1/1 midnight GMT) is -> ";
 cout << asctime(localtime(&secSinceEpoch));
}

std::string is usable in MSVS but can’t cout +! #include {string}

Many STL headers in Visual C++ (including iostream header) pull in a definition of the std::basic_string class (because they indirectly include the implementation-defined <xstring> header (never include that directly)). While that allows you to use the string class, the relevant operator<< is defined in the <string> header itself, so you must include that manually.

##string utilities in c++std lib

Needed for coding IV and GTD

  1. stringstream
  2. cin
  3. vector<char> for sorting and lower_bound
  4. list<char> for splice? could be slow but who cares in a quick coding test? Simpler than swap
  5. iomanip for output to stringstream
  6. vector<std::string>, set<std::string>
  7. getline on stringstream or cin
  8. std::string methods. Some are powerful but unfamiliar
    1. find_last_not_of()
  9. c_str and char array? seldom needed nowadays

string,debugging+other tips:[[moving from c to c++]]

[[moving from c to c++]] is fairly practical. Not full of good-on-paper “best practice” advice.

P132 don’t (and why) put “using” in header files
P133 nested struct
P129 varargs suppressing arg checking
P162 a practical custom Stack class non-template
P167 just when we could hit “missing default ctor” error. It’s a bit complicated.

–P102 offers practical tips on c++ debugging

* macro DEBUG flag can be set in #define and also … on the compiler command line
* frequently people (me included) don’t want to recompile a large codebase just to add DEBUG flag. This book shows simple techniques to turn on/off run-time debug flags
* perl dumper receives a variable $abc and dump the value of $abc and also ….. the VARIABLE NAME “abc”. C has a similar feature via the preprocessor stringize operator “#”

— chapter on the standard string class — practical, good for coding IV

* ways to initialize

* substring

* append

* insert

StringBuilder ported to c++ @@

ostringstream. See [[essential c++]] P202

ostringstream can convert “everything” to string.
ostringstream can concat.
ostringstream can produce string or c_str.

Practical and Useful for coding tests.

As you can guess, the opposite, istringstream class, can extract and parse from string to many data types.

homemade ref-counted string – implementation notes

([[ nitty gritty ]] P202 has simple sample code)

An option exchange interviewer asked me to outline a ref-counting string class “str”.

char* cstr; //field will be null-terminated and allocated on heap.
int counter; // field will be an int allocated on heap.

Now forget about ctor and big 3, and focus on simple, common client operations AFTER instantiation. Now I realize we need to recall how a string variable is USED.

int length() const;
char* c_str() const; // STL string offers this conversion method, so do we.
char* substr(….) const;
//operator << to print the string
str operator+() const; // produce a new str object by concatenation. Probably follow the effC++ advice to avoid return-by-reference??

Now the big questions

Q: does copy ctor allocate the cstr or the counter, or share them with sister instances?
%%A: share

Q: does conversion ctor from a C string allocate this->cstr and this->count?
%%A: allocate

Q: how do we create another str variable sharing an existing cstr object?
%%A: copy ctor or assignment
A: cvctor

getline(): how many kinds]C++std library

See also https://bintanvictor.wordpress.com/2017/03/17/cparsing-simple-text-experience/

(Note All the getline() functions below are designed to return text from a File or cin — actually a one-way stream. Inapplicable to sockets.)

1) Most documentations mention the older c-str-based istream::getline() http://www.cplusplus.com/reference/istream/istream/getline/. However,

2) http://www.cplusplus.com/reference/string/getline/ is a std::getline(). Unlike the c-str versions of istream::getline() method, this std::string version is implemented as free function instead of member function in istream class.

This version is more modern. C++11 added new features to this free function, but didn’t bother with (1)

I feel cin can emulate it with for(; !cin.eof(); cin>>str3){…}. Of course, cin can also parse numbers!

Differences between these 2 getline() functions:
– One uses c-str; the other uses std::string class.
– One is a member function; the other a free function

Similarities —
– both require streams, therefore unusable in C.

http://www.learncpp.com/cpp-tutorial/132-input-with-istream/ discusses both.

There’s an IKM question on the (older) member function (1).

3) http://www.gnu.org/software/libc/manual/html_node/Line-Input.html describes a prehistoric ANSI-C standard library function — not using std::string Class or std::istream Class. http://crasseux.com/books/ctutorial/getline.html is a short and sharp tutorial on it.

——- philosophically ——-
We can’t assume the most “everyday” programming tasks in a major language are always well-covered online — completely covered, with the confusing details pointed out. Reading a text file is basic but supported by a confusing bunch of alternatives, all named getline().

http://www.augustcouncil.com/~tgibson/tutorial/iotips.html does describe pitfalls of istream::getline()

## std::string cheatsheet

(c-string is fairly popular and more widespread)

An experienced java developer usually memorizes 10 methods of String.java. ditto for std::string. (See the STL book or the Absolute c++ book.)

  • insert() has many overloads
  • push_back() a single char
  • append() and operator+=() can even take in a single char, or a repetition of a char
  • operator+() to concat
  • substr() returns a COPY
  • find(a char or cStr or std::string)
  • find_first_of(a collection of candidate chars, passed in as a string),
    • find_first_of(char) is same as find(char), according to my test
  • find_last_not_of() //trim
  • at() can read-WRITE a single element.
    • operator[] is faster, but without range check, as with std::vector!
  • front() back() can read-WRITE the character
  • transform(word.begin(), word.end(), word.begin(), ::tolower)
  • std::string trailing space trim
  • c++split string on custom delimiter char
  • begin(), rbegin(), end(), rend()
  • replace() has many overloads
  • to_string(int_or_float) // c++11
  • clear()
  • empty()
  • myStr.capacity() // like vector

…. see http://www.cplusplus.com/reference/string/string/