- memory fence c++
- memory barrier c++
- c++ thread memory visibility
You can search for these keywords on Google. Hundreds of people would agree that without synchronization, a write to sharedMutableObject1 by thread A at Time 1 is not guaranteed to be visible to Thread B at Time 2.
In any aggressively multithreaded program, there are very few shared mutable objects. If there’s none by design, then all threads can operate in single-threaded mode as if there’s no other thread in existence.
In single-threaded mode (the default) compilers would Not generate machine code to always flush a write to main memory bypassing register/L1/L2/L3 caches. Such a memory barrier/fence is extremely expensive — Main memory is at least 100 times slower than register access.
I would hypothesize that by default, the most recent write (at Time 1) is saved to register only, not L1 cache, because at compile time, compiler author doesn’t know if at runtime this same thread may write to that same address! If you update this object very soon, then it’s wasteful to flush the intermediate, temporary values to L1 cache, since no other threads need to see it.
L1 cache is about 10 times slower than register.
Multi-threaded lock-free programming always assumes multiple threads access shared mutable objects.
Even a lock-free function without contention  requires memory barriers and is therefore slower than single-threaded mode. I would say in a low-contention context, the main performance gain of single-threaded over lock-free is data cache efficiency. Another performance gain is statement reordering.
 i.e. no retry needed, since other threads are unlikely to touch sharedMutableObject1 concurrently