I guess xchg is used in some basic concurrency constructs, but I need to research more.
See my blogpost hardware mutex, based@XCHG instruction
https://c9x.me/x86/html/file_module_x86_id_328.html points out the implicit locking performed by CPU whenever one of the two operands is a memory location. The other operand must be a register.
This implicit locking is quite expensive according to https://stackoverflow.com/questions/50102342/how-does-xchg-work-in-intel-assembly-language, but presumably cheaper than user-level locking.
This implicit locking involves a memory fence.