cpu affinity, basics

sched_setaffinity() is a syscall probably close to the hardware.  All other *affinity*() functions are based thereon

*_np in function name means non-portable.

Q: is there a cpu instruction for affinity?

Need to read more. Similar to socket QQ — Not deep, but even more academic than socket

POSIX semaphores do grow beyond initial value #CSY

My friend Shanyou pointed out a key difference between c++ binary semaphore and a simple mutex — namely ownership. Posix Semaphore (binary or otherwise) has no ownership, because non-owner threads can create permits from thin air, then use them to enter the protected critical section. Analogy — NFL fans creating free tickets to enter the stadium.

https://stackoverflow.com/questions/7478684/how-to-initialise-a-binary-semaphore-in-c is one of the better online discussions.

  • ! to my surprise, if you initialize a posix semaphore to 1 permit, it can be incremented to 2. So it is not a strict binary semaphore.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/binarySem.cpp is my experiment using linux POSIX semaphore. Not sure about system-V semaphore. I now think a posix semaphore is always a multi-value counting semaphore with the current value indicating current permits available.

  • ! Initial value is NOT “max permits” but rather “initial permits”

I feel the restriction by semaphore is just advisory, like advisory file locking. If a program is written to subvert the restriction, it’s easy. Therefore, “binary semaphore” is binary IIF all threads follow protocol.

https://stackoverflow.com/questions/62814/difference-between-binary-semaphore-and-mutex claims “Mutex can be released only by thread that had acquired it, while you can signal semaphore from any non-owner thread (or process),” This does NOT mean a non-owner thread can release a toilet key owned by another thread/process. It means the non-owner thread can mistakenly create a 2nd key, in breach of exclusive, serialized access to the toilet. https://blog.feabhas.com/2009/09/mutex-vs-semaphores-–-part-1-semaphores/ is explicit saying that a thread can release the single toilet key without ever acquiring it.


[[DougLea]] P220 confirms that in some thread libraries such as pthreads, release() can increment a 0/1 binary semaphore value to beyond 1, destroying the mutual exclusion control.

However, java binary semaphore is a mutex because releasing a semaphore before acquiring has no effect (no “spare key”) but doesn’t throw error.

POSIX countingSemaphore4IPC #Solaris

Exec Summary — I feel inter-thread coordination should use mutex+condVar, whereas semaphores are more useful in IPC.

https://docs.oracle.com/cd/E19120-01/open.solaris/816-5137/sync-11157/index.html points out a lesser-known difference in the Solaris context:

Because semaphores need not be acquired and be released by the same thread, semaphores can be used for asynchronous event notification, such as in signal handlers (but presumably not interrupt handlers). And, because semaphores contain state, semaphores can be used asynchronously without acquiring a mutex lock as is required by condition variables. However, semaphores are not as efficient as mutex locks.

The same page also shows POSIX countingSemaphore can be used IPC or between threads.


thread^process: lesser-known differences #IV

Popular IV question. Largely a QQ question.  Some may consider it zbs.

To the kernel, there are man similarities between the “thread” construct vs the “process” construct. In fact, a (non-kernel) thread is often referenced as a LightWeightProcess in many kernels such as Solaris and Linux.

  • context switching — is faster between threads than between processes because no TLB flush needed. In linux, context switching between kernel-threads is even faster.
  • creation — some thread libraries can create threads without the kernel knowing. No such thing for a process.
  • socket — 2 threads in a process can access the same socket; two processes usually can’t access the same socket, unless … parent-child. See post on fork()
  • memory — thread AA can access all heap objects, and even Thread BB’s stack objects via pointers. Two processes can’t share these, except via shared memory.
  • a non-kernel thread can never exist without an owner process. In contrast, every process always has a parent process which could be long gone.


blockingMutex implementation ] kernel!!userland

Background — Linux kernel provides two types of locks — spinlock and blocking mutex, as in https://www.kernel.org/doc/htmldocs/kernel-locking/locks.html . Here I focus on the mutex. I think this is far more useful to userland applications.

https://lwn.net/Articles/575460/ has good pointers:

  • I believe context switch is expensive, since CPU cache has to be replaced. Therefore, optimistic spin is beneficial.

https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c shows

  • a blocking mutex used in kernel, perhaps not directly used by userland apps
  • implemented using spin lock + some wait_lock
  • maintains a wait_list. Not visible to any userland app.


## notable linux system calls: QQ question

https://syscalls.kernelgrok.com can sort the functions by function id

http://asm.sourceforge.net/syscall.html is ordered by function id

  • fork()
  • waitpid()
  • open() close() read() write()
  • –socket
  • socket() connect() accept()
  • recvfrom() sendto()
  • shutdown() is for socket shutdown and is more granular than the generic close()
  • select()
  • epoll family
  • –memory
  • brk
  • mmap


http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.xsi_shared_memory points out

  • Boost.Interprocess provides portable shared memory in terms of POSIX semantics. I think this is the simplest or default mode of Boost.Interprocess. (There are at least two other modes.)
  • Unlike POSIX shared memory segments, SysV shared memory segments are not identified by names but by ‘keys’. SysV shared memory mechanism is quite popular and portable, and it’s not based in file-mapping semantics, but it uses special system functions (shmgetshmatshmdtshmctl…).
  • We could say that memory-mapped files offer the same interprocess communication services as shared memory, with the addition of filesystem persistence. However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory. Therefore, I don’t see any market value in this knowledge.

POSIX^SysV sempaphores

https://www.ibm.com/developerworks/library/l-semaphore/index.html — i have not read it.

My [[beginning linux programming]] book also touches on the differences.

I feel this is less important than the sharedMem topic.

  • The posix semaphore is part of pthreads i.e. Posix Threads
  • The sysV semaphore is part of IPC and often mentioned along with sysV sharedMem

The counting semaphore is best known and easy to understand.

  • The pthreads semaphore can be used this way or as a binary semaphore.
  • The system V semaphore can be used this way or as a binary semaphore. See http://portal.unimap.edu.my/portal/page/portal30/Lecturer%20Notes/KEJURUTERAAN_KOMPUTER/SEM10809/EKT424_REAL_TIME_SYSTEM/LINUX_FOR_YOU/12_IPC_SEMAPHORE.PDF

Linux manpage pointed out — System V semaphores (semget(2)semop(2), etc.) are an older semaphore API. POSIX semaphores provide a simpler, and better designed interface than System V semaphores; on the other hand POSIX semaphores are less widely available (especially on older systems) than System V semaphores.

The same manage implies both APIs use a _counting_ semaphore semantic, without notification semantics

posix sharedMem: key points { boost

http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.shared_memory_steps is excellent summary

* We (the app developer) need to pick a unique name for the shared memory region, managed by the kernel.

* we can use create_only, open_only or open_or_create

* When we link (or “attach” in sysV lingo) App1’s memory space to the shared memory region, the operating system looks for a big enough memory address range in App1’s address space and marks that address range as an special range. Changes in that address range are automatically seen by App2 that also has mapped the same shared memory object.

* As shared memory has kernel or filesystem persistence, we must explicitly destroy it.

Above is the posix mode. The sysV mode is somewhat different.

breakdown heap/non-heap footprint@c++app #massif

After reading http://valgrind.org/docs/manual/ms-manual.html#ms-manual.not-measured, I was able to get massif to capture non-heap memory:

valgrind --tool=massif  --pages-as-heap=yes --massif-out-file=$massifOut .../xtap -c ....
ms_print $massifOut

Heap allocation functions such as malloc are built on top of system calls  such as mmapmremap, and brk. For example, when needed, an allocator will typically call mmap to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc et al. Massif directly measures only these higher-level malloc et al calls, not the lower-level system calls.

Furthermore, a client program may use these lower-level system calls directly to allocate memory. By default, Massif does not measure these. Nor does it measure the size of code, data and BSS segments. Therefore, the numbers reported by Massif may be significantly smaller than those reported by tools such as top that measure a program’s total size in memory.


jmp_buf/setjmp() basics for IV #ANSI-C

Q: how is longjmp different from goto? See http://ecomputernotes.com/what-is-c/function-a-pointer/what-is-the-difference-between-goto-and-longjmp-and-setjmp

A: longjmp can 1) jump across functions, and 2) restore program state from a jmp_buf, which was saved earlier by setjmp.

dotnet IPC implemented by eventvwr, perfmon

Though this post is about dotnet, it is likely to be also applicable to non-dotnet apps.

You can create/delete custom event logs. The 3 built-in event logs (app, security and sys) can’t be deleted. All of the above are viewable on eventvwr.exe

Except security log, all other event logs are writable – including app, sys and custom event logs. I feel this is a simple way to implement basic application logging in dotnet. The EntryWritten event firing feature looks neat.

Further, it looks like multiple (Local) processes can subscribe to the same EntryWritten event, so you could perhaps coordinate 2 processes using EntryWritten — IPC

You can even access remote event logs.

Perf counter is Another simple IPC mechanism. You can create custom perf counters, visible to other processes + perfmon.exe.

Since one process can update the counter and another process can see it in real time, it’s IPC.

I have yet to implement IPC this way.

share huge static memory chunk btw2processes

Based on P244 [[linux sys programming ]]

I am not quite sure about the use cases[1], but let’s say this huge file needs to be loaded into memory, by 2 processes. If readonly, then memory savings is possible by sharing the memory pages between the 2 processes. Basically, the 2 virtual address spaces map to the same physical pages.

Now suppose one of them — Process-A — needs to write to that memory. Copy-on-write takes place so Process-B isn’t affected. The write is “intercepted” by the kernel which transparently creates a “copy”, before committing the write to the new page.

fork() system call is another user of Copy-on-write technique.

[1] Use cases?

* perhaps a large library, where the binary code must be loaded into memory
* memory-mapped file perhaps

What code is machine/arch specific@@ syscalls, std lib …

A: (obviously) binary object code — including the unix/windows kernels. It runs on the bare hardware — must be machine specific.

A: Assembly language source code — must conform to the hardware instruction set. You can't use a i386 assembly code on a PowerPC. Note each line of assembly source code translates to a line in machine code.

A: C compiler — implements the ABI [1]. The object code produced by the compiler is binary machine code, so the compiler itself must be architecture-specific.

A: Syscalls – are specific to the Architecture. The linux syscalls for i386 architecture include about 300 functions. 90% of them are universally available on all architectures but the rest are architecture-specific.

A: C standard library (like glibc) – provides wrappers over syscalls. Since a small number of syscalls are architecture-specific, the std lib is necessarily architecture-specific. However, the “contamination” stops here – I believe anything linked to the std lib is portable at source level. Therefore the std lib provides a “standard” API. Java portability is even better – where the same bytecode compiled on one architecture is usable on any other, assuming 100% pure java without native code.

[1] API vs ABI – explained in [[linux system programming]]

your compiler ^ the c++ standard

C++ has many compilers offering extensions.

Many developers write code based on the c++ standard. You could decide “if this operation fails, I would get this particular error according to the standard.”

Well, unless you are aiming for portability, you had better respect your local compiler more than the language standard. Your local compiler could add a “feature” and avoid raising that error. Side effect – you don't get “reminded” and you assume everything worked as expected – dangerous. There could be a compile-time warning but then it's often drowned in the noise. Poor signal/noise ratio.

My stance — know your compiler. Whenever the standard says some unusual condition should end in a certain way (or one of several ways), check your compiler's conformance.

It's also possible that the standard says this condition should return a null pointer or silently return, but a compiler adds a much-needed diagnostic feature to catch it. Sounds too good, but then the problem is, if you tweak some debug-build settings (or when you port your code to another compiler) you could lose the “reminders” that you depend on. Your code may appear to work in UAT but fail in production.

[[linux programmer’s toolbox]]

MALLOC_CHECK_ is a glibc env var
–debugger on optimized code

P558 Sometimes without compiler optimization performance is unacceptable.

To prevent optimizer removing your variables, mark them volatile.

An inline function may not appear in call stack. Consider “-fno-inline”

–P569 double-free may not show issues until the next free() or malloc()

–P470 – 472 sar command
can show per-process performance data

can monitor network devices

—P515 printf + macros for debugging

buffering behavior differs between terminal ^ log files

every async operation involves a sync call

I now feel just about every asynchronous interaction involves a pair of (often remote) threads. (Let’s give them simple names — The requester RR vs the provider PP). An async interaction goes through 2 phases —

Phase 1 — registration — RR registers “interest” with PP. When RR reaches out to PP, the call must be synchronous, i.e. Blocking. In other words, during registration RR thread blocks until registration completes. RR thread won’t return immediately if the registration takes a while.

If PP is remote, then I was told there’s usually a local proxy object living inside the RR Process. Registration against proxy is faster, implying the proxy schedules the actual, remote registration. Without the scheduling capability, proxy must complete the (potentially slow) remote registration on the RR thread, before the local registration call returns. How slow? If remote registration goes over a network or involves a busy database, it would take many milliseconds. Even though the details are my speculation, the conclusion is fairly clear — registration call must be synchronous, at least partially.

Even in Fire-and-forget mode, the registration can’t completely “forget”. What if the fire throws an exception at the last phase after the “forget” i.e. after the local call has returned?

Phase 2 — data delivery — PP delivers the data to an RR2 thread. RR2 thread must be at an “interruption point” — Boost::thread terminology. I was told RR2 could be the same RR thread in WCF.

stack frame has a pointer to the caller stack frame

Q: what data occupy the real estate of a single stack frame? Remember a stack frame is a chunk of memory of x bytes and each byte has a purpose.

A: (obviously) any “auto” variable declared (therefore allocated) in the function

A: (obviously) any formal parameter. If a 44-byte Account parameter is passed-by-value, then the 44-bytes are allocated in the stack frame. If by-reference, then only a 4-byte pointer allocated.

A: 4-byte pointer to caller's stack frame. Note that frame also contains a pointer to its own caller. Therefore, the stack frames form a linked list. This pointer is known as a “previous stack top”.

A: 4-byte ReturnAddress. When a function f4() returns, control passes back to the caller function f3(). Now at assembly level the caller function may be a stream of 20 instructions. Our f4() may be invoked on Instruction #8 or whatever. This information is saved in f4() stack frame under “ReturnAddress”. Upon return, this information is put into the “instruction pointer” register inside the CPU.

operands to assembly instructions

I feel most operands are registers. See first operand in example below. That means we must load our 32 bits into the EAX register before the operation.

However, an operand can also refer directly to a memory location.

SUB EAX [0x10050D49]

A third type of operand is a constant. You pass that constant from source code to compiler and it is embedded in the “object file”

glibc, an archetypical standard library (vs system calls)

Background — I was looking for a concrete example of a standard library.

Best example of a standard library is glibc, produced by GNU team. If you strip the G, it’s “libc” — THE C standard library.

“Standard” means this library is a “carpet” hiding all the platform-specific differences and presents a uniform interface to high-level application programmers — so-called App-Programmer-Interface or “API”.

There are many industry standards for this same purpose, such as POSIX, ANSI-C (which standardizes the C programming language + standard lib). Glibc supports all of these standards.

To clarify a common confusion, it’s worthwhile to understand this simple example — glibc functions (like printf) are implemented in platform-specific underlying syscalls.

Q: Exactly What are the platform differences? Short answer — system calls. Remember System calls call into the “hotel service desk”. These syscalls are tied to the processor and the operating system so they are by definition platform-specific. See other posts on syscall vs standard library.

what is kernel space (vs userland)

(sound-byte: system calls — kernel space; standard library functions — userland, often wrappers over syscalls)

Executive summary — kernel is special source code written by kernel developers, to run in special kernel mode.

Q: But what distinguish kernel source code from application source code?
A: Kernel functions (like syscall functions) are written with special access to hardware devices. Kernel functions are the Gatekeepers to hardware, just like app developers write DAO class as gatekeepers to a DB.

Q: Real examples of syscall source code?
A: I believe glibc source code includes either syscall source code or kernel source code. I guess some kernel source code modules aren’t in glibc. See P364[[GCC]]
A: kernel32.dll ?
A: tcp/ip is implemented in kernel.
A: I feel device drivers are just like kernel source code, though RAM/CPU tend to be considered the kernel of kernel.

My 2-liner definition of kernel — A kernel can be thought of as a bunch of (perhaps hundreds of) API functions known as “syscalls”. They internally call additional (10,000 to 100,000) internal functions. Together these 2 bodies of source code constitutes a kernel. On an Intel platform, kernel and userland source code both compile to Intel instructions. At the individual instruction level, they are indistinguishable, but looking at the source code, you can tell which is kernel code.

There are really 2 distinct views (2 blind men describing an elephant) of a kernel. Let’s focus on run-time actions —
X) a kernel is seen as special runtime services in the form of syscalls, similiar to guest calls to hotel service desk. I think this is the view of a C developer.
Y) behind-the-scene, secret stream of CPU instructions executed on the CPU, but not invoked by any userland app. Example — scheduler [4]

I don’t think a kernel is “a kind of daemon”. Such a description is misleading. Various “regular” daemons provide services. They call kernel functions to access hardware. If a daemon never interacts with user processes, then maybe it would live in “kernel space”. I guess kernel thread scheduler might be among them.

I feel it’s unwise (but not wrong) to think of kernel as a process. Kernel services are used by processes. I guess it’s possible for a process to live exclusively in “kernel space” and never interact with user processes. http://www.thehackademy.net/madchat/sysadm/kern/kern.bsd/the_freebsd_process_scheduler.pdf describes some kernel processes.

P241 [[Pro .net performance]] describes how something like func3 in kernel32.dll is loaded into a c# application’s code area. This dll and this func3 are treated similar to regular non-kernel libraries. In a unix C++ application, glibc is linked in just like any regular library. See also http://www.win.tue.nl/~aeb/linux/lk/lk-3.html and http://www.win.tue.nl/~aeb/linux/lk/lk-3.html

[4] Scheduler is one example of (Y) that’s so extremely prominent that everyone feels kernel is like a daemon.

The term “kernel space” is misleading — it is not a special part of memory. Things in kspace don’t run under a privileged user.

— call stack view —
Consider a c# P/Invoke function calling into kernel32.dll (some kernel func3). If you were to take a snapshot of an average thread stack, top of the stack would be functions written by app developers; middle of the stack are (standard) library functions; bottom of the stack are — if hardware is busy — unfinished kernel syscalls. Our func3 would be in the last 2 layers.

All stack frames below a kernel API is “kernel space”. These stack frames are internal functions within the kernel_code_base. Beneath all the stack frames is possibly hardware. Hardware is the ultimate low-level.

Look at the bottom-most frame, it might be a syscall. It might be called from java, python, or some code written in assembly. At runtime, we don’t care about the flavor of the source code. The object code loaded into the “text” section of the Process is always a stream of assembly code, perhaps in intel or sparx InstructionSet

ANY process under any user can call kernel API to access hardware. When people say kernel has special privileges, it means kernel codebase is written like your DAO.

IPC sockets^memory mapped files

Unix domain socket is also known as IPC socket.

Requirement — Large volume data sharing across jvm and other unix processes.

  1. sockets are well supported in java, c, perl, python but still requires copying lots of data. I think only the unix domain socket is relevant here, not inet sockets.
  2. memory mapped files as a RandomAccessFile and MappedByteBuffer? Pure java solution — No JNI needed. I feel not so “popular”. c# and c++ also support it.

[11]Cantor/eSpeed c++ #manyQ

Mostly knowledge-based QQ questions. I need to be selective what to study.

Q1: The reverse is common but when do we provide c wrapper around c++ classes?
A: internal implementation is c++ but clients need C API

Q1b: can c call c++ # <—— very common question
%%A: at compile time, c++ code may be uncompilable by a c compiler. CFRONT converts c++ code into c source code until exceptions were added to the c++ language. At run time, if I have object code precompiled from c++, can c call it? I think it’s often possible.
A: check out the C Application Binary Interface. If your C and C++ code are interoperable, it is because your C and C++ compilers also conform to the same ABI.
A: yes that’s what extern “C” is for.

Q: tool for binary code dependency? (I guess the dependencies are dynamic libraries, rather than static libs linked in.)
AA: ldd or objdump. See http://ask.xmodulo.com/check-library-dependency-program-process-linux.html

Q9: call java from c++, what’re the c++ function(s)?
A: create_vm(), CallStaticVoidMethod(), CallVoidMethod() etc

Q9b: How do I access an int field of a java class?
%%A: if I have a pointer to the java object, I should be able to directly access the int field

Q: how do u implement a singleton

Q: semaphore(1) == mutex?
%%A: no since semaphore in most implementations uses a mutex internally. Functionally they are identical
%%A: semaphore is often cross process. Mutex can also be used on shared memory.

Q: for a const method getAge(), can it return a ptr to a non-const int which is a field?
%%A: i have seen sample code, so probably compilable
A: tricky. P215 effSTL says that inside a const method, all non-static fields become const.

Q3a: what’s the return type of operator*()?
%%A: can be anything, just like operator[], but not void not empty

Q3b: How about qq[ operator T*() ] ?
A(now): is it the OOC to a ptr? Correct. See http://en.cppreference.com/w/cpp/language/cast_operator

Q: if Base class has a virtual method f(int), and Derived class has an overload virtual method f(float, char), is the Base f() accessible via a D object?
%%A: the D f() hides the base f(), but base f() may still be accessible using A::f()

Q: 1 process, 30 threads vs 10 processes * 3 threads each? Assuming there’s data sharing among the threads
%%A: avoid ITC and serialization
%%A (Now): I feel single-process has a single point of failure without redundancy.
A: depends on the IPC cost. If non-trivial cost then favor 30-threads.

Q: how do u change the default fair-share scheduling among threads? #<— uncommon
%%A: I can make some threads sleep or wait. I can control thread-lib-level priorities, but the kernel thread priority levels could be very coarse
%%A(now): if userland thread, then scheduling is done in the thrd lib.

Q: unix performance monitoring?
%%A: jconsole; top; vmstat; perfmeter


Q: Dynamic loadable library. How do we configure the system to load a new version of the library?
%%A: just overwrite the existing *.so file??

Q: how to convert a ptr-to-non-const to a ptr-to-const?
AA: just assign it. The reverse assignment needs const_cast. Tested in g++. I feel this is an obscure technicality.

— communication layer
Q: how do I see syscalls and sockets open by a process?
%%A: truss; lsof; snoop/tcpdump; netstat

Q: practical use of signals?
%%A: kill, core dump, thread dump, suspend,..
A: I used q(trap) to install my signal handler in my bash

Q: what are named pipe? #<—– uncommon
%%A: I think they exist in the file system as fake files. “cat | more” uses an unnamed pipe I believe.
A: now I think I should just say “never used it”.

pipe, stream, pesudo file, tcp-socket#xLang

* Java has a family of IO classes;
* c++ has a family of stream classes derived from class ios;
* unix has various file-like things.

“pipe” and “stream” are 2 general terminologies for an input/output destination with sequential[1] access. A file is one subtype of stream. Therefore java and c++ access files using their family of classes above.

[1] c++ gives each sequential stream a get-ptr and a put-ptr.
A named pipe is an entity in the file system….

TCP (not UDP) socket is a stream-oriented protocol. Sockets, either unix domain or TCP (not udp), is also like a pipe/stream. In fact, Unix treats any socket as a file so you can read and write to a it like a file. Perl liberally “confuses” file handles and sockets, just as C uses the same type of file-descriptor-number for files and sockets.

TCP (not UDP) is ordered — if two messages are sent over a connection in sequence, the first message will reach the receiving application first. When data segments arrive in the wrong order, TCP buffers the out-of-order data until all data can be properly re-ordered and delivered to the application.

Python wraps the socket descriptor number in a socket object, but you can still get the underlying descriptor number by mySocket.fileno()–

import socket
mySocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print mySocket.fileno()

named pipe simple eg #unix socket

$ mkfifo –mode=0666 /tmp/myfifo # create a named pipe, with a file name

$ cat /etc/passwd > /tmp/myfifo # writer will block until some reader comes up on the receiving end.

Now open another terminal

$ cat < /tmp/myinfo # or

$ tail -f /tmp/myfifo

http://developers.sun.com/solaris/articles/named_pipes.html shows pros and cons. http://www.linuxjournal.com/article/2156 is a simple tutorial.

Motivation? Allow totally unrelated programs to communicate with each other

A side note to be elaborated in another blog — name pipe is a FIFO stream, whereas unix domain socket can be data gram or stream (like TCP)

syscall^standard library functions (glibc)

(See also the syscalls in glibc in [[GCC]] )

C and assembly programs both compile to objectCode — platform-specific

* both make syscalls
* both “translate” to the same platform-dependent object code
** C never translates to human-readable assembly code.

* I guess part of the object code is the binary version of assembly code and consists of instructions defined by the processor chip, but a lot of heavy lifting is delegated to syscalls.
* object code like a.out is executable. I think object code file loads into memory as is.

I believe a standard library is in C, whereas system calls are available in C, assembly and any language chosen by the kernel developers.

In C source code you make (synchronous) syscalls like fopen(), write(), brk(), exit(). To my surprise, in an equivalent Assembly source code, you invoke exactly the same syscalls, though invoking by syscall ID.

Think about it — the kernel is the gate keeper and guardian angel over all hardware resources, so there’s no way for assembly program to bypass the kernel when it needs to create socket or print to console.

– C standard library (glibc?) exists on each platform. “printf()” is a standard library function, and translates to platform specific syscalls like write(). Since your C source code only calls printf(), it is portable.
– In contrast, your assembly program source code makes syscalls directly and is always platform specific. No standard library.

—-Differentiate between standard library functions (manpage Section 3) like printf() vs syscalls (manpage Section 2) like write()?  Look at the names —
system call — are calls to the hotel-service-desk, which is operating system kernel. In fact, system calls are written by kernel developers and runs in kernel space. See my post on kernel space
standard library calls — are standardized across platforms. They run in userland.

Many standard library functions are thin wrapppers over system calls, making them hard to differentiate. Someone said online “It’s often difficult to determine what is a library routine (e.g printf()), and what is a system call (e.g sleep()). They are used in the same way and the only way to tell is to remember which is which.”

## IPC solutions: full list

I get this interview question repeatedly. See the chapters of [[programming perl]] and [[c++ threading]]. Here is my impromptu answer, not backed by any serious research. (Perl and c are similar; java is different.)

Let’s first focus on large volume data flow IPC. As of 2012, fastest and most popular IPC is shared memory, followed by named pipe. Across machines, you would have to accept the lower throughput of UDP (or worse still TCP) sockets. However, even faster than shared memory is a single-process data flow — eliminating IPC overhead. I feel you can use either heap or global area. Therefore, the throughput ranking is

# single-process
# shared mem
# named pipe or unix domain sockets

— here’s a fuller list

  • MOM — dominant async solution in high volume, high-reliability systems
  • shared mem — extremely popular in MOM and ETL
  • named pipes and nameless “|” pipes
  • unix-domain sockets;
  • sysV msg queues. Not popular nowadays
  • shared files, ftp, DB — wide spread
  • memory mapped file
  • web service — dominant in cross platform systems
  • signals
  • RMI — java only
  • email

in/out params in std lib – cf select() syscall

Now I feel pure-output and in/out parameters are rather common in C standard library. Here’s just one of many examples.

http://beej.us/guide/bgnet/output/html/multipage/recvman.html shows the last parameter “FROMLEN” is an in/out. Function recvfrom() will fill up FROMLEN with the size of struct sockaddr. (You must also pre-initialize fromlen to be the size of FROM or struct sockaddr.)

The 2nd last parameter is a pure-output parameter.

assembly language programming – a few tips

C compiler compiles into binary machine code. I think pascal too, but not java, dotnet, python.

Assembly-language-source-code vs machine-code — 1-1 mapping. Two representations of the exact same program.

Assembly-language-source-code is by definition platform-specific, not portable.

A simple addition in C compiles to about 3 “Instructions” in machine code, but since machine code consists of hex numbers and unreadable, people represent those instructions by readable assembly source code.

Compared to a non-virtual, a virtual function call translates to x additional instructions. X is probably below 10.

There are many “languages” out there.
* C/c++ remain the most essential language. Each line of source code converts to a few machine instructions. Source code is portable without modification whereas the compiled machine code isn’t.
* Assembly is often called a “language” because the source code is human readable. However a unique feature is, each line of Assembly-language-source-code maps to exactly one line of machine code.
* newer languages (c# java etc) produce bytecode, not machine code.

implicitly stateful library function – malloc(), strtok

( See also beginning of [[Pthreads programming]] )
Most library functions are stateless — consider Math.

Most stateful library calls require manager object instantiation. This object is stateful. In java, some notable stateful language-level libraries include
– Calendar
– Class.forName() that automatically registers a JDBC driver

If there’s a syscall like setSystemTime(), it would be stateless because the C library doesn’t hold the state, which is held in the OS.

In C/C++, the most interesting, implicitly stateful library routines is malloc(). Invisible to you, the freelist is maintained not in any application object, nor in the OS, but in the C library itself. Malloc() is like an airline booking agent or IPO underwriter. It get s a large block and then divvies it up to your program. See the diagram on P188 [[C pointers and mem mgmt]]

The freelist keeper is known as “mem mgmt routine” in the “standard C-library”.

Another common stateful stdlib function is strtok(). It’s not a pure function. It remembers the “scanner position” from last call! The thread-safe version is strtok_r()