3overhead@creating a java stackframe]jvm #DQH

  • additional assembly instruction to prevent stack overflow… https://pangin.pro/posts/stack-overflow-handling mentions 3 “bang” instructions for each java method, except some small leaf methods
  • safepoint polling, just before popping the stackframe
  • (If the function call receives more than 6 arguments ) put first 6 args in register and the remaining args in stack. The ‘mov’ in stack involves more instructions than registers. The subsequent retrieval from stack is likely L1 cache, slower than register read.

spin lock in kernel: unavoidable

Suppose you have 2 cores so 2 kernel threads can run simultaneously. If they are deadlocked, what would the cpu be doing? Nothing but spin. I believe there’s no concept of “blocking” in kernel.

Now suppose there’s another core, but the current process is waiting for I/O. What can this core do? Noting but spin

Q: which thread/PID drains NicBuffer→socketBuffer

Too many kernel concepts here. I will use a phrasebook format. I have also separated some independent tips into hardware interrupt handler #phrasebook

  1. Scenario 1 : A single CPU. I start my parser which creates the multicast receiver socket but no data coming. My process (PID 111) gets preempted on timer interrupt. CPU is running unrelated PID 222 when my data wash up on the NIC.
  2. Scenario 2: pid111 is running handleInput() while additional data comes in on the NIC.

Some key points about all scenarios:

  • context switching — There’s context switch to interrupt handler (i-handler). In all scenarios, the running process gets suspended to make way for the interrupt handler function. I-handler’s instruction address gets loaded into the cpu registers and this function starts “driving” the cpu. Traditionally, the handler function would use the suspended process’s existing stack.
    • After the i-handler completes, the suspended “current” process resumes by default. However, the handler may cause another pid333 to be scheduled right away [1 Chapter 4.1].
  • no pid — interrupt handler execution has no pid, though some authors say it runs on behalf of the suspended pid. I feel the suspended pid may be unrelated to the socket (Scenario 2), rather than the socket’s owner process pid111.
  • kernel scheduler — In Scenario 1, pid111 would not get to process the data until it gets in the “driver’s seat” again. However, the interrupt handler could trigger a rescheduling and push pid111 “to the top of the queue” so to speak. [1 Chapter 4.1]
  • top-half — drains the tiny NIC ring-buffer into main memory (presumably socket buffer) as fast as possible [2] as NIC buffer can only hold a few packets — [[linux kernel]] P 629.
  • bottom-half — (i.e. deferrable functions) includes lengthy tasks like copying packets. Deferrable function run in interrupt context [1 Chapter 4.8], under nobody’s pid
  • sleeping — the socket owner pid 111 would be technically “sleeping” in the socket’s wait queue initially. After the data is copied into the socket receive buffer in user space, I think the kernel scheduler would locate pid111 in the socket’s wait queue and make pid111 the cpu-driver. This pid111 would call read() on the socket.
    • wait queue — How the scheduler does it is non-trivial. See [1 Chapter]
  • burst — What if there’s a burst of multicast packets? The i-handler would hog or steal the driver’s seat and /drain/ the NIC ring-buffer as fast as possible, and populate the socket receive buffer. When the i-handler takes a break, our handleInput() would chip away at the socket buffer.
    • priority — is given to the NIC’s interrupt handler as NIC buffer is much smaller than socket buffer.
    • UDP could overrun the socket receive buffer; TCP uses transmission control to prevent it.

Q: What if the process scheduler is triggered to run (on timer interrupt) while i-handler is busy draining the NIC?
A: Well, all interrupt handlers can be interrupted, but I would doubt the process scheduler would suspend the NIC interrupt handler.

One friend said the while the i-handler runs on our single-CPU, the executing pid is 1, the kernel process. I doubt it.

[1] [[UnderstandingLinuxKernel, 3rd Edition]]

[2] https://notes.shichao.io/lkd/ch7/#top-halves-versus-bottom-halves

[15]1st deep dive@linker + comparison with compiler

mtv: I feel linker errors are common. Linker is less understood than pre-processor or compiler. This know-how is more practical than a lot of c++ topics like MI, templates, op-new … Most real veterans (not just bookworm generals) would deal with some linker errors and develop some insight. These errors can take a toll when your project is running late. My textbook knowledge isn’t enough to give me the insight needed.

I believe compiler produces object files; whereas linkers take in object or library files and produce library or executable files.

Q: can linker take in another linker’s output?

http://www.lurklurk.org/linkers/linkers.html seems to be more detailed, but I have yet to read it through.


This object file contains the compiled code (in binary form) of the symbols defined in the input. Symbols in object files are referred to by name.

Object files can refer to symbols that are not defined. This is the case when you use a declaration, and don’t provide a definition for it. The compiler doesn’t mind this, and will happily produce the object file as long as the source code is well-formed.

(I guess the essence of linking is symbol resolution i.e. translating symbols to addresses) It links all the object files by replacing the references to undefined symbols with the correct addresses. Each of these symbols can be defined in other object files or in libraries.

During compilation, if the compiler could not find the definition for a particular function, it would just assume that the function was defined in another file. If this isn’t the case, there’s no way the compiler would know — it doesn’t look at the contents of more than one file at a time.

So what the compiler outputs is rough machine code that is not yet fully built, but is laid out so we know the size of everything, in other words so we can start to calculate where all of the absolute addresses will be located. The compiler also outputs a symbol table of name/address pairs. The symbols relate a memory offset in the machine code in the module with a name. The offset being the absolute distance to the memory location of the symbol in the module. That’s where we get to the linker. The linker first slaps all of these blocks of machine code together end to end and notes down where each one starts. Then it calculates the addresses to be fixed by adding together the relative offset within a module and the absolute position of the module in the bigger layout.

linker error in java – example

[[mastering lambda]] points out one important scenario of java linker error. Can happen in java 1.4 or earlier. Here’s my recollection.

Say someone adds a method m1() to interface Collection.java. This new compiled code can coexists with lots of existing compiled code but there’s a hidden defect. Say someone else writes a consumer class using Collection.java, and calls m1() on it. This would compile in a project having the new Collection.java but no HashSet.java. Again, this looks fine on the surface. At run time, there must be a concrete class when m1() runs. Suppose it’s a HashSet compiled long ago. This would hit a linker error, since HashSet doesn’t implement m1().


## 10 unix signal scenarios

A signal can originate from outside the process or from within.

The precise meaning of signal generation requires a clear understanding of signal handlers. See P125 [[art of debugging]] and P280 [[linux sys programming]]

— External —
# (SIGKILL/SIGTERM) q(kill) commands
# (SIGINT) ctrl-C
# (SIGHUP) we can send this signal to Apache, to trigger a configuration reload.

— internal, i.e. some kernel code module “sends” this signal to the process committing the “crime” —
# (SIGFPE) divide by zero; arithmetic overflow,
# (SIGSEGV) memory access violation
# (SIGABRT) assertion failure can cause this signal to be generated
# (SIGTRAP) target process hitting a breakpoint. Except debuggers, every process ignores this signal.

pbref^pbclone DEFINED : 3QnA any lang, Part1

pbref include 2 categories – pbcref (const-ref) and pbrref (regular ref). It’s best to update all posts …

  lhs_var =: rhs_var // pseudo code

Reading this assignment in ANY language, we face the same fundamental question –

Q1: copying object address (assume 32 bit) or are we cloning the object? Note input/output for a function call is a trickier form of assignment. In most languages, this is essentially identical to regular LHS/RHS assignments. [1]
Q2: “Is LHS memory pre-allocated (to be bulldozed) or yet to be allocated (malloc)?”
Q3: does the variable occupy a different storage than the referent/pointee object? See http://bigblog.tanbin.com/2011/10/rooted-vs-reseat-able-variables-c-c.html

With these questions,  we will see 2 different types of assignment. I call the them pass-by-reference — pbref vs pbclone — pass-by-cloning (including op= and copy-ctor). These questions are part of my Lesson 1 in Python and C++. Here are some answers

java primitives – pbclone
java non-primitives – pbref
c# value types – pbclone
c# reference types – pbref
c++ nonref by default – pbclone
c++ nonref passing into function ref-param — pbref
c++ pointer – usually pbclone on the 32-bit pointer
c++ literal – pbclone?
c++ literal passing into function ref-param?
php func by default – pbclone
php func ref-param – pbref
php func ref-return – pbref
perl subroutine using my $newVar — pbcloneperl subroutine using $_[0] — pbref
python immutables (string, tuple, literals, frozenset) – pbref — copy-on-write
** python primitive — no such thing. “myInt=3” is implemented same as string. See post on Immutable, initialize etc
python list, dict, set …- pbref

Q: what if RHS of pbref is a literal?
A: then pbref makes no sense.

Q: what if RHS is a function returning by reference?
A: you can get pbref or pbclone

[1] C++ and Perl parameter passing is multifaceted.

every async operation involves a sync call

I now feel just about every asynchronous interaction involves a pair of (often remote) threads. (Let’s give them simple names — The requester RR vs the provider PP). An async interaction goes through 2 phases —

Phase 1 — registration — RR registers “interest” with PP. When RR reaches out to PP, the call must be synchronous, i.e. Blocking. In other words, during registration RR thread blocks until registration completes. RR thread won’t return immediately if the registration takes a while.

If PP is remote, then I was told there’s usually a local proxy object living inside the RR Process. Registration against proxy is faster, implying the proxy schedules the actual, remote registration. Without the scheduling capability, proxy must complete the (potentially slow) remote registration on the RR thread, before the local registration call returns. How slow? If remote registration goes over a network or involves a busy database, it would take many milliseconds. Even though the details are my speculation, the conclusion is fairly clear — registration call must be synchronous, at least partially.

Even in Fire-and-forget mode, the registration can’t completely “forget”. What if the fire throws an exception at the last phase after the “forget” i.e. after the local call has returned?

Phase 2 — data delivery — PP delivers the data to an RR2 thread. RR2 thread must be at an “interruption point” — Boost::thread terminology. I was told RR2 could be the same RR thread in WCF.

##what bad things can crash JVM

(Why bother? These are arcane details seldom discussed under the spotlight, but practically important in most java/c++ integrations.)

Most JVM exits happen with some uncaught exception or explicit System.exit(). These are soft-landings — you always know what actually killed it.

In contrast, the hard-landing exits result in a hs_err_pid.log file, which gives cryptic clues to the cause of death. For example, this message in the hs_err file is a null pointer in JNI —

siginfo: ExceptionCode=0xc0000005, reading address 0x00000000

Note this hs_err file is produced by a fatal error handler. However, if you pull the power plug, the FEH may not have a chance to run, and you get what I call an “unmanaged exit“. Unmanaged exit is rare. I have yet to see one.

People often ask what bad things could cause a hard landing? P79 [[javaPerformance]] mentions that FEH can fire due to

* fault in application JNI code
* fault in OS native code
* fault in JRE native code
* fault in the VM itself

smartPtr – remember the 3 types involved

I feel it’s easy to confuse the 3 data types involved in even the simplest smart pointer. It’s less confusing when you instantiate the class template and slightly more confusing when you have to program in templates.

Say you use smartPtr, there are really 3 data types involved here
1) T
2) pointer to T
3) smartPtr

There are also at least 3 objects involved
1) a single pointee object, of type T
2) 1 or more 32-bit raw-pointer objects of type T*
3) Multiple instances of smartPtr

At clean-up time, we need to
1) deallocate the pointee object, assuming it’s on heap
2) don’t worry about the raw pointer objects. Someone would take care of it. However, we assume no one calls delete() on one of them!
3) the smartPtr instances are typically low-maintenance. They are often stack-allocated or are (non-ref) fields of bigger objects — non-ref meaning a field of type smartPtr not smartPtr* or smartPtr&. When such a smartPtr instance is cleaned up, we don’t need to worry about anything.

Now we are ready to understand circular reference.
– A smartPtr instance myCar holding a raw pointer to a Car instance, which has a field which is a smartPtr instance;
– this smartPtr instance holds raw pointer to a Driver instance.
– the driver instance has a field that is myCar.

All 6 objects are unreachable and unusable, but look at the 2 reference counts.

a simple (tricky) sabotage on java debugger

If you rely heavily on a java debugger, beware of this insidious sabotage.

You could use finally blocks. You could surround everything with try/catch(Throwable). If all of these get skipped (rendering your debugger useless) and system silently terminates at inconsistent moments, as if by a divine intervention, then perhaps ….

perhaps you have a silent System.exit() in an obscure thread.

Let me make these pointers clear —
– System.exit() will override/ignore any finally block. What you put in finally blocks will not run in the face of System.exit()
– System.exit() will not trigger catch(Throwable) since no exception is thrown.
– System.exit() in any thread kills the entire JVM.

Q: Is JNI crash similar to System.exit()?
%%A: i think so.

Actually, in any context a silent System.exit() can be hard to track down when you look at the log.

stack frame has a pointer to the caller stack frame

Q: what data occupy the real estate of a single stack frame? Remember a stack frame is a chunk of memory of x bytes and each byte has a purpose.

A: (obviously) any “auto” variable declared (therefore allocated) in the function

A: (obviously) any formal parameter. If a 44-byte Account parameter is passed-by-value, then the 44-bytes are allocated in the stack frame. If by-reference, then only a 4-byte pointer allocated.

A: 4-byte pointer to caller's stack frame. Note that frame also contains a pointer to its own caller. Therefore, the stack frames form a linked list. This pointer is known as a “previous stack top”.

A: 4-byte ReturnAddress. When a function f4() returns, control passes back to the caller function f3(). Now at assembly level the caller function may be a stream of 20 instructions. Our f4() may be invoked on Instruction #8 or whatever. This information is saved in f4() stack frame under “ReturnAddress”. Upon return, this information is put into the “instruction pointer” register inside the CPU.

operands to assembly instructions

I feel most operands are registers. See first operand in example below. That means we must load our 32 bits into the EAX register before the operation.

However, an operand can also refer directly to a memory location.

SUB EAX [0x10050D49]

A third type of operand is a constant. You pass that constant from source code to compiler and it is embedded in the “object file”

All(@@) OO methods are "static" under the hood

C++ supports “free functions”, static method and non-static method. 
– Static method are very similar to free functions. (Difference? Perhaps inheritance and access modifiers.)
– Non-static methods are implemented as free functions accepting “this” as first argument.

C++ is largely based on classic-C. Basically everything translates to classic-C code, which uses free functions only.

Once we understand the c++ world, we can look at java and c# — Essentially same as c++.

Looking deeper, any method, function or subroutine consists of a sequence of instruction — a “code object” if you like. This object is essentially singleton because there’s no reason to duplicate this object or pass it by value. This object has an address in memory. You get that address in C by taking the address of a function — so-called function pointers.

In a multi-threaded environment, this code object is kind of singleton and stateless (if you know what I mean!). Think of Math.abs(). The code object itself is never thread-unsafe, but invoking it as a non-static method often is. Indeed, Invoking it as a free function can also be thread-unsafe if the function uses a stateful object. Example — localtime().

Non-static method has to be implemented based on this code object.

Methods are implemented as Fields — function pointer Fields

(It would be great to go back to the earliest OO language, but let’s just start from C++.) The class concept is very similar to the C struct. Now if you add a func ptr as a struct field, then Whoa! A method is born (Today happens to be Christmas…)

Note python treats both fields and methods as “attributes”.

Suppose your struct is named MyStruct and has methods m1(), m2(). You may need to add a pointer-to-MyStruct as a hidden field (named “this”) to MyStruct. You have to pass “this” to m1() as a first argument. Consequently, m1() can reach all the fields of the host MyStruct INSTANCE. Without such an argument, m1() doesn’t know which object it belongs to. Remember all C++ methods are free functions in disguise.

Let me repeat — each method has the same first parameter of type (HostClass*)

That’s the basic idea of a c++ class, but we need some efficiency improvements. If you instantiate 999 instances of MyStruct, then are you going to allocate 999 func-pointers for m1() and 999 func-pointers for m2()?

No. I think you can just add a single pointer-to-MyStruct-class-object as another hidden field to MyStruct. In the singleton MyStruct class object, you keep the addresses of m1() and m2(). Therefore, ARM said the struct instance holds no info about m1().

The vtbl is also part of this singleton object.

Note java puts MyStruct class object in the permanent generation.

Another optimization is where to stored the “this” pointer. Brutal force solution is to add a ptr-to-MyStruct field to MyStruct but  C++ and java compilers all avoid this. I think the compiler treats “this” as a pointer Variable not a pointer Object (like vptr). Implicit conversion —

    myStructInstnace.m1(otherArgs) // becomes
    m1(&myStructInstance, otherArgs); // [aaa]

The [aaa] form isn’t really what compiler does. In MultipleInheritance “this” isn’t the same across Base objects and Derived object.

class loaders — [[ weblogic definitive]]

P382 [[ weblogic definitive ]] is a good intro to java class loading. Generally concise and detailed. Still there are A few unclear points to bear in mind when studying it:

– pass up and pass down — when a class-loading request (for class J) comes to a child classloader C, C checks “its own memory” to see if J is loaded. Failing that, it sends the class-finding job UPSTAIRS to parent class loader P (and further up). If root class loader R can’t find it then R tries to load the class J. Failing that, R returns the class-loading job downstairs to P and to C.
– – > 1st classloader to attempt loading is always root classloader.

– The classpath classloader can ONLY load from the classpath. The extensions classloader is limited by JVM to only load from /jre/lib/ext/. In general most [2] classloaders are restricted to read a specific group of class files. Fundamental to delegation.
– – > corollary: Every parent is limited in its capability. When a child delegates to her parent some job that’s out of parent’s limits, child will do it herself.

– When a class-loading “job” comes in, it comes to a particular classloader. It doesn’t “come to the system”.

– When a class-loading request comes in, it comes with only a full classname. Each classloader [3] must *search* for the class file in jars and directories. Some beginners may assume the request comes labelled with physical address of the class file.

– It’s common to put a class file in 2 places, each visible to a classloader. Usually (if not always) only one classloader reads the class file and loads it into memory. If these 2 loaders are parent and child, then the parent loads it.

[1] by jvm
[2] if not every [3] the immediate parent of the receiving classloader will try first, followed by the grandparent… A few summary points: – tree. 1:m mapping. one-parent-many-children

what is kernel space (vs userland)

(sound-byte: system calls — kernel space; standard library functions — userland, often wrappers over syscalls)

Executive summary — kernel is special source code written by kernel developers, to run in special kernel mode.

Q: But what distinguish kernel source code from application source code?
A: Kernel functions (like syscall functions) are written with special access to hardware devices. Kernel functions are the Gatekeepers to hardware, just like app developers write DAO class as gatekeepers to a DB.

Q: Real examples of syscall source code?
A: I believe glibc source code includes either syscall source code or kernel source code. I guess some kernel source code modules aren’t in glibc. See P364[[GCC]]
A: kernel32.dll ?
A: tcp/ip is implemented in kernel.
A: I feel device drivers are just like kernel source code, though RAM/CPU tend to be considered the kernel of kernel.

My 2-liner definition of kernel — A kernel can be thought of as a bunch of (perhaps hundreds of) API functions known as “syscalls”. They internally call additional (10,000 to 100,000) internal functions. Together these 2 bodies of source code constitutes a kernel. On an Intel platform, kernel and userland source code both compile to Intel instructions. At the individual instruction level, they are indistinguishable, but looking at the source code, you can tell which is kernel code.

There are really 2 distinct views (2 blind men describing an elephant) of a kernel. Let’s focus on run-time actions —
X) a kernel is seen as special runtime services in the form of syscalls, similiar to guest calls to hotel service desk. I think this is the view of a C developer.
Y) behind-the-scene, secret stream of CPU instructions executed on the CPU, but not invoked by any userland app. Example — scheduler [4]

I don’t think a kernel is “a kind of daemon”. Such a description is misleading. Various “regular” daemons provide services. They call kernel functions to access hardware. If a daemon never interacts with user processes, then maybe it would live in “kernel space”. I guess kernel thread scheduler might be among them.

I feel it’s unwise (but not wrong) to think of kernel as a process. Kernel services are used by processes. I guess it’s possible for a process to live exclusively in “kernel space” and never interact with user processes. http://www.thehackademy.net/madchat/sysadm/kern/kern.bsd/the_freebsd_process_scheduler.pdf describes some kernel processes.

P241 [[Pro .net performance]] describes how something like func3 in kernel32.dll is loaded into a c# application’s code area. This dll and this func3 are treated similar to regular non-kernel libraries. In a unix C++ application, glibc is linked in just like any regular library. See also http://www.win.tue.nl/~aeb/linux/lk/lk-3.html and http://www.win.tue.nl/~aeb/linux/lk/lk-3.html

[4] Scheduler is one example of (Y) that’s so extremely prominent that everyone feels kernel is like a daemon.

The term “kernel space” is misleading — it is not a special part of memory. Things in kspace don’t run under a privileged user.

— call stack view —
Consider a c# P/Invoke function calling into kernel32.dll (some kernel func3). If you were to take a snapshot of an average thread stack, top of the stack would be functions written by app developers; middle of the stack are (standard) library functions; bottom of the stack are — if hardware is busy — unfinished kernel syscalls. Our func3 would be in the last 2 layers.

All stack frames below a kernel API is “kernel space”. These stack frames are internal functions within the kernel_code_base. Beneath all the stack frames is possibly hardware. Hardware is the ultimate low-level.

Look at the bottom-most frame, it might be a syscall. It might be called from java, python, or some code written in assembly. At runtime, we don’t care about the flavor of the source code. The object code loaded into the “text” section of the Process is always a stream of assembly code, perhaps in intel or sparx InstructionSet

ANY process under any user can call kernel API to access hardware. When people say kernel has special privileges, it means kernel codebase is written like your DAO.

What’s so special about jvm portability cf python/perl #YJL

You have a very strong technical mind and I find it hard to convince you. Let’s try this story…

At a party, one guy mentions (quietly) “I flew over here in my helicopter …” 5 boys overheard and start talking “I too have a helicopter”. Well the truth is, either they are renting a helicopter, or their uncle used to have a helicopter, or their girlfriend is rich enough to own a helicopter, or they have an old 2nd hand helicopter, they have a working helicopter for a university research project, or a toy helicopter.

It’s extremely hard to build a cross-platform bytecode interpreter that rivals native executable performance. Early JVM was about the same speed as perl. Current JVM easily exceeds perl and can sometimes surpass C.

In contrast, it’s much easier to build a cross-platform source code interpreter. Javascript, python, perl, php, BASIC, even C can claim that. But why do these languages pale against java in terms of portability? One of the key reasons is efficiency.

To convince yourself the value of JVM portability, ultimately you need to see the limitations of dynamic scripting languages. I used them for years. Scripting languages are convenient and quick-turnaround, but why are they still a minor tool for most large systems? Why are they not taking over the software world by storm?

Why is C still relevant? Because it’s low-level. Low-level means (the possibility of) maximum efficiency.  Why is MSOffice written in C/C++ and not VBA? Efficiency is a key reason. Why are most web servers written in C and not perl, not even java? Efficiency is a key reason.

Back to jvm portability. When I compile 2000 classes into a jar, and download 200 other jars from vendors and free packages. I zip them up and I get a complete zip of executables. If I fully tested it in windows then in many cases I don’t need to test them in unix. Compile once, run anywhere. We rely on this fact every day. Look at spring jars, hibernate jars, JDBC driver jars, xml parser jars, jms jars. Each jar in question has a single download for all platforms. I have not seen many perl downloads that’s one-size-fit-all.

I doubt Python, php or other scripting languages offer that either.

(See comments below)

Sent: Sunday, June 26, 2011 8:14 PM
Subject: RE: What’s so special about jvm’s portability compared to python’s or perl’s?

If you treat JVM == the interpreter of php/python/perl/etc., then Java’s so called “binary code portability” is almost the same as those scripting languages’ “source code portability”.
[Bin ] I have to disagree. AMD engineered their instruction set to be identical to Intel’s. Any machine code produced for Intel runs on AMD too — hardware level portability.
That’s one extreme level of portability. Here’s another level — Almost any language, once proven on one platform, can be ported to other platforms, but only at the SCP (source-code-portable) level. Portability at different levels has very different values. High-level portability is cheap but less useful.

Java Bytecode is supposed to be much faster as a lot of type checking, method binding, access checking, address resolution.. were already completed at compile-time. Java bytecode looks like MOV, JMP, LOAD … and gives you some of the efficiency of machine code.

Another proof is: Java binary code (compiled using regular method) can be de-compiled into source code, which indicates that its “binary code” has almost 1-to-1 mapping to “source code”, which means its binary code is equal to source code.
[Bin ] I would probably disagree. The fastest java bytecode is JIT and probably not decompilable I guess. For a sequence of instructions, the more “machine-like”, the faster it runs.

Well, you may want to argue JVM is better than the interpreter of those scripting languages, and I tend to agree. Java must have something that earned the heart of the enterprise application developers. Only that I haven’t found what it is yet 🙂

What’s so special about jvm portability cf python/perl, briefly@@

When I compile a class in windows, the binary class file is directly usable in unix. (Of course I must avoid calling OS commands.) I don’t think python or perl reached this level.

I feel dynamic, scripting languages are easier to make portable because they offer source-code portability (SCP), no binary code portability (BCP). In other words, BCP is tougher than SCP. I believe BCP is more powerful and valuable.  BCP was the holy grail of compiler design and Java conquered it.

Due to the low entry barrier, some level of SCP is present in many scripting languages, but few (if any) other compiled languages offer BCP, because it’s tough. JVM is far ahead of the pack.

Even C is source-code portable, but C is known as a poorly portable language due to lack of binary portability.

RMI skeleton/stub instantiation

In an RMI scenario, there are 2 + 1 process. 2 on server, 1 on client
Process ps1) the application jvm
Process ps2) registry process. On unix, you often start it by hand. Note PS1 and PS2 must be on the same localhost.
Process pc) client JVM

There are at least 3 java objects involved. Both skeleton and stub implement the same business interface as OB.
Object OB) the real business object
Object SK) the skeleton object
Object ST) the stub object

Let's see how these are created and linked.
1) skleton is probably instantiated by exportObject() based on the OB object, inside PS1 JVM. This is a static method.
2) After export, Skeleton's address is then registered with the registry, using rebind() or bind(), both static methods.
) UnicastRemoteObject probably has a static collection to hold OB and SK, to fend off garbage collection
) Stub is created on demand in PC, by deserializing the skeleton object

address of a java object (and virtual/physical memory)


(another blog post) We once discussed how to find the address of a java object. The address has to be hidden from application programs since the garbage collector often need to relocate the object through the generational heap. Therefore any reference variable we use in java will let us read/write the “pointee” object but won't reveal address.

However, the address is visible to the garbage collector and some of the C code integrating with java via JNI or other means. It has to be visible because C uses pointers. A pointer holds a memory address. If a C function uses a pointer, then the C function can print out the address.

By the way, all along we are talking about virtual memory addresses, which could be anything from 0 to 0xFFFFFFFF ie 32-bit integer, even on a 128MB RAM laptop.

The virtual memory module in the kernel translates between virtual memory address and physical RAM address.

Q: Is it every possible for a C program to see the physical RAM address of an object? Here are my tentative answers so please correct me —

A: yes for the C program implementing the virtual memory module itself. This module runs in probably the lowest layer in the kernel. Virtual memory module probably gets loaded first so that a 32MB RAM laptop can load a 50M operation system. Virtual memory continues to be extremely relevant since no machine has enough RAM to fill up a 64 bit address space.

A: no for any other C program running on top of virtual memory module.

low-level differences between HASA^ISA, in terms of function pointers

Some interviewer (MS or CS) asked me about the differences.

1) Subclass instance has the _same_address_ as base object, that’s why you can cast the ptr (or reference) up and down the inheritance hierarchy. See post on AOB ie address of basement.

2) Also, all inherited methods are available in the “collection” of function pointers of the derived object. In other words, derived object advertises all those inherited “features” or “services”, if you think of a family of interchangeable components. Derived object can _stand_in_ for the base TYPE.

In OO languages, a pure interface is basically a collection of function pointers. Has-a doesn’t expose/advertise the internal object’s function pointers, so the wrapper can’t “stand in” for that type.

stepping through class/object loading, Take 2

– – – a story/hypothesis to be verified. See P240,113 [[Practical Java]] and P28,30 [[Java Precisely]] – – –

base static initializer and static initializer BLOCK run, in the order of appearance
child static initializer and static initializer block run, in the order of appearance
(see P30 [[java precisely]])

^^ milestone: classes fully loaded.

child and base instance field half-initialized to defaults — null, 0.0, false,..

^^ milestone: dummy C object allocated, which contains a dummy B object

child constructor C() *entered*
base constructor B() *entered*, as first statement in C(…)
base constructor may call an overridden method m1(), child’s m1(), so child’s m1() runs, with child’s instance fields half initialized! Note in C++, B::m1() runs. See http://www.artima.com/cppsource/nevercall.html

^^ milestone: base constructor B() returns.

child’s instance field initializers run. All fields fully initialized as programmed.
remaining statements in C() run

signed shift and unsigned shifts #multiply

Q1: why is there signed-right-shift and unsigned-right-shift but just a single “left-shift”?
Q2: if i multiply 2 positive int in java, do i always get a positive int?
Q2b: how about in c++? See https://stackoverflow.com/questions/34545445/positive-integers-that-multiply-to-a-negative-value

In java, usually the “thing” to be shifted is an 32-bit int, but can also be a 64-bit long. For both, the left-most leading bit controls the sign — 0 means non-negative, 1 means negative.

Now consider an example. …0110 right shift once –> leaves the left-most leading position empty. Either put a 0 (unsigned shift) or (signed) keep the original leading bit there.

By definition
* Signed shift keeps the sign.
* unsigned shift always returns non-negative

Finally we can give A1 ie why there’s just a single left-shift. Left shift always shifts some bit INTO the left-most leading position. It never becomes empty. The sign of the result depends on that bit.

* new sign may be same as before
* new sign may be different

A2: 0x55555555<< 1 == 0x55555555*2 < 0 // so a positive int * 2 can be negative. See https://stackoverflow.com/questions/16889828/integer-giving-negative-values-in-java-in-multiplication-using-positive-numbers

Another analysis of a vanilla swing event listener

See also http://bigblog.tanbin.com/2011/02/deeper-look-at-swing-event-handlers.html

Case study — http://download.oracle.com/javase/tutorial/uiswing/events/actionlistener.html

Be crystal clear about
* registration time — could happen hours before
* produce-time
* consume-time

How many threads?
– EDT is the consumer thread
– producer thread
– registration thread? less relevant. can be any thread.

Data shared among these threads at these different TIMES?
+ (singleton) jcomponent object
+ (singleton) action listener object
+ event-queue-task objects.

Before registration time, action listener is instantiated. At registration time, address of the listener object is saved in jcomponent object — like a subscriber mailing list (or spam list:)

At produce-time, system instantiates event-queue-task object (different from EventObject), injecting the jcomponent (as event source) + Listener object addresses therein, then enqueues the task object to the EDT event-queue. If 5 listeners registered, then 5 distinct task objects enqueued.

At consume-time (shortly but possibly much later [1]), EDT picks up the task object and finds the listener object, and calls listener.actionPerformed() passing in EventObjectt. As a non-static method, this CALL’s host object is the listener object but it executes on EDT. At consume-time, all instance fields of the source object (typically a jcomponent), instance fields of the listener (and, if a nested class within a jcomponent, private fields of the enclosing jcomponent) are accessible.

Listener OBJECT is often of an inner class –> Complicating. You need to be very clear that registration time and listener creation time are rather less relevant since they could be well before the event. Enclosing class’s fields all become accessible to actionPerformed().

If inner class is local, then local variables of the enclosing METHOD are also accessible, if FINAL. This last FINAL scenario is more tricky. The local vars are created at registration time but (because FINAL) remain accessible at consume-time.

[1] that’s why async always needs buffer — see http://bigblog.tanbin.com/2011/05/asynchronous-always-requires-buffer-and.html

writeObject() invoked although never declared in any supertype

Usually, a common behavior must be “declared” in a supertype. If base type B.java declares method m1(), then anyone having a B pointer can invoke m1() on our object, which could be a B subtype.

However, writeObject(ObjectOutputStream) [and readObject] is different. You can create a direct subtype of Object.java and put a private writeObject() in it. Say you have an object myOb and you serialize it. In the classic Hollywood tradition, Hollywood calls myOb.writeObject(), even though this method is private and never declared in any supertype. Trick is reflection — Hollywood looks up the method named writeObject —

writeObjectMethod = getPrivateMethod(cl, “writeObject”, …

assembly language programming – a few tips

C compiler compiles into binary machine code. I think pascal too, but not java, dotnet, python.

Assembly-language-source-code vs machine-code — 1-1 mapping. Two representations of the exact same program.

Assembly-language-source-code is by definition platform-specific, not portable.

A simple addition in C compiles to about 3 “Instructions” in machine code, but since machine code consists of hex numbers and unreadable, people represent those instructions by readable assembly source code.

Compared to a non-virtual, a virtual function call translates to x additional instructions. X is probably below 10.

There are many “languages” out there.
* C/c++ remain the most essential language. Each line of source code converts to a few machine instructions. Source code is portable without modification whereas the compiled machine code isn’t.
* Assembly is often called a “language” because the source code is human readable. However a unique feature is, each line of Assembly-language-source-code maps to exactly one line of machine code.
* newer languages (c# java etc) produce bytecode, not machine code.

implicitly stateful library function – malloc(), strtok

( See also beginning of [[Pthreads programming]] )
Most library functions are stateless — consider Math.

Most stateful library calls require manager object instantiation. This object is stateful. In java, some notable stateful language-level libraries include
– Calendar
– Class.forName() that automatically registers a JDBC driver

If there’s a syscall like setSystemTime(), it would be stateless because the C library doesn’t hold the state, which is held in the OS.

In C/C++, the most interesting, implicitly stateful library routines is malloc(). Invisible to you, the freelist is maintained not in any application object, nor in the OS, but in the C library itself. Malloc() is like an airline booking agent or IPO underwriter. It get s a large block and then divvies it up to your program. See the diagram on P188 [[C pointers and mem mgmt]]

The freelist keeper is known as “mem mgmt routine” in the “standard C-library”.

Another common stateful stdlib function is strtok(). It’s not a pure function. It remembers the “scanner position” from last call! The thread-safe version is strtok_r()

memory leak detection ideas#malloc etc

http://www.flipcode.com/archives/How_To_Find_Memory_Leaks.shtml is a dated (2000) but readable and detailed treatment. A home-made new/delete overload, using malloc and free rather than qq[[ ::operator new ]] as advised by Scott Meyers.

  • Valgrind – no need to link anything… malloc/free are “replaced” automatically.
  • electric fence — link it into your code to get seg fault upon DAM errors. Won’t catch memory leaks. (My GDB book covers similar tools)
  • cmemleak traces malloc() and free() — the choke points.
  • [[c++nutshell]] says allocators can implement debugging or validity checks to detect programmer errors, such as memory leaks or double frees.
  • GlowCode — Three easy ways to use GlowCode (a windows-only profiler):
    • (1) use it to launch your application,
    • (2) attach GlowCode to a running program, or
    • (3) link GlowCode directly into your application. No source code or
      build change or post-build step required. Similar to Valgrind
  • IBM Purify — When a program is *linked* with Purify, corrected verification code is automatically inserted into the executable by parsing and adding to the object code, including libraries. Similar to Java bytecode instrumentation. That way, if a memory error occurs, the program will print out the exact location of the error, the memory address involved, and other relevant information. Purify also detects memory leaks (but I guess as a secondary feature). Leak report can be generated by calling the Purify leak-detection API from within an instrumented application. Object Code Insertion (OCI) OCI can be performed either during the link phase or after the link. Rational Purify reads object files generated by existing compilers and linkers, and adds error checking instructions without disturbing the ability to use conventional debuggers on the executable.
  • MDB is a commercial dynamic/shared library that provides replacements for malloc/free etc. You must link your software to these *.SO.
    • 🙂 Much lower overhead than Purify and Valgrind

variable can’t live on the heap; only objects can

See also post [[a heap-stack dual variable]]

an object can live on the heap or the stack; but a variable can’t live on the heap. It’s either a stackVar or a field (or occasionally a global). That begs the questions

Q: what if a stack ref/ptr seated to a heap obj gets out of scope?
A: leak. unreachable object — need garbage collector.

Q: what if a field ptr/ref seated to a heap obj gets destructed with its host object?
A: the host dtor simply frees/reclaims the 4-byte memory, without calling delete() on the ptr. Item 7 [[eff c++]] says the “dtor” of a ptr is a no-op.
A: custom virtual dtor needed in the host object.

In java, a variable is either a field or a stackVar. An object is always on the heap

stepping through class/object loading

Based on P28, 30 [[ Java Precisely ]], and P110 [[practical java]]. There are dozens of important details [1]. Here we cover a few interesting observations. Assuming class C extends B, extending A.

Step: static initializer blocks and field initializers run, in order of apperance. Once static fields are initialized, they are available for use by all including static methods.

Step: static methods loaded and available to be called from the call-stack

— By this /milestone/, the class is “loaded” with all static stuff ready —

Note: Before any C-specific initializations, B() always *completes its steps* and returns a complete B object, to be wrapped in the onion.

Let’s skip ahead and look at…
Step K1: A’s instance field initializers and instance initializer blocks run, in order of apperance. These always appear outside the constructor.

Step K2: A() statements.

Note: By this time, no B state-initialization[2]. However, A() statements could call a B method — see [[baseclass dependent on subclass]]

Repeat K12 for B, and then C

[1] see Example 60 [[ Java Precisely ]].
[2] i think this is obvious. loading B’s method definition doesn’t count.
[3] Obviously, Object() and A() must complete beforehand.

if constructor throws ..

myInstance = new MyClass() ;

Will myInstance become null or …?

I feel the assignment should leave myInstance unchanged. The constructor (which strictly are not “methods”) , like methods, won’t return anything to the caller. The constructor, the caller, and upstream callers may each be aborted.

See blog on exceptions in call-stack


For c++, a throwing ctor is common. If on the heap, the compiler will release the memory.

java always pass-by-value

A ref-type argument is passed by value — a copy of the remote-control, pointing to the same object

Once you point “critique_arg” to a new object, this method loses contact with the original critique object ] the caller method.

private boolean recordFault(Critique critique_arg, String brokenSlot){
String message = “Required slot ” + brokenSlot.toUpperCase() + ” missing.”;
critique_arg = new Critique(Critique.Severity.Critical, 0, slot, Critique.Type.ORDER);

By the way, the original object will get garbage collected if the variable in the caller method also gets assigned a new object.

strongly^weakly typed

Most complex software favor strong typing. I feel it’s not all due to ignorance, inertia, corporate politics or the marketing machine. Some brave and sharp technical minds ….

I think large teams need clean and well defined module-to-module interfaces. (module ~= class) A variable (mostly a pointer to a chunk of memory) should have well defined operations for it.

The precision comes with a cost — development time, inflexibility … but large teams usually need more coordination and control. At the heart of it is “identification”.

In the military, hospitals, government, and also large companies, identification is part of everyday life. It provides a foundation to security and coordination.

At the heart of OO modelling — translating real world security policies into system built-in rules. Strong typing = precise type identification.

portableremote.narrow() unable to cast between objects loaded by 2 class loaders


mscope.jar should not include com/titan/**/*.class, so
AdaptiveClassLoader [1] won’t load them.

Which classloader will load these classes? By default, classes mentioned on
the classpath are loaded by the default classpath classloader.

[1] This is a custom class loader to load from mscope.jar. It’s a descendant of the classpath classloader.