real world DB deadlock reduction — insert hotspot

One of my multi-threaded java apps (using ExecutorSerivce) started getting database deadlocks when I increased thread count from 1 to 5.

* In Dev/Test environment, i got up to 4 deadlocks out of 20,000-40,000 inserts.
* In Production, I got 3000+ deadlocks out of 20,000 inserts.

Solution: I changed the clustered index from the table’s identity col to an AccountNumber column.

Result? down to below 5 deadlocks.

Analysis: Probably the clustered index on identity col created a hotspot for inserts. Since new rows all get similar identity values, all 5 threads need to write to the same data pages (sybase defaults to page lock). Each thread (with its own db connection and spid) in the deadlock probably writes to 2 pages within on transaction. If one thread already wrote on page 1 and needs to write on page 2 before commit(), but the other thread already wrote to page 2 and needs page 1, then this scenario meets

* wait-for circle
* incremental lock acquisition
* mutex
* non-preemption

financial jargon: buy-side, sell-side

http://en.wikipedia.org/wiki/Buy_side:

The split between the Buy and Sell sides should be viewed from the perspective of securities exchange services. The investing community must use those services to trade securities. The “Buy Side” are the buyers of those services; the “Sell Side” are the sellers of those services.

Sell side brokerages are registered members of a stock exchange, and required to be market makers in a given security. Buy side firms usually take speculative positions or make relative value trades. Buy side firms participate in a smaller number of overall transactions, and aim to profit from market movements and accruals rather than through risk management and the bid-offer spread.

In Short the entity paying the commission on trade would be a buy side and the one receiving it is a sell side.

threadpool && GUI examples, showcasing anon class

Get used to this kind of coding style. In this case I don’t see a good reason for the anon class. It’s more readable to pull it out into a named static nested (or top-level) class.

However, in real life the anon class’s methods often need the variables in the enclosing class. Anon is quick and dirty.


public static void main1(String args[]) {
new ScheduledThreadPoolExecutor(1).scheduleAtFixedRate(new Runnable() {
public void run() {
System.out.println(new Date());
}
}, 0, 1000, TimeUnit.MILLISECONDS);
}

// Now see the same pattern in this GUI code:

ActionListener taskPerformer = new ActionListener() {
public void actionPerformed(ActionEvent evt) {
//...Perform a task...
}
};
new Timer(delay, taskPerformer).start();

sorted slist #TreeMap

In 2009 I was asked about a sorted list’s performance of
* add()
* find() by key
* return all elements in sorted order

These operations are equally frequent.

I could only think of 2 types of sorted list(???)

  • LL) sorted linked list
  • AL) sorted array list — based on an expandable array
  • How about 3) linkedHashMap where the value is either a count or a linked list of the identical objects.
  • Now I think a TreeMap keeping count of duplicates is best. It looks like a list !

FIND will be slow for LL — no random access, so this is not suitable.

AL’s add() requires O(log(n)) to locate the position to insert, but also O(n) to shift lots of existing occupants. Average shift complexity is n/2 i believe. By the way remove() would also require shifting.

const ^ input iterator – unrelated

(It’s probably enough to know when to use each… Tbudget?)

A) input iterator is a fwd iterator for input streams — P802 [[Absolute C++]] (This book has concise coverage of const and input iterators)

B) const iterator is modeled on ptr-to-const, so you can’t make *myIterator a LHS

These 2 are unrelated.

Are these real (template) classes? dummy types or typedef? I feel in general you can’t say for sure, so I assume the lowest common denominator — dummy types. Note const_iterator is a nested Type in container class declarations, but still it can be a real class, dummy type or typedef.

every python thingy is an object with a type

I like this particular insight in [[py ref]] (i.e. “python essential reference”) though I can’t easily verify it in code — Every thingy in a python program is an object with a type. (There might possibly be some exceptions but this is a rather good starting point.)

* An object is defined, same way in C++, as a storage_location, with an (obviously) immutable address.
** content could be mutable.
** Even an int is an object, unlike java

* Every storage_location (Say ObjectA) has a specific data_type. Required by python interpreter, java compiler or c++ compiler… That specific data_type is technically a Type OBJECT.
* [[peref]] made it clear that Type tells us the features supported by this storage_location. Features include methods and fields…

The thing about the python language is that Everything seems to be an object with a type. Here are some special objects and their types
$ a free function. This OBJECT has a __name__ attribute. Try qq/dir(len)/
$ a bound instance method. Such an OBJECT has an im_self attribute.
$ a bound class method. Such an OBJECT has a __name__ attribute
$ (There’s no such thing as a bound static method — nothing to bind.)
$ a regular instance method. Such an OBJECT has a __self__ attribute pointing to the host OBJECT
$ a regular class method. Such an OBJECT has a __name__ attribute
$ a regular static method. Such an OBJECT has a __name__ attribute

Warning — these special objects each have a distinct data_type but not by type()! It’s worthwhile to step back and recognize these special OBJECTS as separate data_types with their own features. The “feature set” difference between any 2 data_types is reflected in various ways (such as the attributes…)

Footnotes
* the built-in type() function is an almost-perfect way to see the data_type of a storage_location, but I don’t think it reveals all the details I need in a foolproof way.
* The “type” OBJECT is comparable to type_info objects and java “class” objects.
* What’s the type of the “type” OBJECTS? We don’t need to know.

1st learning note on option sensitivities – 3 Greeks

— delta — relation to current asset price, assuming out-of-the-money[5] —
Assuming a call with strike above current asset price, out of the money, call premium rises along with current asset price [3]. It feels like becoming more volatile/dangerous. Dangerous to the insurer.

[5] insurers make sure most of the time they don’t have to pay the claim

Assuming a put with strike below current asset price, out of the money, put (insurance) premium drops when current asset price moves higher and AWAY from strike price.

For out-of-the-money options, both call and put, when asset price moves closer to strike price, insurance premium escalates.

– A put writer (insurer) guarantees to buy our asset at a sky-high strike price. The insurer wishes asset MV to appreciate, so a depreciating MV means higher risk, higher premium.
– A call insurer wishes asset (eg oil) to depreciate, so an appreciating asset means higher risk, higher premium.

[3] Same thing if the call is in deep the money. Rising asset price further protects the buyer because the call is almost guaranteed to be in the money at expiration

— vvvvv vega — relation to vvvvvvol —
Vol increases option payoff; Vol makes options more PROFITABLE, as it’s more likely to hit the strike price. Stable securities have lower “insurance-premium”, as strike is disaster for the insurer.

Q: But what if the option we hold is already in-the-money? I read that higher vol always increases our valuation, but higher vol would increase the chance of going out-of-the-money?
A: I feel in this case high vol’s positive impact outweighs that particular negative impact. Positive impact is the increased chance of sky-high payoff.

— relation to strike price —
call valuation (ie insurance premium) drops with higher strike price, because it requires more volatility to hit it.

call valuation rises with lower strike price, because the strike is dangerously close to current price. As if more volatile.
put valuation drops with lower strike price. If you exercise, you sell the asset for less money — The simplest explanation.
put valuation rises with higher strike price. If you exercise, you sell the asset for more money — The simplest explanation.

— ttttt theta — relation to tttttime-to-expiration — decay of option valuation
For both calls and puts, longer time frame increases insurance premium, as asset has more “opportunities” to hit strike price.

If we hold a call or a put option, each day’s passage decreases our market value, AS IF vol drops.

servlet filters — possible uses

usage: add custom headers. both request and response. custom headers are flexible and powerful communication devices. X-ZED headers.

usage: standard logging on selected requests. You can add this filter to any number of servlets.

usage: time logging. end-to-end roundtrip
usage: request translation. response too.
usage: response compression
usage: consistent cookie processing and logging
usage: authentication and single-sign-on
usage: consistent look and feel?
usage: spell checker?

Callable tasks, Future results

A few learning notes based on P196 [[Java Threads]]. See also
http://bigblog.tanbin.com/2010/07/futurejava-designed-for.html
http://bigblog.tanbin.com/2009/05/callable-tasks-simple-thread-pool-set.html
http://bigblog.tanbin.com/2011/01/exception-passing-between-threads.html

Callable is an enhanced Runnable. Whenever possible, I’d design with Callable rather than Runnable. However, Callable doesn’t extend Runnable. For example, there’s no “new Thread(Callable)”.

Q: Does it make sense to implement both Runnable and Callable interfaces, so my class can be used in both contexts?
%%A: conceivable.

Q: Can this technique be used to convert existing Runnable designs to use Callable, assuming run() calls call()?
%%A: you might find it troublesome to get the task output (either result or exception) when you use the task object as a Runnable.

A Future object represents the result of a task. I feel it’s a window into that real world task in JVM. The (either real or symbolic) task is *stateful*.

I think of the “future” construct as a concept first. It represents a yet-to-complete async task. It could be ready when you read it. As such, I think it’s implemented using condVars.

FutureTask (like SwingWorker) is one of the implementations of Future interface.

Just as a Thread is a (rather imperfect) handle on a JVM thread, a Callable object with the corresponding FutureTask object are handles on the real world task.

lock scope vs db transaction scope

In my systems, i often call beingTran() in one method and commit() or rollback() in other methods. Now, a db transaction often locks a page[1], which is a shared resource. My design can lead to blocking or even deadlock.

java 5 lock scope is similar in one sense. You can call lock() in one method and unlock() in another method, so the lock scope spans methods. However, beware java deadlock is harder to detect than database deadlocks, and impossible to recover from. I feel you should consider tryLock() and lockInterruptibly().

[1] or rows, but let’s keep the discussions simple

your job at age 55 if u r now an IT techie in your 30’s

To discuss percentages, first allow me to focus on a tiny group of male techies in Singapore — the only group i know.

Note: Manager roles can cover team size of 2 to 20.

— top n
40% Job: some job outside IT. These folks get out of IT before 55.
10% Job: company owner, CEO, country manager — #1 guy in an office.
5% Job: manager in “infrastructure support” (defined below [1]). These roles are similar and sometimes indistinguishable. Large systems need a support “team”.
5% Job: manager in app development AND app support. A standard combination.
2% Job: manager in app development, architect, PrjMgr, but without support responsibility. These roles are often combined. Compared to support jobs, such a dev role is rather tiring at age 55.
2% Job: pre-sales + professional service + trainer. Manager or foot soldier. These roles are often combined.

Other jobs:

Job: sales, marketing (manager) in a tech firm
Job: government officer, excluding infrastructure support
Job: full time teaching (IT) in private/public instutitions + some R&D.
Job: Business Analyst
Job: senior DBA, system admin or network admin, but not a manager. at age 55?.
Job: manager in product support. Only large product vendors need a support organization.
Job: full time R&D in public/private (large) labs + some teaching. R&D is always an “elite” activity.
Job: full time trainer
Job: writer + trainer
Job: recruiter. Some IT professionals are suitable for this role.

— Job descriptions above usually combine these job functions below, most of them project functions. Multiple project functions are frequently carried out by the same individual, since someone strong in one function can take up another function.
* function: pre-sales consulting
* function: business anlysis
* function: development + design + architect
* function: project management, implemetation rollout management
* function: professional service consulting
* function: engagement manager, account manager, onsite or offsite
* functino: testing

Functions below are not project roles, though the individuals in these functions also take part in projects.
* [1] function: infrastructure support serving internal users rather than external customers
** app support, after application development and rollout
** operations support
** web master,
** windows application support, such as email or Excel support
** DBA,
** network admin
** Unix/Windows system admin,
** email server support — Exchange, Lotus Notes…
** storage support
** ERM system admin
** CRM system admin
** mobile and remote access support — Citrix, Blackberry
** IT security admin

* function: product support serving external customers, often loosely known as tech support
* function: customer service, often loosely known as tech support
* function: sales support (not pre-sales consulting)
* function: marketing support
* function: training
* function: teaching
* function: research
* function: writer, editor, reporter, reviewer
* function: IT auditing

Most of the age 55 jobs are likely to *combine* these job functions.

y synchronized block !! method

Q: problems with synchronized methods? (It’s definitely easier than synchronized blocks, so most of us prefer it.)

A1: keep it simple.

As a rule, avoid holding more than 1 lock simultaneously. The longer you hold the lock, the tougher this rule becomes.

Somewhere in your method you might call an innocent-looking method and all hell breaks loose.

During the critical period your thread1 holds a lock (any lock), you want thread1 to avoid getting involved with any other lock. Analogy: when you are married, avoid getting involved with another person of the opposite sex.

A2: Deadllock is one of the risks and it’s extremely tough.

A3: performance. Over-serialization defeats concurrency.

A4: Suppose you use synchronized (lock1). The longer you hold the lock, the more likely you are to do stupid things like lock1=getAnotherObject(). See [[Practical Java]]

wrapper object of "me" as a field in me

Interesting thread pattern.

Usually the wrapper object is a different class containing a field holding the wrapped object, but In this example, …. see comment below. What’s the motivation? In a thread pool, I think this is how a task (implementing Runnable.java) from the queue is assigned a thread from the pool.

In memory, the thr object and THIS object hold pointers to each other. “wrapper” is misleading term. wrapper is a one-way HAS-A relationship.

In an object graph (at GC time), you often find 2 objects connected both ways, but in this case the 2-way linking is extremely barebones. Tight-coupling?

class MyThr implements Runnable{
Thread thr; // ----> a field but also a wrapper of this object
MyThr(){
this.thr = new Thread(this);
this.thr.start();
}
...
}

hedging muni long positions

Short muni positions might be similarly hedged, but I feel long muni positions are more common across muni trading desks, so that’s our focus here.

The simplest hedge is a Treasury, followed by Treasury futures (a basket) as the 2nd simplest. If you long a muni bond, you short a treasury, matching dv01.

Q: What if there’s not an exact matching maturity date?
A:  I guess the futures bucketing (Stirt) calc could help traders decide how many futures contracts needed

Q: Why do I need to hedge? What’s the risk in having the long, assuming 0 credit risk?

A: interest rate delta risk. Interest rate can rise, hurting my long market value. By shorting a treasury, i preserve my net market value of long and short

Treasury (futures) is popular because it’s the most liquid hedging instrument.

Q: Why T-futures if there’s treasury? I was told T-futures are similar to treasury in this case.
A: http://finance.zacks.com/guide-hedging-treasury-bond-futures-5786.html I guess cash outlay is smaller with T-fut

How about swaptions? Since a swaption is an option, there’s an insurance premium cost, so some traders avoid it.

Thread object as a lock object: myThr.wait()sometimes useful

Someone (HSBC?) asked me what happens if I call myThread.wait().

MyThread thrd = new MyThread();
thrd.start();
synchronized(thrd){
 thrd.wait(); // similar to thrd.join()
}

Using a Thread.java object as a monitor object is legal, but not advisable. Unlike c++, java has no undefined behavior, so the behavior is probably “defined” in the java language spec. I guess the behavior is that main thread waits in the “waiting room” of the thrd object, but who will notify main thread?

My experiment shows that main thread actually gets notified when thrd ends. Before we uncover the mysterious notifier, let’s remember that it’s unusual, confusing and never necessary to use a Thread.java object as a lock. There’s always a simpler alternative.

If you really use a Thread.java object as a lock/conditionVar, there are hidden issues. Here’s one issue I know –

As a condition variable, a Thread.java object has the undocumented behavior of calling this.notifyAll() when it dies. Someone speculated — “In fact this is how the join() method is implemented, if I remember correctly: it waits on the thread object until it dies.” Note join() is an instance method of an Thread.java object. When main thread calls thrd2.join(), it treats thrd2 as a conditionVar and waits on it until another thread calls thrd2.notifyAll(). Notifying thread is actually the thrd2 itself.

secDB – 2+2 adjectives

(4 Blind men describing an elephant…)

1) spreadsheet-inspired
1b) supporting stress-test — what-if scenario analysis
1c) chain-reaction (more accurate than “ripple effect”)

2) dependency-graph optimized

3) schemaless. Unlike an RDBMS. I guess SedDB objects resemble python/javascript/perl objects — dictionary-based. Note a large number of nosql databases (including python ZODB) are schemaless.

4) in-memory. Like time-series DB but unlike RDBMS, only when loaded into memory does a graph-DB truly come to life. The spreadsheet magic can only work in-memory.

implicitly stateful library function – malloc(), strtok

( See also beginning of [[Pthreads programming]] )
Most library functions are stateless — consider Math.

Most stateful library calls require manager object instantiation. This object is stateful. In java, some notable stateful language-level libraries include
– Calendar
– Class.forName() that automatically registers a JDBC driver

If there’s a syscall like setSystemTime(), it would be stateless because the C library doesn’t hold the state, which is held in the OS.

In C/C++, the most interesting, implicitly stateful library routines is malloc(). Invisible to you, the freelist is maintained not in any application object, nor in the OS, but in the C library itself. Malloc() is like an airline booking agent or IPO underwriter. It get s a large block and then divvies it up to your program. See the diagram on P188 [[C pointers and mem mgmt]]

The freelist keeper is known as “mem mgmt routine” in the “standard C-library”.

Another common stateful stdlib function is strtok(). It’s not a pure function. It remembers the “scanner position” from last call! The thread-safe version is strtok_r()

java reentrant lock – key phrases

* seedless banana — all bananas are seedless. I believe most (All?) locks are reentrant. Lock interface has only 3 implementers.

* backward compatible — reentrant locks behave as [1] the synchronized keyword. So the reentrance feature is nothing new, just like “seedless” is nothing new for banana.

[1] Lock.java does offer a few more features such as unlock, lockInterruptibly and tryLock

* unacceptable design — i think many people talk about non-reentrant locks — if myThr already obtained lock1, and need to acquire it again, then the thread blocks!? but i think there’s no such non-reentrant lock in java. It’s an unacceptable design.

JGC: low-pause ⇒high-overhead @@

Inevitably (IMO), a low-pause GC engine tends to increase your overall GC “overhead” — defined as a percentage of cpu cycles.

The efficient GC algorithms (usually?) need to stop the world for the entire cycle, and use compaction/copying to carve out a “frontier” of continuous free area to support fast allocation.

In theory, new algorithms might emerge with good pause characteristic but without overhead penalty.  I don’t know any. Sounds like a perpetual machine

telltail fields of JTable.java

Most people do not look at (they don’t need to) these protected fields, but these provide valuable insights into the internal contents of jtable.

int editingColumn — currently edited
int editingRow — currently edited
Component editorComp — If editing, the Component that is handling the editing.

Hashtable defaultEditorsByColumnClass
Hashtable defaultRenderersByColumnClass

–Now The big 3
TableModel dataModel
TableColumnModel columnModel
JTableHeader tableHeader

void ptr – thread library and other usages

There’s an ivory tower view that “malloc, char arrays (string) and other arrays are largely unused in c++, since we have delete/new, std::string, vector”. Now i see one more — void pointers. despite the ivory tower, void pointers are widely used and indispensable

* global op-new returns a pointer of type … can’t be any specific type; must be void
** even per-class op-new (implicitly static) returns void ptr. See P282 ARM
* allocators have an op-new. ditto
* other usages — See boost::any.

A lot of, if not most, thread libraries are written in C to be usable to C and C++. A single void pointer is a flexible and versatile vehicle to “transport” several arguments into a functor, which is then handed to a thread. Here’s an example.

thread (void (FVVP*)(void *), void* vptr) // ——–creates and starts a new thread. FVVP means a function ptr returning Void, accepting a Void Pointer

Above is a thread api with 2 arguments — a functor and a void ptr. The functor points to a function accepting a void ptr, so the 2nd argument feeds into the functor.

Now Suppose you realize the actual function you intend for the thread needs access to multiple objects, but you have only the 2nd argument to transport them. Thanks to void pointer, you can create a struct containing those multiple objects, and pass a single pointer as 2nd arg to the thread API. This usage of void pointer resembles
* the Object pointer in java.
* the raw HashMap argument Piroz suggested.
* the std::vector holding heterogeneous objects.

ptr_fun – use a 2-arg template function in std::for_each

//Real code in my project

typedef hash_map HM; // reduces complexity of bind2nd(ptr_fun(update_map)…

int const ar[10] = { 1, 2, 3, 4, 55, 55, 4 };

template
void update_map(int aInt, M map) {
  map[aInt]++; //compiler can’t check if operator[] is defined for the type M until template instantiation
}

template
void build_map(M const & arg_map) {

  //for_each needs single-arg function. To convert our 2-arg function, we use bind2nd + ptr_fun
  for_each(ar, ar + 10, bind2nd(ptr_fun(update_map), map));
}

heap fragmentation and freelist

Heap memory allocators (java GC is actually among them) make a great effort to avoid (heap) fragmentation. Having many holes in the free store is wasteful and hurts allocation speed, to say the least

The fastest allocation-time algorithm is to simply carve out a piece of real estate from THE “frontier” and ignore any usable holes. Done naively, this could lead to huge waste of usable “hole” real estate – imagine a 100MB array gets freed and becomes a hole but ignored during all subsequent allocations.

The frontier is simply marked by a forward moving pointer (“bump-the-pointer”). In JVM, the popular mark-and-sweep (also mark-compact?) and young-generation coping GC carefully creates such a frontier.

In C, there’s no garbage collector so holes are perhaps inevitable. I think the standard solution is the free-list. This hurts allocation-time. To satisfy a 2KB allocation “request” we must search the free-list for a hole, before trying the frontier.

Both the free-list and the “frontier” solutions entail concurrency control.

calendar spread – basics

Key point — sell time-value before it drops (to zero)

Take a PUT-calendar-spread for example.
– Sell (write) the near-date put
– buy the far-date put

An extreme case is easier to comprehend. The near-date instrument loses value (much) faster; while the far-date instrument barely loses value.

In a calendar spread portfolio, vega roll-up is meaningless. See http://bigblog.tanbin.com/2011/11/vega-roll-up-makes-no-sense.html

Key point — treat the near-date instrument as a –perishable-fruit– — Sell it! But to reduce delta exposure, buy the far-date instrument.