spring grandiose jargons

service — is basically a CLASS (B) used by your class (A). In Spring, this service becomes an interafce.

service — is often an instance field in your class.

dependency — is basically an instance field. If class A has a B field and A uses some B functionality/service then A functionally depends on B.

^^^^ now we have seen the worst jargon overdose. A regular instance field is now designated a “service” and a “dependency”. And the 2 new terms have different usage contexts. Sometimes you call that field a “dependency”; sometimes you call that same field (or its class) a “service”; but you never call it a “field”

wiring — a serious app need to wire up hundreds of classes — HAS-A. Spring uses an xml config file, lots of interfaces, and an injector for this wiring.

trinity — for dependency injection, you need minimum 3 classes, the service, the client (your app) and the injector. Spring wants a third party to inject the dependency into the client.

In the extreme implementation,
* just about every field is declared an interface type
* every such field has a setter
* you seldom see new ..(). Bean Instantiation happens at appCx start-up time, by “Hollywood”

left join == left outer join

“The ANSI outer-join syntax begins an outer join with the LEFT JOIN, LEFT OUTER JOIN, RIGHT JOIN, or RIGHT OUTER JOIN keywords. The OUTER keyword is optional.”  ==> LEFT implies LEFT-OUTER

“ANSI join syntax also allows the dominant or subordinate part of an outer join to be the result set of another join, when you begin the join with a left parenthesis.”
right join is lg2


Hi Raymand,

You mentioned documentation. Documentation is not only important for users, but also important for you to get a complete understanding of the entire project. After writing the documentation, hopefully you will be able to better describe the system to a new person such as your next client or a new interviewer.

I think in US and Singapore, many interviewers respect job candidates who understand past projects in-depth.

How stable is your job? Are you learning some new technologies such as java, dotnet, SQL? I recently improved my understanding of SQL joins, correlated queries, java OO, java exceptions, java threads, jsp/javabeans, servlet sessions… Now I feel like a stronger survivor.

a couple of realistic sorting challenges

Update: [[algo in a nutshell]] written by professors, has practical problem-solving suggestions.
Q: what if the storage system holding the long list doesn’t support random access, such as tape?
%%A: read into memory first, by chunk, then merge sort?

Q: what if there are too many items to hold in a computer’s RAM available within our budget? merge sort, but the bucket idea below is better.

Q: what if data is still arriving when we need to start sorting?
%%A: standard merge sort
%%A: if you know data is string and distribution is fairly uniform, you may want to create 26 (or 26+10) trees for A – Z, 0-9 and put incoming data therein. If data rate is too high, you can build 26 queues/circular arrays as the buffer in a producer/consumer pattern.
%%A: even with numeric input, you can do the same provided you know the distribution. In real system, use histogram to estimate the distribution.

I feel this is one case that needs real world experience. Designing a sorter without any idea of distribution is too academic and a waste of time.

what i can do to reduce the pain of adjusting to U.S.

Hi LT,

You asked how to reduce the pain and stress.

  1. keep healthy. Lving in such a difficult place, every little effort makes me feel good about myself.
    1. Remember that many advertised fast food is less healthy than we eat in S'pore
    2. avoid too much unhealthy snacks esp. during long office hours. tough for some people. I try to bring fruits and “healthy” biscuits
    3. regular exercise. tough for most people.
    4. take breaks during long work hours
  2. help my wife who is under more pain than me
    1. find some full-time activity to occupy her time
    2. spend quality time together
    3. try to get her to make new friends
    4. try to get her to become interested in learning English
    5. try to get her to find some new hobbies
    6. live around Chinatowns, where she can find Chinese food
  3. work long hours and feel good about it. This helps my adjustment.
  4. make new friends. I didn't find any time for this, since i was so busy with job and a million other “important” things.
  5. call home regularly
  6. increase income
    1. a mobile phone that can receive Chinese SMS from our families in China. We can't afford it now.
    2. better accommodation
    3. less time wasted on commuting. Extremely important to me.
    4. my wife will afford many foods that are simply priced beyond us
    5. more sight-seeing. important for my wife. There are so many places to see in this vast country.
    6. perhaps a more suitable training course for my wife. Right now many of them are just too expensive
    7. some new clothes for her

hash search again, briefly

Q: how can hash search be O(1) when the separate chaining gets longer and longer as input size grows?
A: hash table length grows in proportional to input size. I guess expected separate-chaining length doesn’t get longer but is independent of input size.

Most hash functions assume positive integer keys. If u have some other inputs, convert them to natural numbers.

Java probably uses MAD (multiply-add-divide) compression after hashing.

lopsided subsumption in java overriding and polymorphism

Say class C.java extends class B.java and wants to override the inherited method m(Number).
* Overrider’s return type must be a subtype of parent’s. Any “client” of B must not be surprised to get C’s return values.
How about args? You might think child’s parameter could be a super-type of Number, such as Object, but no no no. Overrider’s parameters must be exactly identical to parent’s parameters.
==> Bottom line: subsumption applies to return type, not parameters.
Remember Kevin’s quiz question in Feb 2010? To get polymorphism , meet all its strict criteria. Always use @Override to verify.

binary search tree

defining property: any node to my left is smaller or equal. left child = parent. Both children is allowed to be equal to parent — binary search tree is completely unbiased.

The defining property defined visually: For each node, draw a vertical line (long shadow) through the tree.
* Left descendants’ v-lines are on or inside my v-line.
* Right descendants’ v-lines are on or outside my v-line.

When nodes move up or down the three, their shadows shift. I think this may help us visualize a BST successor, deletion and insertion.

the same group of items can build 2 different bs-trees, with different tree heights. How? The same entries could arrive out of sequence, just like IP packets.

The deeper the tree, the more iterations for a tree walk, the slower the sort.

Worst case bst has a tree height equal to input size, with one branch at each node?

a less ambiguous explanation of big O

–If you can only remember one phrase, remember “bound-from-above”
If we say “Algorithm R is O(lg n)”, then number-of-iterations are bound-from-above by lg n multiplied by a constant.

–If you can remember a 2nd phrase, remember “worst case upper bound”
Worst case upper bound is often different from average-case upper bound of a program’s running time. For quicksort, they are O (n-squared) vs. O(n lg n). If we talk of O() in the absence of *-case, it means worst-case-upper-bound. A really long translation of “Algorithm R is O(n-squared)” would be

“iterations are bound from above by n-squared even for the worst-case input array of size n.”

A shorter version: “At most n-squared iterations for the worst input”

Average-case-upper-bound calculation often (if not always) involves probability. Amortized analysis is probability-free.

trie: IS-A n-way tree


  • descendants (not only “children”) from a particular ancestor (i did not say “parent”) share a prefix. In fact, the prefix is often labelled on the ancestor node’s uplink (umbilical cord).
  • height of a trie is the length of the longest key (length=k) in the trie. Example: The key could be 55-character long
  • worst case search? O(k) ?
  • sort by what kind of tree walk (remember all keys are strings)? preorder
  • one node for every common prefix (http://www.nist.gov/dads/HTML/trie.html). one-to-one mapping



storing strings


associative arrays


space-efficient, since most (if not all) prefixes are shared and labelled on the uplinks

bitwise operators, my summary

1 XOR an unknown bit among neighboring bits => TOGGLE the bit

1 AND an unknown bit among neighboring bits => SELECT the bit to examine.

0 AND an unknown bit among neighboring bits => RESET the bit to 0 and ignore it among the neighboring bits

1 OR an unknown bit among neighboring bits => SET the bit to 1

To test equality of 2 “trains” of bits, xor them. 0 means equal.


hibernate notes, briefly

Hibernate was designed for multiple usage scenarios including an application server cluster.

Hibernate are usable in Java Swing applications, or J2EE applications using Enterprise Java Bean (EJB) session beans.

Hibernate XML mapping documents can also generate database table and constraint creation scripts. Remember the hbm2ddl.auto (spelling?) element in hibernate.cfg.xml?


  • designed for string (ie alphanumeric) keys. Simplest example could be decimal integers
  • sort order is the English dictionary sort order.
  • nearly optimal performance with low implementation effort
  • In insert time (or tree-build time), each digit (or character) of the key is used at the next branching node to decide to move left or right.
  • first digit is used to compare with the root node’s first digit? 2nd digit is used to compare with the 2nd digit of the 2nd generation branching node?
  • Unlike binary search tree, nodes on my left are not always smaller, but why???
  • comparable insert/search performance to read-black tree, but with an implementation as easy as binary search tree.
  • A trie forms the fundamental data structure of Burst-sort, which (in 2007) WAS the fastest string sorting algorithm. A faster (some radix sort) — fastest sort for:one str; arrayOfStr; 32bit ints: non-comparison 
  • Digit can also be a Bit, but still different from a binary search tree. To understand, you may need a reasonable understanding of binary search tree first.
  • BST requires key-comparison at each branching node; DST uses quicker digit-comparison?
  • DST is not a common term. Could be related to the more common radix-tree and “trie”

unmatched primary-key is normal in 2nf

In a normalized (2nf) design, given

  Warehouse {id (prikey) , address, manager, established ….}

many tables (Drivers, Customers, Suppliers…) will contain warehouse_id as a forkey. This forkey can often become a Subset of the prike

Example: When a brand new warehouse is empty and not used (=> unmatched by any forkey) , we still want to keep its address in our database.

In performance-tuning, To denormalize “down” from 2NF, you might merge the warehouse table into

  Stock {item, warehouseName, address …} where the composite key is item-warehouseName

I think this reduces data page access to nothing but the Stock data page, without hitting any Warehouse data pages.

Problem: When you delete the last item from a very very old warehouse, you lose its address.

null forkey ^ unmatched prikey

–simple illustration:
emplyee {employee_id, department_id, age, salary…}
department {department_id, size, average age…}

null-forkey: employee without department_id
unmatched prikey: department without employee

— null forkey — perhaps not a case for outer join

“null in forkey column”. referential integrity violated.


Q 2a: One of the MSP (member of parliment) has no party (null forkey, violating referential integrity). How do you include him in a member-and-party listing?

A: select msp.name, party.name from msp, party where msp.party=party.code
select msp.name, null from msp where
msp.party is null

Neither makes sense:
wrong: msp.party=party.code(+) — no effect. How can party.code (prikey) be a SUBSET of msp.party (forkey)? referential integrity violated

wrong: msp.party(+)=party.code — showing empty parties

— unmatched prikey — #1 common scenario for outer join

“forkey = subset@prikey”

Heart of every outer join is a subset relationship. If the 2 join-columns are prikey-forkey, then the smaller set has to be the forkey — referential integrity. An outer join will show ALL the rows from the prikey side, whether or not there’s a matching row.

where employee.employee_id = bonus.employee_id(+) — “+” on the forkey side

O(n) sort – radix sort

The principles are relevant to pure-algo interviews.

Q: quicksort is universally usable. How about radix sort?
A: not widely mentioned as a drawback against quicksort. (In contrast, I think Counting-sort is not universally usable…)
A: wikipedia says integers could [1] represent limited-length strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is not limited to integers.

Radix sort is good for parallel computing. See http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734

Radix-sort Expected running time is faster than quicksort? Could be but….
… but the hidden constant factor in O() is much worse than quicksort.

[1] It’s hard to prove “cannot”

KMP algorithm for string search

Q: Why is std::search() and string::find() use simple O(N*M) solution and not using kmp?
A: kmp requires extra memory to store the pre-computed data structure. Allocating the storage is likely more costly than the big-O savings,
A: in the common use cases, the strings are not super long and there are not many repetitions in the pattern, so kmp will not help a lot
A: therefore, overall, the simple implementation is faster, even though big-O is inferior. I believe big-O comparison really compares large values of N
–prefix-function in one sentence: “longest border of a partial-match needle[1..j-1] that is not followed by the character needle[j] “. In other words, we are looking for the largest integer k such that

* needle[1..k-1] is a border of needle[1..j-1] and

* needle[1..k] is not a border of needle[1..j].

needle[1..j-1] is a “partial-match” and needle[1..k-1] is “border” of the partial-match. The border occurs at both ends of the partial-match.


struts as a dispatcher

“request dispatcher” or “request router”.  Explanation in one sentence:
“A traffice-cop that directs incoming requests [note 1] to places [note 2].”
[ Note 1: perhaps instructed by another struts component ]
[note 2: to other URL’s. Sometimes we talk of “to some processing components” ]
In this model, afbeans can become irrelevant whereas afward COMPONENTS become critical. In this model, u need to know the various means to add parameters (name-value pairs) during the dispatch.
This discussion is important only for AR. iview?

php AR q&&a

query cache ] pdo?
session expiry?
persistent db connection?
db cache?
connect to a db cluster?
cluster between 2 php servers? with session failover?
php web server, static web server, proxy
multi-threading issues?

Q: Remember the NBC invisible dual-host cluster with different date? How do you duplicate the sessions? Or you don’t because each session is needed only in one host due to session-based load routing?

[07] finite automata for string matching

–First, Take the pattern as an array [Note 4]. You can move a pointer to between any neighbours [note 1], to indicate how many leading characters [note 2] match at this moment.

There’s a rule — pointer can move forward (ie rightward) only one position at a time. However, the pointer often jumps backward (leftward).

I call it the s-pointer. It’s a secret pointer. It’s also a state-pointer as it determines the state the automaton is in.

[ Note 4: array of chars, laid out left-to-right ]
[Note 1: Each position of the pointer is one of the so-called states. ]
[Note 2: of the pattern. The “leading charaters” form a so-called prefix of the pattern ]

–Once you understand the s-pointer, you can now appreciate how s-pointer creeps forward or jumps back as you “eat” input characters one by one.

As you move rightward (“eat”) through any text [Note 3] one char at a time, s-pointer creeps forward or jumps back to indicate how many leading characters of pattern match at this moment.

[ Note 3: the text can grow indefinitely. Our automaton just keeps shouting “Match, Match” ]

matching a stuck-record^nopainnogain

a stuck-record is something like “i love fish i love fish i love fish “.

Q: How do u match a stuck record?
A: /(i love fish )\1+/ seems to be the standard solution in literature

On the other hand, “no pain no gain” are not stuck-records but a …. excel column list — AB-AC-AD-…?

Q: Will \d{3} match 287 ?
A: Yes. see p 177 [[ programming perl ]]. Looks like \d behaves like a wildcard just like the dot

Q: will (\d\d){2} match 7193 ?
a: yes

american well questions

list^set difference?
how do u prevent duplicates in a set?
why we need both equals() and hashcode()? (within separate-chaining …)
php caching?
ejb advantage over spring?
why not host biz logic components in the same jvm as web components?
jpa can employ hibernate?

fail-over with pairing cluster, where one a pair of nodes share state data. hot copy
spring XA? a container is needed
ejb3: annotation-based security and tx
ejb: usually generated. easy to make mistakes

jruby — briefly

jruby implements most of the ruby features.
Support for interacting with and defining java classes from within ruby.

Q: j4 jruby
A: using Ruby for tools and tasks where Java only incurs overhead is a
very compelling for Java developers

Q: How do I use it if i already know how to program and run a ruby program?
A: you should be able to just use your Ruby scripts with JRuby

85*85 45*45 115*115

For all of them, the last 2 digits are “25”. To work out the Other digits,

1) take the input’s leading digits before “5” — 8 in this case
2) multiple it by it’s bigger brother — 8 * 9

For 115, “the other digits” are 11*12 = 132, so 115*115=13225
For 175, “the other digits” are 17*18 = 289+17 = 306

perf techniques in T J W’s project–ws,mq,tx

Q: request wait-queuing (toilet queue)? I know weblogic can configure the toilet queue
A: keep the queue entries small. we only keep object id while the objects are serialized to disk (?!)

Q: is 1kB too large?
A: no

q: most common cause of perf issue?
A: mem leak. still present after regression test

q: jvm tuning?
A: yes important, esp mem related

q: regression test?
a: important

q: perf tools?
a: no tools. primarily based on logs. eg. track a long-running
transaction and compute the duration between soap transaction start
and end.

Q: web services?
A: Many of the transactions are based on soap, axis. TCP monitor
can help with your perf investigation.

Q: tx?
A: yes we use two phase commits. Too many transactions involved.
really complex biz logic. Solution is async.

Q: multi-threaded?
A: handled by weblogic.

Q: how is the async and queue implemented?
A: weblogic-mq with persistent store, crash-proof


To impose the validation rules, the standard Struts ActionForm won’t suffice. For this to happen the ActionForm class designed specifically for the Validator framework must be used. It comes in two varieties- ValidatorForm and DynaValidatorForm. Two methods are present in both of them- reset() and validate().

struts validator.xml

It is here that the rules defined within the validator-rules.xml that the rules are applied to the data fields of ActionForm objects.

validator.xml is the “link table” linking the rules with the afbean fields. (A: the other xml file, which defines the validators outside the context oof data fields)

validator.xml is Perhaps the 2nd step in swallowing this elephant, just like struts-config.xml to struts. The first step is …..

types of php var scope#q

“private” keyword? probably not a scope modifier but an access modifier? Scope refers to visibility.

“static” keyword within functions? visibility unaffected

( Maybe it’s easiest to list all of them as a whole. )

“global” keyword?


default scope of var in function?

default scope of var outside function?


interviewers like our knowledge or our hands-on xp@@

Hi JunLi

(to be published on my blog) Friday night i said in MSN chat that the all-important thing in the US job interview is textbook knowledge. With good knowledge, we pass. Without it, we fail. Knowledge is different from hands-on experience.

I was paid tens of thousands of dollars to develop software for real users, coding in C, weblogic, spring, hibernate, java transactions, ejb, jms, java threads …, but I dare not tell interviewers I know them well. I am afraid of their tricky questions.

Ironically, in real projects I have zero (or near-zero) hands-on experience coding hash functions, quicksort, outer joins, deadlocks, explicit Locks … but i gave convincing textbook answers to such interview questions.

Struts is an interesting case. I did use struts and something very similar to struts. Real hands-on experience. But interviewers often ask about things I didn’t need to know to use struts. Paradoxically, I was able to answer some struts questions based on my reading of struts books, even though i never used those features. In a nutshell, textbook learning helped my struts interviews; hands-on experience didn’t.

US work xp vs Singapore work xp


(to be published on my blog) Thanks for your questionnaire;-) over the phone. We went over some important topics today. To re-iterate some of my opinions,

I suggest you choose between only these 2 priorities
– find a partner
– spend a few months getting a job , any job, in US, and then look for a better one
* Your investment activities should not be a priority and should not be your focus. Such a focus is short-sighted.

I understand your highs and lows in terms of enthusiasm with dating. Once you fall off the horse, get back on and keep going. Don’t stop. Time is not on your side.

I now have many family constraints on my career choices. I used to see it as a burden but now I see it as a man’s duty. Even though I don’t seem to practice what I preach, I do feel “Family is more important than work.”  Family problems are more serious and more painful than job set-backs. Job compared to family is like a backpack adventure compared to a long long train journey or a financial investment program compared to a savings account. There are inherent risks in most careers, but family is something I grow with care and feeding.

I agree that any US work experience is well-recognized in other countries whereas Singapore work experience (including leadership experience) is seldom recognized in US. Many Singapore team leads come to US and take on non-lead roles.

Thanks for your story of the ex-colleague growing his confidence.
* I will work on strengthening my foundation and my confidence.
* I will work on understanding interviewer’s perceptions and growing my confidence.

Thanks for your detailed illustration of his stance on the company’s side whenever there’s a decision involving workers vs company. Good insight. I will learn this technique.

Thank you for your valuable insight about UBS “approved softwares”. I now see this  policy can leave programmers with a poor understanding of the internals of those “customized” packages.

google phone IV — mostly algorithms

Q: bookmark system design: DB schema, insert/search

Q: implement the fastest
bool isAnagram (string1, string2)

Q: for any user-input word, quickly find all anagrams in a given dictionary (pre-loaded into memory). You own the pre-loader, so could pre-load any data structure you like.

%%A: use counting sort to derive a “key” (like a hashcode) for every word in the dict. Use a 2-column table to hold the word and its key. keep the dict sorted by the key

O(1) constant complexity means ….

1st type of time-complexity to study is the “linear growth” — O (N). As list size N tripples, running time tripples.

“constant complexity” means …. As list size N grows, running time doesn’t grow with N. It’s independent of input size. I think running time may still vary as list grows, but much slower than log N .

Example: somehow, hash search has a constant complexity for search and insert, independent of list size or hash table size(?)

hash search, briefly

hash function produces a table *address* for each input key — no comparison required. Therefor hash represents the 2nd major type of search algorithm, after comparison-based algorithms like Binary Search Tree, or red-black tree

usually the fastest choice for symbol table search

widely used.

constant time for insert and search, but not some other operations like … sort.

hash output should be evenly distributed across table addresses.

modular hashing should use large prime numbers but why? P174 [[java generics]]

modular hashing need adjustment for long alphanumeric keys cos hardware can’t handle 4049040490934 bits for a single variable. Perhaps apply mod-multiple(by 256)-add to (MSB, next MSB, next next MSB..)

Step 1 of hashing algorithm: hashcode(key) to convert key into table address
Step 2 of hashing algorithm: collision resolution — separate chaining. One separate unordered linked list per shared table-address

vmware IV questions

what’s a race condition?
how do u deal with it?
what’s a deadlock?
how do u deal with it?
what objects do u use in ajax for an async operation?
what are the 5 ready states?
what’s the difference between msie’s implementation and firefox
what’s a hash table, not a HashTable
how do u implement a hash table to maintain constant time-complexity for retrieval?

#1 error when selecting tuples failing a predicate

Q: write a SELECT to
– show those … (ie tuples) which are not …
– show students who are not ….
– show accounts which do not …
– show ships which are not ….
– show shops which are not ….

All of these are similar. Usually you need a condition like

  where col8 …
  where col8 not like “%..”

Do you remember a common mistake you tend to make, assuming no nulls?

A: You are assuming each student is represented by just 1 row. Example: an enrollment table with columns for student_id and subject_code. One student can show up in many rows.

3 steps to understand "signing a msg" with a private key

1) understand basic meanings of your private key ^ your pub key.

If you can’t remember anything, remember to keep your private key very
very private, and publish your pub key to friends.

If you can remember one more thing, then remember

“sign with private key; verify with public key”.

2) understand “digsig” — an encryption of a msg, encrypted with your
private key

You often send the digsig to your receipients to certify something.
Receipients can verify your digsig using your pub key

3) understand what it means to “sign a msg”.

In the simplest case, it means “produce a digsig, and send the (cleartext) msg + the digsig”

Q: This process offers CIAn?
A: IAn but not Confidentiality since you actually send a cleartext msg

java q[volatile] briefly: visible, stale

imprecise words? ok
problem ] 1 sentence: variable updates in one thread could be invisible in another.

volatile ^ synchronized? Yes synrhonized can replace volatile, but it’s nothing to do with atomicity. Synchonized getter/setters has a side-effect — trigger a flush from register to main mem, achieving volatile’s functionality

[[ java concurrency in practice ]] p34 has a simpler example than [[ java threads ]]

“update visibility”
“stale data”

nested loop join

for now, focus on 1-to-1 mapping between tables.
  • oracle takes 1st row from driving-table
  • oracle looks for a matching row in 1st joined-to table
  • oracle builds a combined row containing fields from both tables
  • the combined row becomes input to the 3rd loop scanning the 3rd table
  • a final combined row is returned