ajax paradigm is imperfect

based on http://en.wikipedia.org/wiki/Rich_Internet_application

Ajax has recently been used most prominently by Google for projects such as Gmail and Google Maps. However, creating a large application in this framework is very difficult (echoed by the “lunch interviewer” in Google), as many different technologies must interact to make it work, and browser compatibility requires a lot of effort. In order to make the process easier, several open source Ajax Frameworks have been developed, as well as commercial frameworks.

Ajax complicates software testing activities. These complications lengthen the software development process.

[07] questioning accumulation as sys architect

You have repeatedly focused on accu.

@55, if u accumulate 5 years of lead-architect xp, it’s like to help tanbin’s sy1 (tanbin is suitable for it) even though
* even though In 2029 sg AR outside finance may earn less than many non-AR roles in US
* even though many of your peers might be in more glamorous roles in other countries earning more
* even though process differs from telco to telco — xp
* even though once-dominant technologies subside. Consider Raymond, Software AG, novell, Notes, Infomix…
* even though in many jobs and in many domains, accu is /elusive/
* even though — xp — you were frequently distracted by dotcoms, sunrise domains …
Q: how fast to become an large sys AR
If it takes years, then let’s accumulate now. Don’t diversify into PHP, DBA or SAP
Q: if not in US, then in SG?
perhaps lower pay than a non-AR role in US. Not greed.
Q: u feel AR adds more value than other roles, but it could be an internal system or a new, unproven service?
A: Still, AR tend to add more value.

perl one-liners for xml

based on http://www.xml.com/pub/a/2002/04/17/perl-xml.html.

Find all section titles in a DocBook XML:

# cat files/mybook.xml | xpath //section/title

Retrieve just the significant text (not including nodes containing all-whitespace) from a given document:

# cat files/mybook.xml | xpath “//text()[string-length(normalize-space(.)) > 0 ]”

Save the entire data stored in the ‘users’ table as a huge file users.xml:

# sql2xml.pl -sn myserver -driver Oracle -uid user -pwd seekrit -table user -output users.xml

Pretty-print a bad xml file:

# cat overwrought.xml | xmlpretty > new.xml

Use the built-in HTML parser to convert ill-formed HTML to XML before further processing:

# xmllint –html khampton_perl_xml_17.html | xpath “//a[@href]”


parallel source tree junit #open q

See http://junit.sourceforge.net/doc/faq/faq.htm#organize_1

keyword: junit
keyword: default “package” access — such members would be testable


Q: If they all have “package com.verizon;”, would jvm see them in the same package?

Q: in a distributable jar, would they go into a merged /com/verizon directory?

%%struts quiz

brief points? ok
j4: keep for review in 2 years

[g=generic servlet/jsp issue]

  1. [i] error-handling is relatively simple but imp2iview. What kind of entities read ActionErrors and how are they used? views?
  2. ActionForward ^ ActionForwards ? P 113
  3. where does Struts use reflection and introspection? P 114
  4. do you need to manage sessions or struts does it for you?
  5. [i] what kind of data structure to hold multiple errors in ActionErrors (No need to be exact) ?
  6. how2get an afbean from a mapping object? Is the afbean “contained” (HAS-A) in the mapping obj? I think when the afbean object exists in JVM, the mapping obj provides a path to it.
  7. [gi] when does a session start and in which method? probably by ActionServlet
  8. [i] signature and usage of findForward()? defined in which class? P 741 [[ ?? ]]
  9. [gi] is ActionServlet a singleton?
  10. which object invokes the afbean’s validate()?
  11. how does execute() retrieve the afbean? from the argugment mapping obj or from the request argument?
  12. how2get formbean from req? when ? when req is forwarded to a non-struts servlet or another struts servlet?
  13. [i] which method(s) return ActionErrors (see the trailing s), following which well known design pattern?
  14. let’s zoom into Struts’ usage of dataSource class in standard java. which struts components interact with a datasource?
  15. [g] is ActionServlet a single-threaded servlet, and need many sisters to service simultaneous requests?
  16. [gi] How is ActionServlet thread-safe?
  17. session in struts is same as session in other servlets?
  18. <action input=…. means?
  19. <action name=…. means?
  20. struts connection pooling? no answer
  21. beside global forward and local forward, there’a 3rd way to DEFINE an afward. P189
  22. which validator xml file contains the javascript function? What’s the other major content in the samel xml? name of a method?
  23. controls the (non-validator) validate() method u create in your afbean, but what if u use validator?
  24. 1 basic use of var in validator.xml? p378
  25. #1 most basic validator in the validator package? how do u configure it in validator-rules.xml? p375
  26. aforward obj is returned by which method to whom? How is it used?
  27. [gi] in which method do u do jdbc connect? perhaps a M object not defined by struts
  28. why execute() need the map? local forwards?
  29. g how does a non-struts servlet retrieve afbean from the request (received by forward)? P221
  30. Without struts-validator, what page is returned to browser upon validation error? How do u make this takes place by configuration?
  31. biggest 2 struts extensions each can be used outside struts, and each defines its taglib. What’re the 2 tag prefixes?
  32. what return type do u use with a formbean getAmount() ? P155
  33. simple yet important AR question: which obj calls execute()? P 741
  34. where 2 spell out a regex? how 2 pass it into a validator
  35. can take one of 3 { include, forward, ???? } I think it’s type
  36. ignore validator for now. afbean usually defines an public instance method validate(). what’s the signature? revealing! P740
  37. [i] simple yet tricky question: exactly which object (loaded into the servlet container) acts as the router, inspecting and dispatching requests to other objects?
  38. signature of execute(). AR-revealing
  39. [gi] how does the query result get into the view? aforward? req obj? P 741 [[servlet]]
  40. [i] which obj auto-populates a formbean? ActionServlet?
  41. where is the Model object instantiated? P 741 [[ servlet ]]
  42. is the mysterious but important ActionServlet (full class name) mentioned in any prominent place? Hint: along with struts-config.xml. P 743 [[servlet]]
  43. during request processing flow, describe when the mapping object is loaded and how it controls each module such as form population/valiation, forward, custom action etc. P47.
  44. example of httpSession in struts? P79

migrate between jsp & php

Hi LS,
Ideas for the migration, if migration is on your mind.
Inside or outside the database,
* squeeze as much business rules into DB as possible. Unique constraints, referential integrity (triggers), check constraints, input validation constraints for insert/update/delete, not-null constraints, stored proc, triggers, user-defined functions …
* Without an enormous effort, you can move every single SQL out of java/php/asp source and into some (xml) “query-listing” file. Every SQL sent to the DB will be taken from this listing. Performance penalty can be managed with some kind of simple caching, which I think can be quite effective.

* Many if not most jsp projects encourage a DAO pattern, sometimes quite elaborate or sophisticated. PHP supports DAO patten in PEAR::DB, albeit lightly. It helps your migration. However, small PHP sites may not warrant the extra effort

* Avoid Object-Relational-Mapping tools such as Hibernate or iBatis, since PHP isn’t comfortable with ORM yet.

With regard to so-called “presentation layer” or page templates
* smarty and JSP-EL/JSTL are highly recommended. They are simple and straight-forward. With them you can create templates free of business logic. The only “programming” remnants left in the template are simple variables, and the simplest loops and if-then-else — all easy stuff to migrate between jsp and php.
* squeeze as much business logic into javascript as possible. Over the years, more and more functionalities have become transferrable from server-side to javascript
* Not sure if AJAX is necessary for your site. AJAX requires you to create special server-side components. The browser-side ajax function communicates with the server-side components. Server-side components are coded in php or java or asp.
Lastly, we finally turn our attention to the jsp/php files. Even with the DB/presentation layer hacks, some, perhaps the bulk, of the functionality may remain in your java/php source code. You need to think of server-side software design ideas that are transferrable between jsp/php/asp. Many things to consider. Here are A few trivial items.
* avoid servlet/jsp forward. I don’t think php supports it.
* PHP guys tend to put a few related functionalities into one php file, motivated by maintainability, readability and other reasons, whereas jsp/servlet tends to separate them into individual jsp files (even separate javabean/servlet classes). The java approach is viable in PHP too, and helps migration.
* If you need automated testing, consider black-box test tools. These are transferrable and agnostic to the server-side language, be it jsp or asp or whatever

sum(field_with_all_null) == null

http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html shows how aggregate functions react to empty sets ie “no entry except nulls”.

COUNT returns zero whereas AVG, SUM, MAX, and MIN return null.

If you have no values to count, it’s fair to say that you have zero values, whereas you can’t really come up with, say, a maximum value without at least one value from which to choose. We can make a reasonable argument that SUM should return zero instead of null, that the sum of no values is zero, but Oracle’s implementation of the behavior we describe here is fully compliant with the SQL standard.

WallSt interviews cover 1% of java topics

Hi LS,
I said wall street asks only 1% of java topics so a java candidate need to know close to nothing about the other 99%. This is perhaps close to the truth but potentially misleading.
* I have 2 books [[ java threads ]] and [[ concurrent programming in java ]]. Wall street multithreading interview questions require 0.1% to 0.5% of the knowledge in them.
* If you look at the published java Collections API, again wall street java collections questions require less than 2% of that body of knowledge.
* Similar for servlet/jsp, sql joins, and garbage collection.
However, 1% still requires months of focused study. I'm still weak in design patterns, GC and SQL tuning
tan bin

TreeSet won’t keep insertion order

When you see tree*, think of “sorted”. In java, Tree* includes TreeSet, TreeMap…

I believe all tree constructs (in computer science) are sorted.

A collection (set or list or queue …) are physically stored in one order only [1]. The physical storage order can only follow one of the following:
* sorted
* insertion-order maintained
* parent-child relationship
* etc

It’s now obvious that Tree* constructs don’t keep insertion order.

[1] Same as Clustered Index

SoftReference^WeakReference : %%intro

First understand weak ref. It’s more useful than soft ref. [[hard core java]] says many jvm treats soft ref just as weak ref, but nowadays jvm tends to aggressively extend softRef lifetime.

1-word intro for Mr Soft — middle. Its strength level is between regular ref and weak ref. Weak reference is weakest.

— based on http://www.axlrosen.net/stuff/softreferences.html

  • Q: What’s the difference between SoftReferences and WeakReferences?
  • A: In terms of lifetime …..  SoftReferences are extended aggressively; WeakReferences are released aggressively;
  • A: Use WeakRef if release is desirable; use SoftRef if release is undesirable and should be postponed if possible
  • A: Use WeakRef if memory is more precious than the object; use SoftRef if memory is less precious than the object so you keep the object for contingency.
  • A: Both types are suggestions to GC. Using a SoftReference tells the garbage collector that you would like it to keep the object around for a while, until memory considerations cause it to reclaim the object. By contrast, using a WeakReference tells the garbage collector (to act aggressive) that there’s no reason to keep the object around any longer than necessary.

Usage?  SoftReferences are primarily for implementing capacity-sensitive caches. WeakReferences are mostly used in WeakHashMap, primarily for associating extra information with an object (that’s what WeakHashMap does). http://stackoverflow.com/questions/154724/when-would-you-use-a-weakhashmap-or-a-weakreference has one answer showing such an example.


batch jobs in financial trading system

According to a friend in investment banking technology, ALL trading systems need batch jobs to complement online applications. I think MQ applications are a third type. Typical batch:

* save in DB historical volume/day-high/day-low/day-open/day-close … — “open” information open to the public

* save in DB all market players’ trades. That’s my own terminology referring to “our own” hedge funds, other firms’ hedge funds, other firms’ traders… Our own traders’ activities are probably captured during transaction — no batch required.

* We (the brokerage) may also have large institutional clients whose data need to be recorded in DB. Such data may need batch processing if it is not recorded automatically.

listing customers with 0 payment history

listing customers with 0 payment history (according to the Payment table) — A simple anti-join, well covered in literature.

Customer_table (cid…) with primary key on cid
Payment_table (cid, date, amount ..) with non-unique index on foreign-key cid

S1 (uncorrelated): select C.cid
from C
where C.cid not in
(select distinct cid from P)

S2 (correlated) — thanks to index, no FTS:
select C.cid
from C
where not exists
(select * from P
where C.cid = P.cid)

[1] One of the first key observations is the subset relationship between the prikey C.cid and forkey P.cid, and consequently the referential integrity.

[2] Second key observation is that P.cid is (perhaps heavily) repetitive, if customers pay a few times a day (for a subway pass). I think it’s imperative to use the index on P.cid.

[3] Third key observation is the absense of additional criteria => have to process every C.cid. In most real life queries, another where-criterion restricts us to a fraction of the C.cid values. In that case P.cid FTS is bad.

S1 subquery runs only once and reads every Payment_table page exactly once. S2 subquery runs once for each [4] row in Customer_table, reads every index entry but never reads the Payment_table. Due to caching, S2 loads every index data block exactly once.

[4] I think the repetitive run of the subquery is not as bad as unnecessary disk reads. Number of disk reads dominates SELECT performance. What’s the disk vs memory speed ratio? Orders of magnitude! S1 reads all P.cid, S2 loads the entire index on P.cid. Which reads less? Also see http://www.onlamp.com/pub/a/onlamp/2004/09/30/from_clauses.html.

In any case, the correlated S2 always reads every distinct P.cid that matches C.cid, which is every[1] distinct P.cid /value/, but I think S2 reads only index, not the table. In most real-world queries, a index scan precedes a table read.

It is possible that FTS on P requires almost the same number of disk reads as a full index scan, if P.cid is close to unique. Without [2], disk reads related to P are identical between S1 and S2.

S3 (using distinct): — no FTS on P, probably beats the anti-join recommendation in [[ oracle sql tuning ]].
select distinct C.cid
from C outer join P on cid
where P.cid = null

S4 (using count): select C.cid
from C outer join P on cid
having count(date) = 0
group by C.cid

S5 (using sum): select C.cid
from C outer join P on cid
having sum(amount) = NULL — not 0. P 93 [[ intro to sql ]]
group by C.cid

S6 (using minus):

[07] google-style interviews: apply comp-science Constructs on realistic problems

Hi LS,

Now I think yesterday’s toughest interview questions are similar in spirit to Google interview questions. I think the key skill tested yesterday was “how does this candidate apply computer science constructs to real world problems”. That’s what I meant when I said “connecting the algorithms/data-structures with the given requirements“. Before you can identify any algo/data-structure to model the requirements, you have no clue how to support the requirements. Whenever we see a first-in-first-out system like Green Card applicants, we automatically think of a Queue data structure — ie connecting the Queue construct to a real world problem.

A 6-year veteran developer like me is supposed to be familiar with
1) associative arrays, hashmaps, sets, binary search trees, doubly linked lists, multi-dimensional arrays and (hopefully) circular linked lists, binary heaps, red-black trees, and any combination of the above like a stack of hashmaps,
2) the concepts of space efficiency, time efficiency, random access, sequential access, what can/can’t be sorted …
3) OO concepts of dependency, coupling, polymorphism, static, multiple inheritance, HAS-A vs IS-A, setters (extremely important) …
4) OO design patterns

But this veteran may not know how to APPLY these constructs to a real-world problem, according to my observation. All of Google’s and my recent problem-solving interview questions attempt to test a candidate’s analysis skill to “apply tools to problems”. In my case, I couldn’t applied the Strategy design pattern unless I spot the similarity between the given requirement and the Strategy pattern’s requirements.

The 4 groups of tools mentioned above are simple tools when studied individually, but any design that need tools from 3 of these groups is likely to be non-trivial. Consider algorithms to delete a node from a binary search tree.

Finally I have some idea why Google like those questions, including the question about “1-7 random number generator using a 1-5 generator”.

property^param^attribute ] servlet

property = field
attribute => 4 scopes
param … usually means GET or POST params
These are the shortest intro to the 3 frequent terms in servlet/jsp literature. Now some details.
— param
GET => query string
POST => user input
usually means “request param” ie GET/POST param, unless otherwise stated
means request param, too!
disambiguation: “parameter” and “param” sometimes have specific and different meanings.
— attribute is a servlet concept
is usually a bean

corresponds to the setAttribute() and getAttribute() methods

in EL, ${*Scope.abc} usually refers to an attribute “abc”

disambiguation: Some servlet literature talks about “attribute” when they mean “xml attribute”.
–property => HAS-A.
Can be a bean.
is a generic java concept.
HAS-A can be nested, so can properties.
“setProperty” => a field in a bean

join order ] oracle – brief notes

* n-way
* Oracle can take an hour to choose a join order
* To limit it, you can tell the Oracle optimizer how many permutations to look into. P406 [[ oracle sql tuning ]]
* “ordered” hint tells the optimizer to follow the join order specified in the FROM clause

But what join order would you, the developer, prefer? I feel we should avoid re-scanning a large table

data structure to hold spreadsheet content

Q: Assume string content in each cell. As a start let’s target 1,000,000 rows by 1,000,000 columns. Goals
– memory
– random access
– growable

%%A: Intermediate solution (good habit): a 3-column DB table with a composite key (row,col). Now how do we represent this in a data structure. First attempt would be a hashmap whose key is a customized big integer. Higher 64 bits represent row numbr, while lower half col number. Value is a reference with Copy-on-write. Hash access with a given (row,col) is …. considered random access with O(1).

Now I think a second try would be a hashmap where
main key -> row number
value -> sub-hashmap, where
sub-key -> col number

The other of the twin hashmap keeps col numbers as main keys. Every update to the spreadsheet requires updates to both.

estimate cubic root #GS IV

Q: Given any double, how do you create an algorithm to estimate its /cube root/ to 3 decimal places.
%%A: first multiply it by 8 until we hit the limit of a 64bit long integer. We can safely drop the decimal part since it’s guaranteed to be negligible. Now we have a large (at least 61 bit) INTEGER to work with, which is easier than a float/double.

Original question was designed to ask: how would you as a human estimate that, without a computer.

blocking methods susceptible to interrupt #cancels too

interrupt() can be used to interrupt many blocking method. See P 29. Exactly what blocking methods? If you find a simple rule on this, probably it’s not completely trust-worthy. [[ java threads ]] isn’t clear. P 169 [[ concurrent programming in java ]] is better.

— blocking methods susceptible to interrupt(), based on my understanding
wait() P29
a socket’s read() method
join() P27
Any method containing “synchronized”? NO. See Lock.lockInterruptibly()

pthreads also defines several “cancellation points”. See P398 [[beginning linux programming]]

UBS iview

q1.3: why do u need to override equals()?

equity cash -> equity derivatives

For trading systems, throughput and stability are 2 priorities. Every minute of downtime means loss of revenue.

Q: what happens when a thread runs a synchronized block

q1.1: if u need a custom key class for a hashmap, what do you need for that class?
A: [[ java precisely ]] had a brief mention

q: what multithreading issues/solutions in your project?

q: how did u handle multiple inheritance in your project

q: what kind of data structures did you use in FTTP?

q: what is a deadlock?

q1.2: beside hashcode()?

q: what’s that “doubly linked list” in FTTP project

q: profiling tool?
A: jconsole, NetBeans profiler

duo+trio intro 2 mysql cursor

Lesson 1: “open, fetch, close” is the first thing to internalize. You have mastered the most important thing in Mysql cursor if you can recall this trio.

open my_cursor ;
fetch my_cursor ….
close my_cursor ;

Side show: Cursors are used for multi-row selects. For single-row selects, simply “select into…”

Lesson 2: “declare, open, fetch, close”

Lesson 3: “1-to-1 mapping” between a cursor and a multi-row select

Lesson 4: “declare-declare, open, fetch, close”, since we usually need to declare the cursor and the error handler. This forms the world-famous “duo + trio”

Lesson 5: Each multi-row select usually has 1 cursor and 1 error handler to handle end-of-loop.

Perl needs OO

Q: Criteria for favoring OO over procedural Perl?
A: P 320 [[ Perl best practices ]] gave a few criteria I could identify with.

* encapsulation — between class and clients. “Implementation of individual components of the system is likely to change over time”. Hide volatile internal implementation from clients

* “Large number of other programmers will be using your code modules”

Perl batch applications]a financial IT team

Team@@ A team of contractors to UBS (Wealth Management)
Who@@ 4 dedicated Perl developers, managing 10-20 mission-critical perl applications.

Symptom@@ Always firefighting — always busy with some urgent issues.

Cause@@ Probably not all due to technical problems. Sometimes an urgent business requirement pops up and requires a quick and dirty solution. Batch solutions are often the most quick-and-dirty solution.

#1 complaint@@ maintainability — not extendable, unchangeable, inflexible

Biggest complaint against Perl@@ Reading other peopel’s perl code can be painful. UBS does have coding standards, but somehow not enfoced for these 4 Perl guys, perhaps because of constant firefighting.

#2 most common issue in the environment is related to “shared codebase”. When you modify a shared code, more than 1 system can be affected. Since the existing code is almost unreadable (according to some of the 4 guys), the impact of changes is unpredictable.

Both problems contribute to the maintenance nightmare.

My diagnosis@@ Priority set by leadership. Quick-and-dirty is the chosen priority. Perl does permit unreadable coding styles. See my other post on Perl::Critic. If leadership is ambitious and wants to support twice the amount of business requirement without increasing headcount, then maintainability could suffer.

Justification for migrating to Java batch@@ One possibility is interoperability and code sharing with non-batch java apps.

singletons complicate unit tests@@

http://code.google.com/p/google-singleton-detector/wiki/WhySingletonsAreControversial gave 2+1 /indictments/ against singletons in general. The first indictment is testability.

Not easy to introduce a mock/stub for the singleton.

xp: in %% FTTP parser, I saw no real problem. The “user” is LoaderFactory.java. I could be wrong but don’t think it and my singleton are tightly coupled, and complicating unit test.