testing threaded designs – enterprise apps(!lib)

Bottom line – (Unconventional wisdom) Be bold to create new threaded designs. It doesn’t have to be rock solid like in the standard library.

Experts unanimously agree that non-trivial MT designs are hard to verify or test, often exceedingly hard. There are often too many possibilities. Maybe a million tests pass but next test reveals a bug. Therefore peer review is the way to go.  I feel that’s the “library” view, the view from the language creators. Different from enterprise apps.

In enterprise apps, if a MT design passes load test and UAT, then good enough. No budget to test further. If only 1 in a million cases fail then that case has something special — perhaps a special combination or purely timing coincidence. Strictly speaking those are still logical errors and design defects. A sound design ought to handle such cases gracefully. Most designs aren’t so sound. If such a failure happens so rarely, just about everyone involved would agree it’s basically impossible to catch it during testing. Such a bug, if ever uncovered, would be too hard to catch and catching it should not be in any job scope here. Missing it is tolerable and only human.

A goal keeper can’t be expected to catch 99 penalties in a row.

In the enterprise reality, such a bug is probably never uncovered (unless a log happens to provide the evidence). It often takes too much effort to investigate such a rare issue. Not worth the effort.

quiet confidence on go-live day

I used to feel “Let’s pray no bug is found in my code on go-live day. I didn’t check all the null pointers…”

I feel it’s all about …blame, even if manager make it a point to to avoid blame.

Case: I once had a timebomb bug in my code. All tests passed but production system failed on the “scheduled” date. UAT guys are not to blame.

Case: Suppose you used hardcoding to pass UAT. If things break on go-live, you bear the brunt of the blame.

Case: if a legitimate use case is mishandled on go-live day, then
* UAT guys are at fault, including the business users who signed off. Often the business come up with the test cases. The blame question is “why this use case isn’t specified”?
* Perhaps a more robust exception framework would have caught such a failure gracefully, but often the developer doesn’t bear the brunt of the blame.
**** I now feel business reality discounts code quality in terms of airtight error-proof
**** I now feel business reality discounts automated testing for Implementation Imperfections (II). See http://bigblog.tanbin.com/2011/02/financial-app-testing-biz.html

Now I feel if you did a thorough and realistic UAT, then you should have quiet confidence on go-live day. Live data should be no “surprise” to you.

swing automated test, briefly

3) http://www.codework-solutions.com/testing-tools/qfs-test/ says …event is constructed and inserted artificially into the system’s EventQueue. To the SUT it is indistinguishable whether an event was triggered by an actual user or by QF-Test. These artificial events are more reliable than “hard” events that could be generated with the help of the AWT-Robot, for example, which could be used to actually move the mouse cursor across the screen. Such “hard” events can be intercepted by the operating system or other applications.

2) http://jemmy.java.net/Jemmy3UserTutorial.html and http://wiki.netbeans.org/Jemmy_Tutorial explain some fundamentals about component searching. Jemmy can simulate user input by mouse and keyboard operations.

1) java.awt.Robot is probably the most low-level — Using the class to generate input events differs from posting events to the AWT event queue —  the events are generated in the platform's native input queue. For example, Robot.mouseMove will actually move the mouse cursor instead of just generating mouse move events.

IV questions/answers on trading system testing

See post on biz-scenario [S] ^ implementation error [I].

More and more interviewers seem to ask increasingly serious questions about trading system testing. I feel that’s part of our professional growth towards senior positions (though many Wall St lead developers I know aren’t big fans). In some cases, the higher you are paid, the deeper testing experience is expected of you.

Testing skill is a complement to design skill, both of which become increasingly important as we move up. These Interview questions tend to be open-ended. Interviewer gives us a chance to sell and convince them that we do have real experience and personal insight. Here are some pointers for myself. Your inputs welcome.
–At the Tools level, we can easily talk about Fitnesse, Mockito, JUnit, DbUnit etc [I]. Easy to read up about these. Easy to tell real stories. Nothing unique about trading systems.
–Another level of easy topics – regression testing, integrated testing. Again nothing unique about trading system. Since i have personal experience [S] i usually spend a lot of time at this level.
* scenario testing — essential in larger systems. Without scenarios, if you ask a rookie to list all the combination of various inputs, the list is endless and useless. Instead, I use assertions to reduce the combination.
* matrix testing — i often draw up some spreadsheet with rows and columns of input values.
* production data — essential if we were to involve users. The earlier we involve users, the better.
* I use JUnit to drive integrated tests with real DB, real MOM, real cache servers, real web servers

–At another level, special to trading systems are load testing, performance testing, thread testing, MOM testing. MOM and esp. thread testing uses special techniques. But in reality these are seldom adopted. An interesting question I like to ask is “at what specific (number please) load level will my system crash?” It’s good if we can predict that magic number. If we can’t then we might get caught with our pants down, which is quite common. I feel trading system testing can be divided into logic testing and performance testing. In reality, most small enhancements need only logic testing.
–Perhaps the highest level is UAT [S]. I would mention Fitnesse. I would make it clear that in GS, users or BA are called upon to generate test cases.
–At a methodology level, a tougher question is something like “How do you ensure adequate testing before release?” or “How do you decide how much testing is enough?”. Since I never used test coverage [I] tools, I won’t mention it. Frankly, I don’t believe 80% test coverage means the code is more reliable than 0% test coverage
— The most productive GS team members do no automated testing at all. They rely on experience to know what scenarios are worth testing, and then test those manually, without automation. When we realize bugs are released to production, in reality we almost always find our test scenarios incomplete. Automated tests won’t help us. But I can’t really say these things in interviews — emperor’s new dress.

financial app testing – biz scenario^implementation error

Within the domain of _business_logic_ testing, I feel there are really 2 very different targets/focuses – Serious Scenario (SS) vs Implementation Imperfections (II). This dichotomy cuts through every discussion in financial application testing.

* II is the focus of junit, mock objects, fitnesse, test coverage, Test-driven-development, Assert and most of the techie discussions
* SS is the real meaning of quality assurance (QA) and user acceptance (UA) test on Wall St. In contrast II doesn’t provide assurance — QA or UA.

SS is about real, serious scenario, not the meaningless fake scenarios of II.

When we find out bugs have been released to production, in reality we invariably trace root cause to incomplete SS, and seldom to II. Managers, users, BA, … are concerned with SS only, never II. SS requires business knowledge. I discussed with a developer (Mithun) in a large Deutsche bank application. He pointed out their technique of verifying intermediate data in a workflow SS test. He said II is simply too much effort and little value.

NPE (null pointer exception) is a good example of II tester’s mindset. Good scenario testing can provide good assurance that NPE doesn’t happen in any acceptable scenarios. If a scenario is not within scope and not in the requirement, then in that scenario system behavior is “undefined”. Test coverage is important in II, but if some execution path (NPE-ridden) is never tested in our SS, then that path isn’t important and can be safely left untested in many financial apps. I’m speaking from practical experience.

Regression testing should be done in II testing and (more importantly) SS testing.

SS is almost always integrated testing, so mock objects won’t help.

automated test in non-infrastructure financial app

I worked in a few (non-infrastructure) financial apps under different senior managers. Absolutely zero management buy-in for automated testing (atest).

“automated testing is nice to have” — is the unspoken policy.
“regression test is compulsory” — but almost always done without automation.
“Must verify all known scenarios” — is the slogan but if there are too many (5+) known scenarios and not all of them critical then usually no special budget or test plan.

I feel atest is a cost/risk analysis for the manager. Just like market risk system. Cost of maintaining a large atest and QA system is real. It is justified on
* Risk, which ultimately must translates to costs.
* speed up future changes
* build confidence

Reality is very different on wall street. [1]
– Blackbox Confidence[2] is not provided by test coverage but by “battle-tested”. Many minor bugs (catch-able by atest but were not) will not show up in a fully “battle tested” system; but ironically a system with 90% atest coverage may show many types of problems once released. Which one enjoys confidence?

– future changes are only marginally enhanced by atest. Many atests become irrelevant. Even if a test scenario remains relevant, test method may need a complete rewrite.

– in reality, system changes are budgeted in a formal corporate process. Most large banks deal with man-months so won’t appreciate a few days of effort saving (actually not easy) due to automated tests.

– Risk is a vague term. For whitebox, automated tests provide visible and verifiable evidence and therefore provides a level of assurance, but i know as a test writer that a successful test can, paradoxically, cover up bugs. I never look at someone’s automated tests and feel that’s “enough” test coverage. Only the author himself knows how much test coverage there really is. Therefore Risk reduction is questionable even at whitebox level. Blackbox is more important from Risk perspective. For a manager, Risk is real, and automated tests offer partial protection.

– If your module has high concentration of if/else and computation, then it’s a different animal. Automated tests are worthwhile.

[1] Presumably, IT product vendors (and infrastructure teams) are a different animal, with large install bases and stringent defect tolerance level.
[2] users, downstream/upstream teams, and managers always treat your code as blackbox, even if they analyze your code. People maintaining your codebase see it as a whitebox. Blackbox confidence is more important than Whitebox confidence.

dependency injection – promises && justifications

* “boilerplate code slows you down”.
* “code density”
* separating behavior from configuration. See post on “motivation of spring”
* code reuse, compile-time dependency. cut transitive dependency. — u can move the aircon without moving the house. “to build one class, you must build the entire system”
* “tests are easier to write, so you end up writing more tests.”
* easy to mock

These are the promises and justifications as authors and experienced users described.