3 ways to expire cached items

server-push update ^ TTL ^ conditional-GET # write-through is not cache expiration

Few Online articles list these solutions explicitly. Some of these are simple concepts but fundamental to DB tuning and app tuning. https://docs.oracle.com/cd/E15357_01/coh.360/e15723/cache_rtwtwbra.htm#COHDG198 compares write-through ^ write-behind ^ refresh-ahead. I think refresh-ahead is similar to TTL.

B) cache-invalidation — some “events” would trigger an invalidation. Without invalidation, a cache item would live forever with a infinity TTL, like the list of China provinces.

After cache proxies get the invalidation message in a small payload (bandwidth-friendly), the proxies discard the outdated item, and can decide when to request an update. The request may be skipped completely if the item is no longer needed.

B2) cache-update by server push — IFF bandwidth is available, server can send not only a tiny invalidation message, but also the new cache content.

IFF combined with TTL, or with reliability added, then multicast can be used to deliver cache updates, as explained in my other blogposts.

T) TTL — more common. Each “cache item” embeds a time-to-live data field a.k.a expiry timestamp. Http cookie is the prime example.

In Coherence, it’s possible for the cache proxy to pre-emptively request an update on an expired item. This would reduce latency but requires a multi-threaded cache proxy.

G) conditional-GET in HTTP is a proven industrial strength solution described in my 2005 book [[computer networking]]. The cache proxy always sends a GET to the database but with a If-modified-since header. This reduces unnecessary database load and network load.

W) write-behind (asynchronous) or write-through — in some contexts, the cache proxy is not only handling Reads but also Writes. So the Read requests will read or add to cache, and Write requests will update both cache proxy and the master data store. Drawback — In distributed topology, updates from other sources are not visible to “me” the cache proxy, so I still rely one of the other 3 means.

TTL eager server-push conditional-GET
if frequent query, in-frequent updates efficient efficient frequent but tiny requests between DB and cache proxy
if latency important OK lowest latency slower lazy fetch, though efficient
if in-frequent query good waste DB/proxy/NW resources as “push” is unnecessary efficient on DB/proxy/NW
if frequent update unsuitable high load on DB/proxy/NW efficient conflation
if frequent update+query unsuitable can be wasteful perhaps most efficient

 

Advertisements

central data-store updater in write-heavy system

I don’t know how often we encounter this stringent requirement —

Soccer world cup final, or a big news about Amazon … millions of users posts comments on a web page and all comments need to be persisted and shown on some screen.

Rahul and I discussed some simple design. At the center is a single central data store.

  • In this logical view, all the comments are available at one place to support queries by region, rating, keyword etc.
  • In the physical implementation, could use multiple files or shared-memory or distributed cache.

Since the comments come in a burst, this data store becomes the bottleneck. Rahul said there are two unrelated responsibilities on the data store updaters. (A cluster of updaters might be possible.)

  1. immediately broadcast each comment to multiple front-end read-servers
  2. send an async request to some other machine that can store the data records. Alternatively, wait to collect enough records and write to the data store in a batch

Each read-server has a huge cache holding all the comments. The server receives the broadcast and updates its cache, and uses this cache to service client requests.

always separate write^read traffic

Rahul pointed out my “simplistic” thinking. Now I feel there’s no good reason to create a web server to handle both read and write requests.

A Read server has a sizable data cache to service client requests. This cache gets updated ….

A Write server (“writer”) has no such data cache, but it might have an incoming request queue + a downstream queue.

The incoming queue introduces delay, but users who send updates often understand that writes take longer than read.

The downstream queue is relatively new to me, so here’s my hypothesis —

Say 100 writers all need to get their records persisted in a central data store. The infrastructure at the central data store is now a bottleneck, so the 100 writers send their records in a queue rather than wait indefinitely. The writers can then handle other incoming requests.

consolidate into single-process: low-latency OMS

Example #2– In a traditional sell-side OMS, an client FIX order propagates through at least 3 machines in a chain —

  1. client order gateway
  2. main OMS engine
  3. exchange gateway such as smart order router or benchmark execution engine supporting VWAP etc

The faster version consolidates all-of-the-above into a single Process, cutting latency from 2ms to 150 micros .. latency in eq OMS

Example #1– In 2009 I read about or heard from interviewers about single-JVM designs to replace multi-stage architecture.

Q: why is this technique not used on west coast or main street ?
%%A: I feel on west coast throughput outweighs latency. So scale-out is the hot favorite. Single-JVM is scale-up.

EnterpriseServiceBus phrasebook #ESB

Mostly based on https://www.it.ucla.edu/news/what-esb

  • enterprise — used in big enterprises, but now out of fashion
  • HTTP/MQ — Http is the more popular protocol than MQ
  • MOM — I think this middleware is a separate process. https://en.wikipedia.org/wiki/Enterprise_service_bus#ESB_as_software says MOM vendors call their products ESB
  • async — no synchronous http call between client and server
  • latency — perhaps not popular for real time trading, which prefers FIX
  • SOA — ESB jargon was created along with SOA
  • jxee — you may need to know this jargon in the jxee job market.
  • churn 😦

prod write access to DB^app server@@

Q: Is production write access more dangerous in DB or app server?
A: I would say app server, since a bad software update can wipe out production data in unnoticeable ways. It could be a small subset of the data and unnoticeable for a few days.

It’s not possible to log all database writes. Such logging would slow down the live system and take up too much disk space. It’s basically seen as unnecessary.

However, tape backup is “protected” from unauthorized writes. It is usually not writable by the app server. There’s a separate process and separate permission to create/delete backup tapes.

async messaging-driven #FIX

A few distinct architectures:

  • architecture based on UDP multicast. Most of the other architectures are based on TCP.
  • architecture based on FIX messaging, modeled after the exchange-bank messaging, using multiple request/response messages to manage one stateful order
  • architecture based on pub-sub topics, much more reliable than multicast
  • architecture based on one-to-one message queue

strategic value of MOM]tech evolution

What’s the long-term value of MOM technology? “Value” to my career and to the /verticals/ I’m following such as finance and internet. JMS, Tibrv (and derivatives) are the two primary MOM technologies for my study.

  • Nowadays JMS (tibrv to a lesser extent) seldom features in job interviews and job specs, but the same can be said about servlet, xml, Apache, java app servers .. I think MOM is falling out of fashion but not a short-lived fad technology. MOM will remain relevant for decades. I saw this longevity deciding to invest my time.
  • Will socket technology follow the trend?
  • [r] Key obstacle to MOM adoption is perceived latency “penalty”. I feel this penalty is really tolerable in most cases.
  • — strengths
  • [r] compares favorably in terms of scalability, efficiency, reliability, platform-neutrality.
  • encourages modular design and sometimes decentralized architecture. Often leads to elegant simplification in my experience.
  • [r] flexible and versatile tool for the architect
  • [rx] There has been extensive lab research and industrial usage to iron out a host of theoretical and practical issues. What we have today in MOM is a well-tuned, time-honored, scalable, highly configurable, versatile, industrial strength solution
  • works in MSA
  • [rx] plays well with other tech
  • [rx] There are commercial and open-source implementations
  • [r] There are enterprise as well as tiny implementations
  • — specific features and capabilities
  • [r] can aid business logic implementation using content filtering (doable in rvd+JMS broker) and routing
  • can implement point-to-point request/response paradigm
  • [r] transaction support
  • can distribute workload as in 95G
  • [r] can operate in-memory or backed by disk
  • can run from firmware
  • can use centralized hub/spoke or peer-to-peer (decentralized)
  • easy to monitor in real time. Tibrv is subject-based, so you can easily run a listener on the same topic
  • [x=comparable to xml]
  • [r=comparable to RDBMS]

##simplicity@design pushed to the limit

Here I collection simple concepts proven rather versatile, resilient, adaptable. Note in these designs, the complexity can never disappear or reduce. Complexity shifts to somewhere else more manageable.

  • [c] stateless — http
  • [!s] microservices –complexity moves out of a big service into the architecture
  • [c] pure functions — without side effects
  • use the database concept in solving algo problems such as the skyline #Gelber
  • stateless static functions in java — my favorite
  • [c !s] garbage collection — as a concept.Complexity shifts from application into the GC codebase
  • REST
  • in c# and c++, all nested classes are static, unlike in java
  • python for-loop interation over a dir, a file, a string … See my blog post
  • [c] immutable — objects in concurrent systems
  • [c] STM i.e. single-threaded mode, without shared mutable #Nsdq
  • [c] pipe — the pipe concept in unix is a classic
  • JSON
  • [c] hash table as basis of OO — py, javascript, perl..
  • [c] sproc (+trigger) — as a simple concept “data storage as guardian of its data, and a facade hiding the internal complexities”
  • [!s] dependency injection
  • [c !s] EDT — swing EDT and WPF
  • [c] RAII
  • smart pointers as a concept
  • singleton implemented as a static local object, #Scott Meyers
  • [c=celebrated, classic, time-honored, ..]
  • [!s = not so simple in implementation]

 

##RDBMS=architect’s favorite

A specific advantage .. stored proc can _greatly_ simplify business logic as the business logic lives with the data …

Even without stored proc, a big join can replace tons of application code implementing non-trivial business logic. Hash table lookup can be implemented in SQL (join or sub-query) with better clarity and instrumentation because

* Much fewer implicit complexities in initialization, concurrency, input validation, null pointers, state validity, invariants, immutabilities, …
* OO and concurrency design patterns are often employed to manage these complexities, but SQL can sidestep these complexities.

Modularity is another advantage. The query logic can be complex and maintained as an independent module (Similarly, on the presentation layer, javascript offers modularity too). Whatever modules dependent on the query logic has an dependency interface that’s well-defined and easy to test, easy to investigate.

In fact testability might be the most high-profile feature of RDBMS.

I think one inflexibility is adding new column. There are probably some workarounds but noSQL is still more flexible.

Another drawback is concurrency though there are various solutions.

stateless (micro)services #%%1st take

in 2018, I have heard more and more sites that push the limits of stateless designs. I think this “stateless” trend is innovative and bold. Like any architecture, these architectures have inherent “problems” and limitations, so you need to keep a lookout and deal with them and adjust your solution.

Stateless means simplicity, sometimes “extreme simplicity” (Trexquant)

stateless means easy to stop, restart, backup or recover

Stateless means lightweight. Easy to “provision”, easy to relocate.

Stateless means easy scale-out? Elastic…

Stateless means easy cluster. Http is an example. If a cluster of identical instances are stateless then no “conversation” needs to be maintained.

p2p messaging beats MOM ] low-latency trading

example — RTS exchange feed dissemination infrastructure uses raw TCP and UDP sockets and no MOM

example — the biggest sell-side equity OMS network uses MOM only for minor things (eg?). No MOM for market data. No MOM carrying FIX order messages. Between OMS nodes on the network, FIX over TCP is used

I read and recorded the same technique in 2009… in this blog

Q: why is this technique not used on west coast or main street ?
%%A: I feel on west coast throughput outweighs latency. MOM enhances throughput.

microservices “MSA” #phrasebook

I feel MSA is more of a architect interview topic, not a developer interview topic. Dev complexity is low by design.

eg: error acct lookup, receiving productId + possibly a clientId, returning an error acct

Now the phrasebook:

  • jxee — As of 2019, I guess jxee has the best support for MSA
  • enterprise — enterprise-bias. Most of the practices used in SOA/MSA come from developers who have created software applications for large enterprise organizations.
  • SOA — is the ancestor and now out of fashion. I think MSA will also fall out of fashion.
  • stateless — stateless microservice is best. Can be highly concurrent and scaled out
  • scalability — hopefully better
  • decentralized — rather than monolithic
  • modularity
  • communication protocol — supposedly lightweight, but more costly than in-process communication
    • http — is commonly used for communication. Presumably not asynchronous
    • messaging — metaphor is often used for communication. I doubt there’s any MOM of message queue.
  • cloud-friendly — cheaper
  • flexible — in the face of changing requirements, though I’m not sure time-to-market will improve
  • simple-facade — (of a big monolithic service) is now replaced by more complex interface, so I suspect this is not always popular.
  • complexity — (various forms) is the public enemy but I don’t know which weapon (REST,SOA,ESB,MOM,Spring) actually works
  • in-process — services can be hosted in a single process, but less common
  • devops — is a driver
    • testability — each service is easy to test, but not integration test
    • loosely coupled — decentralized, autonomous dev teams
    • deployment — is ideally independent for each service, and continuous, but overall system deployment is complicated

blocking scenario ] CPU-bound system

Q: can you describe a blocking scenario in a CPU-bound system?

Think of a few CPU bound systems like

  • database server
  • O(N!) algo
  • MC simulation engine
  • stress testing

I tend to think that a thread submitting a heavy task is usually the same thread that processes the task. (Such a thread doesn’t block!)

However, in a task-queue producer/consumer architecture, the submitter thread enqueues the task and can do other things or return to the thread pool.

A workhorse thread picks up the task from queue and spends hours to complete it.

Now, I present a trivial blocking scenario in a CPU bound system —

  • Any of these threads can briefly block in I/O if it has big data to send. Still, system is CPU-bound.
  • Any of these threads can block on a mutex or condVar

[09]%%design priorities as arch/CTO

Priorities depend on industry, target users and managers’ experience/preference… Here are my Real answers:

A: instrumentation (non-opaque ) — #1 priority to an early-stage developer, not to a CTO.

Intermediate data store (even binary) is great — files; reliable[1] snoop/capture; MOM

[1] seldom reliable, due to the inherent nature — logging/capture, even error messages are easily suppressed.

A: predictability — #2 (I don’t prefer the word “reliability”.) related to instrumentation. I hate opaque surprises and intermittent errors like

  • GMDS green/red LED
  • SSL in Guardian
  • thick, opaque libraries like Spring
  1. Database is rock-solid predictable.
  2. javascript was predictable in my pre-2000 experience
  3. automation Scripts are often more predictable, but advanced python is not.

(bold answers are good interview answers.)
A: separation of concern, encapsulation.
* any team dev need task breakdown. PWM tech department consists of teams supporting their own systems, which talk to each other on an agreed interface.
* Use proc and views to allow data source internal change without breaking data users (RW)
* ftp, mq, web service, ssh calls, emails between departments
* stable interfaces. Each module’s internals are changeable without breaking client code
* in GS, any change in any module must be done along with other modules’ checkout, otherwise that single release may impact other modules unexpectedly.

A: prod support and easy to learn?
* less support => more dev.
* easy to reproduce prod issues in QA
* easy to debug
* audit trail
* easy to recover
* fail-safe
* rerunnable

A: extensible and configurable? It often adds complexity and workload. Probably the #1 priority among managers i know on wall st. It’s all about predicting what features users might add.

How about time-to-market? Without testibility, changes take longer to regression-test? That’s pure theory. In trading systems, there’s seldom automated regression testing.

A: testability. I think Chad also liked this a lot. Automated tests are less important to Wall St than other industries.

* each team’s system to be verifiable to help isolate production issues.
* testable interfaces between components. Each interface is relatively easy to test.

A: performance — always one of the most important factors if our system is ever benchmarked in a competition. Benchmark statistics are circulated to everyone.

A: scalability — often needs to be an early design goal.

A: self-service by users? reduce support workload.
* data accessible (R/W) online to authorized users.

A: show strategic improvement to higher management and users. This is how to gain visibility and promotion.

How about data volume? important to eq/fx market data feed, low latency, Google, facebook … but not to my systems so far.

DB=%% favorite data store due to instrumentation

The noSQL products all provide some GUI/query, but not very good. Piroz had to write a web GUI to show the content of gemfire. Without the GUI it’s very hard to manage anything that’s build on gemfire.

As data stores, even binary files are valuable.

Note snoop/capture is no data-store, but falls in the same category as logging. They are easily suppressed, including critical error messages.

Why is RDBMS my #1 pick? ACID requires every datum to be persistent/durable, therefore viewable from any 3rd-party app, so we aren’t dependent on the writer application.

Y more threads !! help`throughput if I/O bound

To keep things more concrete. You can think of the output interface in the I/O.

The paradox — given an I/O bound busy server, the conventional wisdom says more thread could increase CPU utilization [1]. However, the work queue for CPU gets quickly /drained/, whereas the I/O queue is constantly full, as the I/O subsystem is working at full capacity.

[1] In a CPU bound server, adding 20 threads will likely create 20 idle, starved new threads!

Holy Grail is simultaneous saturation. Suggestion: “steal” a cpu core from this engine and use it for unrelated tasks. Additional threads or processes basically achieve that purpose. In other words, the cpu cores aren’t dedicated to this purpose.

Assumption — adding more I/O hardware is not possible. (Instead, scaling out to more nodes could help.)

If the CPU cores are dedicated, then there’s no way to improve throughput without adding more I/O capacity. At a high level, I clearly see too much CPU /overcapacity/.

dotnet remoting and related jargon

P4 [[.net 1.1 remoting, reflection and threading]] shows a insightful history leading to dotnet remoting —
#1) RPC (pre-OO).
OO movement brought about the Next generation in the form of distributed objects (aka distributed components) —
#2) CORBA, RMI (later ejb) and dcom, which emerged around the same time.
COM is mostly for in-process and dcom is distributed
#3) soap and web services , which are OO-agnostic
I feel soap is more like RPC… The 2 distinct features of soap — xml/http. All predecessors are based on binary protocols (efficient), and the “service component” is often not hosted in any server.
#4) dotnet remoting feels more like RMI to me…According to the book above, remoting can use either
1) http channel with the soap formatter, or
2) tcp channel  with the binary formatter

Therefore, I feel remoting is an umbrella technology with different implementations for different usage scenarios.

#5) WCF
Remoting vs wcf? See other post.

##[12] bottlenecks in a high performance data "flow" #abinitio

Bottlenecks:

#1 probably most common — database, both read and write operations. Therefore, ETL solutions achieve superior throughput by taking data processing out of database. ETL uses DB mostly as dumb storage.

  • write – if a database data-sink capacity is too slow, then entire pipe is limited by its throughput, just like sewage.
    • relevant in mkt data and high frequency trading, where every execution must be recorded
  • read – if you must query a DB to enrich or lookup something, this read can be much slower than other parts of the pipe.

#2 (similarly) flat files. Write tends to be faster than database write. (Read is a completely different story.)
* used in high frequency trading
* used in high volume market data storage — Sigma2 for example. So flat file writing is important in industry.
* IDS uses in-memory database + some kind of flat file write-behind for persistence.

#? Web service

#? The above are IO-bound. In contrast, CPU-bound compute-intensive transform can (and do) also become bottlenecks.

async (almost)always requires buffer and additional complexity

Any time I see asynchronous (swing, MOM etc), i see additional complexity. Synchronous is simpler. Synchronous means blocking, and requires no object beside the caller actor and service actor. The call is confined to a single call stack.

In contrast, async almost always involves 2 call stacks, requires a 3rd object in the form of a buffer [1]. Async means caller/sender can return before responder/callback even gets the message. In that /limbo/, the message must be kept in the buffer. If responder were a doctor then she might be “not accepting new patients“.

Producer/consumer pattern … (details omitted)
Buffer has capacity and can overflow.
Buffer is usually shared by different producer threads.
Buffer can resend.
Buffer can send the messages out of order.

[1] I guess the swing event object must be kept not just on the 2 call stacks, but on the event queue — the buffer

Q: single-threaded can be async?
A: yes the task producer can enqueue to a buffer. The same thread periodically dequeues. I believe swing EDT thread can be producer and consumer of tasks i.e. events. Requirement — each task is short and the thread is not overloaded.

Q: timer callback in single-threaded?
A: yes. Xtap is single-threaded and uses epoll timeout to handle both sockets and timer callbacks. If the thread is busy processing socket buffers it has to ignore timer otherwise socket will get full. Beware of the two “buffers”:

  • NIC hardware buffer is very small, perhaps a few bytes only, processed by hardware interrupt handler, without pid.
  • kernel socket buffer is typically 64-256MB, processed under my parser pid.
    • some of the functions are kernel tcp/udp functions, but running under my parser pid

See which thread/pid drains NIC_buffer}socket_buffer

4 infrastructure features@Millennium

 swing trader station + OMS on the server-side + smart order router over low-latency connectivity layer

* gemfire distributed cache. why not DB? latency too high.
* tibrv is the primary MOM
* between internal systems — FIX based protocol over tibrv, just like Lehman equities. Compare to protobuf object serialization
* there’s more advanced math in risk system; but the highest latency requirements are on the eq front office systems.

y java is dominant in enterprise app

What's so good about OO? Why are the 3 most “relevant” enterprise app dev languages all happen to be OO – java, c# and c++?

Why is google choosing java, c++ and python?

(Though this is not really a typical “enterprise app”) Why is apple choosing to promote a compiled OO language — objective C?

Why is microsoft choosing to promote a compiled OO language more vigorously than VB.net?

But why is facebook (and yahoo?) choosing php?

Before c++ came along, most enterprise apps were developed in c, cobol, fortran…. Experience in the field show that c++ and java require more learning but do offer real benefits. I guess it all boils down to the 3 base OO features of encapsulation, inheritance and polymorphism.

enterprise reporting with^without cache #%%xp

YH,

(A personal blog) We discussed enterprise reporting on a database with millions of new records added each day. Some reflections…

One of my tables had about 10G data and more than 50 million rows. (100 million is kind of minimum to qualify as a large table.) This is the base table for most of our important online reports. Every user hits this table (or its derivative summary tables) one way or another. We used more than 10 special summary tables and Business Objects and the performance was good enough.

With the aid of a summary table, users can modify specific rows in main table. You can easily join. You can update main table using complex SELECT. The most complex reporting logic can often be implemented by pure SQL (joins, case, grouping…) without java. None of these is available to gigaspace or hibernate users. These tools simply get in the way of my queries, esp. when I do something fancy.

In all the production support (RTB) teams I have seen on wall street, investigating and updating DB is the most useful technique at firefighting time. If the reporting system is based on tables without cache, prod support will feel more comfortable. Better control, better visibility. The fastest cars never use automatic gear.

Really need to limit disk I/O? Then throw enough memory in the DB.

10 (random) arch features of HFT

When talking to low-latency shops, i realize the focus shifts from pricing, trade booking, position mgmt … to market data, message formatting and sockets – rather low-level stuff. A high-frequency trading engine has many special features at architectural and impl levels, but here i will focus on some important architectural features that make a difference. By the way, my current system happens to show many of these features.

1) message-driven, often using RV or derivatives. Most trading signals come in as market data, tick data, benchmark shifts, position adjustments (by other traders of own own bank). Among these, I feel market data poses the biggest challenge from the latency perspective.
2) huge (reluctantly distributed – see other post) cache to minimize database access
) judicious use of async and sync IPC, if one-big-machine is undesirable.
3) optimized socket layer, often in C rather than c++. No object-orientation needed here:)
) server collocation
) large number of small orders to enable fine-grained timing/cancel and avoid disrupting market
) market data gateway instantiates a large number of small objects
) smart order router, since an order can often execute on multiple liquidity venues

Beyond the key features, I guess there’s often a requirement to immediately change a parameter in the runtime rather than updating a database and waiting for the change to be noticed by the runtime. I feel messaging is one option, and RMI/JMX is another.

DB as audit trail for distributed cache and MOM

MOM and distributed cache are popular in trading apps. Developers tend to shy away from DB due to latency. However, for rapid development, relational DB offers excellent debugging, tracing, and correlation capabilities in a context of event-driven, callback-driven, concurrent processing. When things fail mysteriously and intermittently, logging is the key, but u often have multiple log files. You can query the cache but much less easily than DB.

Important events can be logged in DB tables and

* joined (#1 most powerful)
* sorted,
* searched in complex ways
* indexed
* log data-mining. We can discover baselines, trends and anti-trends.
* Log files are usually archived (less accessible) and then removed, but DB data are usually more permanent. Don't ask me why:)
* selectively delete log events, easily, quickly.

* Data can be transformed.
* accessible by web service
* concurrent access
* extracted into another, more usable table.
* More powerful than XML.

Perhaps the biggest logistical advantage of DB is easy availability. Most applications can access the DB.

Adding db-logging requires careful design. When time to market is priority, I feel the debug capability of DB can be a justification for the effort.

A GS senior manager preferred logging in DB. Pershing developers generally prefer searching the same data in DB rather than file.

gemfire write-behind and gateway queue #conflation, batched update

http://community.gemstone.com/display/gemfire60/Database+write-behind+and+read-through says (simplified by me) —
In the Write-Behind mode, updates are asynchronously written to DB. GemFire uses Gateway Queue. Batched DB writes. A bit like a buffered file writer.

With the asynch gateway, low-latency apps can run unimpeded. See blog on offloading non-essentials asynchronously.

GemFire’s best known use of Gateway Queue technology is for the distribution/propagation of cache update events between clusters separated by a WAN (thus they are referred to as ‘WAN Gateways’).

However, Gateways are designed to solve a more fundamental integration problem shared by both disk and network IO — 1) disk-based databases and 2) remote clusters across a WAN. This problem is the impedance mismatch when update rates exceed absorption capability of downstream. For remote WAN clusters the impedance mismatch is network latency–a 1 millisecond synchronously replicated update on the LAN can’t possibly be replicated over a WAN in the same way. Similarly, an in-memory replicated datastore such as GemFire with sustained high-volume update rates provides a far greater transaction throughput than a disk-based database. However, the DB actually has enough absorption capacity if we batch the updates.

Application is insulated from DB failures as the gateway queues are highly available by default and can be configured to allow zero data loss.

Reduce database load by enabling conflation — Multiple updates of the same key can be conflated and only the final entry (containing all updates combined) written to the database.

Each Gateway queue is maintained on at least 2 nodes, internally arranged in a primary + (one or multiple) secondary configuration.

Spring can add unwanted (unnecessary) complexity

[5] T org.springframework.jms.core.JmsTemplate.execute(SessionCallback action, boolean startConnection) throws JmsException
Execute the action specified by the given action object within a JMS Session. Generalized version of execute(SessionCallback), allowing the JMS Connection to be __started__ on the fly, magically.
——–
Recently i had some difficulties understanding how jms works in my project. ActiveMQ hides some sophisticated stuff behind a simplified “facade”. Spring tries to simplify things further by providing a supposedly elegant and even simpler facade (JmsTemplate etc), so developers don’t need to deal with the JMS api[4]. As usual, spring hides some really sophisticated stuff behind that facade.

Now i have come to the view that such a setup adds to the learning curve rather than shortening it. Quickest learning curve is found in a JMS project using nothing but standard JMS api. This is seldom a good idea overall, but it surely reduces learning curve.

[4] I don’t really know how complicated or dirty it is to use standard JMS api directly!

In order to be proficient and become a problem solver, a new guy joining my team probably need to learn both the spring stuff and the JMS stuff [1]. When things don’t behave as expected[2], perhaps showing unexpected delays and slightly out-of-sync threads, you don’t know if it’s some logic in spring’s implementation, or our spring config, or incorrect usage of JMS or a poor understanding of ActiveMQ. As an analogy, when an alcoholic-myopic-diabetic-cancer patient complains of dizziness, you don’t know the cause.

If you are like me, you would investigate _both_ ActiveMQ and Spring. Then it becomes clear that Spring adds complexity, not reduces complexity. This is perhaps one reason some architects decide to create their own frameworks, so they have full control and don’t need to understand a complex framework created by others.

Here’s another analogy. If a grandpa (like my dad) wants to rely on email everyday, then he must be prepared to “own” a computer with all the complexities. I told my dad a computer is nothing comparable to a cell phone, television, or camera as a fool-proof machine.

[1] for example, how does the broker thread start, at what time, and triggered by what[5]? Which thread runs onMessage(), and at what point during the start-up? When and how are listeners registered? What objects are involved?

[2] even though basic functionality is there and system is usable

web services j4, features — briefly

web service is an old technology (RPC) given a new lease of life 10 years ago.

* [#1 selling point] cross platform for eg between dotnet frontend and java backend
* loosely coupled
* good for external partner integration. Must be up all the time.
* beats MOM when immediate response is required.
* web service (soap) over MOM? should be feasible. One listener thread for the entire client system — efficiency

database access control solutions for enterprise

solution: deny access to command line tools like sql+ or sqsh, deny access via standard windows clients like toad, aqua studio. Require every interactive user to access via a browser.

solution: custom api for java clients. block access via jdbc.

Solution: views to limit access to subset of rows, subset of columns or derived columns. deny access to underlying tables.

Solution: sproc and function to limit access. deny direct access to underlying tables. I think this is the most flexible.

rule compilation in java #basic learning notes

a rule-compiler compiles rule-source-code into executable-rules. You deploy excutable-rules just as any other java class, to JVM.

rule-compiler contains a rule-parser.

executable-rule files have an encoding format to hold the condition/action.

rule-compilation is an initialization overhead to minimize, perhaps by caching.

Q: How does this compilation affect the edit-cycle?

rule-engine vs rules

If you remember one knowledge pearl about business rule engines, i think you may want to remember the relationship between rule engine and rules. I think these are the 2 main entities to *deploy* to an ent app.

Verizon’s circuit fault-isolator is a typical enterprise application using JRules. Think of your ent app as a host-app (or user or caller) of the rule stuff. There are quite a lot of rule stuff to *deploy* and you will soon realize the 2 main thingies are the (A) generic rule-engine and (B) your rules.

– The rule-engine is written by ILOG (or JBoss or whoever) but the rules are written by you.
– Rule Engine is a standard, generic component but the rules are specific to your business.
– The rule-engine is first and foremost the interpreter of your rules

* a good analogy is found in XSL transformer vs xsl stylesheet. Your host application need to load both of them into memory
* A similar relationship exists between spring the framework and the spring-beans you create.

tiers ] ports^1-jvm

Refer to the overview post on ports^1-jvm.

Suffering from the same abuse-of-terminology as “server”, the word “tier” can now refer to not only separate-jvm but also to 1-jvm AR. Consider

“data tier” —
“dao tier” — basically the same as “dao modules”, as a layer of abstraction
“orm tier”
“object tier”
“presentation tier” — jsp, struts views, usually within 1-jvm

server/client mean — ports^1-jvm

a “server” is traditionally a separate jvm or unix process but nowadays occasionally can refer to a method (or set of methods) your “client objects” can call within a container.

In the same vein, a “client” used to mean a number — a process id (or thread), often on a different host, but now in Java literature it often means a “caller of a method”. A caller of a method may or may not be an object in memory, but always refers to some “calling context” to be altered by the method.

Every OO students would eventually come to realize when to say “client” and what that implies.

The j2ee community seem to pay little attention to the question “same or separate jvm?”

A “service” is often provided by the container.

The “server” usually provides some utility service like water and electricity.

Examples?

Perhaps not a good example. jndi is obviously a container service. It can be filesystem-based.

ports^1-jvm to decouple web tier

2 common architectures to decouple any object-oriented web (or non-web) system. Master them. Don’t try to add a 3rd architecture to overload your memory.

— A) ports ie tcp ports. You Separate a chunk of java code into another unix-process with a port. You end up with a “tier” in a multi-tier AR. Beware “tier” can now refer to single-jvm modules too, in an abuse of terminology.

connection pools @@

Examples: EJB, ActiveMQ, CPF Single-sign-on-server, crystal-report-server, web services

— B) single-jvm solution. Instantiate intelligent components from an off-the-shelf or /3rd-party/ (3p) jar
thread issues
In a web tier, the 3p objects could be too big to re-create ==> put in session

Examples: struts, spring, hibernate, nanoXML, log4j, shopping cart

Example: the M in MVC could be a 3rd-party module (shopp`cart) or even a legacy ERP, but almost always there are some M-classes within the MVC jvm.

capacity management, a Unix /perspective/

aim for simultaneous saturation and eliminate bottleneck? More for perf tuning than cap management

“Capacity” is largely (don’t /sweat/ it) about “resources”.

— can’t add resource?
identify critical resources (bandwidth, simultaneous oracle conn, disk throuput..)
collect usage pattern esp. peak usage for each resource
increase effi 4 each resource ie reduce wastage
Identify — most of the time, perf is cpu-bound, mem-bound, disk io-bound, network io-bound … Same for a Weblogic server

— can add resources?
follow the same suggestions above
do cap plann`
do load forecast

perf techniques in T J W’s project–ws,mq,tx

Q: request wait-queuing (toilet queue)? I know weblogic can configure the toilet queue
A: keep the queue entries small. we only keep object id while the objects are serialized to disk (?!)

Q: is 1kB too large?
A: no

q: most common cause of perf issue?
A: mem leak. still present after regression test

q: jvm tuning?
A: yes important, esp mem related

q: regression test?
a: important

q: perf tools?
a: no tools. primarily based on logs. eg. track a long-running
transaction and compute the duration between soap transaction start
and end.

Q: web services?
A: Many of the transactions are based on soap, axis. TCP monitor
(http://ws.apache.org/axis/java/user-guide.html#AppendixUsingTheAxisTCPMonitorTcpmon)
can help with your perf investigation.

Q: tx?
A: yes we use two phase commits. Too many transactions involved.
really complex biz logic. Solution is async.

Q: multi-threaded?
A: handled by weblogic.

Q: how is the async and queue implemented?
A: weblogic-mq with persistent store, crash-proof

runtime change to object behaviour

[[ head first design patterns ]] repeatedly favors *runtime* change to program functionality, rather than compile-time ie source code change. I assume they have a *practical* reason instead of a doctrine.

Related concepts: Strategy pattern, Decorator pattern,

When we need to change from an old functionality to a new functionality, a good approach is
* we try to create a new functionality class, if at all possible,
* at runtime, use existing setters to assign the new functionality, replacing the old, when needed.
* minimize edits to existing, tested classes

See also post on [[ create functionality without jvm restart]strategy ]]

I think this probably incurs least-impact to existing, tested functionalities.
=> regression test@@ no need
=> Low stress for fellow developers, managers, clients, internal users and any non-technies.
=> no need to worry “Did we miss any other existing classes that need edit?”
( documentation on interdependencies is crucial but often neglected by developers. )

batch feature wishlist

[x = lesser-known but fairly regular requirement in my experience]
A “record” means one of a (potentially large) number of input data to be processed

* [x] step-by-step manual confirmation, each with a single keystroke. Just like rm -i
* skip certain steps
* reshuffle some steps — arguably tapping on one of the strengths of interpreted languages.
* [x] re-run a certain step only
* share codebase with other on-going projects, to avoid forking and ease maintenance
* persistent xml config + command-line config
* be nice (Unix terminology) to other processes. Batch jobs can quickly eat up shared resources.

— infrastructure support needed, because standard batch languages can’t
* self-profiling and benchmarking on the batch application, to record time/mem/DB/bandwidth… usage for performance analysis
* scheduled retry or manual retry
* “easy” multi-threading (with data sharing) to exploit multi-threaded processors like our T2000’s 32 kernel threads. Multi-threading is non-trivial, esp. with data sharing. Many batch developers won’t have the time/expertise to create it or test it. Infrastructure support could lower the barrier and bring multi-threading to the “masses”

Re: NextGen server mean time to failure@@

(A draft email) Hi,

Thanks for your quick reply. Sorry I’m unable to give any suggestion. Just some nagging worries. I’m trying to be critical yet objective.

My experience suggests that many java-based daemons are fairly susceptible to degradation with a concurrent load level high enough. Similar to denial-of-service attacks.

I’m not easily convinced that any piece of software (including my favorite — apache httpd) can keep up performance without restart for a few months under heavy load. For example, over 20 years solaris went through continuous improvements in terms of self-healing, daemon/service availability — a clear sign that the system can sustain “injuries” and lose performance. If it can happen to OS, what is immune?

I remember Siva told me the FTTP workload could be quite high and it’s not easy to handle that load. I think he said a few thousand cases a day. Will keep us busy:)

tan bin

transparency ] j2ee AR

warning: “transparent” has 2 unrelated meanings in java.

[[better, lighter faster java]]

Key concept: coupling. The tighter, the less transparent
Key concept: put “peripherals SERVICES” out of the DOMAIN MODEL
– persistence service
– messaging service
– tx service
– sec service,
– serialization service
– printing service
– email service

For example, Tight coupling between a serialization service and the domain model means “the service is CUSTOMIZED for this biz”. Changes to domain
model requires changes to the service.

For example, Look at persistence. a transparent persistent SERVICE
persists any, yes any, object.

For example, look at serialization SERVICE, which serializes any, yes
any, object

Most imp technique –> reflection <–

Q: what u already understand (LJ) transparency@@
A: see-through. readable logic. an extra layer or functionality should not
impede overall AR readability

declarative control — enterprise design pattern

justification: reduce source change, which can introduce bugs

justification: reduce test effort

justification: Maintain users’, bosses’ and colleagues’ confidence. Confidence that the source didn’t change, so existing functionalities aren’t affected.

justification: slightly Better man-day estimate compared to hacking source code

justification: Adaptble, Flexible

justification: more readable than source code

justification: something of a high-level documentation, well-structured

Examples: DD, spring config file, struts config file, hibernate config file.

I feel this is a habit (unit test is another habit). Initially it’s not easy to apply this idea. A lot of times you feel “not applicable here”, only to witness others applying it here. Easier to justify for component-based, inter-dependent modules. Other projects may find declarative control an overkill, and may opt for a properties file.

Q: Alternative to declarative?
A: The information must move into some place, usually source code. How about a properties file?

Q: is this a design pattern?
A: Purists to avoid the term. For OO and non-OO

biz rules ] DB

What are Business rules? They are set by the business. These guys have written rules. Don’t ask me exactly what qualify and what don’t qualify as business rules. Business rules can be implemented in java, javascript or batch.

Many business rules are best implemented inside the DB. Reason? The concept of biz rule is popularized and heavily influenced by DB industry, vendors and practitioners. Most things that /pass as/ biz rules are defined in terms of DB records (real world objects represented by records). As a result, these biz rules can be and often are best described, saved, encoded in a DB format.

Below are just a few buzzwords, not meant to be an orthogonal, mutually exclusive list of things.

– unique constraint — eg: member id must be unique
– not-null — eg: “We can’t leave this field blank”
– RI — May not be a rule set by business, but closely related to other business rules. eg: “When this salesperson resigns, all her customers must be assigned a replacement salesperson.”
– check constraint — Can be complex. I think (??? confirmed) they should be applied at modification time.
– triggers — can implement RI, check constraints,
– – > input-validation trigger is an important, well-defined type of
– derived data — insert or update “derived data” via triggers, to let java classes select them without “deriving”. The derivation formula contain business rules.
– authorization and access control via views and stored-procs. May not qualify as business rules.
– stored proc — most flexible. Can implement the most complex rules set by business, involving multiple objects.
– – > multi-table correlated modification via stored programs
– cascade delete
– views — can contain business rules in the view’s definition query. eg: “These class of users can only read/modify this subset of data — not those protected columns or irrelevant rows. They should always see the details of each purchase — by a table join.”

An architect should learn this list of techniques. Move business rules from java classes into DB whenever possible, to reduce the complexity of java classes. A large system usually has 60-90% of the business logic implemented in application source code (like java). That’s too much to manage. It’s good to move some to javascript or DB.

[[ pl/sql for dummies ]] advocates putting most “business logic” in DB rather than java. The most complex business logic would need big guns like
* procedures
* functions
* triggers
* complex views, perhaps containing functions in their definitions and have instead-of triggers defined on them.