c#delegate instance HAS-A inv list@@

Everything important to a delegate instance is in its inv list, but technically, is there something else in a delegate instance?

Compiler provide no way to modify an inv list. Immutability is guaranteed by the compiler. I feel immutability simplifies physical layout. I feel a simple implementation would be a thin wrapper over the inv list. The simplest implementation could be nothing but the inv list. A delegate instance is an object on heap. The inv list is also an object on heap.

Note Delegate.Remove() is a global static method which automatically returns null when it would otherwise produce an empty inv list. Nature abhors a vacuum; c# abhors an empty inv list.

More about the immutability – see other posts.

Is a pure-play investment bank really a bank@@

Many people wonder why the word “bank” in “investment-bank”. I mean, do those Goldmans and Morgans have anything in common with the main street banks, such as the savings banks or commercial banks? Well, I see  some non-trivial similarities.

Similarity – service provider and facilitator. Like a commercial bank, an investment bank is supposed to facilitate clients’ financial strategies. This is obvious in the IB business model (below). It’s less clear in the SS model (below). In an ideal world, an investment bank in its SS role should not do the buy-side type of business (prop-trading) and compete with clients. This ideal world doesn’t exist, but most investment banks do look like sell-side service providers.

Similarity – lending. Like all banks, an investment bank lends money all the time, and it also borrows money from investors (“depositors”) and central banks.

The IB business model described below has so much in common with commercial banking that most of the major investment banks today are part of commercial banks. This model is known as universal banking, adopted by Citi, Barclays, JMPC, UBS/CS, HSBC/SCB/RBS, DB, BNP/SG, RBC etc. If we focus on the investment banking business on its own, there are basically 2 main business models —

IB — The traditional meaning of IB is related to “funding” and “financing” for a big client’s big project, such as a merger or privatization but more commonly bond/equity issue, including public issues a.k.a. IPOs. These are often dressed up, packaged as “advisory business”, but what clients need most is financing, as illustrated in the Napoleonic wars. In such a funding project, the IB does something similar to a regular bank – collect funds from a large number of investors and lend to that particular client. However, the risks, expertise, techniques, operations, competitive strategies … tend to be different from regular commercial banking.

SS — The other major IB business model is playing the sell-side on security markets. This is not just passive order-taking. Many players are also in the business to create structured products. They have an advisory team to actively engage prospective clients and customize their products for each client. See other posts in this blog.

All other IB business models are lesser known but could sometimes generate more profit than the 2 main models
– asset management — buy-side business model
– prime brokerage
– security lending
– clearance
– investment research

managed ^ unmanaged (dotnet)- what to Manage@@

A popular IV question — exactly what runtime Services are provided by the dotnet Hosting runtime/VM to the Hosted/managed code? In other words, how is the “managed code” managed?

Re http://bigblog.tanbin.com/2011/09/what-is-kernel-space-vs-userland.html, I believe an executing thread often has it’s lowest level stack frames in kernel mode, middle frames in the VM, and top frames running end-user application code. The managed code is like a lawyer working out of a hotel room. The hotel provides her many business services. Now, host environments have always provided essential runtime services to hosted applications, since the very early days of computers. The ubiquitous runtime host environment is the OS. In fact, the standard name of an OS instance is a “hostname”. If you have 3 operating systems running on the same box sharing the CPU/RAM/disk/network then there are 3 hosts — i.e. 3 distinct Hosting-Environments. In the same tradition, the dotnet/java VM also provides some runtime services to hosted applications. Hotel needs “metadata” about the data types, so a dotnet assembly always include type metadata in additional to IL code.

Below is a dotnet-centric answer to the IV question. (JVM? probably similar.) For each, we could drill down if needed (usually unneeded in enterprise apps).

– (uncaught) exception handling. See [[Illustrated c#]] (but how different from unmanaged c++?)
– class loading. See [[Illustrated c#]]
– security
– thread management? But I believe unmanaged code can also get userland threads manufactured by the  (unmanaged) thread library.
– reflection.See [[Illustrated c#]]
– instrumentation? Remember jconsole
– easier debugging – no PhD required. Unmanaged code offers “limited” debugging??
– cross-language integration?
– memory management?
** garbage collection. This service is so prominent/important it’s often listed on its own, but I consider this part of mem  mgmt.
** memory request/allocation? A ansi-C app uses memory management library to grab memory wholesale from kernel (see other posts) and a VM probably takes over that task.
** translate a C# heapy object reference to a “real virtual” heap address. Both JVM and .Net collectors have to move (non-pinned[1]) objects from one real-virtual address to another real-virthal address. Note the OS (the so-called Paging supervisor) translates this real-virtual address into physical RAM address.** appDomain. VM isolates each appdomain from other appdomains within a single Process and prevents memory cross-access. See http://msdn.microsoft.com/en-us/library/ms173138(v=vs.80).aspx

[1] pinned objects are not relocatable.

jar ^ c# DLL, briefly

In java, namespace tree (not the inheritance “family” tree), physical directory tree and accessibility are all based on the same tree.

C# decouples them. The namespace tree has no physical manifestation.

The physical organization of files is based on assembly, which is unrelated to namespace.

For a third party library, java would use a jar. C# would use a DLL, which is an assembly. Inside the jar there’s a namespace tree known as a package. An assembly isn’t required to use a unique namespace.

c# regex-match backslashes in strings

My suggestion — First find a “safe” character that’s guaranteed not to show up in the original string, like “_”. Replace all back slashes. Then proceed.

Problem with backslashes is the unnecessary complications. Here I want to match “one-or-more backslashes”. In the end I need to put 4 bachslashes in the pattern to represent that “one”.

var ret = Regex.Replace(@”any number of\\\backslashes”, “(.+\\\\+)?(.+)”, “$1 – $2”);

Alternatively, I could use @ to reduce the complexity @”(.+\\+)?(.+)”

Disappointingly the @ does a partial job. We still need 2 strokes — Confusing! I’d rather just remember one simple rule and avoid the @ altogether

%%jargon – Consumer coder, Consumer class

When we write a utility, an API, or a data class to be used by other programmers or as components (or “services” or “dependencies”) in other modules, we often strain to find an unambiguous and distinct term that refers to “the other side” whom we are working to serve. The common choices of words are all ambiguous due to overload —
“Client” can mean client-server.
“User” can mean business user.
“App developer”? me or “the other side” are both app developers

My Suggestions —

How about “downstream coder”, or “downstream classes”, or “downstream app” ?

How about “upper-layer coder”, “upper-layer classes”, “upper-layer app”, “upper-layer modules”
How about “upper-level coder”, “upper-level classes”, “upper-level app”, “upper-level modules”
How about “Consumer coder”, “Consumer class”, or “Consumer app”?

##some python constructs to understand

These are the features I feel likely to turn up in production source code or interviews, so you need to at least recognize what they mean but need not know how to use exactly. (A subset of these you need to Master for writing code but let’s not worry about it.)

List of operators, keywords and expressions are important for this purpose

Most built-in Methods are self-explanatory.

##coding guru tricks (tools) learnt across WallSt teams

(Blogging. No need to reply.)

Each time I join a dev team, I tend to meet some “gurus” who show me a trick. If I am in a team for 6 months without learning something cool, that would be a low-calibre team. After Goldman Sachs, i don’t remember a sybase developer who showed me a cool sybase SQL trick (or any generic SQL trick). That’s because my GS colleagues were too strong in SQL.

After I learn something important about an IDE, in the next team again I become a newbie to the IDE since this team uses other (supposedly “common”) features.

eg: remote debugging
eg: hot swap
eg: generate proxy from a web service
eg: attach debugger to running process
eg: visual studio property sheets
eg: MSBuild

I feel this happens to a lesser extent with a programming language. My last team uses some c++ features and next c++ team uses a new set of features? Yes but not so bad.

Confucius said “Among any 3 people walking by, one of them could be teacher for me“. That’s what I mean by guru.

Eg: a Barcap colleague showed me how to make a simple fixed-size cache with FIFO eviction-policy, based on a java LinkedHashMap.
Eg: a guy showed me a basic C# closure in action. Very cool.
Eg: a Taiwanese colleague showed me how to make a simple home-grown thread pool.
Eg: in Citi, i was lucky enough to have a lot of spring veterans in my project. They seem to know 5 times more spring than I do.
Eg: a sister team in GS had a big, powerful and feature-rich OO design. I didn’t know the details but one thing I learnt was — the entire OO thing has a single base class
Eg: GS guys taught me remote debugging and hot replacement of a single class
Eg: a guy showed me how to configure windows debugger to kick-in whenever any OS process dies an abnormal death.
Eg: GS/Citi guys showed me how to use spring to connect jconsole to the JVM management interface and change object state through this backdoor.
Eg: a lot of tricks to investigate something that’s supposed to work
Eg: a c# guy showed me how to consolidate a service host project and a console host project into a single project.
Eg: a c# guy showed me new() in generic type parameter constraints

These tricks can sometimes fundamentally change a design (of a class, a module or sub-module)

Length of experience doesn’t always bring a bag of tricks. It’s ironic that some team could be using, say, java for 10 years without knowing hot code replacement, so these guys had to restart a java daemon after a tiny code change.

Q: do you know anyone who knows how to safely use Thread.java stop(), resume(), suspend()?
Q: do you know anyone who knows how to read query plan and predict physical/logical io statistic spit out by a database optimizer?

So how do people discover these tricks? Either learn from another guru or by reading. Then try it out, iron out everything and make the damn code work.

back testing a VaR process, a few points

–Based on http://www.jpmorgan.com/tss/General/Back_Testing_Value-at-Risk/1159398587967

Let me first define my terminology. If your VaR “window” is 1 week, that means you run it on Day 1 to forecast the potential loss from Day1 to Day7. You can run such a test once a day, or once in 2 days etc — up to you.

The VaR as a big, complicated process is supposed to be a watchdog over the traders and their portfolios, but how reliable is this watchdog? VaR is a big system and big Process involving multiple departments, hundreds of software modules, virtually the entire universe of derivatives and other securities + pricing models for each asset class. Most of these have inherent inaccuracies and unreliability. The most visible inaccuracy is in the models (including realized volatilities).

VaR is a “policeman”, but who will police the policeman? Regular Back test is needed to keep the policeman honest — keep VaR realistic and consistent with market data. Otherwise VaR can become a white elephant and an emporer’s new dress.

[13]some modules of algo trading system #YH

A few modules I have heard of

* MM — mkt datafeed reader
* OMS — order book …
* order submission gateways and smart order routers.
* PP — pretrade pricer, not needed in stock or FX
* BB — trade booking engine, when an execution comes back successful. BB is the main module affecting the position master DB
* PNL — real time pnl engine
* RR — real time risk?

I guess BB, PP or PNL might be absorbed into the OMS module. In some places, OMS might be the name for any real time component of the entire system. There’s a narrower definition of OMS though, so I will just refer to that definition as the EMS (execution management system). For a micro hedge fund,

PP EMS RR may not exist, and the rest (MM BB PNL) can be manual. In contrast algo shops must automate all steps.

–Where does MOM fit in? It’s an enabling technology, usable in many of the functional modules above.

–How does dynamic data fabric fit in? Also known as data grid or in-memory DB. There are 2 types
* generic technology, often integrating SQL, MOM technologies
* functional module with complex business logic, such as CEP engines, often tightly integrated with MM OMS PP

These two descriptions are not mutually exclusive. Many data grid systems include features of both. However, for simplicity, in this write-up I treat data grid as the generic technology just like a DB or MOM.

–How do reference data fit in? I guess it’s just another separate service to interface with database, which in turn provide reference data to other modules when needed?

Ref data tend to be fairly static — update frequency would be once a few hours at most, right? It is essentially a readonly component to the algo engine modules. Readonly means those modules don’t update ref data. It’s one-way dependency. In contrast, MM is also readonly, but more dynamic. MM is the driver in an event-driven algo engine.

I feel ref data read frequency can be high, but update frequency is low. Actually, i feel the occassional update can be a performance issue. Those dependent modules can’t cache ref data if ref data can change mid-day. There are techniques to address this issue. A fast engine must minimize DB and network access, so if ref data is provided on the network, then every read would be costly.

Some moules are simpler, like PNL and BB so no big deal. Pricing lib is used for PP and perhaps RR, which are quantatitively complex. MM EMS are technically challenging.

The “algo” tends to be in the MM PP EMS and RR modules

A High frequency shop also needs to assess market impact. Not sure where it fits in, perhaps the EMS

Where does STP fit in? I feel STP is largely non-realtime.

gold standard#3:devalue,print-money,export..

Every functioning economy needs to Print money, basically creating more paks out of thin air, assuming your currency is the “pak”. It becomes a problem only when we print too much. But too much compared to …. To the amount of export earning and your “gold” reserve.

Initially, each pak is backed by a microgram of gold. Assume our country has no gold mine, so when you print 5% more packs, you need to “earn” 5% more gold by exporting. If you earn less, and print more, then it’s no longer possible for every pak in circulation to be backed by a microgram of gold. The pak would devalue against gold.

According to P359 CFA Econ textbook, Each pak paper note or coin is officially and legally a claim on the “gold” (i.e. the foreign reserve of gold and hard currencies) of the pak issuer i.e. the central bank. No one else can print paper pak notes. Initially, a pak holder has the right to convert his pak into gold. Such paper money is called convertible paper money.

I think SGD is backed by  a basket of hard currencies – the modern-day equivalent of gold. Every SGD ever “printed” is (not strictly) backed by some amount of “gold” earned from export. Singapore government doesn’t print SGD to solve her own debt problems. Singapore government probably doesn’t need to, thanks to good revenue. Tax revenue is a major component but tax is relatively low in Singapore. I guess revenue also comes from (partially) state-owned companies[1], land sales and sovereign fund investment returns.

[1] I don’t totally agree, but many Singapore residents say that hospitals, residential parking lots, telephone, water, power, gas supplies are all run by state-owned but privatized companies that are profit driven. Some of the largest property developers and largest banks are partially state-owned too.

In the case of Fed reserve bank, the “gold” asset you can claim on consists of
1) real gold
2) basket of hard currencies
3) US gov bonds — biggest component

The fed prints paper money according to the total quantity of these “gold” assets. If too much paper money printed, then the dollar devalues against 1 and 2.

Bear in mind anyone can hold  US gov bonds, but Fed holds so much of it that it is the basis of most of the USD paper money in circulation. Suppose the total USD in circulation is 900 tn. When US government issues 100 tn of bonds, Fed’s reserve would increase by that amount and Fed can print that much USD. However, is the additional quantity of USD ultimately backed by gold? No. I think that’s why USD would weaken against gold.

Once again, I feel gold standard is simplifying assumption. We just need to know when it stops being simplifying and becomes simplistic.

size of Object.java, and how does it add up@@

Typically, the per-object overhead is 8 bytes on 32-bit, and 12-byte on a 64-bit machine, as shown on https://stackoverflow.com/questions/44468639/memory-alignment-of-java-classes, but sometimes rounded to 16 bytes.

(Hypotheses and Personal opinion only. I’m no authority.)

Minimum info to be stored in an Object.java instance —

– (4-byte) wait-set as a hidden field — to hold the waiting threads. Expandable set, so perhaps a WS_pointer to a linked list
** It’s probably inefficient to try to “save” memory by building a static VM-wide lookup table of {instance-address, wait-set}. As this table grows, such hashtable lookup is going to impact the most time-critical operations in JVM. Not having this lookup means minimum overhead locating the wait-set.
** this particular WS_pointer starts out as null pointer
** Note c# value types don’t have wait-set. c# Reference types do. The sync-block takes 4-bytes
** why linked list? Well, Arrays can’t grow. Vector involves re-allocation.

– How about the data structure holding the set of threads blocked in lock() or synchronized keyword? A mutex “contender set” associated with each Object.java instance? Apparently there’s no such thing mentioned in popular literature. Is it possible to put these contenders in the wait-set but use a flag to distinguish?

– (4-byte) vptr as a hidden field — If you try to be clever and put this ptr as the first element of the wait-set, then every Object instance still must occupy 64bits == WS_pointer + 1st element in the wait set. Therefore it’s faster to store the vptr directly in the Object instance. Note the vtable also holds the runtime type info, just as in c++ RTTI. Vtable can even hold a pointer to the “class” object.

– ? pointer to the class object as a hidden (static) field — representing the Runtime type of the object.
** This can be stored in the per-class vtbl, as C++ does. This hidden field occupies 32 bits per Class, not per instance.
** I believe myObject.getClass() would use this pointer.
** See my blog post on type info stored inside instances (http://bigblog.tanbin.com/2012/02/type-info-stored-inside-instances-c-c.html).

– hashcode — Once generated, hashcode must not mutate (due to garbage collection relocation), until object state changes.
** I feel this can live on the wait-set, since most Object instances don’t get a hashcode generated.
** Why not use another hidden field? Well, that would require 32 bits dedicated real estate per object, even if an object needs no hashcode in its lifetime.
** Note python hashcode never mutates. See separate blog post.

?? this-pointer as a hidden field — to hold “my” own address. Remember this.m(…) is compiled to m(this, …) to let m() access fields of the host object.
** However, I believe this-pointer is possibly added (as a hidden field) only when you subclass Object.java and introduce a field, and then a method referencing that field. There’s no need to add this-pointer to classes whose instance methods never access instance fields. Such an object doesn’t (need to) know it’s own address — like a lost child.
** even in those “other” cases, I feel it’s possible for the compiler to remove this-pointer as a field (reducing memory footprint) because compilers implicitly translate myInstance.m1() to m1(&myIntance)

In conclusion, I’d guess minimum size = 8 bytes but will grow when hashcode generated.

Q1a: min size of a c# struct or any value type instance?
%%A: C# structs need zero overhead, so an Int32 instance needs just 32 bits.
When we call myStruct.m2(), there’s no vptr involved — just a compile-time resolved function call as in C and non-OO languages. Compiler translates it to some_Free_Function(ptr_to_myStruct). Note even for this value-type there’s no pass-by-value — no passing at all. Just translation by compiler.

Q1b: min size of c# System.Object instance.
A: Supposed to be 8 bytes minimum (A value type instance has no such overhead)
* 4-byte vptr
* 4-byte sync block
A real instance seem to be 12-bytes long.

Q2: why is the minimum size of an empty c++ object no more than 1 byte?
%%A: obvious from the analysis above.

Q2b: why not 0 byte?
A: an object is defined as a storage location. Even if there’s no field in it, a=new MyObject() and b=new MyObject() (twice in a row, single-threaded program) must produce 2 objects at 2 locations.

Note the size of an empty c++ string class instance is 12 bytes, according to [[c++ primer]]

According to P89 [[C# in depth]] a C# byte takes 1 byte, but 8+1 bytes of heap usage when “boxed”. These 9 bytes are rounded up to 12 bytes (memory alignment). Since heap objects are nameless and accessed only via a pointer, the pointer becomes a field (of the boxed object) and takes 4 bytes (assuming a 32-bit machine)

[13] mutable data types: python ilt java #dictKey

As illustrated in python object^variable, Type, Value, immutable, initialize.. and P29 [[python essential ref]], python simple int Age variable is implicitly a pointer to a reference-counted, copy-on-write, IMmutable pointee object. That begs the question ..

Q: so, how do 2 variables share a Mutable object?
A: Use instance methods like mutablePerson.setAge()???
A: Lists and dictionaries also offer versatile “shared mutable objects”. listofMutables[0] would return a reference to the first element.

I feel these 2 answers cover 80% of the use cases.

In summary,
– “scalar” or primitive type Variables point to immutable Objects. Example — Everyday strings and numbers
– most “composite” object are mutable, such as dict, list and user-defined objects
– between the 2 well-understood categories, there exist some special data types
** tuples are composite but immutable
** method objects?
** class objects?
** modules?

Incidentally, (in a bold departure from java/c#[2]) only immutables (string, tuple,”myInt” variables) can be dictionary keys[1]. Lists, dictionaries and most user-defined objects are Mutable therefore disqualified. A “fake” immutable tuple of list also disqualifies — just try it. In real project, we only use strings and ints as keys.

[1] the underlying Object must be immutable, even though the variables can re-bind.
[2] c# expects but does not require keys to be immutable


http://en.wikipedia.org/wiki/Multicast shows(suggests?) that broadcast is also time-efficient since sender only does one send. However, multicast is smarter and more bandwidth-efficient.

IPv6 disabled broadcast — to prevent disturbing all nodes in a network when only a few are interested in a particular service. Instead it relies on multicast addressing, a conceptually similar one-to-many routing methodology. However, multicasting limits the pool of receivers to those that join a specific multicast receiver group.

MSVS project – Don’t include unversioned files

If a file is versioned and included in VS project, normal.
If a file is versioned and Excluded in VS project, then no real risk even if the file causes build errors.
If a file is un-versioned and included in VS project, then build will complain explicitly.
If a file is un-versioned and Excluded in VS project, then do you need to back up the file? If you do then better version it.