c++ low-latency connectivity IV (nQuant) #2

This IV is heavy on low-level QQ in C/C++. Such obscure knowledge won’t help GTD and is not significant zbs. They may improve your design though.

Q3: Memory alignment – what if on the stack I declare 2 char variables? See post on memory alignment.

Q3b: what if I have 2 char fields in a struct?

Q3c: I have two 64-bit ints, one misaligned. When I use them what problems will I have?
A: not much performance penalty. See https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/

Q1: If inside a member function I call “delete this”, what happens?
%%A: what if this “this” points to an object embedded as a field of an umbrella object? The deallocation would happen, but the destruction of the umbrella object may again deallocate it? This is confirmed in the FAQ (https://isocpp.org/wiki/faq/freestore-mgmt#delete-this)
%%A: how do we know the host obj is on heap, stack or global area i.e. data section.

Q1b: To achieve heap-only, my class has private ctors and private op= and a static factory method. Will it work?
%%A: according to moreEffC++ P146, I would say yes, with certain caveats.

Q2: What’s reinterpret_cast vs dynamic_cast vs static_cast?

Q2b: What other casts are there?

Q: Placement new – can I use the regular “delete”?
%%A: probably no. Need to call the dtor manually? See P42 moreEffC++

Q: How does tcp handshake work? (I don’t know why this nlg is even relevant)
A: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_establishment

Q: Some tcp parameter to speed it up?
A: larger TCP window size?
AA: TCP_NODELAY

Q: tcp client to specify a non-random port? See post on bind()

Q: If a c++ app runs fine in debug build (compiler optimizations removed), but crashes in release mode, what guesses/clues do you have?
%%A: conditional compilation, like in my c# project
%%A: the compiler optimization leads to unusual execution speed between 2 threads, and cooks up a rare corner case
%%A: I have seen assertions turned on in debug build (a debug STL can also be used), so we know one data file F1 is unusable, and another data file F2 is usable. In release build, someone else tries F1 and it crashes somewhere else.

See c++debug build can modify app behavior!

sync primitives]pthreads^winThreads

(see also post on “top 2 threading constructs in java ^ c#”)

[[Art of concurrency]] P89 has a 3-pager on pthreads vs win threads, the 2 dominant thread libraries. It claims “… most of the functionality in one model can be found in the other.”

In pthreads (like java), the 2 “most common” controls are mutex and condition var

In win threads library (and later dotnet), the 2 essential controls are

  1. events WaitHandles– kernel objects. Also known as event mutex, these are comparable to condVar, according to P175 [[object-oriented multithreading using c++]]
  2. locks
    • mutex — kernel object, cross process
    • CRITICAL_SECTION — userland object, single-process

nQuant c++IV #1 #monitoring

Overall, I remember nQuant IV was rather low-level and heavy on optimization. Mostly QQ.

Q6: how could do you implement a userland thread?
%%A: setjmp and longjmp, as in early JVM

Q6b: how? I feel better be modest and offer my tentative guesses.

Q6c: how is setjump different from goto? Seldom asked!
AA: http://ecomputernotes.com/what-is-c/function-a-pointer/what-is-the-difference-between-goto-and-longjmp-and-setjmp and http://www.geekinterview.com/question_details/3330

Q: by default, is a linux mutex cross-process?
%%A: I guess the mutex in pthreads by default isn’t. Outside linux, the windows mutex is cross-process.
AA: Beyond the default, linux mutex can be cross-process, by using shared memory — http://stackoverflow.com/questions/9389730/is-it-possible-to-use-mutex-in-multiprocessing-case-on-linux-unix

Q: in linux, what’s the difference between a process and a thread?

Q: start 5 copies of the current process, using fork? (Can use the example in [[head first c]] )

Q: If a variable in (2-process) shared memory is marked volatile, is it a reasonable usage of volatile keyword?
AA: now i think so. Similar to a variable writable by a temperature sensor. http://embeddedgurus.com/barr-code/2012/01/combining-cs-volatile-and-const-keywords/ shows use of “const volatile” on a variable in shared memory.
AA: [[moving from c to c++]] section on Volatile (P75) seems to agree

Q: what’s a limitation of the select() system call when there are too many sockets to check, and each messages (~ 2KB) is important.
A: select has max socket count. Use epoll() — http://stackoverflow.com/questions/5357445/select-max-sockets

Q: what linux command to monitor memory usage by a given process, showing size of heap, stack, text, even broken down to shared lib vs static lib
A: cat /proc/{pid}/smaps
A: pmap -x {pid}

Q: given a one-line c program “while(1);”, launch it and /usr/bin/top would show 100% cpu usage. What does it mean?

Q: what’s inside a socket? My socket book has detailed diagrams.
%%A: http://bigblog.tanbin.com/2011/06/5-parts-in-socket-object.html

Q: can a socket be shared by 2 processes?
AA: Yes. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

———I feel most of the questions below are rarely asked.

Q: how many sockets can a single process open? Not too sure. A few hundred?
A: http://stackoverflow.com/questions/410616/increasing-the-maximum-number-of-tcp-ip-connections-in-linux

Q: what linux command to monitor network performance?
%%A: Beside netstat, I have seen tools that report error rates that indicate saturation
A: ss

 

Q: how to remove the word “option” from a resume, unless it is a sub-word? Use perl or python
%%A: Word boundary symbol?

Q: when a user land thread makes a syscall, what’s the implication?
%%A: the thread enters kernel mode?

Q: what’s offered on Layer 2? Can IP simply operate on top of physical layer without Layer 2? Too deep and seldom asked.
A: ethernet is a L2 technology.
A: Each device on a network has a hardware address or MAC address, used by the data link layer

Libor, Eurodollar, OIS, Fed Fund rate … common features

deposit — All are based on the simple instrument of “deposit” — $1 deposited today grows to $1.00x

unsecured — when I deposit my $1 with you, you may go down with my money. Credit risk is low but non-zero.

inter-bank — the deposit (or the lending) is between banks. The lending rate is typically higher when lending to non-banks.

short-term — overnight, 3M etc, up to 12M.

arbitrage involving a convex/concave contract

(I doubt this knowledge has any value outside the exams.) Suppose a derivative contract is written on S(T), the terminal price of a stock. Assume a bank account with 0 interest rate either for deposit or loan. At time 0, the contract can be overpriced or under-priced, each creating a real arbitrage.

Basic realities (not assumptions) ? stock price at any time is non-negative.

— If the contract is concave, like L = log S, then a stock (+ bank account) can super-replicate the contract. (Can't subreplicate). The stock's range-of-possibilities graph is a straight tangent line touching the concave curve from above at a S(T) value equal to S(0) which is typically $1 or $10. The super-replication portfolio should have time-0 price higher than the contract, otherwise arbitrage by selling the contract.

How about C:=(100 ? S^2) and S(0) = $10 and C(0) = 0? Let's try {-20S, -C, +$200} so V(t=0) = $0 and V(t=T) = S^2 ? 20 S +100. At Termination,

If S=10, V = 0 ←global minimum

If S=0, V= 100

If S=11, V= 1

How about C:=sqrt(S)? S(0) = $1 and C(0) = $1? Let's try {S, +$1, -2C}. V(t=0) = 0. V(t=T) = S + 1 – 2 sqrt(S). At termination,

If S=0, V = 1

If S=1, V= 0 ←global minimum

If S=4, V= 1

If S=9, V= 4

— If the contract is convex, like exp(S), 2^S, S^2 or 1/S, then a stock position (+ bank account) can sub-replicate the contract. (Can't super-replicate). The replication range-of-possibilities graph is a straight tangent line touching the convex from below. This sub-rep should have a time-0 price below the contract, otherwise arbitrage by buying the contract and selling the replication.

string,debugging+other tips:[[moving from c to c++]]

[[moving from c to c++]] is fairly practical. Not full of good-on-paper “best practice” advice.

P132 don’t (and why) put “using” in header files
P133 nested struct
P129 varargs suppressing arg checking
P162 a practical custom Stack class non-template
P167 just when we could hit “missing default ctor” error. It’s a bit complicated.

–P102 offers practical tips on c++ debugging

* macro DEBUG flag can be set in #define and also … on the compiler command line
* frequently people (me included) don’t want to recompile a large codebase just to add DEBUG flag. This book shows simple techniques to turn on/off run-time debug flags
* perl dumper receives a variable $abc and dump the value of $abc and also ….. the VARIABLE NAME “abc”. C has a similar feature via the preprocessor stringize operator “#”

— chapter on the standard string class — practical, good for coding IV

* ways to initialize

* substring

* append

* insert

[[linux programmer’s toolbox]]

MALLOC_CHECK_ is a glibc env var
–debugger on optimized code

P558 Sometimes without compiler optimization performance is unacceptable.

To prevent optimizer removing your variables, mark them volatile.

An inline function may not appear in call stack. Consider “-fno-inline”

–P569 double-free may not show issues until the next free() or malloc()

–P470 – 472 sar command
can show per-process performance data

can monitor network devices

—P515 printf + macros for debugging

buffering behavior differs between terminal ^ log files

c# coding drill for IV

These questions will seldom hit the phone round…

Q: Implement IEnumerable foo(IEnumerable aa, IEnumerable b) such that foo returns string in either aa or b but not both.

To keep things simple, let’s assume aa holds unique elements, and b too.

Q: regex engine supporting quantifier “*”, and the wild-card “.”
Q: IteratorTest from MS
Q: circular buffer
Q: reverse a linked list in-place
Q: rotate a long array by 3
Q: spreadsheet concretization

(610610 blog has a letter to XR, which shows a list of coding practice projects I completed.)

familiarity with localSys(+mainstream tools) is KPI for GTD

(I have been learning this lesson since 2007, but Not before.)

We techies including the coolest techies, can get stressed up by technical issues, by deadlines, by users and managers. These really are the same kind of issue — the damn thing doesn’t work. This not-working issue constitute a major stressor though not the only stressor. When things are seriously bad, our basic competency becomes a question, meaning managers question if we are up to the job. When this stressor gets bad,

we feel like treated differently than other team members
we lose manager’s trust, and we get micro-managed,
we are required to justify our (big or small) designs
we are forced to abandon what we wrote (99% working) and adopt another design imposed on us
we are asked to explain each delay
we are forced to work late
we voluntarily decide to cancel our vacation
we worry about our job
we have no time/energy to hit the gym
we lose sleep and lose appetite

On the other hand, when we are clearly capable of handling this type of issue, we feel invincible and on top of the challenges. We can GetThingsDone — see blog post on GTD.

To effectively reduce this stressor, we really need to get comfortable with the “local system”. Let’s assume a typical financial system uses java/c#/c++, SQL, sproc, scripts, svn, autosys … In addition, there are libraries (spring, gemfire, hadoop etc) and tools like IDE, Database tools, debuggers… riding on these BASE technologies.

We are expected/required to know all of these tools [1] pretty well. If however we are slow/unfamiliar with one of these tools, we get blamed as under performing. We are expected to catch up with the colleagues. Therefore each tool can become a stressor.

[1] see also http://bigblog.tanbin.com/2012/11/2-kinds-of-essential-developer-tools-on.html

Therefore a basic survival skill is familiarity with all of these tools + the local system. If I’m familiar with all the common issues [2] in my local system then I can estimate effort, and I can tell my boss/users how long it takes. Basically, I’m on top of the tech challenge.

[2] If some part of java (say the socket layer or the concurrent iterators) never gives me problems, then I really don’t need that familiarity.

Q: how about non-mainstream tools like spring-integration, jmock? Just like local system knowledge. Investing your time learning these is not highly strategic.

When I change job, again there’s a body of essential know-how I need in order to / fend off / the stressors. Part of this know-how I already have – the portable tech knowledge. Frequently, my new team would use some unfamiliar java feature. More seriously, the local system knowledge is often the bulk of the learning load. If I were in a greenfield development phase I would write part of the local system, and I would have a huge advantage.

A major class of tools are poorly understood with insufficient proven solutions
– About half of Windows tools.
** OC GMDS systems kept crashing for no reason.
** When I got started with perl/php/javascript/mysql vs VBA/dos FTP, I soon noticed the difference in quality.

top 2 threading constructs in java ^ win32

Update: Now I think condition is not a fundamental construct in c#. The wait handle is. WH are based on primitive kernel objects…. See other posts
—–

(Soundbyte– I feel the know-how about low level constructs are more valuable/versatile/powerful. Interviewers often recognize that. These include CAS, conditions.)

See other posts on the additional threading constructs added by dotnet …
See also my post on NSPR, a cross-platform library with heavy concurrency emphasis.

Most important, practical and popular [1] constructs —
* pthreads       — 1) locks, 2) conditions
* java              — 1) locks, 2) conditions. On a “higher” layer, timers; thread pools; queues and other thread-safe collections
* win32/dotnet — 1) locks, 2) event WaitHandle (not conditions)… Also timers, thread pools. Task is supposed to be popular too.
* the dbx debugger offers “mutex” and “condition” commands as the only 2 thread-related features (beside the “thread” command)

In Win32, there are 2 lock constructs
1a) mutex : usable cross-process , like other kernel objects
1b) CRITICAL_SECTION : single-process only, like other userland objects

In general, locks alone are sufficient for simple thread coordination. Sometimes you need fancier tools —
– In java, you can use wait/notify, which is another primitive.
– In C#, wait/notify seems less popular. WaitHandle seem to be popular. Wait handles are designed (by MS) to be a more complex but feature-rich notification /construct/ than conditions. See http://msdn.microsoft.com/en-us/library/ms228964.aspx#signaling. However, experts agree conditions are more powerful. The IAsyncResult uses wait handles for inter-thread signal.

Interlocked/automicVar seems to be equally popular in java and c#.

I think exception CAS all other threading constructs rely on locks and conditions. As [[c# threading]] points out, you can simulate most features of dotnet Wait Handles by simple Condition techniques. However, wait handles (but not conditions) supports IPC.

[1] some techniques are important and practical but poorly marketed and unpopular — wait/notify, immutable, interrupts,..

producer/consumer threading pattern #YH

YH,

You are I are big fans of producer/consumer threading pattern. Now I realize there are drawbacks.

There is synchronization overhead since the task queue is shared mutable data. The more fine-grained the tasks, the more frequent threads would add/remove on the task queue, the more overhead.

I feel P/C is a high-level “general purpose” threading pattern, good for most everyday concurrency, but not for those extreme requirements. In a lot of high-end gigs, they look right past the standard techniques like P/C and look for specialized techniques/experience that is needed for higher performance. I think they look for lockfree, and they look for parallelism without inter-thread interference.

For example, if the (market) data floods in thick and fast, I fear P/C may suffer. Perhaps a naive design is for the “dispatcher” or “boss” thread to accumulate a chunk of data and package it into a task and add it to the queue. Such a design may make the dispatcher a hotspot or bottleneck.

The lower we descend, the less overhead is tolerated, and the less synchronization code I see. I feel the low-level coders avoid synchronization like a plague.

What’s your view?

bond investment is safe, in the long run (Hendricks)

If you buy a bond at a $98.5 (1.5% discount from face value), and hold it till maturity, then there’s no uncertainty in how much you will get at the end.

I heard a lot of “common sense” wisdom that bond appreciates and drops with interest rate, and therefore volatile and risky.

(I usually assume default risk is very low for the bonds I consider, at least much lower than the sensitivity to yield.)

However, if indeed a bond loses value due to rate hike, then bond holders always have the “safe” option to hold it till maturity. Its price will eventually rise and end up exactly $100.  Therefore there’s absolutely no uncertainty about the terminal value like there is about options, stocks, or futures contracts.

This is one of the most fundamental features of bond as an asset class. I don’t know another asset having this feature.

dynamic delta hedge – continual rebalancing

To dynamically hedge a long call position, we need to hold dC/dS amount of shares. That number is equal to the delta of the call. In practice, given an option position, people adjust its delta hedge on a daily basis, but in theory, rebalancing is a continual, non-stop process.
The call delta BS-formula makes it clear that “t” alone will change the delta even if everything else held constant. Specifically, even if S is not changing and interest rate is zero, the option delta won’t stay constant as maturity approaches, so we need to adjust the number of shares we hold.

martingale – phrasebook

Like the volatility concept, mg is a fundamental concept but not a simple concept. It's like an elephant for the 5 blind men. It has many aspects.
process? a martingale is a process. At any time the process has a value.

MG property? A security could have many features and one of them could be the mg property meaning the security's fair value is a process and it meets the mg definition and is a mg process.

0-expectation? Expn(M_tomorrow – M_now) = 0

no-drift? A variable or a price (that qualifies as a process) with no drift is a mg.

replication – 1-step binomial model, evolving to real option

Now I feel the binomial tree model is a classic analytical tool for option pricing….
The 1-step binomial scenario is simple but non-trivial. Can be mind-bending. Usually we are given 5 numbers for Sd, S0, Su, Cd, Cu, and the problem is phrased like “Use some number of stock and bond to replicate the contract C” to get the same “payout outcomes” [1].

First, ignore the Cd, Cu values. The Sd, S0, Su 3 numbers alone imply RN probabilities of the up-move and down-move.

Next, using the RNP values we can replicate ANY contract including the given C contract.

The number of shares in the replication is actually the delta-hedge of the call.

[1] “Payout outcomes” mean the contract pays Cd dollars in the down-state and Cu dollars in the up-state.
—— That’s the first knowledge pearl to internalize…————

* 1-step binomial call option 
– this option contract can be replicated with stocks + bonds. Rebalancing not necessary.
– RNP/MG is an alternative to replication
* 1-step 3-state call option
– can’t replicate with stocks + ….
– RNP non-unique
(That’s assuming the 3 outcomes don’t accidentally line up.)
* 2-step 3-state call option, i.e. allowing rebalancing
– can replicate with stocks + bonds but needs rebalancing (self-financed, of course)
– RNP/MG is an alternative to replication

* fine-grained call options — infinite steps, many states
can replicate (terminal) payout with stocks + bonds, but needs dynamic delta-hedge (self-financed of course)
– * required number of stocks = delta

differential ^ integral in Ito’s formula

See posts on Ito being the most precise possible prediction.

Given dynamics of S is    dS = mu dt + sigma dW  , and given a (process following) a function f() of S,  then, Ito’s rule says

    df = df/dS * dS + 1/2 d(df/dS)/dS * (dS)^2

There are really 2 different meanings to d____

– The df/dS term is ordinary differentiation wrt to S, treating S as just an ordinary variable in ordinary calculus.
– The dt term, if present, isn’t a differential. All the d__ appearing outside a division (like d_/d__) actually indicates an implicit integral.
** Specifically, The dS term (another integral term) contains a dW component. So this is even more “unusual” and “different” from the ordinary calculus view point.

signal-noise ^ predictive formula – GBM

The future price of a bond is predictable. We use a predication formula like bond_price(t) = ….

The future price of a stock, assumed GBM, can be described by a signal-noise formula

S(t) =

This is not a prediction formula. Instead, this expression says the level of S at time t is predicted to be a non-random value plus a random variable (i.e. a N@T)

In other words, S at time t is a noise superimposed on a signal. I would call it a signal-noise formula or SN formula.

How about the expectation of this random variable S? The expectation formula is a prediction formula.

LogR, LRR, RF – usages

LogR = log return

RF = return factor

RR = Level return rate

VaR assuming norm distribution? probably RF or RR
VaR assuming Lognormal distribution of return factor? use LogR

linear factor model like capm and Fama-French and principal component? RF or RR

Mean-variance? RR

principal component? RR?

MV optimization with rise-free rate – sum(weight vector) == 1 @@

In a MV optimization context without a risk free asset, the weight vector must sum to 1 — any fund left over we have no where to put, not even bank account.

In a MV world with a risk-free rate (our world), the weight vector doesn’t need to sum to 1. Any difference is an allocation to the risk-free asset.

Then we work out a tangency portfolio, whose weight vector is scaled (?) to sum to 1. I feel this is just for convenience. If the tangency weight vector sums to 25, we still can construct all MV portfolios on the MV frontier by varying the “allocation to tangency” from 0 to 0.04 and beyond.

0 means all invested in risk-free asset.

0.04 means all invested in tangency.

0.05 means short risk-free  to get 25% more cash and invest all 125% into tangency portfolio.

c++IV data analyst(WorldQuant

Struct base{
char * buffer
base(){ buffer = new char[1000]; }
..
}; //how do we improve this class?

What if the new() throws exception

Q: if you make buffer a smart array then what do you do with the big3?

Q: diff between scoped_array vs shared_array?

Q: what if I have a derived class without virtual dtor?

base * a = new derived;
delete a;

Q: What if I make derived a virtual subclass and have a derivedDerived class?

c++buy-side data support IV (wq) – phone round

Q: inline function vs macro. Not on my Tier 1/2
%%A: macro is frowned upon. Inline is a hint to compiler

Q: reverse a single linked list in-place. How many temp variables needed and how?
A: 3. I wrote this, in this blog

Q: volatile keyword in c++
A: used to be hardware-related. Also used for threading? Not according to a stackoverflow post
Q: throw exception in constructors? Best practices?
A: memory cleanup needed … JVM/CLR handles it!
Q: given an unsigned integer, how do you test if it’s a power of 2.
A: x-1 would be all-1…

template expansion/instantiation c++^c#[[pro .Net perf]]

— Based on P154 [[pro .net performance]] —

A regular c++ class Dog has 2 forms i.e. source file and the compiled binary file (usually as a library).

In contrast, a c++ class template like a simple BigList has 1 form only. There’s no such thing as a compiled form of a class template (unlike java/c#). The BigList source is used only when we compile a concretized class based on the template, such as BigList. If we include BigList source in a project but never use it, then the compiler probably ignores it.

If BigList defines a method dance() that’s not used in BigList, then this method is ignored by the compiler. In contrast, an invoked method (such as add()) is “expanded” with the type-argument “string”, to become a concretized method, and then compiled. I feel the expansion mechanism is similar to macro expansion.

What if dance() contains something unsupported by string? Well, rest assured — not compiled.

In C#, BigList is not an “expanded” or concretized version of BigList.

high return, high sharpe, high beta

Initially we want high return, or equivalently, high excess return.

naivety 1: we are ignoring the variance of the return.

-} so now we want high sharpe ratio

naivety 2: we didn’t know that all the expected return over the next year will be mostly driven by the market return.

-} so now we use beta to guide our selection.

* We might want a high beta, magnifying market return

** small stocks tend to exhibit high beta

* We might want a low beta, resistant to market up and down

* We might want 0 beta like a time deposit or something uncorrelated with the market

eq-fwd contract pricing – internalize

Even if not actively traded, the equity forward contract is fundamental to arbitrage pricing, risk-neutral pricing, and derivative pricing. We need to get very familiar with the math, which is not complicated but many people aren’t proficient.

At every turn on my option pricing learning journey, we encounter our friend the fwd contract. Its many simple properties are not always intuitive. (See P 110 [[Hull]])

* a fwd contract (like a call contract) has a contractual strike and a contractual maturity date.Upon maturity, the contract’s value is frozen and stops “floating”. The PnL gets realized and the 2 counter-parties settle.
* a fwd contract’s terminal value is stipulated (ST – K), positive or negative. This is a function of ST, i.e. terminal value of underlier. There’s even a “range of possibilities” graph, in the same spirit of the call/put’s hockey sticks.
* (like a call contract) an existing fwd contract’s pre-maturity MTM value reacts to 1) passage of time and 2) current underlier price. This is another curve but the horizontal axis is current underlier price not terminal underlier price. I call it a “now-if” graph, not a  “range of possibilities” graph. The curve depicts

    pre-maturity contract price denoted F(St, t) = St                    – K exp(-r (T-t)  ) ……… [1]
    pre-maturity contract price denoted F(St, t) = St exp(-q(T-t)) -K exp(-r(T-t)) .. [1b] continuous div

This formula [1b] is not some theorem but a direct result of the simplest replication. Major Assumption — a constant IR r.

Removing the assumption, we get a more general formula
              F(St, t) = St exp(-q(T-t)) – K Zt
where Zt is today’s price of a $1 notional zero-bond with maturity T.

Now I feel replication is at the heart of everything fwd. You could try but won’t get comfortable with the many essential results [2] unless you internalize the replication.

[2] PCP, fwd price, Black model, BS formula …

Notice [1] is a function of 2 independent variables (cf call).  When (T – now) becomes 0, this formula degenerates to (ST – K). In other words, as we approach maturity, the now-if graph morphs into the “range of possibilities” graph.

The now-if graph is a straight line at 45-degrees, crossing the x-axis at    K*exp(-r  (T-t)  )

Since Ft is a multivariate function of t and St , this thing has delta, theta —

delta = 1.0, just like the stock itself
theta = – r K exp(-r  (T-t)  ) …… negative!

(Assuming exp(-q(T-t)) = 0.98 and
To internalize [1b], recall that a “bundle” of something like 0.98 shares now (at time t) continuously generates dividend converting to additional shares, so the 0.98 shares grows exponentially to 1.0 share at T. So the bundle’s value grows from 0.98St to ST , while the bond holding grows from K*Zt to K. Bundle + bond replicates the fwd contract.

 —————Ft / St is usually (above or below) close to 0 when K is close to S.  For example if K = $100 and stock is trading $102, then the fwd contract would be cheap with a positive (or negative) value.
** most fwd contracts are constructed with very low initial value.
* note the exp() is applied on the K. When is it applied on the S? [1]
* compare 2 fwd contracts of different strikes?
* fwd contract’s value has delta = 1

[1] A few cases. ATMF options are struck at the fwd price.