initialize matrix with invalid value, not 0 #DP

In simple bottom-up dynamic programming algos, we often need to build a matrix of previous results to derive later results.

In contrast, harder DP problems may need other tools, but often in addition to, not in place of, a matrix.

Q: (often neglected question): what initial value to put into matrix?

  • In some problems, the top-right triangular area is not used. So the dump() had better use some obviously invalid values to highlight the used cells. 999999 is often a reasonable choice, but then dump() would need alignment.
    • If you use the read_matrix technique discussed in another blogpost, then I think dump() should use read_matrix
  • Sometimes we can use an initial high value because the DP algo will use min() to compare. I think this is fine.
  • Zero is often a reasonable-looking initial value, but a lot of times zero is a legit vale in the algorithm ! Therefore, the initial value looks like a computed value and very confusing.

 

Advertisements

With queues,say pop/append !!enqueue/dequeue

The standard phrases “enqueued/enqueuing, dequeued/dequeuing” are sometimes too long and less readable. Some texts actually use the shorter words —

  1. pop, popped, popping .. for dequeue
    • This word is used in C++ std::list::pop_front and pop_back
    • This word is used in python list.pop() and python collections.deque.popleft()
  2. append, appending, appended .. for enqueue.
    • This word is more direct, more visual.

Don’t use null/None to indicate empty

Update — [[c++codingStandard]] says we should use (smart) ptr parameter if it can be null.

Many developers return (python) None/null (java) and put such a value in a containers to positively indicate an empty value.

This practice creates unnecessary ambiguity (rather than reduce ambiguity) because many built-in modules authors use None/null when they have no choice.  If you positively return this value, it’s harder to differentiate the various scenarios. This becomes an unnecessary ambiguity when troubleshooting in production.

In Java, I often create a dummy instance of a MyType and return it to mean “empty”.  I think applicable to c++ too.

In python, I tend to return a negative number from a function that ordinarily returns  a MyType, since python functions can return type1 in ContextA but type2 in ContextB! Python list also allows unrelated types.

spreadsheet concretize #Junli Part2

Note the java algo is event-queue based — every newly concretized cell is an event added to the event queue. When we encounter this cell again after a dequeue, All registered dependents of this cell are checked. If the check results in a new cell concretized, this cell is enqueued.

In contrast, my c++ algo is a modified BFT. Key idea is, whenever a branch node can’t be concretized (due to an unresolved upstream reference) we basically ignore that node’s subtree. The other root nodes’s BFT would eventually visit this node, unless there’s a cycle.

I believe both algorithms are relatively easy to visualize at a high level. Which algo implementation is more tricky and error-prone? I guess the BFT but not really sure.

— Topo-sort — “topological sorting” is the reusable general technique for similar problems like even scheduling. As I described to Kyle, the idea is “Given a directed graph, assign (artificial) integer ranks to all nodes so that every arrow is from a low rank to a high rank”

There are linear time algorithms to assign the ranks. I think some form of BFT may work… need more thinking.

I think it depends on what’s more natural — start from leaf nodes or start from root nodes. The start level would use lower integers.

For a typical spreadsheet, I feel it’s natural to start from nodes that have no downstream.

My c++ implementation was similar to Kahn’s algorithm.

[[Algorithms]] P 550 presents an elegant DFT  algo but not so intuitive to me yet. I think it DFT can’t solve this spreadsheet.

–Some additional notes on the c++ implementation

  • 🙂 I encountered much few seg-faults than in other projects. I think it’s because very few arrays (including vectors) are used.
  • Before I look up a map/set, I always use count() to verify.
  • 🙂 I didn’t need to check against end() as lower_bound() and find() functions would require.
  • no smart ptr needed. container of raw ptr worked well. See source code comments
  • In fact, container of cell names (as strings) is even “safer”.

read_matrix(r,c)beats matrix[r][c]

Usually read_matrix(r,c) simply returns matrix[r][c] but

Such a wrapper function can

  • assertions
  • read-count + write-count on each element
  • write-once constraint enforced
  • saves typing — a(r,c) vs mat[r][c]
  • encapsulation — the matrix can be encapsulated as a local static object.
  • simplifies conditional logic when r==0 or r==c. Without this wrapper, I often need separate code to initialize those special cells
  • In python, defaultdict() can handle index-out-of-range automatically but read_matrix() also does a fine job

t_implBestPractice^t_idiom^t_c++patt^namedDesignPatt^t_c++ECT66^c++tecniq^t_workTimeSaver

  • t_c++pattern — must be a named pattern
  • t_c++idiom — must be well-established small-scale
  • t_ECT and t_c++idiom are mutually exclusive
  • t_tecniq — higher than syntaxTips, less selective than t_c++idiom or t_c++pattern but Should NOT be dumping ground
  • t_implBestPractice is like gentmp

##c++good habits like java q[final]

  • Good habit — internal linkage
    • ‘static’ keyword on file-level static variables
    • anonymous namespace
  • if (container.end() ==itrFromLower_bound) {…}
  • use typedef to improve readability of nested containers
  • typedef — “typedef int FibNum” as source documentation
  • initialize local variables upon declaration.
  • use ‘const/constexpr’ whenever it makes sense. The c++11 constexpr is even better.
  • [c++11] – use ‘override’ or ‘final’ keywords when overriding inherited virtual methods. Compiler will break if there’s nothing to override.

goto: justifications

Best-known use case: break out of deeply nested for/while blocks.

  • Alternative — extract into a function and use early return instead of goto. If you need “goto: CleanUp”, then you can extract the cleanup block into a function returning the same data type and replace the goto with “return cleanup()”
  • Alternative — extract the cleanup block into a void function and replace the goto with a call to cleanup()
  • Alternative (easiest) — exit or throw exception

Sometimes none of the alternatives are easy. To refactor the code to avoid goto requires too much testing, approval and release. The code is in a critical module in production. Laser surgery is preferred — introduce goto.

 

append _v2 to methods and classes

Relax — It’s fairly painless to use IDE to rename back and forth any
method or class.
When I feel one class is fairly complete among 20 incomplete or
“volatile” classes, I mark it someClass_v9, signifying it’s version 0.9.
When I create a mockup class I often mark it MyClass_v1. Useful when
putting together a paper aircraft so as to build out a prototype
architecture.
It helps to see these little information-radiators when so many things
are in a flux.

long constructor signature vs tons of setters

A mild code smell is a long list of (say 22) constructor parameters to set 22 fields. Alternative is 22 setters.
If some field should be final, then the ctor Pain is justified.

If some of the 22 fields are optional, then it’s good to initialize only the compulsory fields in the ctor. Optional fields can use setter.

If one field’s initialization is lengthy, then this single initialization can prolong the ctor call, during which the incomplete new-born reference can leak. Therefore, it’s perhaps safer to postpone this slow initialization to a setter.

If setter has non-trivial logic for a field, then setter is a reasonable choice.

Best benefit in the ctor route is the final keyword. With setters, we need to deny access to a field before initialization. Final fields are beautiful.

It’s often hard to know which is which looking at the 22 arguments in a ctor call (e.g. they include literals). However, a cheap sweetener is the IDE-refactor tool introduce-local-variable. In contrast, setter serves as simple documentation, but some IDE are unable to update setter names when we rename a field.

instance fields vs long argument list

One of the “mild” code smells is the habit to convert instance-method params into fields. It complicates code tracing. Much cleaner is a local var – writable only in a local scope. In contrast, a field (or global) can be updated from many places, so it’s harder to visualize data flow.

Practical Scenario – If there are 5 objects I keep passing among a bunch of instance-methods, I would consider converting them to instance fields of the class (so methods receive fewer args). As a middle ground between fields and long signatures, perhaps an immutable method object is acceptable.

static methods perfect; static fields dangerous

Hi XR,

Over the last 5 projects, I am moving into a new design direction — use static (rather than non-static) methods whenever possible, while keeping mutable static fields to a minimum.

A digression first — a note on local “auto” variables as a superior alternative to instance fields. Instance fields represent object state, which is often shared. System complexity (as measured by testing effort) is proportional to the number of stateful variables — mostly instance fields + some static fields. In contrast, Local variables including some function parameters [3] are temporary, on the stack, far more thread-friendly, and often stateless except temporary state. The best ones are scratch pad variables that become unreachable soon, so they don’t pollute, don’t linger, and has no side effects.

[3] If an object passed in as argument has a large “scope” somewhere else, then it doesn’t enjoy the simplicity of local variables.

Similar to local vars, the best static methods are stateless and thread-friendly — these methods don’t change object state i.e. instance fields or static fields. For example, most static debugger statements are stateless; most type converters are stateless; factory methods are often stateless as they create new objects; most math functions are static; many string utils are static

Static methods are often simpler. If you were to implement the same logic with instance methods, they often end up entangled in a web of dependencies.

I can easily move a static method between classes. Perfect lego blocks.

Object dependency is a major part of system complexity. Look at dependency injection frameworks. Java Swing is complex partly for this — Compared to the average non-swing app, I feel the average swing app has more objects and more inter-dependencies.

If you design objects to represent business domain entities (like accounts, requests, patients, shipments, houses..), then instance methods represent behavior of objects; static methods represent … utilities

When I look at a non-trivial OO system, and estimate the percentage of code in instance methods vs static methods, I find invariably instance methods outnumber static methods by at least 3 times (75% vs 25%). Habitually, I refactor them into static methods whenever feasible. Time well spent, as system invariably gets simpler and cleaner.

C# elevates static methods pattern into a language feature — static class

[[the art of readable code]] P98 says — to restrict access to class members, make as many methods static as possible. This let reader know “these lines of code are isolated from those variables”

Any experience to share?

create void methods

Many developers told me they seldom write void methods. Instead, their methods by default return a int/char status or something similar.

There are justifications for void methods. I think void method applies widely, not just when you have no choice.

* if you use void now, you can more easily add a return type later. More flexible.
* if you return something but don’t read it, then other programmers may get confused/mis-lead. Inevitably you end up adding comments
to explain that.
* if you read return value even though you don’t need, then that’s extra work and other programmers will feel they need to
understand what’s going on.
* for some simple methods, void makes the method body simpler. Non-void requires 1 or sometimes several return statements. What a
waste if you don’t need it at all!
* The need to indicate success/failure can be met with judicial use of runtime exceptions. Checked exeption usage requires
justification.
* IDE refactoring frequently creates void methods.

asserts for production system diagnosis

I think Assertions were added to java1.4 partly for documentation and partly to help us investigate production systems — restart the system with -ea and you may see error messages before a null pointer exception.

Put another way, without asserts enabled, the error condition (stack trace etc) might be inexplicable and baffling. You enable asserts to let the (production!) system fail earlier so you can see the early signs previously unreported/unnoticed. This is probably faster than reproducing the problem in QA.

Ideally you want to add more logging or error checking, but asserts offer some unique advantages.

de-couple – using abstract^concrete class

When authors say a client class A “depends on a concrete class” B they mean A.java instanticiates new B(). This is less advisable than using a abstract/interface. Now I think there are many hidden costs and abstract arguments.

One of the most understandable argument is testability. But in this post, we talk about another argument — the “expected services” between client object A and “service” object B.

Scenario 0: client object A instantiates new B(). The maintainers of A.java and B.java have to assume every public B.java method [1] is required. Suppose A.java and B.java are maintained in separate companies like spring^hibernate. The 2 maintainers dare not change many things in A.java or B.java. But all software need changes!

Scenario 1: Interface C.java offers more flexibility — decoupled. If A.java only uses C, then both maintainers know the services that B.java must support. A maintainer can swap in (runtime) another implementation of interface C. Likewise, B maintainer have more room for maneuver.

Analogy: I wrote HTTP clients in telnet and Perl. They follow the HTTP protocol so they can interact with any web server, thanks to a published standard on the expected behaviour of client and server. Both sides depend on the abstract HTTP protocol only. A java Interface strives for the same flexibility and decoupling.

During OO design, I think it’s fine to introduce as many interfaces as you like. I think interfaces don’t add baggage or tie us down as classes do. Remember your team lead makes one DAO interface for every dao class!

[1] Public fields, too.

best way2report bad arguments #java

Say you are writing a utility Class1.getByIndex(int arrayIndex). Best way to enforce that arrayIndex is positive?

  • throw an unchecked error and crash the jvm? Recommended by many experts. I guess the downside is tolerable, and there are no better candidates as a standard, universal solution.
    • What if too often? Caller code needs a fix!
  • throw a checked exception and force other programmers to try-catch? I think the overhead grows when you set this as standard practice.
  • return a boolean result? Many methods/constructors can’t.

Ideally, you want to remind other programmers to check their input first, but not force them to put up a useless try-catch.

baseclass dependent on subclass

A parent calls a method overridden by a subclass .. template method. A lot of plugin, callback, hollywood principal uses something like template method.

Say you have just one class. Within it, you call your own methods. Those private and final methods are fine but all other methods are virtual (ie overridable and “subject to overriding”). Problematic if this happens by accident not by design. Now I think baseclass depending on subclass is a very likely mistake.


The MIT course note (in an earlier blog) points out the danger when a super class method B.m1() calls m2(), which is defined in B, overriden in subclass C.

In this case, “dependent” means “when the subclass C.m2() changes, B may need change, since B’s behaviour is affected.”

Q: whose m2() runs?
A: the innermost onion’s m2(), in this case, C.m2()

Q: how do u specify B’s m2() to be used in m1()? B.m2 ?
A: B.m2 will probably fail. B.m2() requires m2 be static.
A: I think u need to provide a private final m2a(), and make m2() call m2a().

If your class B need to call your own method m2(), you need to decide whether to call the m2 defined in this class or a subclass (by dynamic binding) or a parent class. Not sure if it’s possible to implement your decision.

For a full example, see Example 60 [[ java precisely ]]