Start by identifying some high-quality, flexible, working code base that’s as close to our requirement as possible. Then slowly add X) business features + Y) optimizations (on throughput, latency etc.) I feel [Y] is harder than [X], thought [X] gives higher business value. Latency tuning is seldom high-value, but data volume could be a show-stopper.
Both X/Y enhancements could benefit from the trusty old SQL or an in-memory data store. We can also introduce MOM. These are mature tools to help X+Y. 
As I told many peers, my priorities as architect are 1) instrumentation 2) transparent languages 3) product maturity
GTD Show-stopper: data rate overflow. Already addressed
GTD Show-stopper: frequent crashes. Unlikely to happen if you start with a mature working code base. Roll back to last-working version and retest incrementally. Sometimes the crash is intermittent and hard to reproduce 😦 Good luck with those.
To blast through the stone walls, you need power tools like instrumentation, debuggers … I feel these are more important to GTD than optimization skills.
To optimize, you can also introduce memory manager such as the ring buffer and custom allocator in TP, or the custom malloc() in Facebook. If performance doesn’t improve, just roll back as in Rebus.
For backend, there are many high or low cost products, so they are effectively out of scope, including things like EOD PnL, position management, risk management, reporting. Ironically, many products in these domains advertise themselves as “trading platforms”. In contrast, what I consider in-scope would be algo executor, OMS, market data engine , real time PnL.
— The “easy route” above is probably an over-simplification, but architects must be cautiously optimistic to survive the inevitable onslaught of adversities and setbacks —
It’s possible that such a design gradually becomes outdated like GMDS or the Perl codebase in PWM-commissions, but that happens to many architects, often for no fault of their own. The better architects may start with a more future-proof design, but more likely, the stronger architects are better at adjusting both the legacy design + new requirements
Ultimately, you are benchmarked against your peers in terms of how fast you figure things out and GTD….
Socket tuning? Might be required to cope with data rate. Latency is seldom a hard requirement.
Threading? single-threaded model is probably best. Multiple processes rather than multiple threads.
Shared memory? Even though shared memory is the fastest way to move data between processes, the high-performance and high-throughput ticket plant uses TCP/Multicast instead.
MOM? for high-speed market data gateway, many banks use MOM because it’s simpler and flexible.
Inter-process data encoding? TP uses a single simplified FIX-like, monolithic format “CTF”. There are thousands of token types defined in a “master” data dictionary — semi-static data.
Scheduled tasks? Are less common in high speed trading engines and seldom latency-sensitive. I would rely on database or java/c++ async timers. For the batch tasks, I would use scripts/cron.
Testing? I would use scripts as much as possible.
 eg: GMDS architect chose memory-mapped-file which was the wrong choice.  both require an exchange interface
 data store is a must; MOM is optional;
If it crashes once a day we could still cope. Most trading engines can shut down when market closed.