The noSQL products all provide some GUI/query, but not very good. Piroz had to write a web GUI to show the content of gemfire. Without the GUI it’s very hard to manage anything that’s build on gemfire.
As data stores, even binary files are valuable.
Note snoop/capture is no data-store, but falls in the same category as logging. They are easily suppressed, including critical error messages.
Why is RDBMS my #1 pick? ACID requires every datum to be persistent/durable, therefore viewable from any 3rd-party app, so we aren’t dependent on the writer application.
I feel the technology churn is remarkably low.
New low-level latency techniques are coming up frequently, but these topics are actually “shallow” and low complexity to the app developer.
- epoll replacing select()? yes churn, but much less tragic than the stories with swing, perl, structs
- most of the interview topics are unchanging
- concurrency? not always needed. If needed, then often fairly simple.
I think naturally-occurring rvalue objects are rare — Strict temporary objects safe for “robbing/stealing”:
- literals — but these objects don’t hold any resources via a heap pointer
- string1 + “.victor”
- myInventory – 5000
- myVector.push_back(Trade(12345)) — there is actually a temp Trade object. Compiler will call the rvr overload of push_back(). https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp is my investigation. My temp object actually hold a resource via a heap pointer. But this usage scenario is rare in my opinion
However, if you have a regular nonref variable std::string myStr=”hello”, you can generate a rvr variable:
std::string && rvr2 = std::move(myStr);
By using std::move(), you promise to the compiler not to use myStr object or myStr variable afterwards.
I used to dismiss “commodity” skills like market data, risk system, J2EE… I used to prefer high-end specializations like algo-trading, quant-dev, derivative pricers.
As I get older, it makes sense to prefer market depth rather than “elite”(high-end niche) domains. A job market with depth (eg market-data) offers a large number of positions. The typical salary of top 10% vs the median are not very different — small gaps. In contrast, the elite domains feature bigger gaps. As I grow older, I may need to reconsider the specialist vs generalist-manager choices.
Reminders about this preference (See also the spreadsheet):
- stagnation in my orgradient
- may or may not use my specialist skills in math, concurrency, algorithms, or SQL …
- robust demand
- salary probabilities(distro): mgr^NBA#marketDepth etc
–case study: Algo trading domain
The skillset overlap between HFT vs other algo systems (sell-side, OTC, RFQ, automated pricing/execution..) is questionable. So is “accumulation” across the boundary. There seems to be a formidable “dragon gate” — 鲤鱼跳龙门.
Within c++ based HFT, accumulation is conceivable. Job pool is so small that I worry about market depth. My friend Shanyou agreed that most of the technical requirement is latency. C/C++ latency techniques are different from java.
Outside HFT, the level of sophistication and latency-sensitivity varies. Given the vague definition, there are many (mostly java) jobs related to algo trading i.e. better market depth. Demand is more robust. Less elitist.
Example — you programmed java for 6+ months, but you scored below 50% on those (basic) java knowledge question I asked you in skype chat. You only know what to study when you attend interviews. Without interviews, you won’t encounter those topics in your projects.
Example — I used SQL for at least 3 years before I joined Goldman Sachs. Until then I used no outer join no self-join no HAVING clause, no CASE, no correlated sub-query, no index tweaking. These topics were lightly used in Goldman but needed in interviews. So without interviews, I wouldn’t not know to pay attention to these topics.
Example — I programming tcp sockets many times. The socket interview questions I got from 2010 to 2016 were fairly basic. When I came to ICE I looked a bit deeper into our socket codebase but didn’t learn anything in particular. Then my interviews started showing me the direction. Among other things, interviewers look for in-depth understanding of
· Fast/slow receivers
· Buffer overflow
How the hell can we figure out these are the high-value topics in TCP without interviews? I would say No Way even if I spend 2 years on this job.
if node is None: return None
if node.key == k: return node.val
node = node.next
The above exit condition is far more visible than in
while node is not None:
if node.key == k: return node.val
node = node.next
In the late 2010’s, Wall street java jobs were informally categorized into core-java vs J2EE. Nowadays “J2EE” is replaced by “full-stack” and “big-data”.
The typical core java interview requirements have remained unchanged — collections, lots of multi-threading, JVM tuning, compiler details (including keywords, generics, overriding, reflection, serialization ), …, but very few add-on packages.
(With the notable exception of java collections) Those add-on packages are, by definition, not part of the “core” java language. The full-stack and big-data java jobs use plenty of add-on packages. It’s no surprise that these jobs pay on par with core-java jobs. More than 5 years ago J2EE jobs, too, used to pay on par with core-java jobs, and sometimes higher.
My long-standing preference for core-java rests on one observation — churn. The add-on packages tend to have a relatively short shelf-life. They become outdated and lose relevance. I remember some of the add-on
- Hibernate, iBatis
- Servlet, JSP
- XML-related packages (more than 10)
- JMS, Tibco EMS, Solace …
- functional java
- Protobuf, json
- Gemfire, Coherence, …
- ajax integration
None of them is absolutely necessary. I have seen many enterprise java systems using only one of these add-on packages (not Spring)
I am curious about data scientist jobs, given my formal training in financial math and my (limited) work experience in data analysis.
I feel this role is a typical type — a generic “analyst” position in a finance-related firm, with some job functions related to … data (!):
- some elementary statistics
- some machine-learning
- cloud infrastructure
- some hadoop cluster
- noSQL data store
- some data lake
- relational database query (or design)
- some data aggregation
- map-reduce with Hadoop or Spark or Storm
- some data mining
- some slice-n-dice
- data cleansing on a relatively high amount of raw data
- high-level python and R programming
- reporting tools ranging from enterprise reporting to smaller desktop reporting software
- spreadsheet data analysis — most end users still favor consider spreadsheet the primary user interface
I feel these are indeed elements of data science, but even if we identify a job with 90% of these elements, it may not be a true blue data scientist job. Embarrassingly, I don’t have clear criteria for a real data scientist role (there are precise definitions out there) but I feel “big-data”, “data-analytics” are so vague and so much hot air that many employers would jump on th bandwagon and portray themselves as data science shops.
I worry that after I work on such a job for 2 years, I may not gain a lot of insight or add a lot of value.
———- Forwarded message ———-
Date: 22 May 2017 at 20:40
Subject: Data Specialist – Full Time Position in NYC
Data Specialist– Financial Services – NYC – Full Time
My client is an established financial services consulting company in NYC looking for a Data Specialist. You will be hands on in analyzing and drawing insight from close to 500,000 data points, as well as instrumental in developing best practices to improve the functionality of the data platform and overall capabilities. If you are interested please send an updated copy of your resume and let me know the best time and day to reach you.
As the Data Specialist, you will be tasked with delivering benchmarking and analytic products and services, improving our data and analytical capabilities, analyzing data to identify value-add trends and increasing the efficiency of our platform, a custom-built, SQL-based platform used to store, analyze, and deliver benchmarking data to internal and external constituents.
- 3-5 years’ experience, financial services and/or payments knowledge is a plus
- High proficiency in SQL programming
- High proficiency in Python programming
- High proficiency in Excel and other Microsoft Office suite products
- Proficiency with report writing tools – Report Builder experience is a plus
I feel move ctor (and move-assignment) is extremely implicit and “in-the-fabric”. I don’t know of any common function with a rvr parameter. Such a function is usually in some library, but I don’t know any std lib function like that. Consequently, in my projects I have not seen any user-level code that shows “std::move(…)”
Let’s look at move ctor. “In the fabric” means it’s mostly rather implicit i.e. invisible. Most of the time move ctor is picked by compiler based on some rules, and I have basically no influence over it.
https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when I need to call move() but it’s a contrived example — I have some object (holding a resource via heap pointer), I use it once then I don’t need it any more, so I “move” its resource into a container and abandon the crippled object.
Conclusion — as app developers I seldom write code using std::move.
- P20 [[c++ std lib] shows myCollection.insert(std::move(x)); // where
x is a local nonref variable, not a heap pointer!
- I think you do this only if x has part of its internal storage allocated on heap, and only if the type X has a move ctor.
I bet that most of the time when an app developer writes “move(…)”, she doesn’t know if the move ctor will actually get picked by compiler. Verification needed.
–Here’s one contrived example of app developer writing std::move:
vectorOfString.push_back(std::move(myStr)); //we promise to compiler we won’t use myStr any more.
Without std::move, a copy of myStr is constructed in the vector. I call this a contrived example because
- if input is a char-array, then emplace_back() is more efficient
- if input is another string, then we can simply use push_back(input)
Interviewer should probably understand. Those simple tasks take up time and don’t demonstrate my skills.
- split string
- initialize a container with zeros