The value-add of big-data (as an industry or skillset) == tools + models + data
- If we look at 100 big-data projects in practice, each one has all 3 elements, but 90-99% of them would have limited value-add, mostly due to .. model — exploratory research
- data mining probably uses similar models IMHO but we know its value-add is not so impressive
- tools —- are mostly software but also include cloud.
- models —- are the essence of the tools. Tools are invented, designed mostly for models. Models are often theoretical. Some statistical tools are tightly coupled with the models…
Fundamentally, the relationship between tools and models is similar to
Quant library technology vs quant research.
- Big data technologies (acquisition, parsing, cleansing, indexing, tagging, classifying..) is not exploratory. It’s more similar to database technology than scientific research.
- Data science is an experimental/exploratory discovery task, like other scientific research. I feel it’s somewhat academic and theoretical. As a result, salary is not comparable to commercial sectors. My friend Jingsong worked with data scientists in Nokia/Microsoft.
The biggest improvement in recent years are in … tools
The biggest “growth” over the last 20 years is in data. I feel user-generated data is dwarfed by machine generated data