My working definition — big data is the challenges and opportunities presented by the large volume of disparate (often unstructured) data.
For decades, this data has always been growing. What changed?
* One recent changed in the last 10 years or so is data processing technology. As an analogy, oil sand has been known for quite a while but the extraction technology slowly improved to become commercially viable.
* Another recent change is social media, creating lots of user-generated content. I believe this data volume is a fraction of the machine-generated data, but it’s more rich and less structured.
Many people see opportunities to make use of this data. I feel the potential usefulness of this data is somewhat /overblown/ , largely due to aggressive marketing. As a comparison, consider location data from satellites and cellular networks — useful but not life-changing useful.
The current crop of big data technologies are even more hype. I remember XML, Bluetooth, pen computing, optical fiber .. also had their prime times under the spotlight. I feel none of them lived up to the promise (or the hype).
What are the technologies related to big data? I only know a few — NOSQL, inexpensive data grid, Hadoop, machine learning, statistical/mathematical python, R, cloud, data mining technologies, data warehouse technologies…
Many of these technologies had real, validated value propositions before big data. I tend to think they will confirm and prove those original value propositions in 30 year, after the fads have long passed.
As an “investor” I have a job duty to try and spot overvalued, overhyped, high-churn technologies, so I ask
Q: Will Haoop (or another in the list) become more widely used (therefore more valuable) in 10 years, as newer technologies come and go? I’m not sure.
http://www.b-eye-network.com/view/17017 is a concise comparison of big data and data warehouse, written by a leading expert of data warehouse.