The Oracle nosql book has these four “V”s to qualify any system as big data system. I added my annotations:
- Variety of data format — If any
two data formats account for more than 99% of your datain your system, then it doesn’t meet this definition. For example, FIX is one format.
- Variability in value — Does the system treat each datum equally?
Most of the so-called big-data systems I have seen don’t have these four V’s. All of them have some volume but none has the Variety or the Variability.
I would venture to say that
- 1% of the big-data systems today have all four V’s
- 50%+ of the big-data systems have no Variety no Variability
- 90% of financial big-data systems are probably in this category
- 10% of the big-data systems have 3 of the 4 V’s
The reason that these systems are considered “big data” is the big-data technologies applied. You may call it “big data technologies applied on traditional data”
Does my exchange data qualify? Definitely high volume and velocity, but no Variety or Variability.